The Global API Latency Map: Where Are APIs 2026
The Global API Latency Map: Where Are APIs Fastest?
Your API might respond in 50ms from your office. But what about from São Paulo? Lagos? Jakarta? For global applications, latency isn't one number — it's a map. Geography determines whether your API feels instant or sluggish, and the difference between regions can be 10x.
How API Latency Works
The Physics Problem
Light travels through fiber optic cable at roughly 200,000 km/s — about two-thirds the speed of light in a vacuum. That means:
| Route | Distance | Speed-of-Light Minimum | Realistic RTT |
|---|---|---|---|
| NYC → London | 5,500 km | 28ms | 70-80ms |
| NYC → Tokyo | 10,800 km | 54ms | 150-180ms |
| NYC → Sydney | 16,000 km | 80ms | 200-250ms |
| NYC → Mumbai | 12,500 km | 63ms | 180-220ms |
| London → Singapore | 10,800 km | 54ms | 160-200ms |
| São Paulo → Tokyo | 18,500 km | 93ms | 280-320ms |
These are minimum latencies. Real-world adds DNS lookup (10-50ms), TLS handshake (30-100ms), server processing, and multiple network hops.
What Makes Up API Latency
Total latency = DNS + TCP + TLS + Network transit + Server processing + Response transfer
Example (NYC → Singapore API call):
DNS lookup: 10ms (cached) or 50ms (cold)
TCP handshake: 80ms (one RTT)
TLS handshake: 160ms (two RTTs)
Request transit: 80ms
Server processing: 20ms
Response transit: 80ms
─────────────────────
Total: ~430ms (cold) or ~240ms (warm, with connection reuse)
API Latency by Region
Cloud Region Performance (2026)
Average response times for a simple REST API (CRUD, <1KB response) from each region:
| Origin Region | To US East | To EU West | To Asia East | To Australia | To South America |
|---|---|---|---|---|---|
| US East (Virginia) | 5-15ms | 70-90ms | 150-200ms | 200-250ms | 120-160ms |
| US West (Oregon) | 60-80ms | 130-160ms | 100-140ms | 150-180ms | 150-180ms |
| EU West (Ireland) | 70-90ms | 5-15ms | 200-250ms | 280-320ms | 180-220ms |
| EU Central (Frankfurt) | 80-100ms | 10-20ms | 180-220ms | 260-300ms | 200-240ms |
| Asia East (Tokyo) | 150-200ms | 200-250ms | 5-15ms | 100-130ms | 280-320ms |
| Asia Southeast (Singapore) | 200-240ms | 160-200ms | 30-50ms | 80-120ms | 300-350ms |
| Australia (Sydney) | 200-250ms | 280-320ms | 100-130ms | 5-15ms | 300-350ms |
| South America (São Paulo) | 120-160ms | 180-220ms | 280-320ms | 300-350ms | 5-15ms |
The Multi-Region Gap
A single-region API has wildly different performance for global users:
API hosted in US East (Virginia):
User in New York: 15ms ✅
User in London: 80ms ⚠️ Noticeable
User in Tokyo: 180ms ❌ Feels slow
User in Sydney: 230ms ❌ Frustrating
User in São Paulo: 140ms ⚠️ Noticeable
Rule of thumb: Users notice latency above 100ms. Above 300ms feels broken.
How Major API Providers Perform
Global Latency Benchmarks
| Provider | Regions | Median Global Latency | P99 Latency | Edge PoPs |
|---|---|---|---|---|
| Cloudflare | 300+ cities | 15-30ms | 50-80ms | 300+ |
| AWS CloudFront | 60+ regions | 20-40ms | 60-100ms | 400+ |
| Google Cloud CDN | 40+ regions | 25-45ms | 70-120ms | 180+ |
| Fastly | 30+ regions | 15-35ms | 50-90ms | 90+ |
| Azure CDN | 120+ PoPs | 25-50ms | 80-130ms | 120+ |
API-Specific Provider Latency
| API Provider | Architecture | US Latency | EU Latency | Asia Latency | Global Avg |
|---|---|---|---|---|---|
| Stripe | Multi-region | 50-80ms | 50-80ms | 100-150ms | 80-100ms |
| Twilio | Multi-region | 60-100ms | 60-100ms | 120-180ms | 90-120ms |
| Auth0 | Multi-region | 80-120ms | 80-120ms | 150-250ms | 100-150ms |
| Algolia | Distributed search | 10-30ms | 10-30ms | 20-50ms | 15-35ms |
| OpenAI | US primary | 200-500ms | 250-600ms | 300-700ms | 300-600ms |
| Anthropic | US primary | 200-400ms | 250-500ms | 300-600ms | 250-500ms |
Note: AI API latency is dominated by inference time, not network latency. The 200-600ms is mostly compute, not geography.
Strategies for Reducing Global Latency
1. Deploy to Multiple Regions
// Multi-region deployment with geo-routing
// DNS routes users to nearest region automatically
// vercel.json — auto-deploys to all edge regions
{
"regions": ["iad1", "cdg1", "hnd1", "syd1", "gru1"]
}
// Or with Cloudflare Workers — runs in 300+ cities
export default {
async fetch(request: Request) {
// This code runs in the city nearest the user
const data = await getFromNearestDB(request);
return Response.json(data);
}
};
2. Edge Caching
// Cache API responses at the edge — CDN serves from nearest PoP
export async function GET(request: Request) {
const data = await fetchProducts();
return Response.json(data, {
headers: {
'Cache-Control': 'public, s-maxage=60, stale-while-revalidate=300',
'CDN-Cache-Control': 'public, max-age=300',
'Surrogate-Control': 'max-age=3600',
},
});
}
// Impact:
// Without cache: 200ms (origin in Virginia → user in Tokyo)
// With edge cache: 15ms (Tokyo PoP → user in Tokyo)
3. Edge Database Patterns
| Solution | Type | Read Latency | Write Latency | Consistency |
|---|---|---|---|---|
| Cloudflare D1 | SQLite at edge | <5ms | <10ms (local) | Eventual |
| Turso | Distributed SQLite | 5-15ms | 50-100ms | Strong |
| PlanetScale | Distributed MySQL | 10-30ms | 30-80ms | Strong |
| Neon | Serverless Postgres | 20-50ms | 30-80ms | Strong |
| DynamoDB Global Tables | Key-value | 5-10ms | 50-100ms | Eventual |
| CockroachDB | Distributed SQL | 10-30ms | 50-150ms | Strong |
4. Connection Optimization
// Reduce handshake overhead with connection reuse
// HTTP/2 multiplexing — one connection, many requests
const agent = new https.Agent({
keepAlive: true,
maxSockets: 10,
keepAliveMsecs: 60000,
});
// Savings per request with warm connection:
// Cold: DNS(50) + TCP(80) + TLS(160) + Transit(80) = 370ms
// Warm: Transit(80) = 80ms (78% reduction)
5. Regional API Routing
// Route API calls to nearest provider region
const REGION_ENDPOINTS = {
'us': 'https://us.api.example.com',
'eu': 'https://eu.api.example.com',
'ap': 'https://ap.api.example.com',
};
function getNearestEndpoint(userRegion: string): string {
if (['US', 'CA', 'MX'].includes(userRegion)) return REGION_ENDPOINTS.us;
if (['GB', 'DE', 'FR', 'NL', 'IT', 'ES'].includes(userRegion)) return REGION_ENDPOINTS.eu;
return REGION_ENDPOINTS.ap;
}
// Or use DNS-based routing (Route 53, Cloudflare Load Balancing)
// Automatically routes to nearest healthy endpoint
6. Prefetching and Preconnecting
<!-- Reduce perceived latency with resource hints -->
<!-- DNS prefetch — saves 50ms on first request -->
<link rel="dns-prefetch" href="https://api.example.com" />
<!-- Preconnect — saves 200ms+ (DNS + TCP + TLS) -->
<link rel="preconnect" href="https://api.example.com" />
<!-- Prefetch data — fetches before user needs it -->
<link rel="prefetch" href="https://api.example.com/v1/config" />
// Client-side: preconnect to API on page load
if (typeof window !== 'undefined') {
const link = document.createElement('link');
link.rel = 'preconnect';
link.href = 'https://api.example.com';
document.head.appendChild(link);
}
Measuring API Latency
Tools
| Tool | What It Measures | Coverage |
|---|---|---|
| Pingdom | HTTP response time | 100+ locations |
| Uptrends | Full waterfall timing | 230+ checkpoints |
| Checkly | API monitoring from multiple regions | 20+ regions |
| Catchpoint | Network-level latency | 2,800+ nodes |
| ThousandEyes | Path analysis and latency | ISP-level visibility |
| Cloudflare Radar | Internet latency trends | Global |
DIY Global Latency Test
// Test your API from multiple regions using Cloudflare Workers
export default {
async fetch(request: Request) {
const endpoints = [
{ region: 'US East', url: 'https://us-east.api.example.com/health' },
{ region: 'EU West', url: 'https://eu-west.api.example.com/health' },
{ region: 'Asia', url: 'https://asia.api.example.com/health' },
];
const results = await Promise.all(
endpoints.map(async ({ region, url }) => {
const start = performance.now();
const response = await fetch(url);
const latency = performance.now() - start;
return { region, latency: Math.round(latency), status: response.status };
})
);
return Response.json({
measured_from: request.cf?.city,
results,
});
},
};
Regional Optimization Playbook
| Your Users Are In | Deploy To | Cache At | Database |
|---|---|---|---|
| US only | us-east-1 | CloudFront US | RDS / PlanetScale |
| US + EU | us-east-1 + eu-west-1 | CloudFront | PlanetScale (multi-region) |
| Global | Edge (Workers/Vercel) | Every PoP | Turso / D1 / DynamoDB Global |
| Asia-focused | ap-northeast-1 + ap-southeast-1 | CloudFront AP | DynamoDB AP |
| Latency-critical | Every major region | Aggressive edge cache | Read replicas everywhere |
The Business Case for Latency Investment
Latency optimization is frequently framed as a technical concern, but the business case is quantitative and well-documented across industries. Amazon's internal research — widely cited in web performance literature — found that every 100ms of additional latency reduced revenue by approximately 1%. Google documented that a 500ms increase in search latency reduced searches by 20%. Akamai's research found 1-second delays reduce customer satisfaction by 16% and conversions by 7%.
For API-heavy applications, the relationship between latency and business outcomes is particularly direct. An API powering a checkout flow, a search experience, or a real-time collaborative tool creates a measurable line between API response time and revenue impact. A checkout API that adds 300ms of latency from button click to confirmation degrades conversion even when users don't consciously notice the delay.
The ROI calculation for latency investment is straightforward: measure current conversion rate by latency bucket, quantify how many users are experiencing elevated latency due to geographic distance, project the revenue impact of moving them from high-latency to low-latency, then compare against the infrastructure cost of a CDN layer or a second region. For most applications with meaningful global traffic, this math favors multi-region or edge deployment long before traffic volumes feel large enough to justify the operational complexity. A $200/month additional region that improves conversion for 20% of users typically pays for itself within a few weeks.
The counterargument is real: distributed infrastructure adds complexity, and complexity introduces failure modes. The right decision is region-by-region, driven by where your actual users are, not by the assumption that global distribution is always better.
Edge vs Multi-Region: Choosing the Right Architecture
The choice between edge deployment (Cloudflare Workers, Vercel Edge Functions, Fastly Compute) and traditional multi-region deployment (4-6 cloud regions with load balancing) involves tradeoffs that aren't immediately obvious from the latency numbers alone.
Edge deployment places compute in 100-300+ cities globally, achieving sub-30ms response times from almost anywhere on Earth. The constraint: edge runtimes are sandboxed environments with meaningful restrictions. Cloudflare Workers limit CPU time to 10-50ms per invocation, cap memory at 128MB, and restrict certain Node.js APIs. Full-featured database queries, session state, and CPU-intensive operations don't run at the edge — edge functions can only read from fast nearby data sources (KV stores, edge caches, or replicas with sub-10ms read latency from the edge node).
Multi-region deployment across 4-6 traditional cloud regions runs full application logic with no runtime sandbox restrictions, at the cost of moderately higher latency (10-50ms to the nearest region rather than 5-15ms to the nearest edge node) and more infrastructure complexity. This is the right architecture when requests require database writes, session management, or compute that exceeds edge runtime limits.
The architecture most teams end up with: edge functions for static assets, configuration fetches, and read-heavy endpoints with aggressive cache headers; regional deployment for authenticated routes, mutations, and any logic that requires full runtime access. This hybrid captures most of the latency benefit for the majority of requests while avoiding the edge sandbox constraints that would otherwise require significant application restructuring.
Common Mistakes
| Mistake | Impact | Fix |
|---|---|---|
| Single-region deployment for global users | 200-300ms for distant users | Multi-region or edge deployment |
| No CDN for API responses | Every request hits origin | Add Cache-Control headers + CDN |
| Cold DNS lookups | +50ms per new domain | DNS prefetch + keep-alive |
| No connection reuse | +200ms TLS handshake per request | HTTP/2 + keep-alive |
| Measuring latency only from your office | Miss real-world experience | Monitor from multiple global locations |
| Ignoring P99 latency | 1% of users have terrible experience | Optimize tail latency, not just median |
Compare API providers by regional performance on APIScout — latency benchmarks, uptime data, and global coverage maps.
Related: Best CDN APIs for Developers in 2026, Building Multi-Region APIs 2026, Cloudflare Workers vs AWS Lambda@Edge vs Fastly Compute