How to Cache API Responses for Performance 2026
How to Cache API Responses for Better Performance
The fastest API call is the one you don't make. Caching API responses reduces latency, lowers costs, improves reliability, and keeps your app fast when the API is slow. But cache wrong and you serve stale data, break real-time features, or introduce bugs.
This guide covers caching in three layers — browser/HTTP, application (Redis), and edge/CDN — and the invalidation strategies that keep each layer fresh. The layers aren't mutually exclusive: a production app typically uses all three in combination, with different data types cached at different layers with different TTLs. The key skills are knowing which layer to add for each problem, and how to design cache keys that allow precise invalidation when data changes.
Why Cache API Responses?
| Benefit | Impact |
|---|---|
| Latency | 200ms API call → <5ms cache hit |
| Cost | 50-80% fewer API calls = 50-80% lower API bill |
| Reliability | Serve cached data when API is down |
| Rate limits | Fewer requests = stay under limits |
| User experience | Instant responses feel native |
Caching Layers
User Request
│
▼
┌──────────────────┐
│ Browser Cache │ Cache-Control headers, Service Worker
│ (0ms latency) │
└────────┬─────────┘
│ miss
▼
┌──────────────────┐
│ CDN / Edge Cache │ Cloudflare, CloudFront, Fastly
│ (5-20ms latency) │
└────────┬─────────┘
│ miss
▼
┌──────────────────┐
│ Application Cache│ Redis, Memcached, in-memory
│ (1-10ms latency) │
└────────┬─────────┘
│ miss
▼
┌──────────────────┐
│ API Call │ Third-party API
│ (50-500ms) │
└──────────────────┘
Layer 1: HTTP Caching
Use Cache-Control headers — the browser and CDN do the work for you.
// Your API route that proxies a third-party API
export async function GET(request: Request) {
const data = await fetch('https://api.example.com/products');
const products = await data.json();
return Response.json(products, {
headers: {
// Cache in browser for 60 seconds
'Cache-Control': 'public, max-age=60',
// Cache at CDN for 5 minutes, serve stale while revalidating
'CDN-Cache-Control': 'public, max-age=300, stale-while-revalidate=600',
// ETag for conditional requests
'ETag': `"${hashResponse(products)}"`,
},
});
}
Cache-Control Directives
| Directive | What It Does | Use When |
|---|---|---|
public, max-age=60 | Cache everywhere for 60s | Static data, not user-specific |
private, max-age=300 | Cache in browser only for 5min | User-specific data |
no-store | Never cache | Sensitive data (balances, auth) |
stale-while-revalidate=60 | Serve stale, fetch fresh in background | Most API responses |
s-maxage=300 | CDN caches for 5min (browser uses max-age) | CDN-specific TTL |
Layer 2: Application Cache (Redis)
import { Redis } from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
class APICache {
constructor(private redis: Redis) {}
async getOrFetch<T>(
key: string,
fetchFn: () => Promise<T>,
ttlSeconds: number = 300
): Promise<T> {
// Try cache first
const cached = await this.redis.get(key);
if (cached) {
return JSON.parse(cached);
}
// Cache miss — fetch from API
const data = await fetchFn();
// Store in cache (non-blocking)
this.redis.set(key, JSON.stringify(data), 'EX', ttlSeconds).catch(console.error);
return data;
}
async invalidate(pattern: string): Promise<void> {
const keys = await this.redis.keys(pattern);
if (keys.length > 0) {
await this.redis.del(...keys);
}
}
}
// Usage
const cache = new APICache(redis);
const products = await cache.getOrFetch(
'api:products:all',
() => fetch('https://api.example.com/products').then(r => r.json()),
300 // 5 minutes
);
Stale-While-Revalidate Pattern
class SWRCache {
async getOrFetch<T>(
key: string,
fetchFn: () => Promise<T>,
options: { maxAge: number; staleAge: number }
): Promise<T & { _fromCache?: boolean; _stale?: boolean }> {
const cached = await this.redis.get(key);
if (cached) {
const { data, timestamp } = JSON.parse(cached);
const age = (Date.now() - timestamp) / 1000;
if (age < options.maxAge) {
// Fresh cache — return immediately
return { ...data, _fromCache: true };
}
if (age < options.staleAge) {
// Stale cache — return immediately, refresh in background
this.refreshInBackground(key, fetchFn, options.maxAge);
return { ...data, _fromCache: true, _stale: true };
}
}
// No cache or expired — fetch synchronously
const data = await fetchFn();
await this.store(key, data, options.staleAge);
return data;
}
private async refreshInBackground<T>(key: string, fetchFn: () => Promise<T>, maxAge: number) {
try {
const data = await fetchFn();
await this.store(key, data, maxAge * 3);
} catch (error) {
console.error(`Background refresh failed for ${key}:`, error);
}
}
private async store(key: string, data: any, ttl: number) {
await this.redis.set(key, JSON.stringify({ data, timestamp: Date.now() }), 'EX', ttl);
}
}
// Usage: fresh for 5 min, stale for 1 hour
const products = await swrCache.getOrFetch(
'products',
fetchProducts,
{ maxAge: 300, staleAge: 3600 }
);
Layer 3: Edge Caching
Cache API responses at CDN edge locations for global low-latency:
// Cloudflare Worker — cache at edge
export default {
async fetch(request: Request): Promise<Response> {
const cacheKey = new Request(request.url, request);
const cache = caches.default;
// Check edge cache
let response = await cache.match(cacheKey);
if (response) return response;
// Cache miss — fetch from origin
response = await fetch('https://api.example.com/products');
// Clone and cache at edge
const cachedResponse = new Response(response.body, response);
cachedResponse.headers.set('Cache-Control', 'public, max-age=300');
await cache.put(cacheKey, cachedResponse.clone());
return cachedResponse;
},
};
Cache Invalidation Strategies
Time-Based (TTL)
// Simple but effective for most cases
const CACHE_TTLS = {
products: 300, // 5 min — changes infrequently
prices: 60, // 1 min — changes occasionally
inventory: 10, // 10 sec — changes frequently
user_profile: 600, // 10 min — user-specific, rarely changes
search_results: 30, // 30 sec — balances freshness and performance
static_config: 3600, // 1 hour — almost never changes
};
Event-Based
// Invalidate cache when data changes
async function updateProduct(productId: string, data: ProductUpdate) {
// Update in database
await db.products.update(productId, data);
// Invalidate related caches
await cache.invalidate(`products:${productId}`);
await cache.invalidate('products:list:*');
await cache.invalidate('products:search:*');
}
// Or via webhooks
async function handleWebhook(event: WebhookEvent) {
if (event.type === 'product.updated') {
await cache.invalidate(`products:${event.data.id}`);
}
}
Tag-Based
// Tag cache entries for group invalidation
class TaggedCache {
async set(key: string, data: any, tags: string[], ttl: number) {
await this.redis.set(key, JSON.stringify(data), 'EX', ttl);
// Store key under each tag
for (const tag of tags) {
await this.redis.sadd(`tag:${tag}`, key);
}
}
async invalidateTag(tag: string) {
const keys = await this.redis.smembers(`tag:${tag}`);
if (keys.length > 0) {
await this.redis.del(...keys);
await this.redis.del(`tag:${tag}`);
}
}
}
// Usage
await taggedCache.set('product:123', productData, ['products', 'category:electronics'], 300);
await taggedCache.set('product:456', productData, ['products', 'category:books'], 300);
// Invalidate all products
await taggedCache.invalidateTag('products');
// Or just electronics
await taggedCache.invalidateTag('category:electronics');
What to Cache (and What Not To)
| Cache? | Data Type | TTL | Reason |
|---|---|---|---|
| ✅ Yes | Product catalogs | 5-60 min | Changes infrequently |
| ✅ Yes | Search results | 30-300 sec | Same queries repeat |
| ✅ Yes | User profiles | 5-10 min | Rarely changes |
| ✅ Yes | Configuration/settings | 1-24 hours | Nearly static |
| ✅ Yes | Public API data (weather, prices) | Per API recommendation | Save API calls |
| ⚠️ Carefully | Real-time inventory | 5-30 sec | Balance freshness vs load |
| ❌ No | Financial transactions | Never | Must be real-time |
| ❌ No | Authentication tokens | Never (except for sessions) | Security risk |
| ❌ No | One-time data (OTP, verification) | Never | Security risk |
| ❌ No | Rapidly changing data | Use WebSockets instead | Cache would always be stale |
Choosing Your Cache Layer
The three-layer diagram above (browser → CDN → application) is the canonical architecture, but not every app needs every layer. The right choice depends on the nature of the data and the distribution of your users.
Use HTTP caching (Cache-Control headers) when: the response is the same for all users, the data changes on a predictable schedule, and you control the route that proxies the external API. HTTP caching is zero-infrastructure — the browser and CDN do the work. The key constraint is that Cache-Control headers work on the HTTP response level; if two different users should see different data, you can't use public HTTP caching (use private for user-specific data, or move to application-layer caching).
Use Redis application caching when: you need per-user caching (include the user ID in the cache key), you need to invalidate caches programmatically on data changes, you're calling the API from multiple server instances and need a shared cache, or you need the stale-while-revalidate pattern where you control exactly when background refresh happens. Redis adds operational complexity (you need a Redis instance, handle connection failures, manage eviction policy), but its flexibility is worth it for complex caching needs.
Use edge/CDN caching when: your users are geographically distributed and API response latency from your origin server is a bottleneck, or you're caching public content that can be shared across all users globally. Cloudflare Workers gives you programmable caching logic at the edge — you can cache some paths and bypass cache for others in a single Worker script. The limitation: CDN caches don't support server-sent events or WebSocket connections, and they're optimized for GET requests (POST/PUT/DELETE requests typically bypass cache).
Designing Good Cache Keys
A cache key design determines who shares the cached value and how granular your cache invalidation can be. Bad cache keys are either too broad (you serve the wrong data to someone) or too narrow (you cache the same data 10,000 times with slightly different keys, wasting memory and defeating the purpose of caching).
Include only what changes the response: If two API calls with different User-Agent headers return identical data, User-Agent shouldn't be in your cache key. Include: the URL path, any query parameters that affect the response, the user ID (if the response is user-specific), and the API version. Exclude: request headers that don't affect the response (User-Agent, Accept-Language if you're not doing localization), timestamps, and request IDs.
Use hierarchical keys for flexible invalidation: Structure your Redis keys as resource:id:subresource (e.g., product:123:reviews, product:123:details, user:456:profile). This lets you invalidate all data related to product 123 with a pattern match (redis.del('product:123:*')), or invalidate just product 123's reviews without touching other product data. Flat cache keys (e.g., productReviews123) make selective invalidation much harder.
Hash long or complex cache keys: Cache keys have practical length limits (Redis keys max at 512MB, but 100-200 bytes is a practical ceiling for readability and network efficiency). For complex cache keys (long URL with many query params, GraphQL query text), hash the key before storage: const key = 'query:' + crypto.createHash('sha256').update(queryString).digest('hex').slice(0, 16). Include a human-readable prefix so you can identify what's cached when debugging.
Cache Warming and Cold Start
A cold cache — the state immediately after deployment or cache flush — can cause a thundering herd: hundreds of requests arrive simultaneously, all miss the cache, and all hit the origin API at the same time. This is a common cause of "works in staging, breaks on deploy" issues.
Pre-warm critical caches at deploy time: Before routing traffic to a new deployment, run a warm-up script that fetches and caches the most frequently requested resources. For an e-commerce site, this might be the top 100 products, all category pages, and the site configuration. This ensures the cache has content before users arrive, rather than having the first users bear the cost of cold cache misses.
Stagger cache expiration: If you set the same TTL (300 seconds) on thousands of cache entries created at the same time, they'll all expire simultaneously — causing a synchronized thundering herd. Add a small random jitter to TTLs: instead of ttl: 300, use ttl: 300 + Math.floor(Math.random() * 60). The 60-second spread distributes expirations across a minute, smoothing the load on your origin API.
Use probabilistic early expiration (PER): Instead of waiting for a cache entry to expire before fetching a new value, begin refreshing the cache slightly before expiration using the "probabilistic early expiration" technique. As the entry approaches its TTL, increasingly likely background refreshes prevent the cache miss spike. This is more complex than simple TTL but eliminates the cold cache thundering herd problem entirely for high-traffic scenarios.
Methodology
Redis 7.x introduced key expiration notifications that are useful for cache invalidation workflows; configure notify-keyspace-events Ex to receive events when keys expire. The ioredis npm package (v5.x) is the recommended Redis client for Node.js production use — it handles cluster mode, automatic reconnection, and pipeline batching. Cloudflare Workers Cache API follows the Service Worker cache spec; the Cache-Control: s-maxage directive controls CDN TTL separately from the browser max-age. The stale-while-revalidate and stale-if-error directives in the Cache-Control header are part of RFC 5861 and supported by Cloudflare, Fastly, and CloudFront, but not by all CDN providers — verify support for your specific CDN before relying on these directives.
| No cache invalidation strategy | Serving stale data indefinitely | Set appropriate TTLs, invalidate on write | | Caching error responses | Users get errors from cache | Only cache 2xx responses | | Cache key doesn't include all params | Wrong data returned | Include all query params in cache key | | No fallback when cache is down | Error instead of slow response | Fallback to direct API call | | Over-caching real-time data | Users see outdated info | Short TTL or no cache for real-time |
Compare API caching strategies and CDN options on APIScout — find the best edge caching solutions for your API integrations.
Related: Building an AI Agent in 2026, Building an AI-Powered App: Choosing Your API Stack, Building an API Marketplace