API Caching Strategies: HTTP to Redis 2026
API Caching Strategies: HTTP to Redis 2026
Caching is the most impactful performance optimization for APIs. A cache hit avoids database queries, computation, and network round-trips. The challenge isn't adding caching — it's invalidating caches correctly. There are three layers of caching for APIs, each serving different needs.
TL;DR
- Cache at three layers: HTTP headers (browser/proxy), CDN edge (global distribution), and Redis (application logic)
Cache-ControlandETagheaders are free performance wins that require no infrastructure changes- Cache-aside with Redis is the most flexible pattern; use key naming conventions and TTL discipline from day one
- The hardest problem is invalidation — event-based invalidation beats pure TTL for data accuracy
- GraphQL requires a different approach since all requests are POST; use persisted queries and DataLoader
Layer 1: HTTP Caching
Cache-Control Headers
Cache-Control: public, max-age=3600, stale-while-revalidate=60
| Directive | Meaning |
|---|---|
public | Any cache (CDN, browser, proxy) can store |
private | Only the client (browser) can cache |
no-cache | Must revalidate before using cached version |
no-store | Don't cache at all (sensitive data) |
max-age=3600 | Cache for 3600 seconds (1 hour) |
stale-while-revalidate=60 | Serve stale content while revalidating in background |
must-revalidate | Cache must not use stale content after expiry |
ETag for Conditional Requests
# First request
GET /api/products/123
→ 200 OK
ETag: "abc123"
# Subsequent request
GET /api/products/123
If-None-Match: "abc123"
→ 304 Not Modified (no body, save bandwidth)
ETags enable conditional requests — the server only sends the response body if the content changed. Saves bandwidth without serving stale data.
What to Cache at HTTP Level
| Endpoint Type | Cache Strategy |
|---|---|
| Static reference data | public, max-age=86400 (24h) |
| Product listings | public, max-age=300, stale-while-revalidate=60 (5min) |
| User-specific data | private, max-age=60 (1min) |
| Search results | public, max-age=60 (1min) |
| Real-time data | no-store or max-age=0 |
| Sensitive data | no-store, private |
Layer 2: CDN Caching
CDNs cache responses at edge locations worldwide. Requests are served from the nearest edge node without hitting your origin server.
CDN Cache Keys
By default, CDNs cache by URL. For APIs, you may need to cache by additional factors:
Cache key = URL + Accept header + Authorization (for per-user caching)
Use Vary header to tell CDNs which request headers affect the response:
Vary: Accept, Accept-Encoding, Authorization
Surrogate Keys (Cache Tags)
Tag cached responses so you can invalidate groups of related content:
Surrogate-Key: product-123 category-electronics user-456
Invalidate all products in a category: PURGE Surrogate-Key: category-electronics
Fastly and Cloudflare support surrogate keys for instant selective purging.
Layer 3: Application Cache (Redis)
For data that changes frequently or requires computation, cache at the application level.
Cache-Aside Pattern
1. Check Redis for key
2. Cache hit → return cached data
3. Cache miss → query database
4. Store result in Redis with TTL
5. Return data
Common Cached Data
| Data | TTL | Invalidation |
|---|---|---|
| API rate limit counters | 1 minute (sliding window) | TTL |
| Session data | 30 minutes | On logout |
| User profile | 5 minutes | On profile update |
| Search results | 1 minute | TTL |
| Computed aggregations | 15 minutes | TTL + event |
| Feature flags | 30 seconds | On flag update |
Cache Invalidation
The two hardest problems in computer science: cache invalidation and naming things.
Strategies
1. TTL-based (Time-to-Live) Set an expiration. Simple, predictable, eventually consistent.
2. Event-based invalidation When data changes, publish an event that invalidates the cache.
User updated → delete cache key "user:123"
3. Write-through Update cache at the same time as the database. Cache is always fresh.
4. Cache versioning Include a version in the cache key. New version = new key = fresh cache.
cache_key = "user:123:v5"
The Stale Content Problem
Sometimes serving slightly stale data is better than waiting for fresh data. Use stale-while-revalidate in HTTP caching and background refresh in application caching.
Common Mistakes
| Mistake | Impact | Fix |
|---|---|---|
| Caching authenticated responses publicly | Data leak between users | Cache-Control: private for user data |
| No Vary header | CDN serves wrong response | Add Vary for all relevant headers |
| Cache key too broad | Low hit rate | Include relevant query parameters |
| Cache key too narrow | Cache duplication | Normalize query parameters |
| No TTL on application cache | Stale data forever | Always set TTL |
| Caching errors | Persistent failures | Don't cache 4xx/5xx responses |
Redis Implementation Patterns
Redis is the de facto choice for application-layer caching. Whether you use self-hosted Redis, AWS ElastiCache, or Upstash (serverless Redis), the implementation patterns are the same. Here is what a production-grade cache-aside implementation looks like using ioredis:
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL!);
interface CacheOptions {
ttl?: number; // seconds
keyPrefix?: string;
}
async function withCache<T>(
key: string,
fetchFn: () => Promise<T>,
options: CacheOptions = {}
): Promise<T> {
const { ttl = 300, keyPrefix = 'api' } = options;
const cacheKey = `${keyPrefix}:${key}`;
// Check cache first
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached) as T;
}
// Cache miss — fetch from source
const data = await fetchFn();
// Store with TTL (fire and forget)
redis.setex(cacheKey, ttl, JSON.stringify(data)).catch(console.error);
return data;
}
// Usage
const user = await withCache(
`user:${userId}`,
() => db.users.findUnique({ where: { id: userId } }),
{ ttl: 300, keyPrefix: 'v2' }
);
Key naming conventions are critical for operational sanity. Use colon-delimited hierarchical keys: {app}:{version}:{resource}:{id}. This lets you scan and delete by pattern (SCAN 0 MATCH api:v2:user:*) and gives you namespace isolation between application versions. Avoid flat keys like user123 — they become unmanageable at scale.
Cache stampede prevention (also called Dogpile prevention) is a real production problem. When a popular cache key expires, dozens of simultaneous requests all miss the cache and hit the database at once, potentially causing a cascade failure. The probabilistic early expiry technique solves this elegantly:
function shouldRecompute(ttl: number, delta: number, beta: number = 1): boolean {
// Probabilistically expire before actual TTL
// Higher beta = more aggressive early expiry
return Date.now() / 1000 - delta * beta * Math.log(Math.random()) >= ttl;
}
async function withProbabilisticCache<T>(
key: string,
fetchFn: () => Promise<T>,
ttl: number
): Promise<T> {
const stored = await redis.hgetall(key);
if (stored.value && !shouldRecompute(
parseInt(stored.expires),
parseInt(stored.delta)
)) {
return JSON.parse(stored.value) as T;
}
const start = Date.now();
const data = await fetchFn();
const delta = (Date.now() - start) / 1000; // compute time in seconds
const expires = Date.now() / 1000 + ttl;
await redis.hmset(key, {
value: JSON.stringify(data),
expires: expires.toString(),
delta: delta.toString(),
});
await redis.expire(key, ttl + Math.ceil(delta * 3));
return data;
}
The key insight is that slower-to-compute values should start their recomputation earlier, because the cost of a stampede is proportional to the time to recompute.
For Upstash users (serverless-friendly Redis), the API is identical to ioredis but HTTP-based, which works in edge runtimes like Cloudflare Workers and Vercel Edge Functions where TCP connections are not available.
CDN Caching in Practice
CDN caching for APIs is often underutilized because developers assume CDNs are only for static files. Modern CDNs like Cloudflare can cache API responses just as effectively as HTML pages, and the performance gains are substantial — edge nodes can serve cached API responses in under 10ms from anywhere in the world.
Cloudflare Cache Rules (formerly Page Rules) let you control caching behavior by URL pattern without modifying your API code. For example, to cache all GET /api/v1/products* responses for 5 minutes, you create a Cache Rule that matches the URL pattern and sets Cache-Control: public, max-age=300. This works independently of what your API origin returns.
For selective purging, use Cloudflare Cache Tags (the Cache-Tag header, which is Cloudflare's version of surrogate keys):
// Set cache tags in your API response
res.setHeader('Cache-Tag', `product-${product.id},category-${product.categoryId}`);
res.setHeader('Cache-Control', 'public, max-age=300');
res.json(product);
// Purge by tag when product is updated
async function purgeProductCache(productId: string, categoryId: string) {
await fetch(
`https://api.cloudflare.com/client/v4/zones/${ZONE_ID}/purge_cache`,
{
method: 'POST',
headers: {
Authorization: `Bearer ${CF_TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ tags: [`product-${productId}`, `category-${categoryId}`] }),
}
);
}
Cache tags let you purge all product listings across all CDN edges in under one second. This is far more surgical than purging by URL or purging everything. For an API with thousands of cached endpoints, cache tags are the difference between useful caching and needing to disable caching because invalidation is too blunt.
CDN cache analytics (Cloudflare Analytics or Fastly's cache reporting) should show your cache hit ratio by URL pattern. A cache hit ratio below 60% for public endpoints means your cache keys are too specific or your TTLs are too short. A hit ratio above 95% means you might be caching too aggressively — check whether fresh data is actually reaching clients when it should.
Caching GraphQL APIs
HTTP caching fundamentally does not work for most GraphQL deployments because GraphQL sends all requests as HTTP POST to a single endpoint (/graphql). CDNs treat POST requests as non-cacheable by default, and even if you forced caching, the POST body determines what data is returned — two requests to the same URL might request completely different data.
The standard solutions are persisted queries and field-level caching.
Persisted queries pre-register query documents on the server and replace the query body with a hash identifier. Instead of sending the full query text, the client sends ?queryId=abc123&variables=... as a GET request. GET requests are cacheable. Apollo Server and GraphQL Yoga both support automatic persisted queries. The workflow is: the client sends the query hash, the server checks if it knows the query. If not, the client sends the full query to register it, and future requests use the hash.
DataLoader solves the N+1 query problem and is the primary tool for field-level caching in GraphQL resolvers. It batches and deduplicates database calls within a single request. But DataLoader can also be backed by Redis for cross-request caching:
import DataLoader from 'dataloader';
// Per-request DataLoader (deduplicates within request)
function createUserLoader() {
return new DataLoader<string, User>(async (ids) => {
const users = await db.users.findMany({ where: { id: { in: [...ids] } } });
return ids.map(id => users.find(u => u.id === id) ?? new Error(`User ${id} not found`));
});
}
// With Redis backing (cross-request caching)
function createCachedUserLoader() {
return new DataLoader<string, User>(async (ids) => {
const cacheKeys = ids.map(id => `user:${id}`);
const cached = await redis.mget(...cacheKeys);
const results = await Promise.all(
ids.map(async (id, i) => {
if (cached[i]) return JSON.parse(cached[i]!) as User;
const user = await db.users.findUnique({ where: { id } });
if (user) await redis.setex(`user:${id}`, 300, JSON.stringify(user));
return user ?? new Error(`User ${id} not found`);
})
);
return results;
});
}
Apollo Client's normalized cache (InMemoryCache) provides client-side caching for GraphQL responses based on object identity (__typename + id). Understanding how Apollo Client's cache works — and when it serves stale data vs refetches — is essential for building responsive GraphQL UIs.
Caching with Stale-While-Revalidate
The stale-while-revalidate directive is one of the most developer-friendly cache patterns available. When a cached response is stale (past max-age) but within the stale-while-revalidate window, the CDN or browser serves the stale response immediately while triggering a background revalidation. The user sees no latency — they get the slightly-stale cached response while a fresh version is being fetched in the background.
Cache-Control: public, max-age=60, stale-while-revalidate=120
This tells caches: serve the cached response for 60 seconds without question, and if the request arrives between 60 and 180 seconds after caching, serve the stale response but revalidate in the background. Only after 180 seconds will the request block on fresh data.
For most API endpoints, data that is 60-180 seconds old is completely acceptable. Product prices, user profiles, article content — these rarely need to be fresh to the millisecond. Stale-while-revalidate eliminates the "thundering herd on cache expiry" problem at the HTTP layer because fresh requests do not block, they just queue a background update.
In Node.js application servers, you can implement the same pattern manually:
interface RevalidatingCache<T> {
value: T;
expiresAt: number;
revalidatingAt: number;
revalidating: boolean;
}
const cache = new Map<string, RevalidatingCache<unknown>>();
async function staleWhileRevalidate<T>(
key: string,
fetchFn: () => Promise<T>,
maxAge: number, // seconds before stale
swrWindow: number // additional seconds to serve stale
): Promise<T> {
const now = Date.now() / 1000;
const entry = cache.get(key) as RevalidatingCache<T> | undefined;
if (entry) {
const isFresh = now < entry.expiresAt;
const isInSWRWindow = now < entry.revalidatingAt;
if (isFresh) return entry.value;
if (isInSWRWindow && !entry.revalidating) {
// Serve stale, revalidate in background
entry.revalidating = true;
fetchFn().then(data => {
cache.set(key, {
value: data,
expiresAt: Date.now() / 1000 + maxAge,
revalidatingAt: Date.now() / 1000 + maxAge + swrWindow,
revalidating: false,
});
}).catch(() => { entry.revalidating = false; });
return entry.value;
}
}
// No cache or expired — fetch fresh
const data = await fetchFn();
cache.set(key, {
value: data,
expiresAt: Date.now() / 1000 + maxAge,
revalidatingAt: Date.now() / 1000 + maxAge + swrWindow,
revalidating: false,
});
return data;
}
This pattern is why stale-while-revalidate is superior to plain TTL for user-facing APIs. Users see consistent, fast responses. Data stays reasonably fresh. And cache expiry is not a latency event.
Measuring Cache Effectiveness
Caching without measurement is guesswork. The primary metric is cache hit ratio: what percentage of requests are served from cache versus hitting your origin. A hit ratio below 50% for public endpoints suggests either your TTLs are too short, your cache keys are too granular, or you're caching too many unique endpoints.
For Redis, monitor memory usage and eviction rate. If Redis is evicting keys under memory pressure (maxmemory-policy set to allkeys-lru or similar), your most-accessed keys may be getting evicted just before they're needed. Either increase Redis memory allocation or audit which keys are consuming the most memory — redis-cli --hotkeys identifies frequently accessed keys, and MEMORY USAGE key shows per-key memory cost.
Impact on p99 latency is the most important business metric for cache effectiveness. A cache hit on a slow database query can reduce latency from 200ms to under 5ms. Track your p99 latency for cached vs uncached endpoints separately, and track how your overall p99 changes as your cache hit ratio improves. For a comprehensive look at API latency measurement, see our guide on how to monitor API performance.
CDN cache analytics (Cloudflare Analytics dashboard, or the Cache-Status header on individual responses: cf-cache-status: HIT vs MISS vs EXPIRED) give you visibility into edge cache effectiveness. High MISS rates on cacheable endpoints mean your cache keys or Cache-Control configuration needs tuning.
Conclusion
Effective API caching requires a layered strategy. HTTP caching with Cache-Control and ETag is free performance that should be applied to every public endpoint. CDN caching with cache tags provides global distribution with surgical invalidation. Redis application caching handles user-specific data, rate limiting, and computed results that HTTP caching can't touch.
The common thread is intention: decide what cacheability each endpoint has, set appropriate headers and TTLs, implement event-driven invalidation for data that changes in real time, and measure hit ratios to verify that your caching strategy is actually working. For more on building performant APIs, see our guide on API rate limiting best practices and browse the full API tooling directory.
Related: API Testing Strategies for 2026, API Versioning Strategies, How AI Is Transforming API Design and Documentation