How to Cache API Responses for Better Performance

The fastest API call is the one you don't make. Caching API responses reduces latency, lowers costs, improves reliability, and keeps your app fast when the API is slow. But cache wrong and you serve stale data, break real-time features, or introduce bugs.

This guide covers caching in three layers — browser/HTTP, application (Redis), and edge/CDN — and the invalidation strategies that keep each layer fresh. The layers aren't mutually exclusive: a production app typically uses all three in combination, with different data types cached at different layers with different TTLs. The key skills are knowing which layer to add for each problem, and how to design cache keys that allow precise invalidation when data changes.

Why Cache API Responses?

Benefit	Impact
Latency	200ms API call → <5ms cache hit
Cost	50-80% fewer API calls = 50-80% lower API bill
Reliability	Serve cached data when API is down
Rate limits	Fewer requests = stay under limits
User experience	Instant responses feel native

Caching Layers

User Request
    │
    ▼
┌──────────────────┐
│  Browser Cache    │  Cache-Control headers, Service Worker
│  (0ms latency)   │
└────────┬─────────┘
         │ miss
         ▼
┌──────────────────┐
│  CDN / Edge Cache │  Cloudflare, CloudFront, Fastly
│  (5-20ms latency) │
└────────┬─────────┘
         │ miss
         ▼
┌──────────────────┐
│  Application Cache│  Redis, Memcached, in-memory
│  (1-10ms latency) │
└────────┬─────────┘
         │ miss
         ▼
┌──────────────────┐
│  API Call         │  Third-party API
│  (50-500ms)       │
└──────────────────┘

Layer 1: HTTP Caching

Use Cache-Control headers — the browser and CDN do the work for you.

// Your API route that proxies a third-party API
export async function GET(request: Request) {
  const data = await fetch('https://api.example.com/products');
  const products = await data.json();

  return Response.json(products, {
    headers: {
      // Cache in browser for 60 seconds
      'Cache-Control': 'public, max-age=60',

      // Cache at CDN for 5 minutes, serve stale while revalidating
      'CDN-Cache-Control': 'public, max-age=300, stale-while-revalidate=600',

      // ETag for conditional requests
      'ETag': `"${hashResponse(products)}"`,
    },
  });
}

Cache-Control Directives

Directive	What It Does	Use When
`public, max-age=60`	Cache everywhere for 60s	Static data, not user-specific
`private, max-age=300`	Cache in browser only for 5min	User-specific data
`no-store`	Never cache	Sensitive data (balances, auth)
`stale-while-revalidate=60`	Serve stale, fetch fresh in background	Most API responses
`s-maxage=300`	CDN caches for 5min (browser uses max-age)	CDN-specific TTL

Layer 2: Application Cache (Redis)

import { Redis } from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

class APICache {
  constructor(private redis: Redis) {}

  async getOrFetch<T>(
    key: string,
    fetchFn: () => Promise<T>,
    ttlSeconds: number = 300
  ): Promise<T> {
    // Try cache first
    const cached = await this.redis.get(key);
    if (cached) {
      return JSON.parse(cached);
    }

    // Cache miss — fetch from API
    const data = await fetchFn();

    // Store in cache (non-blocking)
    this.redis.set(key, JSON.stringify(data), 'EX', ttlSeconds).catch(console.error);

    return data;
  }

  async invalidate(pattern: string): Promise<void> {
    const keys = await this.redis.keys(pattern);
    if (keys.length > 0) {
      await this.redis.del(...keys);
    }
  }
}

// Usage
const cache = new APICache(redis);

const products = await cache.getOrFetch(
  'api:products:all',
  () => fetch('https://api.example.com/products').then(r => r.json()),
  300 // 5 minutes
);

Stale-While-Revalidate Pattern

class SWRCache {
  async getOrFetch<T>(
    key: string,
    fetchFn: () => Promise<T>,
    options: { maxAge: number; staleAge: number }
  ): Promise<T & { _fromCache?: boolean; _stale?: boolean }> {
    const cached = await this.redis.get(key);

    if (cached) {
      const { data, timestamp } = JSON.parse(cached);
      const age = (Date.now() - timestamp) / 1000;

      if (age < options.maxAge) {
        // Fresh cache — return immediately
        return { ...data, _fromCache: true };
      }

      if (age < options.staleAge) {
        // Stale cache — return immediately, refresh in background
        this.refreshInBackground(key, fetchFn, options.maxAge);
        return { ...data, _fromCache: true, _stale: true };
      }
    }

    // No cache or expired — fetch synchronously
    const data = await fetchFn();
    await this.store(key, data, options.staleAge);
    return data;
  }

  private async refreshInBackground<T>(key: string, fetchFn: () => Promise<T>, maxAge: number) {
    try {
      const data = await fetchFn();
      await this.store(key, data, maxAge * 3);
    } catch (error) {
      console.error(`Background refresh failed for ${key}:`, error);
    }
  }

  private async store(key: string, data: any, ttl: number) {
    await this.redis.set(key, JSON.stringify({ data, timestamp: Date.now() }), 'EX', ttl);
  }
}

// Usage: fresh for 5 min, stale for 1 hour
const products = await swrCache.getOrFetch(
  'products',
  fetchProducts,
  { maxAge: 300, staleAge: 3600 }
);

Layer 3: Edge Caching

Cache API responses at CDN edge locations for global low-latency:

// Cloudflare Worker — cache at edge
export default {
  async fetch(request: Request): Promise<Response> {
    const cacheKey = new Request(request.url, request);
    const cache = caches.default;

    // Check edge cache
    let response = await cache.match(cacheKey);
    if (response) return response;

    // Cache miss — fetch from origin
    response = await fetch('https://api.example.com/products');

    // Clone and cache at edge
    const cachedResponse = new Response(response.body, response);
    cachedResponse.headers.set('Cache-Control', 'public, max-age=300');
    await cache.put(cacheKey, cachedResponse.clone());

    return cachedResponse;
  },
};

Cache Invalidation Strategies

Time-Based (TTL)

// Simple but effective for most cases
const CACHE_TTLS = {
  products: 300,        // 5 min — changes infrequently
  prices: 60,           // 1 min — changes occasionally
  inventory: 10,        // 10 sec — changes frequently
  user_profile: 600,    // 10 min — user-specific, rarely changes
  search_results: 30,   // 30 sec — balances freshness and performance
  static_config: 3600,  // 1 hour — almost never changes
};

Event-Based

// Invalidate cache when data changes
async function updateProduct(productId: string, data: ProductUpdate) {
  // Update in database
  await db.products.update(productId, data);

  // Invalidate related caches
  await cache.invalidate(`products:${productId}`);
  await cache.invalidate('products:list:*');
  await cache.invalidate('products:search:*');
}

// Or via webhooks
async function handleWebhook(event: WebhookEvent) {
  if (event.type === 'product.updated') {
    await cache.invalidate(`products:${event.data.id}`);
  }
}

Tag-Based

// Tag cache entries for group invalidation
class TaggedCache {
  async set(key: string, data: any, tags: string[], ttl: number) {
    await this.redis.set(key, JSON.stringify(data), 'EX', ttl);

    // Store key under each tag
    for (const tag of tags) {
      await this.redis.sadd(`tag:${tag}`, key);
    }
  }

  async invalidateTag(tag: string) {
    const keys = await this.redis.smembers(`tag:${tag}`);
    if (keys.length > 0) {
      await this.redis.del(...keys);
      await this.redis.del(`tag:${tag}`);
    }
  }
}

// Usage
await taggedCache.set('product:123', productData, ['products', 'category:electronics'], 300);
await taggedCache.set('product:456', productData, ['products', 'category:books'], 300);

// Invalidate all products
await taggedCache.invalidateTag('products');

// Or just electronics
await taggedCache.invalidateTag('category:electronics');

What to Cache (and What Not To)

Cache?	Data Type	TTL	Reason
✅ Yes	Product catalogs	5-60 min	Changes infrequently
✅ Yes	Search results	30-300 sec	Same queries repeat
✅ Yes	User profiles	5-10 min	Rarely changes
✅ Yes	Configuration/settings	1-24 hours	Nearly static
✅ Yes	Public API data (weather, prices)	Per API recommendation	Save API calls
⚠️ Carefully	Real-time inventory	5-30 sec	Balance freshness vs load
❌ No	Financial transactions	Never	Must be real-time
❌ No	Authentication tokens	Never (except for sessions)	Security risk
❌ No	One-time data (OTP, verification)	Never	Security risk
❌ No	Rapidly changing data	Use WebSockets instead	Cache would always be stale

Choosing Your Cache Layer

The three-layer diagram above (browser → CDN → application) is the canonical architecture, but not every app needs every layer. The right choice depends on the nature of the data and the distribution of your users.

Use HTTP caching (Cache-Control headers) when: the response is the same for all users, the data changes on a predictable schedule, and you control the route that proxies the external API. HTTP caching is zero-infrastructure — the browser and CDN do the work. The key constraint is that Cache-Control headers work on the HTTP response level; if two different users should see different data, you can't use public HTTP caching (use private for user-specific data, or move to application-layer caching).

Use Redis application caching when: you need per-user caching (include the user ID in the cache key), you need to invalidate caches programmatically on data changes, you're calling the API from multiple server instances and need a shared cache, or you need the stale-while-revalidate pattern where you control exactly when background refresh happens. Redis adds operational complexity (you need a Redis instance, handle connection failures, manage eviction policy), but its flexibility is worth it for complex caching needs.

Use edge/CDN caching when: your users are geographically distributed and API response latency from your origin server is a bottleneck, or you're caching public content that can be shared across all users globally. Cloudflare Workers gives you programmable caching logic at the edge — you can cache some paths and bypass cache for others in a single Worker script. The limitation: CDN caches don't support server-sent events or WebSocket connections, and they're optimized for GET requests (POST/PUT/DELETE requests typically bypass cache).

Designing Good Cache Keys

A cache key design determines who shares the cached value and how granular your cache invalidation can be. Bad cache keys are either too broad (you serve the wrong data to someone) or too narrow (you cache the same data 10,000 times with slightly different keys, wasting memory and defeating the purpose of caching).

Include only what changes the response: If two API calls with different User-Agent headers return identical data, User-Agent shouldn't be in your cache key. Include: the URL path, any query parameters that affect the response, the user ID (if the response is user-specific), and the API version. Exclude: request headers that don't affect the response (User-Agent, Accept-Language if you're not doing localization), timestamps, and request IDs.

Use hierarchical keys for flexible invalidation: Structure your Redis keys as resource:id:subresource (e.g., product:123:reviews, product:123:details, user:456:profile). This lets you invalidate all data related to product 123 with a pattern match (redis.del('product:123:*')), or invalidate just product 123's reviews without touching other product data. Flat cache keys (e.g., productReviews123) make selective invalidation much harder.

Hash long or complex cache keys: Cache keys have practical length limits (Redis keys max at 512MB, but 100-200 bytes is a practical ceiling for readability and network efficiency). For complex cache keys (long URL with many query params, GraphQL query text), hash the key before storage: const key = 'query:' + crypto.createHash('sha256').update(queryString).digest('hex').slice(0, 16). Include a human-readable prefix so you can identify what's cached when debugging.

Cache Warming and Cold Start

A cold cache — the state immediately after deployment or cache flush — can cause a thundering herd: hundreds of requests arrive simultaneously, all miss the cache, and all hit the origin API at the same time. This is a common cause of "works in staging, breaks on deploy" issues.

Pre-warm critical caches at deploy time: Before routing traffic to a new deployment, run a warm-up script that fetches and caches the most frequently requested resources. For an e-commerce site, this might be the top 100 products, all category pages, and the site configuration. This ensures the cache has content before users arrive, rather than having the first users bear the cost of cold cache misses.

Stagger cache expiration: If you set the same TTL (300 seconds) on thousands of cache entries created at the same time, they'll all expire simultaneously — causing a synchronized thundering herd. Add a small random jitter to TTLs: instead of ttl: 300, use ttl: 300 + Math.floor(Math.random() * 60). The 60-second spread distributes expirations across a minute, smoothing the load on your origin API.

Use probabilistic early expiration (PER): Instead of waiting for a cache entry to expire before fetching a new value, begin refreshing the cache slightly before expiration using the "probabilistic early expiration" technique. As the entry approaches its TTL, increasingly likely background refreshes prevent the cache miss spike. This is more complex than simple TTL but eliminates the cold cache thundering herd problem entirely for high-traffic scenarios.

Methodology

Redis 7.x introduced key expiration notifications that are useful for cache invalidation workflows; configure notify-keyspace-events Ex to receive events when keys expire. The ioredis npm package (v5.x) is the recommended Redis client for Node.js production use — it handles cluster mode, automatic reconnection, and pipeline batching. Cloudflare Workers Cache API follows the Service Worker cache spec; the Cache-Control: s-maxage directive controls CDN TTL separately from the browser max-age. The stale-while-revalidate and stale-if-error directives in the Cache-Control header are part of RFC 5861 and supported by Cloudflare, Fastly, and CloudFront, but not by all CDN providers — verify support for your specific CDN before relying on these directives.

Compare API caching strategies and CDN options on APIScout — find the best edge caching solutions for your API integrations.

How to Cache API Responses for Performance 2026