API Gateway Patterns for Microservices 2026

An API gateway sits between clients and your microservices, handling cross-cutting concerns — routing, authentication, rate limiting, request transformation, and aggregation. Without a gateway, every microservice implements these concerns independently (or doesn't).

TL;DR

A gateway should be thin: routing, auth, rate limiting, and observability — no business logic
Kong is the default self-hosted choice; Zuplo is the developer-first cloud option; Envoy/Istio if you need a full service mesh
The Backend for Frontend (BFF) pattern — a dedicated gateway per client type — is more maintainable than one monolithic aggregator
JWT validation at the gateway (not in each service) eliminates duplicated auth logic; pass identity downstream via headers
Add OpenTelemetry tracing at the gateway from day one — retrofitting distributed tracing later is painful

Core Gateway Patterns

1. Reverse Proxy / Router

The simplest pattern. The gateway routes requests to the correct service based on URL path, headers, or other criteria.

Client → Gateway → /users/* → User Service
                  → /orders/* → Order Service
                  → /products/* → Product Service

Use when: You want a single entry point without duplicating routing logic across services.

2. API Aggregation (Backend for Frontend)

The gateway combines data from multiple services into a single response. The client makes one request instead of three.

Client: GET /api/dashboard
Gateway:
  → GET user-service/profile
  → GET order-service/recent
  → GET analytics-service/metrics
  → Combine → Single response

Use when: Frontend pages need data from multiple services. Reduces client-side complexity and round trips.

3. Authentication Gateway

The gateway handles authentication (JWT validation, API key verification) so individual services don't need to. The gateway passes the validated identity downstream via headers.

Client → Gateway (validate JWT) → X-User-Id: 123 → Service

Use when: You want centralized authentication instead of each service validating tokens independently.

4. Rate Limiting Gateway

The gateway enforces rate limits before requests reach your services. Protects backend services from abuse and ensures fair usage.

Use when: Multiple services need consistent rate limiting policies.

5. Request/Response Transformation

The gateway transforms requests and responses — header manipulation, body modification, protocol translation (REST → gRPC), and response filtering.

Client (REST/JSON) → Gateway (transform) → Service (gRPC/Protobuf)

Use when: Internal services use different protocols than external clients expect.

6. Circuit Breaker

The gateway monitors service health. When a service fails repeatedly, the gateway "opens the circuit" — returning cached responses or errors without forwarding requests to the failing service.

Use when: Cascading failures are a risk (one slow service brings down everything).

Gateway vs Service Mesh

Concern	API Gateway	Service Mesh (Istio/Linkerd)
Position	Edge (external traffic)	Internal (service-to-service)
Auth	External client auth	mTLS between services
Rate limiting	Per-client limits	Per-service limits
Routing	URL path, host, headers	Service name, labels
Observability	External request metrics	Internal traffic metrics
Aggregation	✅ Yes	❌ No
Complexity	Medium	High

Most architectures use both: Gateway at the edge for external traffic + service mesh for internal communication.

Choosing a Gateway

Gateway	Best For	Type
Kong	Self-hosted, plugin ecosystem	Open source
Zuplo	Edge-deployed, developer-first	Cloud
AWS API Gateway	Serverless + Lambda	Cloud
Envoy	Service mesh, high performance	Open source
Traefik	Kubernetes-native	Open source
NGINX	Simple reverse proxy	Open source
Cloudflare	Edge + security	Cloud

Anti-Patterns

Anti-Pattern	Problem	Solution
Business logic in gateway	Gateway becomes a monolith	Keep gateway thin — routing, auth, rate limiting only
Gateway as single point of failure	One gateway failure = total outage	Multiple gateway instances, health checks
Over-aggregation	Gateway becomes tightly coupled to all services	Limit aggregation to BFF patterns
No circuit breaker	Slow service blocks gateway threads	Implement timeouts and circuit breakers
Gateway per team	Inconsistent policies, management overhead	Shared gateway with per-team configuration

Kong Configuration Example

Kong's declarative configuration (deck) lets you define routes, services, and plugins as code. Here's a real-world configuration for an API with rate limiting and JWT authentication:

# kong.yaml — declarative configuration via deck
_format_version: "3.0"

services:
  - name: user-service
    url: http://user-service:8080
    routes:
      - name: users-route
        paths:
          - /api/users
        methods:
          - GET
          - POST
          - PATCH
        plugins:
          - name: jwt
            config:
              key_claim_name: iss
              secret_is_base64: false
          - name: rate-limiting
            config:
              minute: 1000
              policy: redis
              redis_host: redis
              redis_port: 6379
              identifier: consumer

  - name: orders-service
    url: http://orders-service:8080
    routes:
      - name: orders-route
        paths:
          - /api/orders
        plugins:
          - name: jwt
            config:
              key_claim_name: iss
          - name: rate-limiting
            config:
              minute: 500
              policy: redis
              redis_host: redis
              redis_port: 6379

The JWT plugin validates tokens using configured secrets or JWKS endpoints. After validation, Kong passes X-Consumer-Username and X-Authenticated-Userid headers downstream. Services trust these headers without implementing auth themselves.

Apply this config with:

deck sync --config kong.yaml

Keep your Kong config in Git. Treat it like infrastructure code — pull requests, code review, CI validation before deployment.

Backend for Frontend (BFF) Pattern

The Backend for Frontend pattern creates a dedicated gateway layer per client type, rather than one monolithic API gateway that tries to serve all clients. Each BFF is purpose-built for its client's data needs.

The motivation comes from a real problem: mobile apps need different data shapes than desktop web apps, which need different shapes than third-party integrations. A single aggregation layer becomes a tangle of conditional logic — "if mobile, return abbreviated user; if desktop, return full user; if third-party, return only public fields."

Instead, you build separate BFFs:

Mobile App → Mobile BFF (GraphQL or REST)
                 ↓ fan out to
             User Service, Feed Service, Notifications Service

Web App    → Web BFF (REST/GraphQL)
                 ↓ fan out to
             User Service, Dashboard Service, Analytics Service

Partner    → Public API Gateway (REST, versioned)
                 ↓ routed to
             Stable public endpoints

GraphQL as a BFF

GraphQL works particularly well as a BFF because it lets clients request exactly the fields they need. The BFF implements resolvers that fetch from multiple microservices:

// Web BFF — GraphQL resolver example
const resolvers = {
  Query: {
    dashboard: async (_, __, { userId }) => {
      // Parallel fetching from multiple services
      const [profile, recentOrders, metrics] = await Promise.all([
        userService.getProfile(userId),
        orderService.getRecent(userId, { limit: 5 }),
        analyticsService.getDashboardMetrics(userId),
      ]);

      return { profile, recentOrders, metrics };
    },
  },
};

One GraphQL query from the web app hits the BFF, which fans out to three microservices in parallel and assembles the response. The underlying services stay simple — they don't need to know about each other.

The BFF pattern keeps individual services focused on their domain while letting the presentation layer (the BFF) handle composition. When you add a new field to the dashboard, you modify the BFF resolver, not three separate services. For more on API aggregation patterns, see the REST vs GraphQL vs gRPC comparison.

Rate Limiting at the Gateway Layer

Centralizing rate limiting in the gateway gives you consistent enforcement across all services without each service implementing its own limits. The gateway sees all traffic — individual services only see what passes the rate limiter.

Token Bucket vs Sliding Window

Token bucket: Each client has a bucket that fills at a fixed rate (e.g., 100 tokens/minute). Each request consumes one token. Clients can burst up to bucket capacity. Good for APIs that allow burst traffic.

Sliding window: Counts requests within a moving time window. Smoother than token bucket — prevents request spikes at window boundaries. Higher implementation complexity.

For most APIs, sliding window is the right choice. It prevents the "boundary exploit" where clients send 100 requests at 11:59:59 and 100 more at 12:00:01.

Granularity: Per-IP vs Per-User vs Per-Tenant

Rate limiting granularity matters for multi-tenant APIs. Consider layering:

Per-IP: 1,000 req/min    ← Blocks DDoS, scrapers
Per-User: 100 req/min    ← Fair usage per individual
Per-Tenant: 10,000 req/min ← Plan-based limits

A single enterprise user shouldn't be able to exhaust their entire company's quota. A single IP shouldn't be able to exhaust all per-user limits by rotating tokens.

In Kong, per-tenant limiting uses consumer groups:

plugins:
  - name: rate-limiting-advanced
    config:
      limit:
        - 1000
      window_size:
        - 60
      identifier: consumer
      namespace: tenant-limits

For deep coverage of rate limiting patterns, see API rate limiting best practices and best rate limiting API gateway options.

Gateway Authentication Patterns

JWT Validation at the Gateway

Validate JWTs once at the gateway, not in every service. Services become simpler and more focused on domain logic.

The gateway validates:

Token signature (using the JWKS endpoint from your auth provider)
Expiry (exp claim)
Audience (aud claim matches your service)
Issuer (iss claim)

After validation, the gateway forwards identity to services via trusted headers:

X-User-Id: user_abc123
X-Tenant-Id: tenant_xyz
X-User-Roles: admin,member

Services trust these headers because they only accept traffic from the gateway (private network, mTLS, or a shared secret). External clients cannot forge these headers.

API Key Rotation

API keys need rotation support — developers rotate keys when they're compromised. Your gateway should:

Store keys hashed (not plaintext) — use bcrypt or HMAC with a secret
Support multiple active keys per tenant/user (rotation window)
Emit events when keys are created/rotated/revoked
Rate limit key validation attempts (prevent enumeration)

// Key lookup with caching to avoid DB hit on every request
async function validateApiKey(rawKey: string): Promise<ApiKeyPayload | null> {
  const keyHash = hmac(rawKey, process.env.KEY_SECRET!);

  // Cache validated keys for 60 seconds
  const cached = await redis.get(`apikey:${keyHash}`);
  if (cached === 'invalid') return null;
  if (cached) return JSON.parse(cached);

  const key = await db.apiKeys.findByHash(keyHash);
  if (!key || key.revokedAt) {
    await redis.setex(`apikey:${keyHash}`, 60, 'invalid');
    return null;
  }

  const payload = { userId: key.userId, tenantId: key.tenantId, scopes: key.scopes };
  await redis.setex(`apikey:${keyHash}`, 60, JSON.stringify(payload));
  return payload;
}

mTLS Between Gateway and Services

For internal service communication, mTLS (mutual TLS) ensures requests genuinely come from the gateway, not a compromised internal service or a misconfigured network rule. The gateway presents a client certificate; each service validates it.

This is where service meshes (Istio, Linkerd) complement the gateway — they handle mTLS transparently between all services, without application code changes. The gateway handles external auth; the mesh handles internal identity.

Observability in the Gateway

The gateway sees every request — making it the ideal place to instrument your entire system's observability. If you don't add observability at the gateway level, you're blind to the request path before it hits individual services.

Request Logging

Log at the gateway level: timestamp, method, path, status code, latency, upstream service, client IP, user ID (from validated JWT), tenant ID. Structure logs as JSON for easy ingestion into Datadog, Grafana Loki, or CloudWatch:

{
  "timestamp": "2026-03-08T12:34:56.789Z",
  "method": "GET",
  "path": "/api/orders/order_123",
  "status": 200,
  "latency_ms": 45,
  "upstream": "orders-service",
  "user_id": "user_abc",
  "tenant_id": "tenant_xyz",
  "request_id": "req_1a2b3c"
}

Distributed Tracing with OpenTelemetry

The gateway should start a trace span for every request and propagate trace context to upstream services via the traceparent header (W3C Trace Context standard). Kong's OpenTelemetry plugin handles this:

plugins:
  - name: opentelemetry
    config:
      endpoint: http://otel-collector:4318/v1/traces
      resource_attributes:
        service.name: api-gateway
      propagation:
        - w3c

Each downstream service picks up the trace context and adds its own spans. The result: a complete trace from gateway to database for every request, with timing at each hop.

Metrics: Latency p50/p99

Track latency percentiles (p50, p95, p99), not just averages. An average of 50ms with a p99 of 2,000ms means 1% of users are experiencing 2-second responses. Averages hide outliers.

Key gateway metrics to alert on:

gateway_request_duration_p99 > 500ms (latency regression)
gateway_error_rate > 1% (upstream service degradation)
gateway_rate_limit_hit_rate spike (abuse or client bug)
gateway_upstream_timeout_rate > 0.1% (service health)

For related patterns, see API authentication patterns and the API security checklist.

Conclusion

An API gateway earns its complexity by eliminating cross-cutting concerns from every microservice in your fleet. The key is keeping it thin — routing, auth, rate limiting, and observability belong in the gateway; business logic does not. Start with a simple reverse proxy, add JWT validation and rate limiting early, and instrument with OpenTelemetry before you need to debug a production incident. The BFF pattern is the right answer for aggregation when you have distinct client types with different data needs.

The API Integration Checklist (Free PDF)