API Gateway Patterns for Microservices 2026
API Gateway Patterns for Microservices 2026
An API gateway sits between clients and your microservices, handling cross-cutting concerns — routing, authentication, rate limiting, request transformation, and aggregation. Without a gateway, every microservice implements these concerns independently (or doesn't).
TL;DR
- A gateway should be thin: routing, auth, rate limiting, and observability — no business logic
- Kong is the default self-hosted choice; Zuplo is the developer-first cloud option; Envoy/Istio if you need a full service mesh
- The Backend for Frontend (BFF) pattern — a dedicated gateway per client type — is more maintainable than one monolithic aggregator
- JWT validation at the gateway (not in each service) eliminates duplicated auth logic; pass identity downstream via headers
- Add OpenTelemetry tracing at the gateway from day one — retrofitting distributed tracing later is painful
Core Gateway Patterns
1. Reverse Proxy / Router
The simplest pattern. The gateway routes requests to the correct service based on URL path, headers, or other criteria.
Client → Gateway → /users/* → User Service
→ /orders/* → Order Service
→ /products/* → Product Service
Use when: You want a single entry point without duplicating routing logic across services.
2. API Aggregation (Backend for Frontend)
The gateway combines data from multiple services into a single response. The client makes one request instead of three.
Client: GET /api/dashboard
Gateway:
→ GET user-service/profile
→ GET order-service/recent
→ GET analytics-service/metrics
→ Combine → Single response
Use when: Frontend pages need data from multiple services. Reduces client-side complexity and round trips.
3. Authentication Gateway
The gateway handles authentication (JWT validation, API key verification) so individual services don't need to. The gateway passes the validated identity downstream via headers.
Client → Gateway (validate JWT) → X-User-Id: 123 → Service
Use when: You want centralized authentication instead of each service validating tokens independently.
4. Rate Limiting Gateway
The gateway enforces rate limits before requests reach your services. Protects backend services from abuse and ensures fair usage.
Use when: Multiple services need consistent rate limiting policies.
5. Request/Response Transformation
The gateway transforms requests and responses — header manipulation, body modification, protocol translation (REST → gRPC), and response filtering.
Client (REST/JSON) → Gateway (transform) → Service (gRPC/Protobuf)
Use when: Internal services use different protocols than external clients expect.
6. Circuit Breaker
The gateway monitors service health. When a service fails repeatedly, the gateway "opens the circuit" — returning cached responses or errors without forwarding requests to the failing service.
Use when: Cascading failures are a risk (one slow service brings down everything).
Gateway vs Service Mesh
| Concern | API Gateway | Service Mesh (Istio/Linkerd) |
|---|---|---|
| Position | Edge (external traffic) | Internal (service-to-service) |
| Auth | External client auth | mTLS between services |
| Rate limiting | Per-client limits | Per-service limits |
| Routing | URL path, host, headers | Service name, labels |
| Observability | External request metrics | Internal traffic metrics |
| Aggregation | ✅ Yes | ❌ No |
| Complexity | Medium | High |
Most architectures use both: Gateway at the edge for external traffic + service mesh for internal communication.
Choosing a Gateway
| Gateway | Best For | Type |
|---|---|---|
| Kong | Self-hosted, plugin ecosystem | Open source |
| Zuplo | Edge-deployed, developer-first | Cloud |
| AWS API Gateway | Serverless + Lambda | Cloud |
| Envoy | Service mesh, high performance | Open source |
| Traefik | Kubernetes-native | Open source |
| NGINX | Simple reverse proxy | Open source |
| Cloudflare | Edge + security | Cloud |
Anti-Patterns
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Business logic in gateway | Gateway becomes a monolith | Keep gateway thin — routing, auth, rate limiting only |
| Gateway as single point of failure | One gateway failure = total outage | Multiple gateway instances, health checks |
| Over-aggregation | Gateway becomes tightly coupled to all services | Limit aggregation to BFF patterns |
| No circuit breaker | Slow service blocks gateway threads | Implement timeouts and circuit breakers |
| Gateway per team | Inconsistent policies, management overhead | Shared gateway with per-team configuration |
Kong Configuration Example
Kong's declarative configuration (deck) lets you define routes, services, and plugins as code. Here's a real-world configuration for an API with rate limiting and JWT authentication:
# kong.yaml — declarative configuration via deck
_format_version: "3.0"
services:
- name: user-service
url: http://user-service:8080
routes:
- name: users-route
paths:
- /api/users
methods:
- GET
- POST
- PATCH
plugins:
- name: jwt
config:
key_claim_name: iss
secret_is_base64: false
- name: rate-limiting
config:
minute: 1000
policy: redis
redis_host: redis
redis_port: 6379
identifier: consumer
- name: orders-service
url: http://orders-service:8080
routes:
- name: orders-route
paths:
- /api/orders
plugins:
- name: jwt
config:
key_claim_name: iss
- name: rate-limiting
config:
minute: 500
policy: redis
redis_host: redis
redis_port: 6379
The JWT plugin validates tokens using configured secrets or JWKS endpoints. After validation, Kong passes X-Consumer-Username and X-Authenticated-Userid headers downstream. Services trust these headers without implementing auth themselves.
Apply this config with:
deck sync --config kong.yaml
Keep your Kong config in Git. Treat it like infrastructure code — pull requests, code review, CI validation before deployment.
Backend for Frontend (BFF) Pattern
The Backend for Frontend pattern creates a dedicated gateway layer per client type, rather than one monolithic API gateway that tries to serve all clients. Each BFF is purpose-built for its client's data needs.
The motivation comes from a real problem: mobile apps need different data shapes than desktop web apps, which need different shapes than third-party integrations. A single aggregation layer becomes a tangle of conditional logic — "if mobile, return abbreviated user; if desktop, return full user; if third-party, return only public fields."
Instead, you build separate BFFs:
Mobile App → Mobile BFF (GraphQL or REST)
↓ fan out to
User Service, Feed Service, Notifications Service
Web App → Web BFF (REST/GraphQL)
↓ fan out to
User Service, Dashboard Service, Analytics Service
Partner → Public API Gateway (REST, versioned)
↓ routed to
Stable public endpoints
GraphQL as a BFF
GraphQL works particularly well as a BFF because it lets clients request exactly the fields they need. The BFF implements resolvers that fetch from multiple microservices:
// Web BFF — GraphQL resolver example
const resolvers = {
Query: {
dashboard: async (_, __, { userId }) => {
// Parallel fetching from multiple services
const [profile, recentOrders, metrics] = await Promise.all([
userService.getProfile(userId),
orderService.getRecent(userId, { limit: 5 }),
analyticsService.getDashboardMetrics(userId),
]);
return { profile, recentOrders, metrics };
},
},
};
One GraphQL query from the web app hits the BFF, which fans out to three microservices in parallel and assembles the response. The underlying services stay simple — they don't need to know about each other.
The BFF pattern keeps individual services focused on their domain while letting the presentation layer (the BFF) handle composition. When you add a new field to the dashboard, you modify the BFF resolver, not three separate services. For more on API aggregation patterns, see the REST vs GraphQL vs gRPC comparison.
Rate Limiting at the Gateway Layer
Centralizing rate limiting in the gateway gives you consistent enforcement across all services without each service implementing its own limits. The gateway sees all traffic — individual services only see what passes the rate limiter.
Token Bucket vs Sliding Window
Token bucket: Each client has a bucket that fills at a fixed rate (e.g., 100 tokens/minute). Each request consumes one token. Clients can burst up to bucket capacity. Good for APIs that allow burst traffic.
Sliding window: Counts requests within a moving time window. Smoother than token bucket — prevents request spikes at window boundaries. Higher implementation complexity.
For most APIs, sliding window is the right choice. It prevents the "boundary exploit" where clients send 100 requests at 11:59:59 and 100 more at 12:00:01.
Granularity: Per-IP vs Per-User vs Per-Tenant
Rate limiting granularity matters for multi-tenant APIs. Consider layering:
Per-IP: 1,000 req/min ← Blocks DDoS, scrapers
Per-User: 100 req/min ← Fair usage per individual
Per-Tenant: 10,000 req/min ← Plan-based limits
A single enterprise user shouldn't be able to exhaust their entire company's quota. A single IP shouldn't be able to exhaust all per-user limits by rotating tokens.
In Kong, per-tenant limiting uses consumer groups:
plugins:
- name: rate-limiting-advanced
config:
limit:
- 1000
window_size:
- 60
identifier: consumer
namespace: tenant-limits
For deep coverage of rate limiting patterns, see API rate limiting best practices and best rate limiting API gateway options.
Gateway Authentication Patterns
JWT Validation at the Gateway
Validate JWTs once at the gateway, not in every service. Services become simpler and more focused on domain logic.
The gateway validates:
- Token signature (using the JWKS endpoint from your auth provider)
- Expiry (
expclaim) - Audience (
audclaim matches your service) - Issuer (
issclaim)
After validation, the gateway forwards identity to services via trusted headers:
X-User-Id: user_abc123
X-Tenant-Id: tenant_xyz
X-User-Roles: admin,member
Services trust these headers because they only accept traffic from the gateway (private network, mTLS, or a shared secret). External clients cannot forge these headers.
API Key Rotation
API keys need rotation support — developers rotate keys when they're compromised. Your gateway should:
- Store keys hashed (not plaintext) — use bcrypt or HMAC with a secret
- Support multiple active keys per tenant/user (rotation window)
- Emit events when keys are created/rotated/revoked
- Rate limit key validation attempts (prevent enumeration)
// Key lookup with caching to avoid DB hit on every request
async function validateApiKey(rawKey: string): Promise<ApiKeyPayload | null> {
const keyHash = hmac(rawKey, process.env.KEY_SECRET!);
// Cache validated keys for 60 seconds
const cached = await redis.get(`apikey:${keyHash}`);
if (cached === 'invalid') return null;
if (cached) return JSON.parse(cached);
const key = await db.apiKeys.findByHash(keyHash);
if (!key || key.revokedAt) {
await redis.setex(`apikey:${keyHash}`, 60, 'invalid');
return null;
}
const payload = { userId: key.userId, tenantId: key.tenantId, scopes: key.scopes };
await redis.setex(`apikey:${keyHash}`, 60, JSON.stringify(payload));
return payload;
}
mTLS Between Gateway and Services
For internal service communication, mTLS (mutual TLS) ensures requests genuinely come from the gateway, not a compromised internal service or a misconfigured network rule. The gateway presents a client certificate; each service validates it.
This is where service meshes (Istio, Linkerd) complement the gateway — they handle mTLS transparently between all services, without application code changes. The gateway handles external auth; the mesh handles internal identity.
Observability in the Gateway
The gateway sees every request — making it the ideal place to instrument your entire system's observability. If you don't add observability at the gateway level, you're blind to the request path before it hits individual services.
Request Logging
Log at the gateway level: timestamp, method, path, status code, latency, upstream service, client IP, user ID (from validated JWT), tenant ID. Structure logs as JSON for easy ingestion into Datadog, Grafana Loki, or CloudWatch:
{
"timestamp": "2026-03-08T12:34:56.789Z",
"method": "GET",
"path": "/api/orders/order_123",
"status": 200,
"latency_ms": 45,
"upstream": "orders-service",
"user_id": "user_abc",
"tenant_id": "tenant_xyz",
"request_id": "req_1a2b3c"
}
Distributed Tracing with OpenTelemetry
The gateway should start a trace span for every request and propagate trace context to upstream services via the traceparent header (W3C Trace Context standard). Kong's OpenTelemetry plugin handles this:
plugins:
- name: opentelemetry
config:
endpoint: http://otel-collector:4318/v1/traces
resource_attributes:
service.name: api-gateway
propagation:
- w3c
Each downstream service picks up the trace context and adds its own spans. The result: a complete trace from gateway to database for every request, with timing at each hop.
Metrics: Latency p50/p99
Track latency percentiles (p50, p95, p99), not just averages. An average of 50ms with a p99 of 2,000ms means 1% of users are experiencing 2-second responses. Averages hide outliers.
Key gateway metrics to alert on:
gateway_request_duration_p99> 500ms (latency regression)gateway_error_rate> 1% (upstream service degradation)gateway_rate_limit_hit_ratespike (abuse or client bug)gateway_upstream_timeout_rate> 0.1% (service health)
For related patterns, see API authentication patterns and the API security checklist.
Conclusion
An API gateway earns its complexity by eliminating cross-cutting concerns from every microservice in your fleet. The key is keeping it thin — routing, auth, rate limiting, and observability belong in the gateway; business logic does not. Start with a simple reverse proxy, add JWT validation and rate limiting early, and instrument with OpenTelemetry before you need to debug a production incident. The BFF pattern is the right answer for aggregation when you have distinct client types with different data needs.