Best API Gateway and Rate Limiting Tools 2026

Q: How to Choose?

Rate limiting decisions compound over time in ways that other API gateway decisions don't. Once developers have integrated against your rate limit responses — wiring retry logic, configuring backoff intervals, building quota dashboards — changing the rate limit model breaks their clients. Define your rate limiting semantics early and document them precisely: which headers communicate remaining quota (X-RateLimit-Remaining, Retry-After), whether throttled requests are queued or dropped, and how l

Every public API needs rate limiting. Every microservices architecture needs an API gateway. These tools sit between clients and your backend — handling authentication, rate limiting, request routing, transformation, caching, and monitoring.

TL;DR

Rank	Solution	Best For	Starting Price
1	Kong	Self-hosted, enterprise	Free (open source)
2	Zuplo	API-first, edge-deployed	Free (250K requests/mo)
3	Unkey	API key management + rate limiting	Free (100K verifications/mo)
4	AWS API Gateway	AWS ecosystem, serverless	$1/1M requests
5	Cloudflare API Shield	DDoS protection + rate limiting	Included with Pro ($20/mo)
6	Traefik	Kubernetes, open source	Free (open source)

1. Kong — Most Popular Open-Source Gateway

Best for: Self-hosted API gateway with plugin ecosystem

Kong is the most widely-deployed open-source API gateway. Built on Nginx/OpenResty with a plugin architecture for authentication, rate limiting, logging, transformations, and more. Kong Gateway (OSS) is free. Kong Konnect is the managed cloud version with enterprise features.

Key strengths: Open source (Apache 2.0), 100+ plugins, rate limiting (multiple algorithms), OAuth2/JWT/key-auth, request/response transformation, load balancing, service mesh, multi-cloud.

Pricing: Kong Gateway OSS: free. Kong Konnect (cloud): free tier, Plus at $75/month, Enterprise custom.

Limitations: Self-hosted OSS requires operational expertise. Plugin configuration is YAML/API-based (no visual editor in OSS). Enterprise features (developer portal, analytics, RBAC) require Konnect. Memory-intensive.

2. Zuplo — API-First Gateway

Best for: Developer-first API management with edge deployment

Zuplo deploys at the edge (Cloudflare Workers) and provides API key management, rate limiting, developer portal, and OpenAPI integration as a unified platform. GitOps workflow — configure via JSON/TypeScript in your repo.

Key strengths: Edge deployment (300+ PoPs), built-in API key management, automatic developer portal from OpenAPI, rate limiting, request/response policies, GitOps configuration, TypeScript custom handlers.

Pricing: Free: 250K requests/month. Builder at $25/month (2M requests). Business at $250/month (20M requests).

Limitations: Newer platform with smaller ecosystem. Edge-only deployment may not suit all architectures. Custom policies require TypeScript. Less mature plugin ecosystem than Kong.

3. Unkey — API Key Management

Best for: API key issuing, verification, and rate limiting as a service

Unkey is purpose-built for API key management. Create, verify, and revoke API keys with per-key rate limiting, expiration, and usage analytics. Not a full API gateway — it's the authentication and rate limiting layer that sits in front of your API.

Key strengths: Per-key rate limiting, key expiration, usage analytics, temporary keys, key verification in <40ms, ratelimit API (use without key management), open source.

Pricing: Free: 100K verifications/month. Pro at $25/month (2.5M verifications). Custom enterprise.

Limitations: Not a full API gateway (no routing, transformation, caching). Requires integration into your application code. Newer platform. No request proxying — verification only.

4. AWS API Gateway — Serverless APIs

Best for: Serverless architectures on AWS with Lambda integration

AWS API Gateway creates REST and WebSocket APIs backed by Lambda, HTTP backends, or AWS services. Usage plans with API keys and throttling. Caching, request validation, and WAF integration.

Key strengths: Lambda integration, WebSocket APIs, usage plans, API key management, request validation, caching, WAF integration, CloudWatch monitoring, custom authorizers.

Pricing: REST API: $1/1M requests + $0.09/GB data transfer. HTTP API: $1/1M requests (simpler, cheaper). WebSocket: $1/1M messages.

Limitations: AWS-only. Cold start latency with Lambda. 30-second timeout limit. Complex configuration. Per-request pricing compounds at high volume. No self-hosting.

5. Cloudflare API Shield — Edge Protection

Best for: DDoS protection and rate limiting for existing APIs

Cloudflare API Shield adds rate limiting, mTLS authentication, schema validation, and sequence detection to any API behind Cloudflare. Not a gateway — it's a protection layer at the edge. Rate limiting rules based on IP, headers, cookies, or custom keys.

Key strengths: DDoS protection, rate limiting (custom rules), mTLS, API schema validation, sequence detection (abuse prevention), bot management, 300+ PoPs, included with Cloudflare plans.

Pricing: Basic rate limiting included with Pro ($20/month). Advanced rate limiting with Business ($200/month). Enterprise for full API Shield.

Limitations: Requires Cloudflare as DNS/CDN provider. Not a gateway (no routing, transformation). Advanced features require expensive plans. Rate limiting rules have configuration limits on lower tiers.

6. Traefik — Kubernetes-Native Gateway

Best for: Kubernetes API gateway with automatic service discovery

Traefik is an open-source edge router and API gateway designed for containerized environments. Automatic service discovery in Kubernetes, Docker, and Consul. Built-in rate limiting, circuit breakers, retries, and Let's Encrypt certificate management.

Key strengths: Kubernetes-native, automatic service discovery, Let's Encrypt auto-SSL, rate limiting middleware, circuit breaker, retry, access logs, Prometheus metrics, open source.

Pricing: Free (open source). Traefik Enterprise for additional features.

Limitations: Primarily a reverse proxy/load balancer — API management features are basic compared to Kong. No built-in API key management, developer portal, or analytics. Configuration via Kubernetes CRDs requires learning curve.

Rate Limiting Algorithms

Understanding the underlying algorithms makes it possible to choose the right one for your access pattern and to reason about the edge cases each one produces.

Fixed Window is the simplest algorithm. Divide time into fixed intervals (e.g., 1-minute buckets), count requests per identifier per bucket, reject when the count exceeds the limit. Implementation: a single counter in Redis per (identifier, window) pair with a TTL equal to the window duration. The problem: boundary bursting. If your limit is 100 requests per minute, a user can make 100 requests in the last second of minute 1 and 100 in the first second of minute 2 — 200 requests in 2 seconds while staying within the letter of the limit.

Sliding Window fixes the boundary burst problem. Instead of a fixed bucket, the window slides continuously. For a 100 req/min limit with a sliding window, the limit applies to any rolling 60-second period, not just the current clock minute. The trade-off is higher storage cost — you need timestamps of recent requests rather than a single counter. Redis sorted sets (using timestamp as the score) implement this cleanly: add the current timestamp, remove entries older than the window, count remaining entries.

Token Bucket is the most flexible algorithm and the one most commonly used in practice. Each identifier has a bucket with a maximum capacity (the burst limit). Tokens are added at a fixed rate (the sustained rate). Each request consumes one token. If the bucket is empty, the request is rejected. The burst capacity allows short spikes — a user who hasn't made any requests for 30 seconds has accumulated tokens and can make several requests quickly. This models real user behavior better than fixed windows. Stripe and most major APIs use token bucket semantics.

Leaky Bucket is conceptually the inverse of token bucket. Requests arrive and are queued. The queue drains at a fixed rate. If the queue is full, incoming requests are rejected. The result is a perfectly smooth outbound request rate regardless of burst input. Useful for smoothing writes to a downstream service that can't handle bursts. Less useful for external-facing rate limiting because it adds queuing latency.

For your API, token bucket is usually the right choice — it allows legitimate burst behavior (a developer testing their integration makes 10 quick requests) while preventing sustained abuse (a bot making 10,000 requests per hour).

Unkey: Developer-First Rate Limiting

Unkey deserves a closer look because its design philosophy is different from traditional API gateways. Rather than sitting in the request path as a proxy, Unkey is called from within your application code to verify keys and check rate limits. This integration model trades some flexibility (you must add Unkey calls to your code) for simplicity (no proxy infrastructure to deploy and maintain).

The key creation and verification flow:

// Create an API key for a new user
// Call this from your backend when a user creates an API key
const { result, error } = await unkey.keys.create({
  apiId: process.env.UNKEY_API_ID!,
  prefix: 'sk',
  ownerId: userId,
  ratelimit: {
    type: 'fast',        // 'fast' uses edge nodes, 'consistent' uses central store
    limit: 100,          // max requests
    refillRate: 10,      // tokens added per interval
    refillInterval: 1000 // interval in milliseconds (1 second here)
  },
  meta: {
    plan: 'pro',
    userId: userId,
  },
});

// result.key is shown to the user once
// result.keyId is stored in your DB for management

// Verify a key on each API request
// Call this at the start of your API route handler
import { verifyKey } from '@unkey/api';

export async function GET(req: Request) {
  const apiKey = req.headers.get('authorization')?.replace('Bearer ', '');

  const { result, error } = await verifyKey(apiKey!);

  if (error || !result.valid) {
    return Response.json({ error: 'Unauthorized' }, { status: 401 });
  }

  if (result.code === 'RATE_LIMITED') {
    return Response.json({
      error: 'Rate limit exceeded',
      limit: result.ratelimit?.limit,
      remaining: result.ratelimit?.remaining,
      reset: result.ratelimit?.reset,
    }, { status: 429 });
  }

  // result.ownerId is your userId — no DB lookup needed
  // result.meta contains the custom metadata you set at key creation
  return Response.json({ data: 'ok', remaining: result.ratelimit?.remaining });
}

The remaining count in the response allows you to surface rate limit status in your API responses — clients can show users how many requests they have left before hitting a limit. Include the Unkey rate limit headers in your responses for well-behaved API clients:

return Response.json(data, {
  headers: {
    'X-RateLimit-Limit': result.ratelimit?.limit.toString() ?? '',
    'X-RateLimit-Remaining': result.ratelimit?.remaining.toString() ?? '',
    'X-RateLimit-Reset': result.ratelimit?.reset.toString() ?? '',
  }
});

Kong Open Source Setup

Kong OSS is the right choice when you need a self-hosted API gateway with a broad plugin ecosystem. Docker Compose is the fastest path to a working local environment:

# docker-compose.yml
version: "3"
services:
  kong-database:
    image: postgres:13
    environment:
      POSTGRES_USER: kong
      POSTGRES_DB: kong
      POSTGRES_PASSWORD: kongpass

  kong-migration:
    image: kong:3.6
    command: kong migrations bootstrap
    depends_on:
      - kong-database
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: kong-database
      KONG_PG_PASSWORD: kongpass

  kong:
    image: kong:3.6
    depends_on:
      - kong-database
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: kong-database
      KONG_PG_PASSWORD: kongpass
      KONG_PROXY_LISTEN: "0.0.0.0:8000"
      KONG_ADMIN_LISTEN: "0.0.0.0:8001"
    ports:
      - "8000:8000"   # Kong proxy
      - "8001:8001"   # Kong Admin API

Kong supports two configuration approaches. The Admin API approach configures Kong via HTTP calls to port 8001. The declarative (deck) approach defines configuration in YAML and applies it with the deck sync command — this is the GitOps-compatible approach that should be used in production.

Declarative configuration for a rate-limited, JWT-authenticated service:

# kong.yaml
_format_version: "3.0"
services:
  - name: my-api
    url: http://your-backend:3000
    routes:
      - name: api-route
        paths:
          - /api
    plugins:
      - name: rate-limiting
        config:
          minute: 60
          hour: 1000
          policy: redis
          redis_host: redis
          redis_port: 6379
      - name: jwt
        config:
          secret_is_base64: false

Apply with: deck gateway sync kong.yaml

Custom plugins in Kong are written in Lua (or Go for newer versions). The plugin framework provides lifecycle hooks (access, header_filter, body_filter, log) that execute at different stages of request processing. For most use cases, the built-in plugins (rate-limiting, key-auth, jwt, oauth2, cors, ip-restriction, request-transformer) cover requirements without custom code. See our API gateway patterns for microservices for more advanced Kong patterns.

Distributed Rate Limiting

Single-server rate limiting works in development but fails in production. When your API runs across multiple instances (horizontal scaling, Kubernetes pods, edge nodes), each instance maintains its own rate limit counters. A user can send N requests to each instance and stay under the per-instance limit while making N × (number of instances) total requests. This completely defeats the purpose of rate limiting.

The solution is a shared counter store, typically Redis. Every rate limit check reads and writes to the same Redis instance (or cluster) regardless of which API server handles the request. This is why Kong's rate limiting plugin has a policy: redis option and why all serious gateway solutions require a Redis backend for rate limiting to work correctly in distributed deployments.

The implementation challenge is atomicity. A rate limit check involves read-increment-check, which is a read-modify-write operation. If two requests read the counter simultaneously (both see count=99, limit=100), both increment to 100, and both succeed — but the user has made 101 requests. The solution is atomic operations. Redis provides INCR (atomic increment) and Lua scripts for multi-step atomic operations:

-- Lua script executed atomically in Redis
-- Implements token bucket rate limiting
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local now = tonumber(ARGV[2])
local window = tonumber(ARGV[3])

local current = tonumber(redis.call('GET', key) or "0")
if current >= limit then
  return 0  -- Rate limited
end

redis.call('INCR', key)
redis.call('EXPIRE', key, window)
return 1  -- Allowed

The Lua script executes atomically — Redis doesn't process any other commands while the script runs. This eliminates the race condition.

The eventual consistency tradeoff: if your Redis instance goes down, rate limiting fails. The failure mode choice is either fail open (allow all requests when Redis is unavailable — rate limiting breaks but your API keeps working) or fail closed (reject all requests when Redis is unavailable — rate limiting is enforced but your API is down). Fail open is usually the right choice for rate limiting; fail closed is appropriate for authentication and authorization. This connects to the broader question of API security design — understand your failure modes before deploying.

How to Choose

Use Case	Recommended	Why
Self-hosted API gateway	Kong	Most plugins, largest community
Developer-first API management	Zuplo	Edge deployment, GitOps, dev portal
API key management	Unkey	Purpose-built key + rate limiting
AWS serverless APIs	AWS API Gateway	Lambda integration
DDoS + rate limiting	Cloudflare API Shield	Edge protection
Kubernetes gateway	Traefik	Auto service discovery, K8s-native

Rate limiting decisions compound over time in ways that other API gateway decisions don't. Once developers have integrated against your rate limit responses — wiring retry logic, configuring backoff intervals, building quota dashboards — changing the rate limit model breaks their clients. Define your rate limiting semantics early and document them precisely: which headers communicate remaining quota (X-RateLimit-Remaining, Retry-After), whether throttled requests are queued or dropped, and how limits are measured for burst traffic. A published, stable rate limit contract is as important as a stable API contract — treat rate limit behavior as a breaking change when you modify it, and version changes accordingly.

Comparing API gateways? Explore Kong, Zuplo, Unkey, and more on APIScout. Also see our guides on API rate limiting best practices and API authentication patterns.

The API Integration Checklist (Free PDF)