Portkey vs Kong AI Gateway: LLM Routing APIs 2026
TL;DR
Portkey if you're building LLM-first — it understands tokens, models, and prompts natively. Fallbacks, retries, semantic caching, cost routing, and per-model observability are all built-in. Kong if you're already running Kong for traditional APIs — it's 228% faster (measured), and you can extend familiar API gateway patterns to AI workloads, though you sacrifice LLM-aware features. LiteLLM is the open-source alternative that competes with both. For most AI-native teams in 2026, Portkey wins on features and developer experience despite the latency overhead.
Key Takeaways
- Portkey latency overhead: ~20–40ms (AI-native features cost latency)
- Kong latency advantage: 228% faster than Portkey, 65% lower latency in benchmarks
- Portkey pricing: Free (10K requests/month) → $49/month (100K requests) → Enterprise
- Kong AI Gateway pricing: Complex multi-dimensional model — gateway + request + plugin fees; $30+ per million requests at scale
- Semantic caching: Portkey has it natively (exact + semantic); Kong requires custom plugin
- Token observability: Portkey (native); Kong (treats requests as opaque blobs, no token-level data)
- Model fallbacks: Portkey (native, configured in JSON); Kong (requires custom Lua/Python plugin)
What AI Gateways Do
An AI gateway sits between your application and LLM providers (OpenAI, Anthropic, Gemini, etc.). Instead of calling providers directly, your app calls the gateway, which handles:
- Load balancing across providers/models
- Fallbacks when a provider is down or rate-limited
- Caching to avoid redundant LLM calls
- Observability (cost tracking, latency, error rates)
- Rate limiting and quota management
- Authentication and API key management
The question is whether you want a gateway that understands LLMs (tokens, models, prompts) or a gateway that treats LLM calls as generic HTTP requests.
Portkey: AI-Native Gateway
Getting Started
import Portkey from 'portkey-ai';
// Drop-in replacement for OpenAI SDK
const portkey = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
virtualKey: 'openai-key-abc123', // Your stored OpenAI key
});
// Works exactly like OpenAI SDK
const completion = await portkey.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello!' }],
});
For existing OpenAI SDK code, add Portkey as a base URL:
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.PORTKEY_API_KEY,
baseURL: 'https://api.portkey.ai/v1',
defaultHeaders: {
'x-portkey-virtual-key': 'openai-key-abc123',
},
});
// Your existing OpenAI code works unchanged
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Write a haiku.' }],
});
Fallback Configuration (Configs)
Portkey's most powerful feature is declarative routing via JSON configs:
// Portkey Config: fallback across providers
{
"strategy": {
"mode": "fallback"
},
"targets": [
{
"virtual_key": "openai-key-abc123",
"model": "gpt-4o",
"override_params": { "max_tokens": 4096 }
},
{
"virtual_key": "anthropic-key-abc123",
"model": "claude-3-5-sonnet-20241022"
},
{
"virtual_key": "groq-key-abc123",
"model": "llama-3.3-70b-versatile"
}
]
}
// Use the config in your app
const portkey = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
config: 'config-id-from-dashboard',
});
// If OpenAI is down → falls back to Anthropic → then Groq
const response = await portkey.chat.completions.create({
messages: [{ role: 'user', content: 'Help me debug this code.' }],
// model is determined by the config's primary target
});
Load Balancing and Cost Routing
// Load balance between GPT-4o and GPT-4o-mini by cost
{
"strategy": {
"mode": "loadbalance"
},
"targets": [
{
"virtual_key": "openai-key-abc123",
"model": "gpt-4o-mini",
"weight": 0.8 // 80% of traffic
},
{
"virtual_key": "openai-key-abc123",
"model": "gpt-4o",
"weight": 0.2 // 20% for complex queries
}
]
}
Semantic Caching
Portkey's caching goes beyond exact-match:
const portkey = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
config: {
cache: {
mode: 'semantic', // Fuzzy match on similar prompts
maxAge: 3600, // Cache TTL in seconds
// "What is the capital of France?" and
// "Tell me the capital city of France" both hit cache
},
},
});
// Or exact-match for deterministic prompts
const portkeyExact = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
config: {
cache: {
mode: 'exact',
maxAge: 86400,
},
},
});
Token and Cost Observability
// Portkey tracks token usage, costs, and latency automatically
// Access via dashboard or API
const response = await portkey.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }],
// Attach metadata for filtering in dashboard
});
// Via headers
const portkeyWithMeta = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
traceId: `trace-${Date.now()}`, // Custom trace ID
metadata: JSON.stringify({
userId: user.id,
featureName: 'code-review',
environment: 'production',
}),
});
// Query costs programmatically
const analytics = await portkey.analytics.list({
startDate: '2026-03-01',
endDate: '2026-03-15',
groupBy: 'model',
});
// Returns: cost breakdown by model, user, feature, etc.
Guardrails
// Add content safety without changing your app code
const config = {
guardrails: {
input: [
{
type: 'regex',
pattern: '(credit card|SSN|social security)',
action: 'block',
message: 'PII detected in input',
},
],
output: [
{
type: 'pii_detection',
action: 'redact',
},
],
},
};
Kong AI Gateway
Kong is the enterprise API gateway that added AI capabilities. Its architecture treats LLM calls as enhanced HTTP requests through the same plugin infrastructure used for traditional APIs.
Setup
# kong.yml
_format_version: "3.0"
services:
- name: openai-service
url: https://api.openai.com
routes:
- name: chat-completions
paths:
- /v1/chat/completions
plugins:
- name: ai-proxy
service: openai-service
config:
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: Bearer ${OPENAI_API_KEY}
model:
provider: openai
name: gpt-4o
options:
max_tokens: 4096
temperature: 0.7
AI Proxy Plugin
# AI proxy with provider fallback
plugins:
- name: ai-proxy-advanced
config:
targets:
- model:
provider: openai
name: gpt-4o
auth:
header_name: Authorization
header_value: Bearer ${OPENAI_API_KEY}
weight: 100
- model:
provider: anthropic
name: claude-3-5-sonnet-20241022
auth:
header_name: x-api-key
header_value: ${ANTHROPIC_API_KEY}
weight: 0 # Failover only
balancer:
algorithm: round-robin
Rate Limiting
Kong's rate limiting (its traditional strength):
plugins:
- name: rate-limiting
config:
second: 10
minute: 100
hour: 1000
policy: redis
redis:
host: redis-host
port: 6379
What Kong Lacks for LLMs
Traditional API Gateway metrics Kong tracks:
✅ HTTP status codes
✅ Request/response latency
✅ Requests per second
✅ Bandwidth bytes
LLM-specific metrics Kong does NOT track:
❌ Token count (input/output)
❌ Token cost (no model pricing knowledge)
❌ Prompt content (opaque blob)
❌ Model-aware routing (no semantic understanding)
❌ Cache hits based on semantic similarity
❌ Cost per request to different providers
Kong sees: POST /v1/chat/completions with 2KB body
Portkey sees: 850 input tokens → gpt-4o → 320 output tokens → $0.0053 total
Performance Benchmark
Kong's published benchmark (Kong vs Portkey vs LiteLLM):
Environment: AWS, same region as OpenAI API
Latency (p50):
Kong: 12ms overhead
Portkey: 27ms overhead (+125% vs Kong)
LiteLLM: 95ms overhead (+692% vs Kong)
Throughput (requests/second):
Kong: 8,200 req/s
Portkey: 2,100 req/s
LiteLLM: 890 req/s
For real-time chat applications (200ms total budget):
12ms gateway overhead = 6% of budget (acceptable)
27ms gateway overhead = 13.5% of budget (meaningful)
95ms gateway overhead = 47.5% of budget (problematic)
Context: This benchmark was published by Kong. Independent benchmarks show smaller differences. For most SaaS applications with 500ms+ round-trip LLM calls, 15ms extra overhead is irrelevant.
Pricing Comparison
Portkey Pricing:
Free: 10,000 requests/month + 30-day logs
Starter: $49/month — 100K requests + 90-day logs
Business: $249/month — 1M requests + advanced guardrails
Enterprise: Custom — unlimited + HIPAA + dedicated infra
Overage: $9 per additional 100K requests (Starter)
Kong Pricing:
Free (self-hosted): Open-source, no request limits
Konnect (cloud): $0.016+ per unit (complex pricing)
Enterprise: Multi-dimensional: $0.068/unit in Starter tier
Note: Kong Konnect has 5+ pricing dimensions
(gateway services, request units, paid plugins, premium plugins)
making total cost unpredictable at scale.
Reports of $30+/million requests for full feature stack.
Feature Comparison
| Feature | Portkey | Kong AI Gateway |
|---|---|---|
| Latency overhead | ~20–40ms | ~8–15ms |
| Throughput | ~2,100 req/s | ~8,200 req/s |
| Model fallbacks | ✅ Native (JSON config) | ⚠️ Via custom plugin |
| Token observability | ✅ Native | ❌ Not available |
| Cost tracking | ✅ Per-request, per-model | ❌ Not available |
| Semantic caching | ✅ Native | ❌ Not available |
| Exact-match caching | ✅ | ✅ (ai-semantic-cache plugin) |
| Provider normalization | ✅ (OpenAI-compatible for all) | ⚠️ Per-provider config |
| Prompt management | ✅ Prompt hub + versioning | ❌ |
| Guardrails | ✅ Native | ❌ Custom plugin required |
| RBAC / teams | ✅ | ✅ Enterprise |
| SOC 2 Type II | ✅ | ✅ |
| HIPAA | ✅ Enterprise | ✅ Enterprise |
| Self-hosted | ✅ (open-source version) | ✅ (open-source) |
| Traditional API routing | ⚠️ Limited | ✅ Full-featured |
| Learning curve | Low (JSON configs) | High (Lua/YAML/Admin API) |
LiteLLM: The Open-Source Option
Worth mentioning: LiteLLM is the open-source alternative that both Portkey and Kong compete against:
from litellm import completion
# Unified interface across 100+ providers
response = completion(
model='openai/gpt-4o',
messages=[{'role': 'user', 'content': 'Hello!'}],
# Fallback
fallbacks=['anthropic/claude-3-5-sonnet-20241022', 'groq/llama-3.3-70b'],
)
# Or run LiteLLM proxy server
# litellm --model openai/gpt-4o
# Any OpenAI-compatible client works against http://localhost:4000
LiteLLM is slower than both Portkey and Kong (95ms+ overhead) but $0 for self-hosted deployments with full features.
Decision Guide
Choose Portkey if:
- Your primary concern is LLM-specific features (fallbacks, cost tracking, semantic cache)
- You're building AI-first applications where understanding token costs matters
- You want a fast setup — JSON configs, no DevOps
- HIPAA or SOC 2 is required on the managed platform
- Budget: $49–$249/month is acceptable
Choose Kong if:
- You're already running Kong for traditional APIs — unify AI and HTTP routing
- Latency performance at extreme scale (10K+ req/s) is critical
- You have DevOps capacity to write plugins for LLM-specific needs
- You want open-source self-hosting with no per-request fees
Choose LiteLLM if:
- You're cost-sensitive and have DevOps capacity
- You want full LLM-aware features (fallbacks, load balancing) at $0 cost
- Latency overhead is acceptable in your use case
Browse all AI gateway and LLM infrastructure APIs at APIScout.
Related: OpenRouter vs LiteLLM: API Gateway for Multiple AI Models · The Rise of AI Gateway APIs, Kong vs Envoy vs Tyk vs AWS API Gateway 2026, OpenRouter API: One Key for 500+ LLM Models, How AI Is Transforming API Design and Documentation
Prompt Management and Versioning
One capability with no Kong equivalent is Portkey's Prompt Hub — a central repository for storing, versioning, and A/B testing prompts across your application. When the system prompt for your customer support bot needs updating, you edit it in the Portkey dashboard and every deployment picks up the change without a code deploy. You can also A/B test prompt variants against production traffic and measure output quality differences before committing to a change.
This matters more than it sounds. Prompt engineering in production is an ongoing process: prompts that work well during development often degrade over time as model providers update their underlying models. Having prompt versioning separate from application code means prompt changes don't require engineering involvement. Marketing and content teams can iterate on prompt templates within guardrails you've defined — model, max tokens, guardrail rules — while the engineering team owns the routing and reliability layer.
Kong has no native equivalent. Building similar functionality on top of Kong requires a separate prompt registry (database, versioning system, deployment pipeline) that you build and maintain. For LLM-first teams, this operational overhead is a concrete argument for Portkey despite the latency difference.
Observability and Cost Attribution
The observability gap between Portkey and Kong is most visible in production cost attribution. Portkey tracks cost at the request level — model, token count, user ID, feature name, or any metadata you attach — allowing you to answer "how much does our AI-powered search feature cost per user per day?" Kong's observability is request-count and latency based: it can tell you that 10,000 requests went through in the last hour, but not what they cost or which features drove the expense.
For SaaS products that charge usage-based fees or need to attribute AI costs to customer accounts, Portkey's per-request cost tracking is a direct revenue and margin management tool. Without it, teams resort to estimating AI costs from cloud bills — which provides aggregate data with 24-48 hour delay, not the real-time per-request granularity needed to implement per-user rate limits or usage-based billing.
Methodology
Benchmark figures (228% latency advantage, throughput numbers) are from Kong's published AI Gateway performance report, conducted in AWS US-East-1 with the same region OpenAI endpoint. Independent benchmarks from the LLM infrastructure community show smaller latency gaps (50-100% rather than 228%) — Kong's published figures represent best-case conditions. Pricing sourced from Portkey and Kong Konnect pricing pages as of March 2026; both providers change pricing frequently. Feature comparison verified against Portkey docs (portkey.ai/docs) and Kong AI Gateway plugin documentation.
Compare OpenAI and Anthropic on APIScout.