API Error Handling Patterns for Production 2026
API Error Handling Patterns for Production Applications
API errors in development are annoying. API errors in production lose money, users, and trust. The difference between a fragile integration and a resilient one comes down to how you handle errors — not just catching them, but categorizing, retrying, reporting, and recovering from them.
Most API integration bugs are error-handling bugs: the happy path was tested, but the 401 refresh path wasn't; the retry logic works in isolation but creates a thundering herd under load; the webhook handler acknowledges instantly in development but times out in production under database pressure. This guide is organized around the patterns that prevent those bugs — structured error typing, user-friendly messages, recovery strategies, and the monitoring that tells you when something goes wrong before your users do.
Error Categories
The Three Types of API Errors
| Category | Status Codes | Retryable | Action |
|---|---|---|---|
| Client errors | 400, 401, 403, 404, 409, 422 | No (usually) | Fix the request |
| Server errors | 500, 502, 503, 504 | Yes | Retry with backoff |
| Rate limits | 429 | Yes (after waiting) | Backoff, respect Retry-After |
Detailed Error Code Guide
| Code | Meaning | Should You Retry? | What to Do |
|---|---|---|---|
| 400 | Bad request | No | Fix request body/params |
| 401 | Unauthorized | Maybe (refresh token) | Refresh auth, re-authenticate |
| 403 | Forbidden | No | Check permissions/scopes |
| 404 | Not found | No | Resource doesn't exist |
| 409 | Conflict | Maybe | Resolve conflict, retry |
| 422 | Validation error | No | Fix input data |
| 429 | Rate limited | Yes (after delay) | Wait for Retry-After, then retry |
| 500 | Server error | Yes | Retry with backoff |
| 502 | Bad gateway | Yes | Retry with backoff |
| 503 | Service unavailable | Yes | Retry with backoff, check status page |
| 504 | Gateway timeout | Yes | Retry with backoff |
Pattern 1: Structured Error Handling
Categorize errors and handle each type differently:
class APIError extends Error {
constructor(
message: string,
public status: number,
public code: string,
public retryable: boolean,
public body: unknown,
) {
super(message);
this.name = 'APIError';
}
static fromResponse(response: Response, body: any): APIError {
const retryable = response.status === 429 || response.status >= 500;
const code = body?.error?.code || body?.code || `HTTP_${response.status}`;
const message = body?.error?.message || body?.message || `HTTP ${response.status}`;
return new APIError(message, response.status, code, retryable, body);
}
}
async function apiCall<T>(url: string, options?: RequestInit): Promise<T> {
const response = await fetch(url, {
...options,
signal: AbortSignal.timeout(10000),
});
if (!response.ok) {
const body = await response.json().catch(() => ({}));
const error = APIError.fromResponse(response, body);
// Handle specific cases before throwing
if (response.status === 401) {
await refreshAuth();
// Retry once with new auth
return apiCall(url, options);
}
throw error;
}
return response.json();
}
// Usage
try {
const data = await apiCall('/api/resource');
} catch (error) {
if (error instanceof APIError) {
if (error.retryable) {
// Queue for retry
await retryQueue.add(() => apiCall('/api/resource'));
} else if (error.status === 404) {
// Resource doesn't exist — show appropriate UI
return null;
} else if (error.status === 422) {
// Validation error — show to user
showValidationErrors(error.body);
} else {
// Unexpected error — log and alert
reportError(error);
}
}
}
Pattern 2: Error Mapping for Users
Never show raw API errors to users. Map them to user-friendly messages:
const ERROR_MESSAGES: Record<string, string> = {
// Auth errors
'invalid_api_key': 'Authentication failed. Please try again.',
'token_expired': 'Your session has expired. Please sign in again.',
'insufficient_permissions': 'You don\'t have permission to do this.',
// Validation errors
'invalid_email': 'Please enter a valid email address.',
'duplicate_email': 'An account with this email already exists.',
'password_too_short': 'Password must be at least 8 characters.',
// Payment errors
'card_declined': 'Your card was declined. Please try a different card.',
'insufficient_funds': 'Insufficient funds. Please try a different payment method.',
'expired_card': 'Your card has expired. Please update your payment method.',
// Rate limits
'rate_limit_exceeded': 'Too many requests. Please wait a moment and try again.',
// Server errors
'internal_error': 'Something went wrong on our end. Please try again.',
'service_unavailable': 'This service is temporarily unavailable. Please try again shortly.',
};
function getUserMessage(error: APIError): string {
// Try specific error code first
if (error.code && ERROR_MESSAGES[error.code]) {
return ERROR_MESSAGES[error.code];
}
// Fall back to status code
if (error.status === 429) return ERROR_MESSAGES['rate_limit_exceeded'];
if (error.status >= 500) return ERROR_MESSAGES['internal_error'];
if (error.status === 401) return ERROR_MESSAGES['token_expired'];
if (error.status === 403) return ERROR_MESSAGES['insufficient_permissions'];
// Generic fallback
return 'Something went wrong. Please try again.';
}
Pattern 3: Error Recovery Strategies
Different errors need different recovery approaches:
async function withErrorRecovery<T>(
fn: () => Promise<T>,
options: {
maxRetries?: number;
onAuthError?: () => Promise<void>;
fallback?: () => T;
onError?: (error: APIError) => void;
} = {}
): Promise<T> {
const { maxRetries = 3, onAuthError, fallback, onError } = options;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
if (!(error instanceof APIError)) throw error;
onError?.(error);
// Auth error — try refreshing once
if (error.status === 401 && attempt === 0 && onAuthError) {
await onAuthError();
continue;
}
// Retryable error — backoff and retry
if (error.retryable && attempt < maxRetries) {
const delay = Math.min(Math.pow(2, attempt) * 1000, 30000);
await new Promise(r => setTimeout(r, delay));
continue;
}
// Non-retryable or max retries reached — use fallback
if (fallback) return fallback();
throw error;
}
}
throw new Error('Max retries exceeded');
}
// Usage
const userData = await withErrorRecovery(
() => apiCall('/api/user/profile'),
{
onAuthError: () => refreshToken(),
fallback: () => getCachedProfile(),
onError: (err) => logError('profile_fetch_failed', err),
}
);
Pattern 4: Error Monitoring and Alerting
Track error patterns to catch issues early:
class ErrorTracker {
private errors: Array<{
code: string;
status: number;
endpoint: string;
timestamp: number;
}> = [];
record(error: APIError, endpoint: string) {
this.errors.push({
code: error.code,
status: error.status,
endpoint,
timestamp: Date.now(),
});
// Keep only last hour
const oneHourAgo = Date.now() - 3600000;
this.errors = this.errors.filter(e => e.timestamp > oneHourAgo);
// Check for anomalies
this.checkAlerts();
}
private checkAlerts() {
const last5min = this.errors.filter(e => Date.now() - e.timestamp < 300000);
// Alert: sudden spike in errors
if (last5min.length > 50) {
this.alert('error_spike', `${last5min.length} errors in last 5 minutes`);
}
// Alert: specific endpoint failing
const byEndpoint = this.groupBy(last5min, 'endpoint');
for (const [endpoint, errors] of Object.entries(byEndpoint)) {
if (errors.length > 10) {
this.alert('endpoint_failing', `${endpoint}: ${errors.length} errors in 5 min`);
}
}
// Alert: auth errors (possible key compromise or expiry)
const authErrors = last5min.filter(e => e.status === 401);
if (authErrors.length > 5) {
this.alert('auth_failures', `${authErrors.length} auth failures — check API keys`);
}
}
private groupBy(items: any[], key: string) {
return items.reduce((groups, item) => {
(groups[item[key]] = groups[item[key]] || []).push(item);
return groups;
}, {} as Record<string, any[]>);
}
private alert(type: string, message: string) {
console.error(`[ALERT:${type}] ${message}`);
// Send to monitoring service (PagerDuty, Slack, etc.)
}
}
Pattern 5: Webhook Error Handling
Webhooks need special error handling — you don't control when they arrive:
async function handleWebhook(req: Request): Promise<Response> {
// 1. Verify signature FIRST
const signature = req.headers.get('x-webhook-signature');
const body = await req.text();
if (!verifySignature(body, signature, WEBHOOK_SECRET)) {
// Don't reveal why it failed
return new Response('Unauthorized', { status: 401 });
}
// 2. Parse payload
let event;
try {
event = JSON.parse(body);
} catch {
return new Response('Invalid JSON', { status: 400 });
}
// 3. Acknowledge IMMEDIATELY, process async
// Return 200 fast — the sender will retry if you're slow
processWebhookAsync(event).catch(error => {
// Log but don't fail the webhook response
console.error('Webhook processing failed:', error);
// Queue for manual retry
deadLetterQueue.add(event);
});
return new Response('OK', { status: 200 });
}
// 4. Idempotent processing (webhooks can be delivered multiple times)
async function processWebhookAsync(event: WebhookEvent) {
// Check if already processed
const processed = await db.webhookEvents.findById(event.id);
if (processed) return; // Already handled
// Process
await handleEvent(event);
// Mark as processed
await db.webhookEvents.create({ id: event.id, processedAt: new Date() });
}
Pattern 6: Validation Error Display
When APIs return validation errors, show them clearly:
// API returns structured validation errors
interface ValidationError {
field: string;
message: string;
code: string;
}
// Parse validation errors from different API formats
function parseValidationErrors(body: any): ValidationError[] {
// Stripe format: { error: { param: "email", message: "..." } }
if (body?.error?.param) {
return [{ field: body.error.param, message: body.error.message, code: body.error.code }];
}
// Standard format: { errors: [{ field, message }] }
if (Array.isArray(body?.errors)) {
return body.errors;
}
// Zod format: { issues: [{ path: [...], message }] }
if (Array.isArray(body?.issues)) {
return body.issues.map((issue: any) => ({
field: issue.path.join('.'),
message: issue.message,
code: issue.code,
}));
}
return [{ field: 'general', message: 'Validation failed', code: 'validation_error' }];
}
// React component to display errors
function FormErrors({ errors }: { errors: ValidationError[] }) {
if (errors.length === 0) return null;
return (
<div role="alert" className="error-summary">
<h3>Please fix the following:</h3>
<ul>
{errors.map((error, i) => (
<li key={i}>
<strong>{error.field}:</strong> {error.message}
</li>
))}
</ul>
</div>
);
}
The Error Handling Checklist
| Layer | What to Handle |
|---|---|
| Network | Timeout, DNS failure, connection refused |
| HTTP | 4xx client errors, 5xx server errors, 429 rate limits |
| Response | Invalid JSON, unexpected format, missing fields |
| Business | Application-level errors (insufficient funds, duplicate entry) |
| Webhook | Signature verification, idempotency, async processing |
| Monitoring | Error rate tracking, anomaly detection, alerting |
| User | Friendly messages, actionable guidance, retry options |
Error Budgets and SLOs
Production API integrations shouldn't aim for zero errors — that's impossible and the attempt leads to over-engineering. Instead, define an error budget: the acceptable error rate within a service level objective (SLO). If your SLO is 99.9% success rate for API calls, your error budget is 0.1% — about 8.7 hours of downtime or equivalent error rate per month. This framing changes how you respond to errors: below budget, optimize for velocity; at budget exhaustion, freeze releases and focus on reliability.
SLO definitions for API integrations: Track success rate separately for different endpoint categories. Payment API calls might have a 99.99% SLO (four nines — one minute of error tolerance per week). Product catalog API calls might have a 99.5% SLO (more tolerant because stale data from cache is acceptable). A single app-wide error SLO misses this nuance and can lead to over-investing in reliability for low-stakes operations while under-investing where it matters.
Error budget burn rate: Monitor how fast you're consuming your error budget. A 10x burn rate (consuming budget 10x faster than normal) is a signal to page someone. A 1x burn rate (on track to exhaust at end of the month) is a signal to investigate during business hours. Tools like Google SRE's alerting framework (implemented in Prometheus, Datadog, or New Relic) can calculate burn rates automatically from error rate metrics. For simpler setups, a weekly review of your error rate trend against your budget is sufficient.
Testing Error Scenarios
The error handling code is the code least likely to be tested and most likely to fail when it matters. Most test suites cover the happy path extensively and error paths minimally. The patterns below address this gap.
Test each HTTP error code explicitly: For every external API call in your codebase, write at least one test where the API returns 401 (expired auth), 429 (rate limited), 503 (service unavailable), and network timeout. Use MSW to mock these responses. These tests verify that your error categorization (retryable, user-facing message) is correct for each status code. Run them in CI — they're fast and catch regressions when error handling code changes.
Chaos-style integration tests: For critical integrations (payments, auth), run periodic chaos tests that inject failures at random points in the request lifecycle: return an error on the 3rd retry, inject a timeout on the 2nd request, simulate a malformed JSON response. The goal isn't to find specific bugs but to verify that your recovery mechanisms work under realistic failure conditions. Stripe's testing library provides controlled error injection; for other providers, MSW's handler override pattern (return an error for N requests, then succeed) works well.
Test idempotency: For operations that should be idempotent (payment charge attempts, user creation, webhook processing), send the same request twice and verify the system reaches a consistent state. If your charge endpoint creates two charges on duplicate request, that's a bug. Test this explicitly rather than discovering it from a production incident. Stripe's idempotency keys make this easy to test: send the same key twice, verify only one charge is created.
Distributed Tracing for API Errors
When your application makes multiple downstream API calls as part of handling a single user request, a single error can be hard to isolate — you see a 500 from your API but don't know which downstream call failed. Distributed tracing connects the dots.
Request IDs propagated downstream: Generate a UUID at the entry point of every user request and include it in every outbound API call as a header (typically X-Request-ID or X-Correlation-ID). When a downstream API returns an error, log both the request ID and the downstream API's response. Now when a user reports an error, you can search your logs for their request ID and see the full chain of calls: what you sent, what each downstream API returned, and where the chain failed.
Structured error events for search: Log API errors as structured JSON, not formatted strings. An error event should include: timestamp, requestId, service (which downstream API failed), status (HTTP status code), errorCode (API-specific error code), latencyMs, and retryCount. This structure makes errors queryable: "show me all payment API errors in the last hour where status=402 and latency > 5000ms." String-formatted logs require grep archaeology; structured events support real queries.
Methodology
The APIError class pattern shown above extends the native Error class, which preserves the stack trace in Node.js. The AbortSignal.timeout(10000) built-in (available in Node.js 17.3+, Fetch API) is preferred over creating AbortController + setTimeout pairs. The webhook idempotency pattern relies on storing processed event IDs in a database with a unique constraint on event.id — if a duplicate delivery arrives, the db.webhookEvents.create() throws a unique constraint error that is caught and ignored. SLO calculation methodology: 99.9% uptime = 8.7 hours/month downtime budget; 99.99% = 52 minutes/month. These figures assume a 30-day month and uniform traffic distribution — actual error budget calculations should account for traffic patterns (higher error budget consumption during peak traffic hours).
| Showing raw API error to user | Confusing, exposes internals | Map to user-friendly messages | | Retrying all errors | Retrying permanent failures | Only retry 429 and 5xx | | No error monitoring | Issues found by users | Track error rates, alert on spikes | | Same retry strategy for all APIs | Suboptimal recovery | Per-API retry config | | Not validating API responses | Breaks silently when API changes | Validate with Zod/schemas | | Slow webhook processing | Webhook sender times out and retries | Acknowledge fast, process async |
Find APIs with the best error documentation on APIScout — error code references, retry guidance, and developer experience scores.
Related: Handle API Errors: Status Codes and Error Objects, GraphQL Client Patterns for Production Apps, API Pagination: Cursor vs Offset in 2026