API Error Handling Patterns for Production Applications

API errors in development are annoying. API errors in production lose money, users, and trust. The difference between a fragile integration and a resilient one comes down to how you handle errors — not just catching them, but categorizing, retrying, reporting, and recovering from them.

Most API integration bugs are error-handling bugs: the happy path was tested, but the 401 refresh path wasn't; the retry logic works in isolation but creates a thundering herd under load; the webhook handler acknowledges instantly in development but times out in production under database pressure. This guide is organized around the patterns that prevent those bugs — structured error typing, user-friendly messages, recovery strategies, and the monitoring that tells you when something goes wrong before your users do.

Error Categories

The Three Types of API Errors

Category	Status Codes	Retryable	Action
Client errors	400, 401, 403, 404, 409, 422	No (usually)	Fix the request
Server errors	500, 502, 503, 504	Yes	Retry with backoff
Rate limits	429	Yes (after waiting)	Backoff, respect Retry-After

Detailed Error Code Guide

Code	Meaning	Should You Retry?	What to Do
400	Bad request	No	Fix request body/params
401	Unauthorized	Maybe (refresh token)	Refresh auth, re-authenticate
403	Forbidden	No	Check permissions/scopes
404	Not found	No	Resource doesn't exist
409	Conflict	Maybe	Resolve conflict, retry
422	Validation error	No	Fix input data
429	Rate limited	Yes (after delay)	Wait for Retry-After, then retry
500	Server error	Yes	Retry with backoff
502	Bad gateway	Yes	Retry with backoff
503	Service unavailable	Yes	Retry with backoff, check status page
504	Gateway timeout	Yes	Retry with backoff

Pattern 1: Structured Error Handling

Categorize errors and handle each type differently:

class APIError extends Error {
  constructor(
    message: string,
    public status: number,
    public code: string,
    public retryable: boolean,
    public body: unknown,
  ) {
    super(message);
    this.name = 'APIError';
  }

  static fromResponse(response: Response, body: any): APIError {
    const retryable = response.status === 429 || response.status >= 500;
    const code = body?.error?.code || body?.code || `HTTP_${response.status}`;
    const message = body?.error?.message || body?.message || `HTTP ${response.status}`;

    return new APIError(message, response.status, code, retryable, body);
  }
}

async function apiCall<T>(url: string, options?: RequestInit): Promise<T> {
  const response = await fetch(url, {
    ...options,
    signal: AbortSignal.timeout(10000),
  });

  if (!response.ok) {
    const body = await response.json().catch(() => ({}));
    const error = APIError.fromResponse(response, body);

    // Handle specific cases before throwing
    if (response.status === 401) {
      await refreshAuth();
      // Retry once with new auth
      return apiCall(url, options);
    }

    throw error;
  }

  return response.json();
}

// Usage
try {
  const data = await apiCall('/api/resource');
} catch (error) {
  if (error instanceof APIError) {
    if (error.retryable) {
      // Queue for retry
      await retryQueue.add(() => apiCall('/api/resource'));
    } else if (error.status === 404) {
      // Resource doesn't exist — show appropriate UI
      return null;
    } else if (error.status === 422) {
      // Validation error — show to user
      showValidationErrors(error.body);
    } else {
      // Unexpected error — log and alert
      reportError(error);
    }
  }
}

Pattern 2: Error Mapping for Users

Never show raw API errors to users. Map them to user-friendly messages:

const ERROR_MESSAGES: Record<string, string> = {
  // Auth errors
  'invalid_api_key': 'Authentication failed. Please try again.',
  'token_expired': 'Your session has expired. Please sign in again.',
  'insufficient_permissions': 'You don\'t have permission to do this.',

  // Validation errors
  'invalid_email': 'Please enter a valid email address.',
  'duplicate_email': 'An account with this email already exists.',
  'password_too_short': 'Password must be at least 8 characters.',

  // Payment errors
  'card_declined': 'Your card was declined. Please try a different card.',
  'insufficient_funds': 'Insufficient funds. Please try a different payment method.',
  'expired_card': 'Your card has expired. Please update your payment method.',

  // Rate limits
  'rate_limit_exceeded': 'Too many requests. Please wait a moment and try again.',

  // Server errors
  'internal_error': 'Something went wrong on our end. Please try again.',
  'service_unavailable': 'This service is temporarily unavailable. Please try again shortly.',
};

function getUserMessage(error: APIError): string {
  // Try specific error code first
  if (error.code && ERROR_MESSAGES[error.code]) {
    return ERROR_MESSAGES[error.code];
  }

  // Fall back to status code
  if (error.status === 429) return ERROR_MESSAGES['rate_limit_exceeded'];
  if (error.status >= 500) return ERROR_MESSAGES['internal_error'];
  if (error.status === 401) return ERROR_MESSAGES['token_expired'];
  if (error.status === 403) return ERROR_MESSAGES['insufficient_permissions'];

  // Generic fallback
  return 'Something went wrong. Please try again.';
}

Pattern 3: Error Recovery Strategies

Different errors need different recovery approaches:

async function withErrorRecovery<T>(
  fn: () => Promise<T>,
  options: {
    maxRetries?: number;
    onAuthError?: () => Promise<void>;
    fallback?: () => T;
    onError?: (error: APIError) => void;
  } = {}
): Promise<T> {
  const { maxRetries = 3, onAuthError, fallback, onError } = options;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (!(error instanceof APIError)) throw error;

      onError?.(error);

      // Auth error — try refreshing once
      if (error.status === 401 && attempt === 0 && onAuthError) {
        await onAuthError();
        continue;
      }

      // Retryable error — backoff and retry
      if (error.retryable && attempt < maxRetries) {
        const delay = Math.min(Math.pow(2, attempt) * 1000, 30000);
        await new Promise(r => setTimeout(r, delay));
        continue;
      }

      // Non-retryable or max retries reached — use fallback
      if (fallback) return fallback();

      throw error;
    }
  }

  throw new Error('Max retries exceeded');
}

// Usage
const userData = await withErrorRecovery(
  () => apiCall('/api/user/profile'),
  {
    onAuthError: () => refreshToken(),
    fallback: () => getCachedProfile(),
    onError: (err) => logError('profile_fetch_failed', err),
  }
);

Pattern 4: Error Monitoring and Alerting

Track error patterns to catch issues early:

class ErrorTracker {
  private errors: Array<{
    code: string;
    status: number;
    endpoint: string;
    timestamp: number;
  }> = [];

  record(error: APIError, endpoint: string) {
    this.errors.push({
      code: error.code,
      status: error.status,
      endpoint,
      timestamp: Date.now(),
    });

    // Keep only last hour
    const oneHourAgo = Date.now() - 3600000;
    this.errors = this.errors.filter(e => e.timestamp > oneHourAgo);

    // Check for anomalies
    this.checkAlerts();
  }

  private checkAlerts() {
    const last5min = this.errors.filter(e => Date.now() - e.timestamp < 300000);

    // Alert: sudden spike in errors
    if (last5min.length > 50) {
      this.alert('error_spike', `${last5min.length} errors in last 5 minutes`);
    }

    // Alert: specific endpoint failing
    const byEndpoint = this.groupBy(last5min, 'endpoint');
    for (const [endpoint, errors] of Object.entries(byEndpoint)) {
      if (errors.length > 10) {
        this.alert('endpoint_failing', `${endpoint}: ${errors.length} errors in 5 min`);
      }
    }

    // Alert: auth errors (possible key compromise or expiry)
    const authErrors = last5min.filter(e => e.status === 401);
    if (authErrors.length > 5) {
      this.alert('auth_failures', `${authErrors.length} auth failures — check API keys`);
    }
  }

  private groupBy(items: any[], key: string) {
    return items.reduce((groups, item) => {
      (groups[item[key]] = groups[item[key]] || []).push(item);
      return groups;
    }, {} as Record<string, any[]>);
  }

  private alert(type: string, message: string) {
    console.error(`[ALERT:${type}] ${message}`);
    // Send to monitoring service (PagerDuty, Slack, etc.)
  }
}

Pattern 5: Webhook Error Handling

Webhooks need special error handling — you don't control when they arrive:

async function handleWebhook(req: Request): Promise<Response> {
  // 1. Verify signature FIRST
  const signature = req.headers.get('x-webhook-signature');
  const body = await req.text();

  if (!verifySignature(body, signature, WEBHOOK_SECRET)) {
    // Don't reveal why it failed
    return new Response('Unauthorized', { status: 401 });
  }

  // 2. Parse payload
  let event;
  try {
    event = JSON.parse(body);
  } catch {
    return new Response('Invalid JSON', { status: 400 });
  }

  // 3. Acknowledge IMMEDIATELY, process async
  // Return 200 fast — the sender will retry if you're slow
  processWebhookAsync(event).catch(error => {
    // Log but don't fail the webhook response
    console.error('Webhook processing failed:', error);
    // Queue for manual retry
    deadLetterQueue.add(event);
  });

  return new Response('OK', { status: 200 });
}

// 4. Idempotent processing (webhooks can be delivered multiple times)
async function processWebhookAsync(event: WebhookEvent) {
  // Check if already processed
  const processed = await db.webhookEvents.findById(event.id);
  if (processed) return; // Already handled

  // Process
  await handleEvent(event);

  // Mark as processed
  await db.webhookEvents.create({ id: event.id, processedAt: new Date() });
}

Pattern 6: Validation Error Display

When APIs return validation errors, show them clearly:

// API returns structured validation errors
interface ValidationError {
  field: string;
  message: string;
  code: string;
}

// Parse validation errors from different API formats
function parseValidationErrors(body: any): ValidationError[] {
  // Stripe format: { error: { param: "email", message: "..." } }
  if (body?.error?.param) {
    return [{ field: body.error.param, message: body.error.message, code: body.error.code }];
  }

  // Standard format: { errors: [{ field, message }] }
  if (Array.isArray(body?.errors)) {
    return body.errors;
  }

  // Zod format: { issues: [{ path: [...], message }] }
  if (Array.isArray(body?.issues)) {
    return body.issues.map((issue: any) => ({
      field: issue.path.join('.'),
      message: issue.message,
      code: issue.code,
    }));
  }

  return [{ field: 'general', message: 'Validation failed', code: 'validation_error' }];
}

// React component to display errors
function FormErrors({ errors }: { errors: ValidationError[] }) {
  if (errors.length === 0) return null;

  return (
    <div role="alert" className="error-summary">
      <h3>Please fix the following:</h3>
      <ul>
        {errors.map((error, i) => (
          <li key={i}>
            <strong>{error.field}:</strong> {error.message}
          </li>
        ))}
      </ul>
    </div>
  );
}

The Error Handling Checklist

Layer	What to Handle
Network	Timeout, DNS failure, connection refused
HTTP	4xx client errors, 5xx server errors, 429 rate limits
Response	Invalid JSON, unexpected format, missing fields
Business	Application-level errors (insufficient funds, duplicate entry)
Webhook	Signature verification, idempotency, async processing
Monitoring	Error rate tracking, anomaly detection, alerting
User	Friendly messages, actionable guidance, retry options

Error Budgets and SLOs

Production API integrations shouldn't aim for zero errors — that's impossible and the attempt leads to over-engineering. Instead, define an error budget: the acceptable error rate within a service level objective (SLO). If your SLO is 99.9% success rate for API calls, your error budget is 0.1% — about 8.7 hours of downtime or equivalent error rate per month. This framing changes how you respond to errors: below budget, optimize for velocity; at budget exhaustion, freeze releases and focus on reliability.

SLO definitions for API integrations: Track success rate separately for different endpoint categories. Payment API calls might have a 99.99% SLO (four nines — one minute of error tolerance per week). Product catalog API calls might have a 99.5% SLO (more tolerant because stale data from cache is acceptable). A single app-wide error SLO misses this nuance and can lead to over-investing in reliability for low-stakes operations while under-investing where it matters.

Error budget burn rate: Monitor how fast you're consuming your error budget. A 10x burn rate (consuming budget 10x faster than normal) is a signal to page someone. A 1x burn rate (on track to exhaust at end of the month) is a signal to investigate during business hours. Tools like Google SRE's alerting framework (implemented in Prometheus, Datadog, or New Relic) can calculate burn rates automatically from error rate metrics. For simpler setups, a weekly review of your error rate trend against your budget is sufficient.

Testing Error Scenarios

The error handling code is the code least likely to be tested and most likely to fail when it matters. Most test suites cover the happy path extensively and error paths minimally. The patterns below address this gap.

Test each HTTP error code explicitly: For every external API call in your codebase, write at least one test where the API returns 401 (expired auth), 429 (rate limited), 503 (service unavailable), and network timeout. Use MSW to mock these responses. These tests verify that your error categorization (retryable, user-facing message) is correct for each status code. Run them in CI — they're fast and catch regressions when error handling code changes.

Chaos-style integration tests: For critical integrations (payments, auth), run periodic chaos tests that inject failures at random points in the request lifecycle: return an error on the 3rd retry, inject a timeout on the 2nd request, simulate a malformed JSON response. The goal isn't to find specific bugs but to verify that your recovery mechanisms work under realistic failure conditions. Stripe's testing library provides controlled error injection; for other providers, MSW's handler override pattern (return an error for N requests, then succeed) works well.

Test idempotency: For operations that should be idempotent (payment charge attempts, user creation, webhook processing), send the same request twice and verify the system reaches a consistent state. If your charge endpoint creates two charges on duplicate request, that's a bug. Test this explicitly rather than discovering it from a production incident. Stripe's idempotency keys make this easy to test: send the same key twice, verify only one charge is created.

Distributed Tracing for API Errors

When your application makes multiple downstream API calls as part of handling a single user request, a single error can be hard to isolate — you see a 500 from your API but don't know which downstream call failed. Distributed tracing connects the dots.

Request IDs propagated downstream: Generate a UUID at the entry point of every user request and include it in every outbound API call as a header (typically X-Request-ID or X-Correlation-ID). When a downstream API returns an error, log both the request ID and the downstream API's response. Now when a user reports an error, you can search your logs for their request ID and see the full chain of calls: what you sent, what each downstream API returned, and where the chain failed.

Structured error events for search: Log API errors as structured JSON, not formatted strings. An error event should include: timestamp, requestId, service (which downstream API failed), status (HTTP status code), errorCode (API-specific error code), latencyMs, and retryCount. This structure makes errors queryable: "show me all payment API errors in the last hour where status=402 and latency > 5000ms." String-formatted logs require grep archaeology; structured events support real queries.

Methodology

The APIError class pattern shown above extends the native Error class, which preserves the stack trace in Node.js. The AbortSignal.timeout(10000) built-in (available in Node.js 17.3+, Fetch API) is preferred over creating AbortController + setTimeout pairs. The webhook idempotency pattern relies on storing processed event IDs in a database with a unique constraint on event.id — if a duplicate delivery arrives, the db.webhookEvents.create() throws a unique constraint error that is caught and ignored. SLO calculation methodology: 99.9% uptime = 8.7 hours/month downtime budget; 99.99% = 52 minutes/month. These figures assume a 30-day month and uniform traffic distribution — actual error budget calculations should account for traffic patterns (higher error budget consumption during peak traffic hours).

Find APIs with the best error documentation on APIScout — error code references, retry guidance, and developer experience scores.

API Error Handling Patterns for Production 2026