Skip to main content

Building Webhooks That Don't Break in 2026

·APIScout Team
Share:

Building Webhooks That Don't Break in 2026

Webhooks are the internet's callback mechanism — when something happens in your system, you POST a JSON payload to a URL your customer provides. Simple in concept. In practice, webhooks fail silently, arrive out of order, get replayed, and expose security vulnerabilities. Here's how to build webhooks that actually work.

TL;DR

  • Sign every payload with HMAC-SHA256 using a per-endpoint secret — unsigned webhooks can be spoofed by anyone who knows the endpoint URL
  • Use a queue-based delivery architecture so webhooks are decoupled from your application's request path
  • Retry with exponential backoff for at least 5 attempts before marking an endpoint as failing
  • Consumers must return 200 immediately and process asynchronously — any other pattern causes timeouts and false retries
  • Track processed event IDs to handle duplicate delivery; the same event will be delivered more than once over time

The Basics

A webhook system has three components:

  1. Event source — something happens in your system (order created, payment completed)
  2. Delivery — you POST a JSON payload to the customer's URL
  3. Verification — the customer verifies the payload came from you (signature)

1. Sign Every Payload

Unsigned webhooks can be spoofed. Anyone who knows the endpoint URL can send fake events. Always sign payloads with HMAC-SHA256.

How Stripe does it:

Stripe-Signature: t=1710892800,v1=abc123...

The signature includes: a timestamp (prevents replay attacks) and an HMAC of timestamp.payload using a per-endpoint secret.

Your implementation should:

  • Generate a unique signing secret per webhook endpoint
  • Include a timestamp in the signature to prevent replay attacks
  • Use HMAC-SHA256 (not MD5, not SHA1)
  • Sign the raw request body (not parsed JSON)
  • Document the verification process with code examples in 5+ languages

2. Retry Failed Deliveries

Webhook endpoints go down. Networks fail. Servers return 500s. Retry with exponential backoff.

Retry schedule (example):

AttemptDelayTotal elapsed
1Immediate0
25 minutes5 min
330 minutes35 min
42 hours2h 35m
58 hours10h 35m
624 hours34h 35m

Success criteria: 2xx status code within 30 seconds. Anything else (3xx, 4xx, 5xx, timeout) triggers a retry.

After all retries fail: Mark the endpoint as failing. Notify the customer via email. Pause delivery after N consecutive failures. Provide a manual replay mechanism.

3. Make Events Idempotent

Network issues and retries mean endpoints may receive the same event multiple times. Every event should include a unique ID that consumers use for deduplication.

{
  "id": "evt_abc123",
  "type": "order.completed",
  "created_at": "2026-03-08T12:00:00Z",
  "data": { ... }
}

Consumer-side: Store processed event IDs. Before processing, check if evt_abc123 was already handled. Skip if yes.

4. Event Design

Consistent Event Schema

Every event should have the same top-level structure:

{
  "id": "evt_abc123",
  "type": "order.completed",
  "api_version": "2026-03-08",
  "created_at": "2026-03-08T12:00:00Z",
  "data": {
    "object": { ... }
  }
}

Event Types

Use resource.action naming: order.created, order.updated, payment.succeeded, payment.failed.

Include Full Objects

Include the full current state of the object, not just the changed fields. This way, consumers don't need to make follow-up API calls.

{
  "type": "order.updated",
  "data": {
    "object": {
      "id": "ord_123",
      "status": "shipped",
      "total": 4999,
      "items": [...],
      "customer": { ... }
    }
  }
}

5. Delivery Infrastructure

Async Processing

Never block your application to deliver webhooks. Queue events and process delivery asynchronously.

Application → Event Queue → Webhook Worker → HTTP POST

Timeout

Set a 30-second timeout for webhook delivery. If the endpoint doesn't respond in 30 seconds, mark as failed and retry.

Don't Follow Redirects

Webhook delivery should not follow redirects (3xx responses). Treat redirects as failures. The configured URL should be the final destination.

IP Allowlisting

Publish the IP addresses your webhooks are sent from. Customers may need to allowlist them in their firewall.

6. Security

Prevent SSRF

Customers provide webhook URLs — don't let them point to internal services. Validate URLs:

  • Block private IP ranges (10.x, 172.16.x, 192.168.x, 127.x)
  • Block link-local addresses (169.254.x)
  • Block localhost
  • Resolve DNS before connecting and check the resolved IP

Rate Limit Deliveries

If a customer configures multiple endpoints, limit the total delivery rate per customer. One endpoint failure shouldn't trigger thousands of retry requests.

Payload Size

Limit webhook payload size (e.g., 256KB). Large payloads can overwhelm consumers. For large data, include a reference URL instead.

7. Developer Experience

Webhook Dashboard

Provide a UI where customers can:

  • View delivery attempts (success/failure/pending)
  • See request and response bodies
  • Manually replay failed events
  • Test with sample events
  • Manage endpoint URLs and signing secrets

CLI Testing

Provide a CLI tool for local webhook testing:

your-cli webhooks listen --port 3000

This creates a tunnel so developers can receive webhooks on localhost during development.

Event Catalog

Document every event type with example payloads, when they fire, and what data they include.

Webhook Signing Implementation

The signature verification step is where most implementations introduce vulnerabilities. There are two subtle mistakes: using string equality comparison (vulnerable to timing attacks) and not validating the timestamp (vulnerable to replay attacks).

Here is a correct implementation in both Node.js and Python:

// Node.js - webhook signature verification
import crypto from 'crypto';

interface WebhookVerifyOptions {
  payload: string;         // Raw request body as string
  signature: string;       // From header: "t=timestamp,v1=signature"
  secret: string;          // Per-endpoint signing secret
  toleranceMs?: number;    // Max age of timestamp (default: 5 minutes)
}

function verifyWebhookSignature(opts: WebhookVerifyOptions): boolean {
  const { payload, signature, secret, toleranceMs = 5 * 60 * 1000 } = opts;

  // Parse the signature header
  const parts = Object.fromEntries(
    signature.split(',').map(part => part.split('='))
  );
  const timestamp = parts['t'];
  const v1Signature = parts['v1'];

  if (!timestamp || !v1Signature) return false;

  // Reject stale timestamps (replay attack prevention)
  const timestampMs = parseInt(timestamp, 10) * 1000;
  if (Math.abs(Date.now() - timestampMs) > toleranceMs) return false;

  // Compute expected signature
  const signedPayload = `${timestamp}.${payload}`;
  const expectedSignature = crypto
    .createHmac('sha256', secret)
    .update(signedPayload, 'utf8')
    .digest('hex');

  // Constant-time comparison (prevents timing attacks)
  const expectedBuffer = Buffer.from(expectedSignature, 'hex');
  const receivedBuffer = Buffer.from(v1Signature, 'hex');

  if (expectedBuffer.length !== receivedBuffer.length) return false;

  return crypto.timingSafeEqual(expectedBuffer, receivedBuffer);
}
# Python - webhook signature verification
import hashlib
import hmac
import time

def verify_webhook_signature(
    payload: bytes,
    signature_header: str,
    secret: str,
    tolerance_seconds: int = 300
) -> bool:
    parts = dict(part.split("=", 1) for part in signature_header.split(","))
    timestamp = parts.get("t")
    v1_sig = parts.get("v1")

    if not timestamp or not v1_sig:
        return False

    # Reject stale timestamps
    if abs(time.time() - int(timestamp)) > tolerance_seconds:
        return False

    # Compute expected signature
    signed_payload = f"{timestamp}.{payload.decode('utf-8')}".encode("utf-8")
    expected = hmac.new(
        secret.encode("utf-8"),
        signed_payload,
        hashlib.sha256
    ).hexdigest()

    # Constant-time comparison
    return hmac.compare_digest(expected, v1_sig)

The critical details: always use timingSafeEqual (Node.js) or hmac.compare_digest (Python) for the final comparison. Standard string equality leaks timing information that attackers can use to brute-force the signature one byte at a time. Always validate the timestamp — without this check, an attacker can capture a valid webhook payload and replay it days later.

For more on API security fundamentals, see our API security checklist and API authentication patterns.

Webhook Delivery Infrastructure

The architecture that makes webhooks reliable is straightforward but requires deliberate design. The naive approach — making an HTTP request to the customer's endpoint inline during your application's request handling — will cause you problems at any meaningful scale.

The production pattern is queue-based:

User Action → Your API → Write Event to Queue → Return 200
                              ↓
                    Webhook Worker (async)
                              ↓
                    HTTP POST to Customer URL
                              ↓
                    Success? → Mark delivered
                    Failure? → Schedule retry

The queue decouples event generation from event delivery. When a customer's endpoint is down, your application doesn't notice — events accumulate in the queue and are retried when the endpoint recovers. The worker can be scaled independently of your API layer.

Exponential backoff with jitter is the correct retry strategy. A simple 1s → 10s → 100s → 1000s sequence works, but adding ±25% jitter prevents thundering herd problems when many endpoints fail simultaneously (e.g., after a widespread outage) and all start retrying at the same time.

After all retries are exhausted — typically after 24–72 hours — events go to a dead letter queue. This is a separate storage location where permanently failed deliveries are preserved for manual inspection and potential replay. A webhook dashboard that shows dead-lettered events and allows manual replay is a significant DX improvement for your customers. This connects to the broader topic of API idempotency — dead-lettered events that get manually replayed months later must still be safe to reprocess.

Svix: Webhook Infrastructure as a Service

Building the signing, retry, queueing, dead letter, and dashboard components yourself is a multi-week project. Svix is a managed service that provides all of it as an API. You call Svix's API to send events, and Svix handles delivery, retry, signing, and provides a customer-facing portal where your users can manage their webhook endpoints.

The integration looks like this: instead of writing to your own queue and running your own delivery worker, you call svix.message.create() with the event payload. Svix takes over delivery. Your customers can log into a hosted portal (embeddable in your app) to see delivery history, configure endpoints, and replay failed events.

Svix's pricing model is based on message volume. The free tier includes 50,000 message deliveries per month. Paid plans start at $100/month for 1M deliveries. If you're building the webhook system for a developer-facing product and your team time is more expensive than Svix's per-message fee, the tradeoff usually favors Svix. The cases where rolling your own infrastructure makes sense are high-volume platforms where per-message pricing gets expensive, or applications with very specific retry/routing requirements that Svix's model doesn't accommodate.

Consumer-Side Best Practices

If you're on the receiving end of webhooks rather than sending them, a different set of best practices applies.

The single most important rule: return HTTP 200 immediately and process the event asynchronously. Any processing that takes longer than 30 seconds — database writes, calls to other APIs, email sending — must happen outside the webhook handler. Synchronous processing causes timeouts, which the sender treats as failures, which triggers retries, which causes duplicate processing.

// app/api/webhooks/incoming/route.ts
import { NextResponse } from 'next/server';
import { queue } from '@/lib/queue';

export async function POST(req: Request) {
  const body = await req.text();
  const signature = req.headers.get('webhook-signature') ?? '';

  // 1. Verify signature first — reject invalid before any processing
  if (!verifySignature(body, signature, process.env.WEBHOOK_SECRET!)) {
    return NextResponse.json({ error: 'Invalid signature' }, { status: 401 });
  }

  const event = JSON.parse(body);

  // 2. Check idempotency — have we already processed this event?
  const alreadyProcessed = await db.processedEvents.findUnique({
    where: { eventId: event.id }
  });

  if (alreadyProcessed) {
    return NextResponse.json({ received: true }); // Idempotent success
  }

  // 3. Enqueue for async processing — don't process inline
  await queue.enqueue('process-webhook', {
    eventId: event.id,
    type: event.type,
    data: event.data,
  });

  return NextResponse.json({ received: true }); // Immediate 200
}

Out-of-order delivery is another real-world problem. Webhooks are not guaranteed to arrive in the order they were sent — network delays and retry timing can cause order.updated to arrive before order.created. Design your event handlers to be order-independent. For state machine transitions (order status changes), check the current state in your database before applying the transition rather than assuming events arrive sequentially.

Debugging Webhook Failures

When webhook delivery fails in development or staging, the debugging workflow matters. The standard approaches:

Local development: Use ngrok to expose your localhost to the internet so webhook senders can reach your development server. ngrok http 3000 gives you a public URL that tunnels to your local port. Most webhook providers let you configure this URL for testing.

Inspection: webhook.site is a free service that gives you a URL that logs every HTTP request to it. Point a webhook endpoint at webhook.site, trigger the event, and inspect the full request including headers, body, and timing. This is the fastest way to verify what a sender is actually sending.

Delivery logs: Any serious webhook provider gives you per-delivery logs showing the request body, response status, response body, and retry history. Read these before debugging your own code — often the failure is a misconfigured URL or a 404 rather than a code bug.

Common failure modes: Timeouts are the most common (processing is synchronous and takes too long), followed by SSL certificate errors (expired or self-signed certs in staging environments), and 4xx responses (usually authentication errors where the endpoint checks an API key header that wasn't configured). A 400 from the consumer is particularly common because it often means the consumer is parsing JSON before signature verification — if signature verification uses the raw body but the handler already called req.json(), the raw body is consumed and unavailable.

8. Common Mistakes

MistakeImpactFix
No signaturesEndpoint spoofingHMAC-SHA256 every payload
No retriesSilent data lossExponential backoff, 5+ attempts
Synchronous deliveryApplication slowdownQueue-based async delivery
No event IDsDuplicate processingUnique ID per event
Following redirectsSSRF vulnerabilityTreat 3xx as failure
No timeoutWorker threads stuck30-second timeout
Partial object in payloadConsumer needs follow-up API callInclude full object state
String equality for signaturesTiming attackUse constant-time comparison
No timestamp validationReplay attacksReject timestamps older than 5 minutes

Building event-driven APIs? Explore webhook tools on APIScout — Svix, Hookdeck, Convoy, and more compared. Also see our guides on API idempotency and API error handling.

The API Integration Checklist (Free PDF)

Step-by-step checklist: auth setup, rate limit handling, error codes, SDK evaluation, and pricing comparison for 50+ APIs. Used by 200+ developers.

Join 200+ developers. Unsubscribe in one click.