Building Webhooks That Don't Break in 2026
Building Webhooks That Don't Break in 2026
Webhooks are the internet's callback mechanism — when something happens in your system, you POST a JSON payload to a URL your customer provides. Simple in concept. In practice, webhooks fail silently, arrive out of order, get replayed, and expose security vulnerabilities. Here's how to build webhooks that actually work.
TL;DR
- Sign every payload with HMAC-SHA256 using a per-endpoint secret — unsigned webhooks can be spoofed by anyone who knows the endpoint URL
- Use a queue-based delivery architecture so webhooks are decoupled from your application's request path
- Retry with exponential backoff for at least 5 attempts before marking an endpoint as failing
- Consumers must return 200 immediately and process asynchronously — any other pattern causes timeouts and false retries
- Track processed event IDs to handle duplicate delivery; the same event will be delivered more than once over time
The Basics
A webhook system has three components:
- Event source — something happens in your system (order created, payment completed)
- Delivery — you POST a JSON payload to the customer's URL
- Verification — the customer verifies the payload came from you (signature)
1. Sign Every Payload
Unsigned webhooks can be spoofed. Anyone who knows the endpoint URL can send fake events. Always sign payloads with HMAC-SHA256.
How Stripe does it:
Stripe-Signature: t=1710892800,v1=abc123...
The signature includes: a timestamp (prevents replay attacks) and an HMAC of timestamp.payload using a per-endpoint secret.
Your implementation should:
- Generate a unique signing secret per webhook endpoint
- Include a timestamp in the signature to prevent replay attacks
- Use HMAC-SHA256 (not MD5, not SHA1)
- Sign the raw request body (not parsed JSON)
- Document the verification process with code examples in 5+ languages
2. Retry Failed Deliveries
Webhook endpoints go down. Networks fail. Servers return 500s. Retry with exponential backoff.
Retry schedule (example):
| Attempt | Delay | Total elapsed |
|---|---|---|
| 1 | Immediate | 0 |
| 2 | 5 minutes | 5 min |
| 3 | 30 minutes | 35 min |
| 4 | 2 hours | 2h 35m |
| 5 | 8 hours | 10h 35m |
| 6 | 24 hours | 34h 35m |
Success criteria: 2xx status code within 30 seconds. Anything else (3xx, 4xx, 5xx, timeout) triggers a retry.
After all retries fail: Mark the endpoint as failing. Notify the customer via email. Pause delivery after N consecutive failures. Provide a manual replay mechanism.
3. Make Events Idempotent
Network issues and retries mean endpoints may receive the same event multiple times. Every event should include a unique ID that consumers use for deduplication.
{
"id": "evt_abc123",
"type": "order.completed",
"created_at": "2026-03-08T12:00:00Z",
"data": { ... }
}
Consumer-side: Store processed event IDs. Before processing, check if evt_abc123 was already handled. Skip if yes.
4. Event Design
Consistent Event Schema
Every event should have the same top-level structure:
{
"id": "evt_abc123",
"type": "order.completed",
"api_version": "2026-03-08",
"created_at": "2026-03-08T12:00:00Z",
"data": {
"object": { ... }
}
}
Event Types
Use resource.action naming: order.created, order.updated, payment.succeeded, payment.failed.
Include Full Objects
Include the full current state of the object, not just the changed fields. This way, consumers don't need to make follow-up API calls.
{
"type": "order.updated",
"data": {
"object": {
"id": "ord_123",
"status": "shipped",
"total": 4999,
"items": [...],
"customer": { ... }
}
}
}
5. Delivery Infrastructure
Async Processing
Never block your application to deliver webhooks. Queue events and process delivery asynchronously.
Application → Event Queue → Webhook Worker → HTTP POST
Timeout
Set a 30-second timeout for webhook delivery. If the endpoint doesn't respond in 30 seconds, mark as failed and retry.
Don't Follow Redirects
Webhook delivery should not follow redirects (3xx responses). Treat redirects as failures. The configured URL should be the final destination.
IP Allowlisting
Publish the IP addresses your webhooks are sent from. Customers may need to allowlist them in their firewall.
6. Security
Prevent SSRF
Customers provide webhook URLs — don't let them point to internal services. Validate URLs:
- Block private IP ranges (10.x, 172.16.x, 192.168.x, 127.x)
- Block link-local addresses (169.254.x)
- Block localhost
- Resolve DNS before connecting and check the resolved IP
Rate Limit Deliveries
If a customer configures multiple endpoints, limit the total delivery rate per customer. One endpoint failure shouldn't trigger thousands of retry requests.
Payload Size
Limit webhook payload size (e.g., 256KB). Large payloads can overwhelm consumers. For large data, include a reference URL instead.
7. Developer Experience
Webhook Dashboard
Provide a UI where customers can:
- View delivery attempts (success/failure/pending)
- See request and response bodies
- Manually replay failed events
- Test with sample events
- Manage endpoint URLs and signing secrets
CLI Testing
Provide a CLI tool for local webhook testing:
your-cli webhooks listen --port 3000
This creates a tunnel so developers can receive webhooks on localhost during development.
Event Catalog
Document every event type with example payloads, when they fire, and what data they include.
Webhook Signing Implementation
The signature verification step is where most implementations introduce vulnerabilities. There are two subtle mistakes: using string equality comparison (vulnerable to timing attacks) and not validating the timestamp (vulnerable to replay attacks).
Here is a correct implementation in both Node.js and Python:
// Node.js - webhook signature verification
import crypto from 'crypto';
interface WebhookVerifyOptions {
payload: string; // Raw request body as string
signature: string; // From header: "t=timestamp,v1=signature"
secret: string; // Per-endpoint signing secret
toleranceMs?: number; // Max age of timestamp (default: 5 minutes)
}
function verifyWebhookSignature(opts: WebhookVerifyOptions): boolean {
const { payload, signature, secret, toleranceMs = 5 * 60 * 1000 } = opts;
// Parse the signature header
const parts = Object.fromEntries(
signature.split(',').map(part => part.split('='))
);
const timestamp = parts['t'];
const v1Signature = parts['v1'];
if (!timestamp || !v1Signature) return false;
// Reject stale timestamps (replay attack prevention)
const timestampMs = parseInt(timestamp, 10) * 1000;
if (Math.abs(Date.now() - timestampMs) > toleranceMs) return false;
// Compute expected signature
const signedPayload = `${timestamp}.${payload}`;
const expectedSignature = crypto
.createHmac('sha256', secret)
.update(signedPayload, 'utf8')
.digest('hex');
// Constant-time comparison (prevents timing attacks)
const expectedBuffer = Buffer.from(expectedSignature, 'hex');
const receivedBuffer = Buffer.from(v1Signature, 'hex');
if (expectedBuffer.length !== receivedBuffer.length) return false;
return crypto.timingSafeEqual(expectedBuffer, receivedBuffer);
}
# Python - webhook signature verification
import hashlib
import hmac
import time
def verify_webhook_signature(
payload: bytes,
signature_header: str,
secret: str,
tolerance_seconds: int = 300
) -> bool:
parts = dict(part.split("=", 1) for part in signature_header.split(","))
timestamp = parts.get("t")
v1_sig = parts.get("v1")
if not timestamp or not v1_sig:
return False
# Reject stale timestamps
if abs(time.time() - int(timestamp)) > tolerance_seconds:
return False
# Compute expected signature
signed_payload = f"{timestamp}.{payload.decode('utf-8')}".encode("utf-8")
expected = hmac.new(
secret.encode("utf-8"),
signed_payload,
hashlib.sha256
).hexdigest()
# Constant-time comparison
return hmac.compare_digest(expected, v1_sig)
The critical details: always use timingSafeEqual (Node.js) or hmac.compare_digest (Python) for the final comparison. Standard string equality leaks timing information that attackers can use to brute-force the signature one byte at a time. Always validate the timestamp — without this check, an attacker can capture a valid webhook payload and replay it days later.
For more on API security fundamentals, see our API security checklist and API authentication patterns.
Webhook Delivery Infrastructure
The architecture that makes webhooks reliable is straightforward but requires deliberate design. The naive approach — making an HTTP request to the customer's endpoint inline during your application's request handling — will cause you problems at any meaningful scale.
The production pattern is queue-based:
User Action → Your API → Write Event to Queue → Return 200
↓
Webhook Worker (async)
↓
HTTP POST to Customer URL
↓
Success? → Mark delivered
Failure? → Schedule retry
The queue decouples event generation from event delivery. When a customer's endpoint is down, your application doesn't notice — events accumulate in the queue and are retried when the endpoint recovers. The worker can be scaled independently of your API layer.
Exponential backoff with jitter is the correct retry strategy. A simple 1s → 10s → 100s → 1000s sequence works, but adding ±25% jitter prevents thundering herd problems when many endpoints fail simultaneously (e.g., after a widespread outage) and all start retrying at the same time.
After all retries are exhausted — typically after 24–72 hours — events go to a dead letter queue. This is a separate storage location where permanently failed deliveries are preserved for manual inspection and potential replay. A webhook dashboard that shows dead-lettered events and allows manual replay is a significant DX improvement for your customers. This connects to the broader topic of API idempotency — dead-lettered events that get manually replayed months later must still be safe to reprocess.
Svix: Webhook Infrastructure as a Service
Building the signing, retry, queueing, dead letter, and dashboard components yourself is a multi-week project. Svix is a managed service that provides all of it as an API. You call Svix's API to send events, and Svix handles delivery, retry, signing, and provides a customer-facing portal where your users can manage their webhook endpoints.
The integration looks like this: instead of writing to your own queue and running your own delivery worker, you call svix.message.create() with the event payload. Svix takes over delivery. Your customers can log into a hosted portal (embeddable in your app) to see delivery history, configure endpoints, and replay failed events.
Svix's pricing model is based on message volume. The free tier includes 50,000 message deliveries per month. Paid plans start at $100/month for 1M deliveries. If you're building the webhook system for a developer-facing product and your team time is more expensive than Svix's per-message fee, the tradeoff usually favors Svix. The cases where rolling your own infrastructure makes sense are high-volume platforms where per-message pricing gets expensive, or applications with very specific retry/routing requirements that Svix's model doesn't accommodate.
Consumer-Side Best Practices
If you're on the receiving end of webhooks rather than sending them, a different set of best practices applies.
The single most important rule: return HTTP 200 immediately and process the event asynchronously. Any processing that takes longer than 30 seconds — database writes, calls to other APIs, email sending — must happen outside the webhook handler. Synchronous processing causes timeouts, which the sender treats as failures, which triggers retries, which causes duplicate processing.
// app/api/webhooks/incoming/route.ts
import { NextResponse } from 'next/server';
import { queue } from '@/lib/queue';
export async function POST(req: Request) {
const body = await req.text();
const signature = req.headers.get('webhook-signature') ?? '';
// 1. Verify signature first — reject invalid before any processing
if (!verifySignature(body, signature, process.env.WEBHOOK_SECRET!)) {
return NextResponse.json({ error: 'Invalid signature' }, { status: 401 });
}
const event = JSON.parse(body);
// 2. Check idempotency — have we already processed this event?
const alreadyProcessed = await db.processedEvents.findUnique({
where: { eventId: event.id }
});
if (alreadyProcessed) {
return NextResponse.json({ received: true }); // Idempotent success
}
// 3. Enqueue for async processing — don't process inline
await queue.enqueue('process-webhook', {
eventId: event.id,
type: event.type,
data: event.data,
});
return NextResponse.json({ received: true }); // Immediate 200
}
Out-of-order delivery is another real-world problem. Webhooks are not guaranteed to arrive in the order they were sent — network delays and retry timing can cause order.updated to arrive before order.created. Design your event handlers to be order-independent. For state machine transitions (order status changes), check the current state in your database before applying the transition rather than assuming events arrive sequentially.
Debugging Webhook Failures
When webhook delivery fails in development or staging, the debugging workflow matters. The standard approaches:
Local development: Use ngrok to expose your localhost to the internet so webhook senders can reach your development server. ngrok http 3000 gives you a public URL that tunnels to your local port. Most webhook providers let you configure this URL for testing.
Inspection: webhook.site is a free service that gives you a URL that logs every HTTP request to it. Point a webhook endpoint at webhook.site, trigger the event, and inspect the full request including headers, body, and timing. This is the fastest way to verify what a sender is actually sending.
Delivery logs: Any serious webhook provider gives you per-delivery logs showing the request body, response status, response body, and retry history. Read these before debugging your own code — often the failure is a misconfigured URL or a 404 rather than a code bug.
Common failure modes: Timeouts are the most common (processing is synchronous and takes too long), followed by SSL certificate errors (expired or self-signed certs in staging environments), and 4xx responses (usually authentication errors where the endpoint checks an API key header that wasn't configured). A 400 from the consumer is particularly common because it often means the consumer is parsing JSON before signature verification — if signature verification uses the raw body but the handler already called req.json(), the raw body is consumed and unavailable.
8. Common Mistakes
| Mistake | Impact | Fix |
|---|---|---|
| No signatures | Endpoint spoofing | HMAC-SHA256 every payload |
| No retries | Silent data loss | Exponential backoff, 5+ attempts |
| Synchronous delivery | Application slowdown | Queue-based async delivery |
| No event IDs | Duplicate processing | Unique ID per event |
| Following redirects | SSRF vulnerability | Treat 3xx as failure |
| No timeout | Worker threads stuck | 30-second timeout |
| Partial object in payload | Consumer needs follow-up API call | Include full object state |
| String equality for signatures | Timing attack | Use constant-time comparison |
| No timestamp validation | Replay attacks | Reject timestamps older than 5 minutes |
Building event-driven APIs? Explore webhook tools on APIScout — Svix, Hookdeck, Convoy, and more compared. Also see our guides on API idempotency and API error handling.