What a webhook actually has to do
A webhook is a contract between two systems. The sender promises to deliver an event when something happens; the receiver promises to acknowledge that delivery. Underneath that one-line summary, both sides have to handle the realities of the network: requests time out, responses get lost, receivers go offline, and the same event can arrive twice.
A webhook system that works in production has to provide four things: authenticity (the receiver can verify the request really came from the sender), integrity (the receiver can detect tampering), delivery durability (events that fail get retried for long enough that transient outages don't lose data), and safe replay (the receiver can handle duplicate events without double-counting). The design choices for each interact, so it's easier to think through them together.
Signing the payload
The standard pattern is HMAC-SHA256 over the raw request body, using a shared secret known only to sender and receiver. The sender computes the signature, attaches it as a header (typically X-Signature or a vendor-prefixed equivalent), and sends the request. The receiver re-computes the signature over the body it received and compares it in constant time.
A few details that are easy to get wrong:
- Sign the raw bytes, not the parsed JSON. A receiver that re-serialises before verifying will mismatch on whitespace, key order, or numeric formatting.
- Include a timestamp in what's signed. The signature should cover both the body and a timestamp header. Reject requests where the timestamp is outside an acceptable window (a few minutes either side). This prevents an attacker who captures a valid request from replaying it indefinitely.
- Use constant-time comparison. Standard string equality is variable-time and leaks the prefix of the expected signature through timing differences. Most languages have a constant-time compare in their crypto library.
- Rotate the secret without downtime. Support two valid secrets at once during a rotation window so receivers can update without dropping deliveries. Document the rotation procedure.
Delivery and retries
The sender will see one of three outcomes per delivery attempt: a successful acknowledgement (typically a 2xx response received within a timeout), a definite failure (a 4xx response that means "don't retry"), or an indeterminate result (timeout, 5xx, connection error). Only the last of these warrants a retry.
A reasonable retry schedule is exponential with jitter. Common values: retry after 30 seconds, then double each time up to a cap of about an hour, with random jitter on each interval, for a total window of 24 to 72 hours. The shape matters more than the exact numbers — back off fast enough that you don't hammer a recovering receiver, but keep trying long enough that a half-day outage doesn't drop events.
Two distinctions worth making explicit:
- 4xx vs 5xx. A 4xx from the receiver means "I understood your request and I'm rejecting it on its merits." Don't retry. A 5xx means "something on my side broke." Retry. Some senders treat 410 Gone as a signal to suspend the endpoint entirely, which is sensible if the receiver explicitly tells you they're done.
- Timeout vs explicit failure. If you time out, you don't know whether the receiver processed the event. Retry, and rely on the receiver's idempotency to deduplicate. If the receiver returned a 5xx, you know it didn't process the event. Both cases retry, but the first is what makes idempotency on the receiver mandatory.
Idempotency on the receiver
Every webhook delivery needs a stable, unique event ID — a UUID assigned by the sender at the moment the event is created. Send it as both a header (X-Event-Id) and a field in the body so receivers don't have to choose. The receiver's first action on receipt is to look up that ID; if it has been seen before, return 200 immediately and do nothing else.
This is the receiver-side mirror of the idempotency pattern covered in Idempotency Keys for APIs. The same principles apply: the lookup must be transactional with the write, the deduplication window has to cover the full retry window plus a margin, and the storage cost has to be planned.
Ordering: don't promise what you can't keep
Webhook senders should not promise strict ordering. The moment retries enter the picture, an event that was generated first can be delivered after one that was generated later — first attempt of event A times out, event B is delivered and acknowledged, then the retry of A succeeds. Receivers that assume order will misbehave.
What you can promise is each event's logical timestamp, expressed as a field in the body. Receivers that need ordering can use that timestamp to reorder, or — better — to detect that they're applying an out-of-order event and skip it (for state-update events) or queue it for processing in the right order (for sequence-sensitive events).
If you absolutely need in-order delivery, the only honest way to provide it is per-key serialization: queue all events for a given resource (an account, a tenant) on a single processor. That introduces a head-of-line blocking failure mode, and it's a serious commitment. Most APIs are better off documenting that webhooks are unordered and pushing the ordering decision onto the receiver where it belongs.
What to put in the body
Two schools of thought, both defensible:
- Full payload. Include everything the receiver needs to act on the event, so they don't have to call back to fetch context. Lower latency for the receiver, higher bandwidth for the sender, and harder to keep secrets out of the payload.
- Reference only. Include just the event type, the resource ID, and the event ID. Receivers fetch the full state via the API. Lower bandwidth, easier to keep payloads small and stable, but every event becomes two round trips for the receiver.
The reference pattern pairs well with idempotency, because the receiver always reads the current state on processing — they cannot apply a stale snapshot from an out-of-order delivery. The full-payload pattern is friendlier for simple integrations. Pick one and be consistent within an event type.
A worked example: payment events
Consider a webhook system delivering events for a payment platform. Three events: payment.succeeded, payment.failed, payment.refunded. The receiver needs to update an order record and notify a customer.
- Each event carries an
event_id(UUID), anoccurred_attimestamp, thetype, and adataobject with the payment record. - Each request is signed with HMAC-SHA256 over
{timestamp}.{body}. The signature and timestamp go inX-SignatureandX-Timestampheaders. Receivers reject requests with timestamps more than five minutes off. - Retries follow the schedule above for 72 hours, after which the event is moved to a dead-letter store and the integration is flagged for review.
- The receiver's handler does, in order: verify the signature, check whether
event_idhas been processed (return 200 if so), apply the change in a transaction that also stores the event ID, and return 200.
That's about as much as a webhook system needs to do. Anything more — guaranteed ordering, exactly-once-not-just-at-least-once, push-with-pull-fallback — is either a bigger commitment than most senders should make, or a layer that belongs above webhooks entirely (a message queue, an event-sourced store).
Common mistakes
- Returning 200 before the work is done. If the receiver acknowledges a webhook and then crashes before processing it, the event is lost. Acknowledge only after the change is durable.
- Doing slow work synchronously. Conversely, if processing takes a long time, the receiver may time out the request even though it succeeded. Acknowledge fast (after persisting the event), then process asynchronously.
- Trusting the source IP. IP allow-lists are a useful belt-and-braces measure but they are not a substitute for signature verification. Sender IPs change.
- No visibility for the receiver. Provide an admin view where receivers can see deliveries, statuses, and replay specific events. Without this, debugging is by mailing list.
- Silent secret rotation. Rotating the signing secret without coordinating with receivers will turn every webhook into a signature-mismatch failure overnight.
Where to go next
For the pattern that the receiver applies to make retries safe, see Idempotency Keys for APIs. For the broader API design context — naming, envelope shapes, errors — see API Design Best Practices. For real-time push as an alternative to webhooks, see the WebSocket reference.