ADR 0008 — Webhook system + event bus
Status: Accepted (2026-04-18, Sprint H.4) Supersedes: none Blocks: J.2 (social wave), G.4 (Hatcher TG/Discord bot)
Context
Partners integrating against Hatch's public API (H.1) need a push channel
for score, enrollment, and launch events — polling /api/v1/score/:token
is wasteful and slow. A reliable, signed, replayable delivery system is
the missing piece.
Decision
1. HMAC signatures, Stripe-compatible scheme
signingInput = "${timestamp}.${body}" → HMAC-SHA256 hex →
X-Hatch-Signature: t=<ts>,v1=<hex>. Partners vendor any generic
Stripe-webhook-verifier snippet. Timestamp tolerance default 300s
(replay window).
2. AES-256-GCM at-rest encryption of per-subscription secrets
Secrets must be recoverable at delivery time to sign payloads, so
hashing isn't an option. WEBHOOK_ENCRYPTION_KEY is a 32-byte hex
env-scoped key. Format: base64url(iv[12] || authTag[16] || ciphertext).
Rotating the key invalidates every stored webhook secret — the correct
fail-safe mode.
Plaintext is returned to the client ONCE at create time and never again.
Dashboards show secretPrefix (first 10 chars) for leak detection.
3. Exponential-backoff retry with dead letter
Backoff curve in seconds: [60, 300, 1800, 7200, 43200, 172800] →
7 total attempts over ~48h. After the final failure, dead_letter_at
is set and the row is no longer scheduled. The dashboard exposes a
POST /v1/webhooks/deliveries/:id/replay to requeue dead letters.
4. Pure dispatcher function driven by admin tick
runWebhookDispatcherTick is stateless over (listDue, loadSub, markOutcome, fetch, now).
Admin-bearer-gated POST /v1/webhooks/dispatch drives it today;
future worker (per F.5's notification dispatcher) reuses the same function.
5. Fire-and-forget emission from producers
emit(event, id, payload) returns void. Producers (scoring persist,
attestation publisher, enrollment onEnrolled) never await or surface
webhook errors. A failing emit logs + continues; the dispatcher retries
the delivery separately.
6. Event catalog is append-only
score.created, score.published, enrollment.created,
launch.scheduled, graduation.crossed. Breaking payload changes
require a new event name. Subscribers can subscribe to * for all
events.
Alternatives considered
- Vercel Queues / Redis Streams for delivery. Heavier than Hatch needs today; adds ops surface. Current design is a single dispatcher tick over Postgres — upgrade cleanly when scale demands.
- EventBridge / Svix-as-a-service. Outsources the core integration surface partners see and complicates HMAC + retry semantics. Keep in-house.
- JWT-signed payloads. Larger headers, no real security benefit over HMAC for webhook shape.
- Shared global webhook secret. Per-subscription secrets allow compromise isolation — one partner's leaked secret doesn't blast to all partners.
Risks
- Dispatcher tick endpoint called without a cron — if no caller exists, deliveries silently queue. Mitigated by platform-level cron when deployed.
- Encryption key rotation breaks all existing subscriptions. Documented
in
.env.example+ runbook (follow-up). - Infinite replay loop if subscriber responds 200 but partner server then re-posts. Out of scope — standard webhook problem.
Follow-ups
- Runbook: encryption key rotation procedure (decrypt-all + re-encrypt with new key).
- H.4.1: per-subscription event filters beyond the coarse list (e.g. band=green only).
- I.4: expose delivery metrics in the transparency dashboard.