ADR 0005 — Claude Vision for the image signal (Sprint C.2.1)
Date: 2026-04-17 Status: Accepted
Context
After C.2, five of six scoring signals had real compute paths; image remained
a placeholder that returned a score from URL presence alone. We need a real
image-originality signal before the pipeline can emit a non-stub aggregate
(and therefore attest on-chain, per C.3's hasStubs gate).
Options considered:
- Self-hosted CLIP / image embedding + nearest-neighbor against a meme corpus. Best long-term signal quality but requires infra we don't yet have (GPU worker, corpus ingest, refresh cadence). Out of scope for C.2.1.
- Third-party "originality" API (e.g. reverse image search). Ops cost, rate limits, and opaque scoring make this a poor fit for a deterministic audit trail.
- Claude Vision (Sonnet 4.6) with a structured tool-use scorer. Reuses
the exact client + cost-tracker + prompt-registry infra we already have
for the
memesignal. One new prompt file, one fetcher, done.
Decision
Add Claude Vision scoring (option 3). The image signal flow is:
no imageUrl → local stub (score=40, stub=true)
imageUrl → fetch via image-fetcher.ts (SSRF-safe)
fetch OK → Anthropic callTool with emit_image_score + image block
fetch fails → informative stub naming the fetch error code
vision call fails → informative stub (don't take down the aggregate for one signal)
client stub-mode → deterministic stub via defaultStub (hasStubs stays true)
We explicitly do not cache vision responses in C.2.1. Each scoring request is expected to be unique; caching becomes interesting only if we see the same submission twice, which is rare enough to defer.
Why Image Fetching Is Hardened
A naïve fetch(imageUrl) inside the API process is an SSRF primitive. The
fetcher in image-fetcher.ts enforces:
https://only — nohttp,data:,file:,ftp:- Hostname must not be
localhost/*.localhost/*.local - DNS-resolved IP (and IP literals) must not be in loopback / private /
link-local / multicast / test-net ranges (IPv4 + IPv6, incl.
::ffff:maps) redirect: 'error'— follow-up 3xx can bypass the SSRF check if not pinned- Content-type allowlist:
image/{png,jpeg,gif,webp} - Hard cap of 5 MiB, enforced against both
Content-Lengthand actual stream - 10-second timeout
These guards are enforced before any bytes are handed to Anthropic.
Why Stub Fallbacks Instead Of Throwing
A single signal failing must not fail the whole scoring call. Users see a
stubbed image score with a human-readable reason, hasStubs=true flips on,
and the C.3 attestation gate refuses to publish until the cause is fixed.
That keeps the operator feedback loop honest (the UI and attestation both visibly degrade) without burning the 30-second scoring budget on retries.
Cost Envelope
Per-call cost with Sonnet 4.6:
- Input tokens: ~(image tokens ≈ dim/750) + ~200 prompt tokens — a 1024×1024 image is ~1400 image tokens
- Output tokens: ≤40 (small tool payload)
- At Sonnet pricing ($3/$15 per M): ≈ $0.005 per scored image
Compared to the meme signal (~$0.002/call), the image signal roughly 2.5×es per-submission LLM spend. Within the scoring-cost envelope for the hackathon phase; revisit in C.8 if we see sustained traffic.
Consequences
- The
imagesignal is now live and deterministically stubs only when (a) the submission omits an image, (b) the fetch is blocked/fails, (c) the client is in stub mode (noANTHROPIC_API_KEY), or (d) the vision call itself errors. - Four of the six signals are now real (
meme,image,name,social); two remain stubbed pending external provider keys (creator→ Bitquery,risk→ GoPlus). - The
hasStubsattestation gate will only flip off once all signals are real. Green-band scores are reachable before then, but C.3 refuses to writeHatchAttestrows while any stub is present. - The Anthropic client now carries a generic
images?:field, usable by future signals that want vision.
Alternatives Not Picked
- CLIP/self-hosted — deferred to post-hackathon (needs GPU worker + corpus ingest).
- Perplexity Vision / Gemini Vision — would fork the client; no reason to fragment cost-tracking and prompt-versioning across providers in C.2.1.
- Skip image scoring entirely — rejected; image is one of the six signals in the engineering spec §6.2 and UI design expects a per-signal breakdown.