ADR 0005 — Claude Vision for the `image` signal (Sprint C.2.1)

Date: 2026-04-17 Status: Accepted

Context

After C.2, five of six scoring signals had real compute paths; image remained a placeholder that returned a score from URL presence alone. We need a real image-originality signal before the pipeline can emit a non-stub aggregate (and therefore attest on-chain, per C.3's hasStubs gate).

Options considered:

Self-hosted CLIP / image embedding + nearest-neighbor against a meme corpus. Best long-term signal quality but requires infra we don't yet have (GPU worker, corpus ingest, refresh cadence). Out of scope for C.2.1.
Third-party "originality" API (e.g. reverse image search). Ops cost, rate limits, and opaque scoring make this a poor fit for a deterministic audit trail.
Claude Vision (Sonnet 4.6) with a structured tool-use scorer. Reuses the exact client + cost-tracker + prompt-registry infra we already have for the meme signal. One new prompt file, one fetcher, done.

Decision

Add Claude Vision scoring (option 3). The image signal flow is:

no imageUrl        → local stub (score=40, stub=true)
imageUrl           → fetch via image-fetcher.ts (SSRF-safe)
fetch OK           → Anthropic callTool with emit_image_score + image block
fetch fails        → informative stub naming the fetch error code
vision call fails  → informative stub (don't take down the aggregate for one signal)
client stub-mode   → deterministic stub via defaultStub (hasStubs stays true)

We explicitly do not cache vision responses in C.2.1. Each scoring request is expected to be unique; caching becomes interesting only if we see the same submission twice, which is rare enough to defer.

Why Image Fetching Is Hardened

A naïve fetch(imageUrl) inside the API process is an SSRF primitive. The fetcher in image-fetcher.ts enforces:

https:// only — no http, data:, file:, ftp:
Hostname must not be localhost / *.localhost / *.local
DNS-resolved IP (and IP literals) must not be in loopback / private / link-local / multicast / test-net ranges (IPv4 + IPv6, incl. ::ffff: maps)
redirect: 'error' — follow-up 3xx can bypass the SSRF check if not pinned
Content-type allowlist: image/{png,jpeg,gif,webp}
Hard cap of 5 MiB, enforced against both Content-Length and actual stream
10-second timeout

These guards are enforced before any bytes are handed to Anthropic.

Why Stub Fallbacks Instead Of Throwing

A single signal failing must not fail the whole scoring call. Users see a stubbed image score with a human-readable reason, hasStubs=true flips on, and the C.3 attestation gate refuses to publish until the cause is fixed.

That keeps the operator feedback loop honest (the UI and attestation both visibly degrade) without burning the 30-second scoring budget on retries.

Cost Envelope

Per-call cost with Sonnet 4.6:

Input tokens: ~(image tokens ≈ dim/750) + ~200 prompt tokens — a 1024×1024 image is ~1400 image tokens
Output tokens: ≤40 (small tool payload)
At Sonnet pricing ($3/$15 per M): ≈ $0.005 per scored image

Compared to the meme signal (~$0.002/call), the image signal roughly 2.5×es per-submission LLM spend. Within the scoring-cost envelope for the hackathon phase; revisit in C.8 if we see sustained traffic.

Consequences

The image signal is now live and deterministically stubs only when (a) the submission omits an image, (b) the fetch is blocked/fails, (c) the client is in stub mode (no ANTHROPIC_API_KEY), or (d) the vision call itself errors.
Four of the six signals are now real (meme, image, name, social); two remain stubbed pending external provider keys (creator → Bitquery, risk → GoPlus).
The hasStubs attestation gate will only flip off once all signals are real. Green-band scores are reachable before then, but C.3 refuses to write HatchAttest rows while any stub is present.
The Anthropic client now carries a generic images?: field, usable by future signals that want vision.

Alternatives Not Picked

CLIP/self-hosted — deferred to post-hackathon (needs GPU worker + corpus ingest).
Perplexity Vision / Gemini Vision — would fork the client; no reason to fragment cost-tracking and prompt-versioning across providers in C.2.1.
Skip image scoring entirely — rejected; image is one of the six signals in the engineering spec §6.2 and UI design expects a per-signal breakdown.

ADR 0005 — Claude Vision for the `image` signal (Sprint C.2.1)

Date: 2026-04-17 Status: Accepted

Context

Options considered:

Self-hosted CLIP / image embedding + nearest-neighbor against a meme corpus. Best long-term signal quality but requires infra we don't yet have (GPU worker, corpus ingest, refresh cadence). Out of scope for C.2.1.
Third-party "originality" API (e.g. reverse image search). Ops cost, rate limits, and opaque scoring make this a poor fit for a deterministic audit trail.
Claude Vision (Sonnet 4.6) with a structured tool-use scorer. Reuses the exact client + cost-tracker + prompt-registry infra we already have for the meme signal. One new prompt file, one fetcher, done.

Decision

Add Claude Vision scoring (option 3). The image signal flow is:

no imageUrl        → local stub (score=40, stub=true)
imageUrl           → fetch via image-fetcher.ts (SSRF-safe)
fetch OK           → Anthropic callTool with emit_image_score + image block
fetch fails        → informative stub naming the fetch error code
vision call fails  → informative stub (don't take down the aggregate for one signal)
client stub-mode   → deterministic stub via defaultStub (hasStubs stays true)

Why Image Fetching Is Hardened

A naïve fetch(imageUrl) inside the API process is an SSRF primitive. The fetcher in image-fetcher.ts enforces:

https:// only — no http, data:, file:, ftp:
Hostname must not be localhost / *.localhost / *.local
DNS-resolved IP (and IP literals) must not be in loopback / private / link-local / multicast / test-net ranges (IPv4 + IPv6, incl. ::ffff: maps)
redirect: 'error' — follow-up 3xx can bypass the SSRF check if not pinned
Content-type allowlist: image/{png,jpeg,gif,webp}
Hard cap of 5 MiB, enforced against both Content-Length and actual stream
10-second timeout

These guards are enforced before any bytes are handed to Anthropic.

Why Stub Fallbacks Instead Of Throwing

That keeps the operator feedback loop honest (the UI and attestation both visibly degrade) without burning the 30-second scoring budget on retries.

Cost Envelope

Per-call cost with Sonnet 4.6:

Input tokens: ~(image tokens ≈ dim/750) + ~200 prompt tokens — a 1024×1024 image is ~1400 image tokens
Output tokens: ≤40 (small tool payload)
At Sonnet pricing ($3/$15 per M): ≈ $0.005 per scored image

Consequences

The image signal is now live and deterministically stubs only when (a) the submission omits an image, (b) the fetch is blocked/fails, (c) the client is in stub mode (no ANTHROPIC_API_KEY), or (d) the vision call itself errors.
Four of the six signals are now real (meme, image, name, social); two remain stubbed pending external provider keys (creator → Bitquery, risk → GoPlus).
The hasStubs attestation gate will only flip off once all signals are real. Green-band scores are reachable before then, but C.3 refuses to write HatchAttest rows while any stub is present.
The Anthropic client now carries a generic images?: field, usable by future signals that want vision.

Alternatives Not Picked

CLIP/self-hosted — deferred to post-hackathon (needs GPU worker + corpus ingest).
Perplexity Vision / Gemini Vision — would fork the client; no reason to fragment cost-tracking and prompt-versioning across providers in C.2.1.
Skip image scoring entirely — rejected; image is one of the six signals in the engineering spec §6.2 and UI design expects a per-signal breakdown.

ADR 0005 — Claude Vision for the image signal (Sprint C.2.1)

Context

Decision

Why Image Fetching Is Hardened

Why Stub Fallbacks Instead Of Throwing

Cost Envelope

Consequences

Alternatives Not Picked

ADR 0005 — Claude Vision for the image signal (Sprint C.2.1)

Context

Decision

Why Image Fetching Is Hardened

Why Stub Fallbacks Instead Of Throwing

Cost Envelope

Consequences

Alternatives Not Picked

ADR 0005 — Claude Vision for the `image` signal (Sprint C.2.1)

ADR 0005 — Claude Vision for the `image` signal (Sprint C.2.1)