# Spike #6 — idempotent / exactly-once delivery across the broker↔brain boundary

> Date: 2026-06-01. Status: **PASS**.
> Throwaway code: `../spt-spikes/spike-06-idempotent-boundary` (outside the spt-core repo, not shipped).
> Closes ADR-0004 §E.4 (codex FATAL-adjacent #14). Design input for `REQ-HAZARD-RESTART-IDEMPOTENT` (M3b idempotency task).

## Question

The brain restarts freely (ADR-0004); the broker survives. But a brain can crash *mid-operation* — after a side-effect lands on the wire but before it is acked, or after intent is recorded but before the effect runs. Codex flagged this (#14): without durable IDs + replay rules at every broker↔brain side-effect boundary (spool write, PTY write, transfer, registry update), a restart silently drops or duplicates effects. **Can a side-effect boundary guarantee exactly-once across a crash at any point in the apply protocol?**

## Method

One Rust binary (pure std, deterministic crash injection), two modes: `worker <crash_spec>`, `run`. The boundary is modeled abstractly — one durable-ID + WAL + dedup mechanism standing in for all four real boundaries.

Apply protocol per item `N` (the brain logic):
1. `durable_append(JOURNAL, "PENDING N")` — fsync'd intent.
2. effect — **guarded by the durable applied-set**: append `applied N` to `EFFECTS` only if `N` is not already there.
3. `durable_append(JOURNAL, "DONE N")` — fsync'd ack.

On restart the worker `recover()`s from the **broker-owned** durable files (journal + effects survive the crash) and resumes: skip `DONE` items, re-attempt `PENDING`-without-`DONE`. The two halves of exactly-once:
- **NO DROP** — journal-driven resume re-attempts any un-acked item.
- **NO DUP** — the dedup-by-ID guard makes re-applying an already-applied ID a no-op (the `after_effect` trap: effect landed, ack didn't).

Crash points injected: `before_pending:N`, `before_effect:N`, `after_effect:N`. The `run` gauntlet hits every dangerous window across `TOTAL=12` items, including the no-dup trap and repeated crashes around the same id (`after_effect:5` then `before_pending:5`).

Invariants: **NO DUP** each id applied at most once · **NO DROP** every id `0..TOTAL` applied · **EXACTLY-ONCE** = both, across the full gauntlet.

## Result

```
gauntlet: before_pending:2, before_effect:5, after_effect:5, before_pending:5,
          before_effect:8, after_effect:8, before_effect:11, none
applied (order): [0,1,2,3,4,5,6,7,8,9,10,11]   count 12 / 12
dedup fired on id 5 and id 8 (after_effect crash → effect on disk, replay skipped re-apply)
NO DUP PASS  NO DROP PASS  EXACTLY-ONCE PASS  →  OVERALL PASS ✅
```

The dedup path fired exactly where designed: after a crash with the effect written but the ack missing, the next life saw the id in the applied-set and skipped the effect — no duplicate — then wrote the ack and moved on. No item was lost despite seven crashes.

## Key finding — the recovery anchor must be broker-owned, and the effect must be the dedup point

Two design constraints fall out, both binding on the M3b idempotency task:
1. **The journal + effects (the durable applied-set) are broker-owned state**, not brain state — they *are* the thing that survives the crash being recovered from. A brain-local journal would die with the brain.
2. **Dedup must live at the effect boundary, keyed by the durable ID** — not at intake. The dangerous window is *effect-applied-but-not-acked*; only an applied-set check at the effect closes it. `PENDING`/`DONE` ordering gives no-drop; the applied-set guard gives no-dup. Both are required; neither alone is exactly-once.

## What this does NOT yet prove

- Real side-effects are not all append-with-readback: a PTY write or a network send cannot be "checked then applied" atomically against a durable set the way a file append can. M3b must map each of the four real boundaries (spool / PTY write / transfer / registry) onto this pattern and decide where the durable applied-set lives for each. This spike validates the *protocol shape*; the per-boundary instantiation is M3b task B5.
- fsync durability is assumed (`sync_all`); torn-write / power-loss semantics below the FS are out of scope.

## Verdict

ADR-0004 §E.4 closes **PASS**. Durable-ID + WAL + dedup-at-effect is sufficient for exactly-once across a brain crash at any protocol point. M3b is unblocked on this gate. Carry the two design constraints (broker-owned recovery anchor; dedup-at-effect-keyed-by-ID) into `REQ-HAZARD-RESTART-IDEMPOTENT` / task B5.
