# Restoration D4 — multi-session cold-start resume; retire the `BrainState` message (task plan)

> Working doc for RESTORATION-PLAN.md **D4** (ADR-0018 Q6). D1 (process split),
> D2 (loop relocation), D3 (supervision anchor + real brain-process update
> trigger) are DONE + cross-OS CI-green on main (348a739). D3-3 made `apply`
> trigger a real brain-process restart (`request_brain_restart` → supervisor
> respawn); the outgoing brain is **gone** before the new one starts, so
> continuity can no longer ride a brain→brain frame — it must come from the
> persistent side (the broker). D4 builds that.
>
> **Scope correction (carried from the D3-3 close + the 2026-06-09 commune).**
> The master plan's D4 text points at `applyhost.rs:239` re-attaching
> `sessions.first()` with `from_seq=0` — that was the **in-process apply-path**
> handoff, **deleted in D3-3**. D4 is therefore re-scoped onto the **brain
> `cold_start` path**: the new brain the *supervisor* spawns (crash **or**
> update respawn) reconstructs continuity by querying the broker, not via a
> handoff message. Same Q6 invariant, correct seam.

## Goal (D4-close invariant)

A brain spawned by the broker's supervisor — for an update cycle (D3-3) **or** a
crash recovery (D1) — comes up via `Brain::cold_start` and reconstructs **all**
session continuity by querying the broker. Concretely at D4-close:

- The **broker is cursor-of-record**: per hosted session it tracks the
  last-**delivered** output cursor (the high-water seq it has written to a
  subscriber), and that cursor **survives subscriber detach** — it lives on the
  broker's `OutputLog`, not in the (now-dead) brain. Today the cursor lives only
  in the brain's `next_seq` and dies with the brain.
- A cold-starting brain queries the broker for **every** hosted session and its
  resume cursor, and re-attaches **each** in **resume** mode — fixing today's two
  latent gaps the moment daemon-hosted sessions land: a single-session `Brain`
  that could only carry one session, re-subscribed from `from_seq=0` (full ring
  re-replay → duplicate output).
- **Output is at-least-once** (broker resumes from last-*sent*; a boundary chunk
  may re-send → a dup the SPIKE-05 terminal-stream contract already tolerates);
  **input/effects stay exactly-once** via the already-broker-owned
  `EffectJournal` (B5, unchanged).
- The **`BrainState` handoff *message* is retired from the production path** for
  good — the new brain cold-starts and reconstructs from the broker. `BrainState`
  / `Brain::handoff` / `Brain::snapshot` stay `pub` and compiled **for the
  integration tests only** (handoff.rs, daemon_e2e.rs, idempotent.rs, attach.rs,
  brain_swap.rs, update.rs's test-only `apply_brain_only`).

## What is already satisfied (don't re-build)

- **The production path already stopped calling the handoff frame (D3-3).**
  `apply_staged` (applyhost.rs) triggers `request_brain_restart`; the supervisor
  respawns from `current_exe()`. Grep confirms `Brain::handoff` / `.snapshot()` /
  `BrainState` have **zero non-test callers** (`update.rs::apply_brain_only` is
  itself reached only by tests). So D4-3 is **confirm + lock + document**, not a
  removal — the wire is already cut; D4-3 nails it shut and makes the
  "production resumes via cold_start, never the frame" invariant falsifiable.
- **The broker already buffers + replays per session.** `OutputLog` holds the
  bounded `ring` + `next_seq` and `attach(sub, from_seq)` already replays from a
  cursor (broker.rs:103-129). D4-1 adds the **persisted delivered-cursor**; it
  does not rebuild the ring or the replay path.
- **The brain already does per-key cursor dedup** for net streams
  (`net_cursors: HashMap<stream_id,u64>`) and presence — the exact discipline
  (accept-next / drop-dup / reject-gap, brain.rs:418-474) D4-2 generalizes to
  **per-session** output. The single-session `next_seq` path stays the legacy
  spawn/drive shape; the multi-session map is the resume shape.
- **`net-status`-derived loops already cold-restart correctly (D2/D3-3).** The
  supervised daemon brain (`run_brain`) today drives **no PTY sessions** — only
  net-consumers + shellwake, both re-derived from disk / `net-status` on start.
  So `run_brain` calling `resume_sessions()` is a **no-op today** (zero hosted
  sessions) and **forward-correct** when daemon-hosted sessions arrive with the
  live-agent adapter. D4 builds + **proves the capability with a real
  multi-session harness now** (activate-don't-pre-fail: the machinery is real and
  exercised, not dead code waiting on the adapter).

## Per-commit discipline

Each sub-task is its own atomic commit with evidence tagged in-commit. Gates
every commit: `cargo build` · `cargo test` · `cargo clippy` · `cargo build
--no-default-features` · `traceable-reqs check` (EXIT=0) · `xtask check`. Push to
a dev-freeform branch → **CI both runners** before any tag.

---

## D4-1 — Broker becomes cursor-of-record (per-session delivered cursor) · Q6  ✅ DONE (fea8eaa)

The persistent side gains memory of *where each subscriber was* so a cold brain
can ask, instead of re-replaying from 0.

- **`OutputLog` gains `delivered_through: u64`** — the next seq the subscriber
  still needs (= highest seq **successfully written** to a subscriber, +1).
  **Survives `detach_if`** (it is log state, not subscriber state), so a dead
  brain leaves its cursor behind; the resume query reads `delivered_through`.
- **[doyle amendment 1 — advance on `Ok(write)` ONLY.]** Both send sites swallow
  the write error today (`let _ = write_frame(...)` at `append` broker.rs:113 and
  the `attach` replay broker.rs:126). `delivered_through` must advance for a chunk
  **only when its socket write returned `Ok`** — a failed write to a
  dying-but-not-yet-detached subscriber must leave the cursor at the last success
  (and may detach). Advancing past a failed write would make resume skip that
  chunk **forever**: at-least-once silently degrades to **at-most-once**, the
  exact failure class Q6 exists to kill. (Plan-wide: read "after a live-send" as
  "after a **successful** live-send.")
- **[doyle amendment 2 — replay tail, same root.]** `attach`'s replay sets
  `delivered_through` to the **last-successfully-written seq +1**, *not*
  unconditionally to the ring tail — a subscriber dying mid-replay must not mark
  the unsent tail delivered.
- **[doyle amendment 3 — monotonic.]** `delivered_through` **never decreases**: a
  deliberate `attach(sub, 0)` (the legacy spawn pre-attach at broker.rs:444, or
  any rewind-replay) re-sends chunks but advances the cursor by **monotonic max**
  only — so an attach-from-0 can never *reset* the resume cursor (a regression
  guard, one assert).
- A fresh `attach(sub, from_seq)` still honors the caller's `from_seq` for what to
  **replay** (the spawn auto-attach at `from_seq=0` is unchanged); only the
  *resume cursor* obeys the monotonic-max + Ok-only rules above.
  - *At-least-once boundary:* `delivered_through` is "successfully written to the
    socket," not "acked by the brain" — a resume re-sends from there, so the
    boundary chunk can repeat. That is the at-least-once contract (SPIKE-05);
    exactly-once would need a per-chunk ack the terminal stream doesn't make.
- **`SessionInfo` gains an additive `resume_seq`** carrying the log's
  `delivered_through`; `SessionsReply` surfaces it. **KH-2.3 forward/back
  tolerant, `#[serde(default)]`:**
  - **new brain × old broker (N-1 steady state):** old broker's `SessionsReply`
    has no `resume_seq` → serde defaults 0 → the new brain resumes from 0 (full
    replay = safe, dup-only) — never a parse reject.
  - **old brain × new broker:** the old brain reads `sessions()` for the endpoint
    only and ignores the extra field.
- **No change to the spawn/drive/exit paths** — only the resume-query surface
  grows. `KIND_SESSIONS` already exists (broker.rs:366); D4-1 enriches its reply.

Evidence: `[impl->REQ-HAZARD-BROKER-PROCESS-ISOLATION]` (broker custody of the
delivered cursor) · `[unit->REQ-HAZARD-BROKER-PROCESS-ISOLATION]` (cursor advances
on a **successful** live-send + on replay, **persists across detach**, is
**monotonic** under `attach(sub,0)`, is surfaced in `SessionsReply`; **and the
amendment-1 case — kill the subscriber socket mid-stream → the cursor freezes at
the last successfully-written seq → a fresh resume replays from there, nothing
skipped**) · `[unit->REQ-HAZARD-HANDOFF-ARGV-COMPAT]` (the additive `resume_seq`:
present, absent→default-0, ignored-by-old-reader). Both REQs are already
`[…impl,unit]` active — **no toml change**.

---

## D4-2 — Brain multi-session cold-start resume · Q6, REQ-DAEMON-2  ✅ DONE (08afcb9)

The cold brain reconstructs **all** sessions from the broker's cursors.

- **Per-session output cursors.** Add `session_cursors: HashMap<u64,u64>` to
  `Brain`. When populated (resume mode), `read_event`'s `KIND_OUTPUT` branch keys
  the dedup **per session_id** instead of the single `next_seq`. When empty (the
  legacy single-session spawn/drive path), it falls back to `next_seq`
  **byte-for-byte** — strict accept-next / drop-dup / **reject-gap** unchanged —
  so the many single-session tests (handoff/idempotent/attach) are untouched.
- **[doyle D4-2 discipline — dedup-below + snap-above, seeded from `resume_seq`;
  NO reject-gap.]** The resume-mode session cursor is **neither** the verbatim
  `net_cursors` `or_insert(ev.seq)` **nor** the strict `next_seq` gap-reject — both
  break one of the harness asserts:
  - *Pure `or_insert(ev.seq)`* (seed from first received) would seed the cursor
    **at** the broker's at-least-once boundary re-send and emit it → fails the
    **no-duplicate** assert.
  - *Seed `resume_seq` + strict reject-gap* would **gap-die** when the broker's
    ring-floor clamp replays `first-seq > resume_seq` after a long outage → fails
    the **clamp** case (the clamp bites brain-side: broker replay clamps naturally
    via `filter seq >= from_seq`, but a strict brain cursor rejects the floor-jump).
  - **The hybrid that satisfies both:** at `subscribe`, seed
    `session_cursors[id] = resume_seq`; on each `KIND_OUTPUT`, **drop `seq <
    cursor`** (dedups the boundary re-send → no-dup) and **accept + snap `seq >=
    cursor`** setting `cursor = seq + 1` (covers the contiguous case **and** the
    post-eviction floor-jump → clamp). **No reject-gap branch** for resume-mode
    session output — the broker's `Mutex<OutputLog>`-held replay cannot reorder,
    so a post-eviction floor-jump is the only legitimate forward jump and must be
    accepted, not rejected. Evicted tail lost only; live stream survives.
- **`Brain::resume_sessions(&mut self)`** — the cold-start continuity primitive:
  call `sessions()`, and for each returned `SessionInfo` `subscribe(session_id,
  resume_seq)` and seed `session_cursors[session_id] = resume_seq`. Replaces the
  master plan's stale "re-attach `sessions.first()` from 0" with "re-attach **all**
  from each session's broker cursor." Idempotent + bounded by the broker's hosted
  set; an empty set is a clean no-op.
- **`run_brain` calls it once** after `cold_start`, before the idle heartbeat
  loop (brainproc.rs:147-201). Today: zero hosted sessions → no-op. Forward: a
  brain restart re-attaches every daemon-hosted session gaplessly. The
  `start-reason` (D3-2) is **not** needed here — cold/crash/update all resume the
  same way from the broker (the broker is authoritative regardless of why the
  brain restarted).
- **Input/effects unchanged** — exactly-once still rides the broker-owned
  `EffectJournal` op-id dedup; resume only re-establishes the **output**
  subscription + cursor.

Evidence: `[impl->REQ-HAZARD-BROKER-PROCESS-ISOLATION]` + `[impl->REQ-DAEMON-2]`
(cold-start queries the broker and re-attaches **all** sessions in resume mode) ·
`[unit->REQ-HAZARD-BROKER-PROCESS-ISOLATION]` (a real multi-session harness:
broker + brain spawns **N≥2** real PTY sessions, produces output, the brain is
**hard-killed — no `snapshot()` taken** (doyle amendment 4: this same mode is
D4-3's falsifiable guard), a fresh cold-start brain `resume_sessions` re-attaches
**every** session from its cursor — all resume, **no duplicate** output, gapless,
child pids unchanged; **plus the clamp case (doyle watch-item): a `resume_seq`
below the broker ring floor after a long outage → resume succeeds, the brain
accepts forward from the floor, no gap-error, only the evicted tail lost**).
Both REQs already active — **no toml change**. The **int** process-level survival
re-point stays **D7**.

---

## D4-3 — Retire the `BrainState` message from production (confirm + lock) · Q6  ✅ DONE (35b925e)

The wire is already cut (D3-3); D4-3 makes "production reconstructs from the
broker, never from a brain→brain frame" **explicit, documented, and
falsifiable** — so a future change can't quietly re-introduce the frame.

- **Doc-tag the retirement.** Mark `BrainState` / `Brain::handoff` /
  `Brain::snapshot` as **test-only continuity** in their rustdoc: the production
  resume path is `cold_start` + `resume_sessions` (the broker is cursor-of-record,
  D4-1/D4-2); these types remain `pub` solely because the **integration tests**
  (separate test target — `#[cfg(test)]` would hide them) still drive the
  handoff-frame shape directly. Add the matching note to KNOWN-HAZARDS (the 2.4
  generation-custody entry already records the counter moved to the broker in
  D3-2; this closes its sibling — the *message* retires here).
- **Falsifiable guard = the D4-2 hard-kill harness (doyle amendment 4).** Rather
  than a soft code-shape assertion, the proof that production needs no frame *is*
  the D4-2 multi-session harness run in its strongest mode: **kill the brain hard
  (SIGKILL / `child.kill()` — no `snapshot()` possible)**, then a fresh
  `cold_start` + `resume_sessions` resumes every session gaplessly. A brain that
  was never given the chance to snapshot, yet resumes cleanly, demonstrates the
  `BrainState` frame is not on the production path. D4-3 then carries **no new
  machinery** — only the rustdoc/KNOWN-HAZARDS tags + the documented grep-clean
  close-out (zero non-test `Brain::handoff` callers).
- **No deletion.** `update.rs::apply_brain_only` + `Brain::handoff` stay for the
  existing handoff/idempotent/daemon_e2e/brain_swap tests — removing them is out
  of scope (they still prove the gapless-resume *mechanism* the broker replay
  relies on). D4-3 only asserts they are not on the production trigger.

Evidence: `[unit->REQ-HAZARD-BROKER-PROCESS-ISOLATION]` (production resume =
cold_start + broker query, never the handoff frame) + a doc note (KNOWN-HAZARDS +
brain.rs rustdoc). No new REQ; no toml change.

---

## Sequencing

D4-1 (broker delivered-cursor + `resume_seq` on the reply — the new persistent
surface) → **D4-2** (brain consumes it: per-session cursors + `resume_sessions`,
wired into `run_brain`) → **D4-3** (confirm + lock the `BrainState` retirement).
D4-1 before D4-2 because the brain resumes from the broker-supplied cursor; D4-3
last because it asserts the end-state both produce.

## N-1 compat — grow the D3-4 harness by one field

D4-1's `resume_seq` is the milestone's **second** additive versioned field (after
D3-2's argv `--generation`/`--start-reason`). Per the master plan ("the harness
grows one assertion per later additive field, D4–D6"), extend the D3-4
old-broker × new-brain scaffold with: **old broker's `SessionsReply` omits
`resume_seq` → new brain defaults to 0 (full-replay resume), never a deserialize
reject**; and the inverse (old brain ignores the new field). Scaffold-grow only —
it becomes the green CI gate at **D7**.

## Traceability — no toml activation in D4

| REQ | State entering D4 | D4 adds | Activation note |
|-----|-------------------|---------|-----------------|
| `REQ-HAZARD-BROKER-PROCESS-ISOLATION` | `[doc,impl,unit]` | impl + unit (delivered cursor, multi-session resume, retirement guard) | already active; **int → D7** |
| `REQ-DAEMON-2` | `int` (in-process, mis-evidenced) | impl (cold-start multi-session resume) | int re-point stays **D7** |
| `REQ-HAZARD-HANDOFF-ARGV-COMPAT` | `[impl,unit]` | unit (the additive `resume_seq` N-1 window) | already active |

No `required_stages` change lands in D4 (rule 5): every D4 REQ is already active,
and the REQ-DAEMON-2 / REQ-UPD-3 int re-points are the **D7** commit that lands
the process-level survival E2E (never two int evidences at once).
`traceable-reqs check` stays green at every commit.

## Risks / watch-items

- **Multi-session cursor regression on the legacy single path.** `read_event`'s
  output dedup is load-bearing for every existing handoff/idempotent test. The
  `session_cursors`-empty fallback to `next_seq` must be byte-identical — assert
  the legacy path is unchanged (run the full handoff/idempotent/attach suite as
  the regression gate, not just the new multi-session test).
- **`delivered_through` vs the ring bound — the clamp bites BRAIN-side (doyle).**
  The cursor can point at a seq the bounded ring (`DEFAULT_LOG_CHUNKS = 4096`) has
  already evicted if a brain is down long enough. The **broker** replay clamps
  *naturally* (`filter seq >= from_seq` starts at the ring floor when the cursor
  is below it). The hazard is on the **brain**: a strict `next_seq` cursor would
  see `first-replayed-seq > resume_seq` and **gap-reject** (brain.rs:343-346),
  killing the resume the clamp meant to save. The fix is the **dedup-below +
  snap-above** discipline (D4-2 above): seed the cursor at `resume_seq`, drop `seq
  < cursor`, accept-and-snap `seq >= cursor`, **no reject-gap** — so the
  floor-jump is accepted (evicted tail lost only) while the boundary re-send is
  still deduped. Assert this exact path in the D4-2 clamp test.
- **At-least-once is the contract, not a bug.** A resume that re-sends the
  boundary chunk is correct (SPIKE-05). Don't "fix" it into a per-chunk ack —
  that is a guarantee the terminal stream doesn't make and Q6 doesn't ask for.
- **Forward-correct, not pre-failed.** `run_brain` drives no PTY sessions today,
  so the resume path is harness-proven, not field-exercised, until the live-agent
  adapter lands. Keep the harness real (spawns actual PTY children, kills a real
  brain process) so the capability is genuinely tested, not a mocked stub.
- **Broker socket-bind flake on kitsubito** (DEFERRED): the new multi-session
  process harness adds real process-spawn tests — fix the bind determinism
  opportunistically (unlink-before-bind / per-test TempDir) rather than fighting
  flakes.

## Immediate next step

Start **D4-1**: add `delivered_through` to `OutputLog` (advance on live-send +
replay, persist across `detach_if`), thread `resume_seq` (additive,
`#[serde(default)]`) onto `SessionInfo`/`SessionsReply`, and tag the unit proof
that the cursor survives a detach. Then D4-2 (brain `session_cursors` +
`resume_sessions`, wired into `run_brain`).
