# Restoration D7 — proof + REQ re-pointing + fleet verify: the milestone close (task plan)

> **VET STATUS: APPROVED + GREENLIT (doyle, 2026-06-10).** All 5 open calls ruled
> (see §Resolved open calls); two corrected real mechanics in D7-1/D7-2 (the
> exe_hash breadcrumb shape + the N-1 pairing mechanics) — folded below.
> Sequencing, rule-5 choreography (re-point inside the D7-1 commit; one activation
> at D7-3 after both int legs), scope discipline (no capability, no wire field),
> and the risk register ratified as written. **GO on D7-1.** D1–D6 are DONE +
> cross-OS CI-green on main @ad15a1e. D7 adds **no new daemon capability** — it is
> the conformance close-out: it *proves* what D1–D6 built (the process-level
> survival E2E), turns the V6 N-1 scaffold into a real gate, re-points the
> regression-masked int evidence onto the new proof, activates the last hazard
> stage, and field-verifies the seamless update on the real rig.
>
> **The one additive production change D7 lands (doyle call-3 ruling):** an
> `exe_hash` field on the `brain.ready` breadcrumb (the "which bytes are resident"
> diagnostic — additive, tripwire-registered, pays rent far beyond the test: it is
> exactly the fact the v0.3.2/enlyzeam incident lacked). Everything else is tests +
> a gate + a toml line + a field run.

> Working doc for RESTORATION-PLAN.md **D7** (ADR-0018 V5, V6, §9). D1 (process
> split), D2 (loop relocation), D3 (supervision anchor + real brain-process update
> trigger), D4 (multi-session cold-start resume; `BrainState` message retired),
> D5 (durable absolute-deadline loop timing), D6 (readiness-gated auto-rollback)
> are DONE + cross-OS CI-green on main (@ad15a1e). The broker is the always-up
> per-machine anchor; it spawns + supervises a real `spt daemon brain` **child
> process**, respawns it from the executable path on every exit, gates an update
> brain on a generation-stamped ready signal, and auto-rolls-back a candidate that
> fails to come up. **What is missing is the conformance proof:** the int evidence
> for "a hosted endpoint survives a brain-PROCESS restart onto a swapped binary"
> still points at the *in-process* `Brain::handoff` shape (`brain_swap.rs`
> `[int->REQ-UPD-3]`, the M3b-B9 daemon E2E) — the very thing ADR-0018 calls
> regression-masked. D7 builds the real process-level proof and re-points the
> evidence onto it, scaffolds the V6 N-1 gate into a green CI gate, and verifies
> the whole thing on the live fleet.

## Goal (D7-close = milestone-close invariant)

The restoration milestone is **conformance-closed**: every REQ it touches is
evidenced by a test that proves the *real* property (process-level survival, not
the in-process shape), the N-1 verb window is a CI gate, and a real seamless
update is field-proven on the rig with no manual bounce. Concretely at D7-close:

- **Process-level survival is an `int` E2E [V5].** A PTY child + a live QUIC
  connection survive a brain-**process** restart that lands on a **swapped
  on-disk binary** — the PTY child's pid is unchanged, its output stream gapless,
  the QUIC conn intact, and the post-restart brain is demonstrably running the
  *new* bytes (not the old). This is the test that proves what `brain_swap.rs`
  could not: the swap is a real process respawn from a swapped binary, not an
  in-process closure relaunch.
- **The regression-masked evidence is re-pointed.** `REQ-DAEMON-2` and
  `REQ-UPD-3`'s `int` tags move off the in-process shape
  (`brain_swap.rs`/M3b-B9) **onto** the D7-1 process-level E2E, in the same
  commit the new E2E lands — never double-evidenced, never a gap.
- **The N-1 verb surface is a green CI gate [V6].** The single-binary argv
  scaffold (`n1_compat.rs`) grows into a real **old-broker × new-brain** exercise
  — a pinned prior `spt` binary supervises the current brain across the whole verb
  surface — gating both runners. KH-2.3 can no longer regress silently.
- **The last hazard stage activates.** `REQ-HAZARD-BROKER-PROCESS-ISOLATION`
  gains `int` (the D7-1 E2E + the D7-2 N-1 gate), the only `required_stages`
  change in D7. `traceable-reqs check` stays EXIT=0 at every commit.
- **The seamless update is field-proven [§9].** A restoration+1 test release is
  published and `apply`'d on the live rig (kitsubito + hfenduleam); the running
  brain pid changes / new code goes live **with no manual daemon bounce**, the
  daemon stays healthy — the acceptance the v0.3.2 roll could not get for free,
  and the regression this whole milestone exists to kill, observed fixed in the
  field. This is the **last release that needs a manual bounce** made true.

## What is already satisfied / true (don't re-build, don't mis-scope)

- **The process boundary is already proven minimal (D1, `brain_split.rs`).** The
  real `spt daemon run` broker process spawns a real `spt daemon brain` child;
  killing the brain leaves the broker (the anchor) alive and the supervisor
  respawns it (a new pid in the generation-stamped `brain.ready`). D7-1 does
  **not** re-prove the bare boundary — it adds the **held endpoints** (PTY child
  + live QUIC) and the **swapped binary** on top.
- **The in-process endpoint-survival shape is proven (M3c, `brain_swap.rs`).** A
  brain-only update via the engine swaps logic with the hosted PTY child's pid
  unchanged + gapless replay — but via an **in-process `Brain::handoff` closure**,
  not a process respawn. This is the shape D7-1 **supersedes**: same invariant,
  real process boundary + real on-disk swap. Keep `brain_swap.rs` (it still proves
  the engine's brain-only classification + the handoff substrate); only its
  `[int->REQ-UPD-3]` *tag* moves to D7-1.
- **The QUIC-survival shape is spiked (SPIKE-03).** SPIKE-01 proved PTY + plain
  TCP survival across a brain restart; SPIKE-03 closed the open gap — QUIC stream
  survival (the process-local crypto/stream state TCP doesn't model). D7-1
  productionizes the SPIKE-01 PTY half + the SPIKE-03 QUIC half **together**, in
  one int test, in production topology. The throwaway spike binaries
  (`../spt-spikes/…`, single binary with `broker`/`brain <ver>`/`child`/`client`
  modes) are the *shape* to productionize — not code to import.
- **The supervisor's binary selection is already injectable + record-driven (D6).**
  `supervise_brain` takes an injected `spawn_child` closure; `spawn_brain_child`
  spawns `current_exe()` or a record-supplied path (`brainproc.rs`, the
  D6 `AppliedPending{rollback_binary}` / `RolledBack{rollback_binary}` selection).
  D7-1 reuses this seam to point the respawn at a **test-controlled on-disk path**
  whose bytes it swaps — no new supervisor surface.
- **The loopback QUIC harness exists (`netbroker.rs`).** Two in-process `Broker`
  + `NetHost` endpoints dial over loopback and survive a brain restart
  exactly-once, single-host, no `SPT_TWO_HOST` env. D7-1's QUIC half is a
  one-broker extension of this — single-host feasible, no external machine.
- **The V6 scaffold exists (`n1_compat.rs`).** It already proves the current
  brain binary accepts **both** the bare old-broker argv (defaults) and the
  stamped new-broker argv without a clap rejection — `[unit->REQ-HAZARD-HANDOFF-
  ARGV-COMPAT]`. D7-2 grows it from "argv parses" to "a real old broker
  supervises a real new brain across the verb surface." The seam is built; D7
  makes it a gate.
- **The fleet + deploy procedures are documented + memory-backed.** kitsubito
  (`ssh reavus@kitsubito`, user systemd unit, deploy-from-local-runner) and
  hfenduleam are the live 2-node rig; enlyzeam is **still <0.3.2** and is
  excluded from the D7 verify (note it, don't block on it). Push→CI **before**
  tagging any release (the v0.3.0 lesson — the single-home dev box is blind to
  the spawn/socket topology; CI on both runners is the only real cross-OS check).
- **No new daemon capability + no new IPC wire field in D7.** The ready signal is
  a disk breadcrumb, the applied record is disk state, the generation/start-reason
  ride D3-2's already-shipped argv field. D7 adds **tests + a gate + toml
  activation + a field run** — zero production daemon-logic change (a test-only
  spawn surface, if §Open-call-1 lands that way, is `#[cfg(test)]`/hidden, not the
  frozen api surface).

## Per-commit discipline

Each sub-task is its own atomic commit with evidence tagged in-commit. Gates
every commit: `cargo build` · `cargo test` · `cargo clippy` · `cargo build
--no-default-features` · `traceable-reqs check` (EXIT=0) · `xtask check`. Push to
a dev-freeform branch → **CI both runners** before any tag. D7 is *specifically*
the milestone where the real process-spawn + cross-OS + cross-binary tests run —
the dev box cannot prove them. The `[twohost]` rig rungs gate D7-4; the AF_UNIX
`served_broker` bind flake on kitsubito (DEFERRED 2026-06-09) is in D7-1/D7-2's
blast radius — fix the bind determinism (unlink-before-bind / per-test TempDir)
opportunistically here rather than fighting flakes through the close-out.

---

## D7-1 — Process-level survival E2E: PTY child + live QUIC across a brain-PROCESS restart onto a swapped binary · V5

The keystone of D7 and the one real design fork (see §Open calls 1–3). This is
the test that earns the re-point: it proves *process-level* survival onto a
*swapped binary*, the property the in-process `brain_swap.rs` only approximates.

### The faithful-model decision (the crux)

There is no production path that drives a daemon-hosted PTY session through the
real `spt daemon run` — the brain child connects, resumes (a near-noop today, no
hosted sessions), and idles; nothing makes it spawn a session or dial a peer on
its own (the D4/D5/D6 "forward-correct, not field-exercised" posture). So a
fully-external "`spt daemon run` hosts a PTY that survives" E2E is not reachable
today without **adding a spawn/dial surface to the frozen daemon** — which D7 must
not do.

The honest, faithful model (doyle call-1 FAITHFUL, ratified — the milestone's
proof is a **TRIANGLE**, and D7-1's test rustdoc must name all three legs so the
`int` tag is honest on its face):

- **Leg (a) — the real-broker-process leg = D1 `brain_split.rs`.** The REAL `spt
  daemon run` broker *process* survives a brain kill + respawns it. Already
  proven; D7-1 **cites it** as the process-packaging leg.
- **Leg (b) — the survival leg = D7-1 itself.** Real brain *process* + real
  on-disk swap + real broker-**held** endpoints. The broker is modeled in-process
  — **faithful, not a weakening**: the broker is the *never-restarting anchor by
  design*, so an in-process `Broker` runs *the same Broker code holding the same
  resources* the real broker would; the claim under test is the brain process +
  swapped binary + broker-held endpoint survival, all real here. (Bonus: in-process
  broker gives the test **direct access** to cursor/ring/pid for assertions.)
- **Leg (c) — the field leg = D7-4.** Real daemon, real swap, real hardware.
- **DEFERRED upgrade (record, don't build):** when the live-agent adapter makes
  daemon-hosted sessions real, a *full-stack* single-process E2E (real `spt daemon
  run` hosting + surviving) becomes reachable — a `docs/DEFERRED.md` note, **not**
  D7 work.

Mechanics (doyle call-2 — the **real `spt daemon brain` entry** is the supervised
child, not a fixture; it is drivable today):

- **The brain is the real `spt daemon brain` child process** the **production
  `supervise_brain`** spawns from an on-disk path the test controls. The test sets
  `SPT_HOME` to the temp home; the injected `spawn_child` execs `P daemon brain
  --generation N --start-reason …` (env inherited per `spawn_brain_child`'s own
  posture); `run_brain` `connect_retry`s the test broker socket; `resume_sessions`
  re-attaches the test-established session. **This makes D7-1 the FIRST
  process-level exercise of the D4 cursor-of-record resume with N≥1 hosted
  sessions** — that alone justifies the real entry over a fixture.
- **The swap is real:** the test replaces the on-disk binary bytes at the spawn
  path between generations (fixture B = the same binary with **padding bytes
  appended** — PE and ELF both tolerate trailing data; the hash flips, behavior is
  identical); the supervisor respawns from the swapped path.
- **A test drive** spawns the PTY session into the broker + opens the QUIC conn
  **once** (before the swap), then **drops its client hard** (the D4 hard-kill
  posture — the outgoing side is gone before the new one starts); the supervised
  real brain child **resumes** the hosted session via the D4 path. Assertions read
  the **in-process broker's own state** (cursor/ring/pid) + the QUIC marker.
- **Fixture-brain fallback only if genuinely blocked**, with the gap named — not
  expected: every piece (connect, resume, ready-stamp post-resume) already exists
  in the real entry.

### Test choreography (recommended shape)

1. Bind a test-owned `Broker` on an isolated socket + a `NetHost` loopback peer;
   spawn their serve threads. (The broker = the faithful stable kernel.)
2. Spawn a long-lived PTY echo child into the broker (`cat`/`findstr`, the
   `brain_swap.rs` `echo_spawn_req` fixture); record its pid. Open a QUIC conn
   through `NetHost` to the loopback peer; send a marker the child/peer never
   reads back until after the swap.
3. Drive the **production `supervise_brain`** to spawn generation N as the **real
   `spt daemon brain` child process** from a test-controlled on-disk path `P`
   (fixture A). The child connects to the test broker socket, resumes the hosted
   session, writes its `brain.ready` (now carrying `exe_hash`, below).
4. **Swap:** write fixture B over `P` (fixture B = `P` with **appended padding
   bytes** — identical behavior, flipped hash), trigger a brain restart
   (`request_brain_restart` / kill the child). The supervisor respawns generation
   N+1 **from the swapped `P`**.
5. Assert the survival invariant: **(a)** the PTY child's pid is **unchanged**
   (broker held it across the brain process death); **(b)** the QUIC conn is still
   open and the pre-swap marker is readable post-swap (broker-held `NetHost`
   survived); **(c)** the post-respawn `brain.ready` `exe_hash` **== fixture-B hash
   != fixture-A hash**, proving new *bytes* ran, not a relaunch of the old (the
   property that distinguishes this from `brain_swap.rs`'s in-process closure);
   **(d)** gapless: output produced across the swap window replays (the broker ring
   + cursor-of-record). All reads hit the **in-process broker's own state**.
6. Teardown: kill child + brain, assert reaping.

### The `exe_hash` breadcrumb (the one additive production change — doyle call-3)

The "new bytes ran" proof is a **production breadcrumb**, not a test-only marker:
add `exe_hash` as an additive field on the `brain.ready` JSON (alongside
`{pid, generation}`), **written always** by the real brain post-resume (one
`current_exe` read + hash per brain start — negligible). Why production, not test:
(i) one binary, no double-build — the test's fixture B is just `P` + appended
padding (hash flips, behavior identical); (ii) the supervised child is the real
entry (call 2), so the sentinel **must live in real code anyway**; (iii) this is
the "which bytes are resident" diagnostic that would have caught the enlyzeam
regression on day one — it pays rent far beyond the test (and upgrades D7-4's
field assert, call 5). **Register `exe_hash` in the D6-3 `PRE_READY_DURABLE_FILES`
tripwire set** — `brain.ready` is already a tracked pre-ready durable file
(`rollback_compat.rs`); the new field is additive / N-1-readable by construction
(an N-1 brain ignores it), so the tripwire stays green and owns the new field from
birth. This is the **only** production-code change in D7; it is additive durable
state, same posture as the D6 two-phase record.

### Why each assertion is load-bearing

- **(a) pid unchanged** is the `REQ-UPD-3` absolute — the broker, not the brain,
  owns the PTY master, so a brain process death cannot reach the child.
- **(c) version marker flipped** is the **new** thing D7 proves and the reason the
  re-point is honest: `brain_swap.rs` relaunches an in-process closure (same
  process image); D7-1 proves the supervisor exec'd *different bytes from disk* —
  exactly `update.rs:233-234`'s "exec the new binary's brain" made real, the
  silent regression (apply swaps disk, never runs new code) caught by a test.
- **(b) QUIC survives** closes SPIKE-01's highest remaining risk (codex FATAL #2):
  QUIC carries process-local crypto/stream state plain TCP doesn't; broker-held
  endpoint survival must be proven for the *real* transport, not a TCP proxy.

Evidence: `[int->REQ-HAZARD-BROKER-PROCESS-ISOLATION]` (process-level survival:
PTY child + live QUIC across a brain-PROCESS restart onto a swapped binary —
endpoints broker-held, new bytes proven to run via `exe_hash`) · `[int->REQ-UPD-3]`
(re-pointed here from `brain_swap.rs`: no endpoint terminates across the brain-only
update, at the process level) · `[int->REQ-DAEMON-2]` (re-pointed here: the
broker/brain *process* split delivers the seamless update) · `[impl->REQ-HAZARD-
ROLLBACK-STATE-COMPAT]` (the `exe_hash` field added to `brain.ready` + registered
in `PRE_READY_DURABLE_FILES`) · `[unit->REQ-HAZARD-ROLLBACK-STATE-COMPAT]` (the
tripwire asserts the new `exe_hash` field is additive / N-1-readable). The
re-point of REQ-DAEMON-2 / REQ-UPD-3 lands **in this commit** (rule 5 — never leave
them evidenced by the old in-process test once the new test exists);
`REQ-HAZARD-BROKER-PROCESS-ISOLATION`'s `int` *activation* lands in D7-3 (after
D7-2's gate also exists). REQ-HAZARD-ROLLBACK-STATE-COMPAT is already
`[doc,impl,unit]`-active (D6-3) so the added `exe_hash` evidence needs **no toml
change**. `traceable-reqs check` stays green at each commit.

---

## D7-2 — N-1 verb-surface compat: old-broker × new-brain as a green CI gate · V6

Grow the `n1_compat.rs` scaffold from "the current brain's argv parses both
shapes" into the real steady-state proof: a **pinned prior `spt` binary acting as
the broker** spawns + supervises the **current brain** over the socket, exercising
the verb surface. Steady state after every routine update is new-brain ×
old-broker (the broker almost never updates) — KH-2.3, "the single most
update-frequency-sensitive invariant."

- **The pin = the restoration split boundary, NOT a pre-restoration release
  (doyle call-4a).** A pre-restoration binary (v0.3.2 and earlier) has **no brain
  supervisor** — that absence *is* the regression — so it cannot pair at all. Pin
  the **D1 skeleton ref `0c95435`** (the first commit where the broker spawns a
  brain child). D1 also gives the **wider** N-1 window: the old broker spawns its
  own brain with **bare argv** (pre-D3-2, no `--generation`/`--start-reason`),
  which `n1_compat` already proves the current brain parses — so the cross-binary
  exercise spans the whole argv evolution. **Name the pin + the advance rule
  in-test:** *"the pin advances to the latest PUBLISHED split-era release once one
  exists"* — so post-v0.4.0 the gate automatically tests the **real fleet N-1**,
  not a forever-frozen D1 ref. CI builds the pinned ref; a checked-in prebuilt
  binary is rejected (stale, cross-OS, opaque).
- **Pairing mechanics: TEST-LAUNCH the new brain at the old broker's socket — do
  NOT contort D6 record selection (doyle call-4b).** The old broker's
  `spawn_brain_child` execs **its own** `current_exe` — an old broker will *never*
  spontaneously spawn the new brain, and bending D6's record-driven selection into
  a test hack tests the wrong thing. **The compat surface is the SOCKET VERBS +
  argv, not supervision custody.** Shape: start the old `spt daemon run` (real
  process — it spawns its own old brain, fine; or brain-suppressed if the ref
  allows), then **test-launch the current-tree `spt daemon brain` against the old
  broker's socket/home** and exercise the verb surface: `hello`/`classify`,
  `net-status`, `sessions` → `resume_seq`-absent → serde-default-0, `subscribe`,
  `input`. **Assert:** no clap/arity reject, no verb desync, stable heartbeats.
  *That* is new-brain × old-broker steady state, honestly.
- **It becomes a gate on both runners.** The exercise runs in CI (ubuntu +
  windows); a verb signature regression (a non-additive change) **fails the gate**.
  This is the backstop the master plan's risk register names: "a breaking verb
  signature change is a broker-breaking update class — must be refused, not
  silently shipped."
- **Fix the broker socket-bind flake while here.** The AF_UNIX `served_broker`
  bind flakes on kitsubito (DEFERRED 2026-06-09); D7-1 + D7-2 add real
  process-spawn broker tests on that runner — fix bind determinism
  (unlink-before-bind / abstract namespace / per-test TempDir) opportunistically so
  the new gates are not flaky from birth.

Evidence: `[int->REQ-HAZARD-BROKER-PROCESS-ISOLATION]` (the new-brain × old-broker
verb-surface N-1 window holds across a real cross-binary pairing — the second
half of the hazard's `int`, alongside D7-1's survival E2E) · the existing
`[unit->REQ-HAZARD-HANDOFF-ARGV-COMPAT]` argv scaffold is **kept** (the cheap
fast-feedback check) and the cross-binary exercise is its `int`-grade superset.
**Tag ruling (doyle call-4):** do **NOT** add `[int->REQ-HAZARD-HANDOFF-ARGV-
COMPAT]` — that REQ's `required_stages` is `[impl,unit]`, and adding an `int` here
would be a second activation, breaking D7's one-activation discipline. Instead land
**one toml COMMENT line** on that REQ noting "the D7-2 gate exercises this
int-grade (cross-binary new-brain × old-broker)"; formalizing the `int` stage, if
ever wanted, is its own later activation commit.

---

## D7-3 — Re-point regression-masked evidence + activate the last hazard stage · §9 traceability

The one toml `required_stages` change in D7 (rule 5: activate at the commit that
delivers the evidence — here, after D7-1 + D7-2 both land their `int` evidence).

- **Re-point REQ-DAEMON-2 + REQ-UPD-3 `int`** off the in-process shape onto the
  D7-1 process-level E2E. *Mechanically this is the tag move inside D7-1's commit*
  (so the requirement is never momentarily un-evidenced); D7-3 is where the **toml
  comments** are updated to name the new evidence and the audit is confirmed clean.
  Both REQs already carry `int` in `required_stages` — re-pointing moves the tag
  *location*, not the stage set, so `traceable-reqs check` stays green throughout.
- **Activate `REQ-HAZARD-BROKER-PROCESS-ISOLATION` `int`.** Set
  `required_stages = ["doc","impl","unit","int"]` (was `["doc","impl","unit"]`),
  with `int` evidence = the D7-1 survival E2E + the D7-2 N-1 gate. This is the
  **only** `required_stages` change in D7. It lands in the commit *after* D7-1 +
  D7-2 (both `int` tags present) so the check is green at the activating commit.
- **Confirm `REQ-HAZARD-ROLLBACK-STATE-COMPAT` needs nothing in D7** — its
  `doc`/`impl`/`unit` all activated at D6-3; D7 does not touch it (it has no `int`
  stage by design — the tripwire is a unit guard).
- **Audit the whole restoration REQ set clean** at the close: every restoration
  REQ (`REQ-DAEMON-2`, `REQ-UPD-3`, `REQ-HAZARD-BROKER-PROCESS-ISOLATION`,
  `REQ-HAZARD-ROLLBACK-STATE-COMPAT`, plus the D1–D6 impl/unit tags) is evidenced
  by the *correct* test, no in-process leftovers. `traceable-reqs check` EXIT=0.

Evidence: the toml diff itself (the `int` activation + the re-point comment
updates) + a green `traceable-reqs check`. No new code; this is the traceability
ledger reconciliation. *(Per the master plan's traceability table, this is the
"Activated at D7" row for `REQ-HAZARD-BROKER-PROCESS-ISOLATION` int and the
"re-point" rows for REQ-DAEMON-2 / REQ-UPD-3.)*

---

## D7-4 — Fleet verify + milestone close-out · §9

The field acceptance the whole milestone exists for, then close the milestone.

- **Publish a restoration+1 test release.** Build current main (post-D7-1/2/3,
  CI-green both runners) → `release-keygen/sign/publish` a version above the live
  fleet floor (fleet is 0.3.2; the restoration line tags above it). Signing stays
  **manual + local** (the M6 posture; seed via password-manager CLI).
- **Apply on each live node, observe the seamless swap.** On kitsubito +
  hfenduleam (the 2-node rig; **enlyzeam excluded from the verify — still <0.3.2**;
  see the close-out item for its last manual bounce): `spt update apply` → assert
  **the running brain pid changes / new code goes live with no manual daemon
  bounce**, and the daemon stays healthy (`spt daemon status` → running, before
  *and* after).
- **The hardest field assert: `exe_hash` before/after (doyle call-5).** Read
  `brain.ready` `exe_hash` on each node **before** and **after** apply; it must
  **flip AND match the release artifact's hash**. This is a *harder* fact than
  pid-change — a pid-change proves *a* restart happened; the **hash-match proves
  THE BYTES** the new brain is running are the released artifact, not a stale
  resident. It is **exactly the assert the v0.3.2 / enlyzeam incident lacked**
  (binary on disk, old code resident, the `\r`-corruption persisting for ~a day).
  Capture the before/after `exe_hash` + pid-change + version-bump as the evidence
  appendix (the `docs/TWO-HOST-RUNBOOK.md` restoration rung).
- **Endpoint-survival scope (§Open-call-5).** The fleet daemons host **no PTY
  sessions** today (forward-correct posture), so the field run proves *pid
  changes / new code live / daemon healthy* — the seamless-swap mechanism — while
  the *endpoint-survival* guarantee is carried by D7-1's E2E (which has real
  held endpoints). Recommended: do **not** stand up a synthetic hosted workload on
  the fleet just to assert survival; the E2E owns that and the field run owns
  "the swap actually runs new code on real hardware, no bounce." Flag for doyle.
- **Push→CI both runners BEFORE tagging** (binding — the daemon-service-detection
  lesson: v0.3.0 was tagged early and CI caught 2 regressions; retag cost real
  time). The `[twohost]` rig run is the gate.
- **Milestone close-out.** Mark the restoration ✅ in ROADMAP.md (the "🔜 NEXT
  milestone" entry → delivered, with the field-run reference); confirm
  KNOWN-HAZARDS 6.7/6.8 are fully covered (the int stage closes 6.7); confirm the
  D6 open-call DEFERRED rows (current_exe reconcile residual, operator `--force`
  re-apply) are landed in `docs/DEFERRED.md`. The restoration is the **last
  release needing a manual bounce** — record that the next milestone
  (`spt-claude-code`) inherits the final topology.
- **Footnote task — enlyzeam: the LAST manual bounce in the project's history
  (doyle addition, suggest-not-block).** enlyzeam is correctly excluded from the
  verify (v0.3.2 on disk, v0.3.0 resident — open thread #1). But D7-4 is the
  natural moment to perform its catch-up bounce: bring enlyzeam onto a split-era
  binary so it joins the seamless era, and **close the open thread in the very
  milestone that makes manual bounces obsolete**. After this, no node ever needs a
  manual bounce again.

Evidence: the field-run appendix (pid-change / version-bump / daemon-healthy on
both nodes, `docs/TWO-HOST-RUNBOOK.md`) + the ROADMAP/KNOWN-HAZARDS close-out doc
edits (`xtask check` drift-clean). No REQ stage change here (D7-3 did the
activation); this is the milestone-acceptance + docs reconciliation.

---

## Sequencing

D7-1 (the process-level survival E2E — the keystone; re-points REQ-DAEMON-2 /
REQ-UPD-3 `int` *in the same commit* it lands) → **D7-2** (grow the N-1 scaffold
into the old-broker × new-brain green CI gate) → **D7-3** (the one toml
activation: `REQ-HAZARD-BROKER-PROCESS-ISOLATION` `int`, now both D7-1 + D7-2 int
evidence exist; reconcile the traceability ledger) → **D7-4** (publish a
restoration+1 test release, field-verify the seamless swap on the rig, close the
milestone). D7-1 first because it is the evidence everything else references; D7-2
before D7-3 because the hazard's `int` activation needs *both* the survival E2E
**and** the N-1 gate present; D7-4 last because it needs a tagged release built
from a CI-green main.

## N-1 compat — D7 is where the scaffold becomes a gate

D7 adds **no new IPC wire field** (consistent with D5/D6). The D7-2 work does not
*grow* the verb surface — it *exercises* the existing surface across a real
binary version gap and makes the existing V6 scaffold a gate. The versioned
surfaces D7 asserts are the ones already shipped (D3-2's spawn-time argv field +
the D2 verb set); D7 proves the N-1 window over them holds cross-binary, where
D2/D3-4 only asserted argv-parse tolerance single-binary.

## Traceability — one toml activation in D7 (D7-3)

| REQ | State entering D7 | D7 adds | Activation note |
|-----|-------------------|---------|-----------------|
| `REQ-HAZARD-BROKER-PROCESS-ISOLATION` | `[doc,impl,unit]` | **int** (D7-1 survival E2E + D7-2 N-1 gate) | **activated at D7-3** → `[doc,impl,unit,int]` (the one D7 stage change) |
| `REQ-UPD-3` (no endpoint drop, brain-only) | `int` (in-process `brain_swap.rs`) | re-point `int` → D7-1 process-level E2E | already `int`; **tag moves in the D7-1 commit**, no stage change |
| `REQ-DAEMON-2` (broker/brain split, seamless update) | `int` (in-process shape) | re-point `int` → D7-1 process-level E2E | already `int`; **tag moves in the D7-1 commit**, no stage change |
| `REQ-HAZARD-ROLLBACK-STATE-COMPAT` | `[doc,impl,unit]` | nothing (no `int` by design) | untouched in D7 (D6-3 closed it) |

Rule 5: the **only** `required_stages` change in D7 is
`REQ-HAZARD-BROKER-PROCESS-ISOLATION` → adding `int`, landed in the D7-3 commit
after D7-1 + D7-2 deliver the `int` evidence. The REQ-DAEMON-2 / REQ-UPD-3
re-point is a tag *relocation* (same stage set) inside the D7-1 commit — never
leave them evidenced by the old in-process test once the new test exists.
`traceable-reqs check` stays green at every commit.

## Risks / watch-items (baked in for the vet)

- **The faithful-model line (§Open-call-1) is the whole credibility of D7-1.** If
  the broker-in-process model is judged to weaken the "process-level" claim, the
  re-point is not honest and the test must instead drive the real `spt daemon run`
  — which forces adding a spawn/dial surface to the frozen daemon (out of the
  milestone's "internals only, M8 surface frozen" scope). The recommendation is
  that the in-process broker is *faithful* (the broker is the never-restarting
  anchor by design; the brain process + swapped binary + endpoint survival are all
  real) — but this is doyle's call to ratify, not mine to assume.
- **"New bytes ran" must be provably distinct from a relaunch (§Open-call-3).**
  The single thing separating D7-1 from `brain_swap.rs` is that the respawn ran
  *different bytes from disk*. If the swapped binary is byte-identical or the
  version marker is not observable post-respawn, the test proves nothing new.
  The marker mechanism (a version-stamped brain fixture, two builds; or a self-exe
  hash sentinel) must be airtight and is a sub-decision.
- **The real-`spt daemon brain`-entry vs test-fixture-brain tension
  (§Open-call-2).** The real brain entry resumes-and-idles; it does not spawn a
  session or dial a peer on its own, so *something* test-side must establish the
  endpoints and the supervised child must *resume* them. If the supervised child
  is a test fixture (not the real `spt daemon brain`), D7-1 proves supervisor +
  broker survival but not the *real* brain entry's resume path. Trade-off to rule.
- **N-1 gate flakiness + the old-broker source (§Open-call-4).** A pinned-ref
  build in CI adds build time + a moving "what is N-1" definition; the AF_UNIX
  bind flake on kitsubito will bite the new process-spawn tests if not fixed here.
  Both are named so they are addressed, not discovered.
- **Fleet verify is a real-hardware roll — treat it like a release.** Push→CI
  both runners before tagging; the test release tags *above* the 0.3.2 floor;
  enlyzeam is out of scope (still <0.3.2). A botched field roll on the user's
  daily-driver fleet is the exact harm the careful-rollout posture exists to
  avoid — verify on kitsubito (the non-daily rig) shape first.
- **Don't expand scope into capability.** D7 is conformance close-out: tests,
  a gate, a toml line, a field run, docs. Resist adding the deferred hosted-session
  spawn surface, the alarm port, or the current_exe reconcile — they are DEFERRED
  rows, not D7 work. If D7-1 *needs* a spawn surface, it is `#[cfg(test)]`/hidden,
  never the frozen api/CLI surface.

## Resolved open calls (doyle vet, 2026-06-10)

1. **In-process broker faithful, or drive the real `spt daemon run`?** —
   **RESOLVED: FAITHFUL, ratified.** The proof is a TRIANGLE; the in-process broker
   runs the same `Broker` code holding the same resources the real broker would, so
   it does not weaken the claim. **Condition: D7-1's test rustdoc must name all
   three legs** (D1 `brain_split.rs` = real-broker-process leg · D7-1 = survival leg
   · D7-4 = field leg), citing `brain_split.rs`, so the `int` tag is honest on its
   face. DEFERRED note: a full-stack single-process E2E becomes reachable when the
   live-agent adapter makes daemon-hosted sessions real.
2. **Who spawns the endpoints; must the supervised child be the real entry?** —
   **RESOLVED: the REAL `spt daemon brain` entry**, drivable today (SPT_HOME=temp,
   injected `spawn_child` execs `P daemon brain --generation N --start-reason …`,
   `run_brain` connect-retries, `resume_sessions` re-attaches). This makes D7-1 the
   **first process-level exercise of the D4 cursor-of-record resume with N≥1 hosted
   sessions**. Test drive establishes PTY+QUIC once, drops its client **hard** (D4
   posture); the real child resumes; assert from the in-process broker's own state.
   Fixture-brain fallback only if genuinely blocked, gap named (not expected).
3. **"New bytes ran" proof mechanism?** — **RESOLVED: self-exe-hash sentinel,
   STRENGTHENED into a production breadcrumb.** Add `exe_hash` as an additive field
   on `brain.ready` (alongside `{pid, generation}`), written **always**, registered
   in the D6-3 tripwire set. Fixture B = the same binary with **appended padding
   bytes** (PE+ELF tolerate trailing data; hash flips, behavior identical). One
   binary, no double-build; lives in real code (call 2); the "which bytes resident"
   diagnostic that would have caught enlyzeam day one. Assert: post-respawn
   `exe_hash` == fixture-B hash != fixture-A hash.
4. **N-1 old-broker source for the gate?** — **RESOLVED: pinned prior ref, with
   two corrections.** (a) The pin is **NOT a pre-restoration release** (no
   supervisor → cannot pair) — pin the **D1 skeleton `0c95435`** (widest argv
   window: bare vs stamped); name the advance rule ("pin advances to the latest
   PUBLISHED split-era release once one exists"). (b) **Pairing = test-launch the
   new brain at the old broker's socket** — do NOT contort D6 record selection; the
   compat surface is SOCKET VERBS + argv, not supervision custody. **Tag:** do NOT
   add `[int->REQ-HAZARD-HANDOFF-ARGV-COMPAT]` (keeps one-activation discipline) —
   a toml comment notes the int-grade exercise instead.
5. **Fleet-verify endpoint-survival scope?** — **RESOLVED: no synthetic fleet
   workload.** The E2E owns endpoint survival; the field run owns "new code actually
   runs, no bounce." **STRENGTHENED** with call-3's breadcrumb: read `brain.ready`
   `exe_hash` before/after apply on each node — it must flip AND match the release
   artifact's hash (proves THE BYTES, not just a restart — the assert enlyzeam
   lacked).

## Immediate next step

On greenlight, start **D7-1**: stand up the test-owned `Broker` + `NetHost`
loopback peer + a hosted PTY echo child + a live QUIC conn, drive the production
`supervise_brain` to spawn a real brain child from a test-controlled on-disk path,
swap the bytes (per the §Open-call-3 mechanism), respawn, and assert the four
survival invariants (pid-unchanged · QUIC-intact · new-bytes-ran · gapless). Land
the re-point of REQ-DAEMON-2 / REQ-UPD-3 `int` onto this test **in the same
commit**. Do **not** add a spawn/dial surface to the frozen daemon (test-only
seams are `#[cfg(test)]`/hidden); do **not** add an IPC wire field (D7 grows
none).
