# v0.4.2 plan — Linux brain-respawn-path fix + promotion bytes-gate

> Build plan for the v0.4.1 fleet-roll blocker. Single-fix release. Same loop:
> plan → doyle sign → execute (per-task gates) → CI both runners → deployah
> publishes (counter 9). Restoration close is GATED on this + a post-fix Linux
> seamless observation (debug-channel rollout).

## Context — the blocker (2026-06-11)

v0.4.1 (counter 8) published; the fleet roll is the restoration D7 no-bounce
acceptance.

- **hfenduleam (Windows): GREEN.** Brain pid rolled, gen 0→1, broker pid held,
  `exe_hash` flipped to the published v0.4.1 bytes, pump wedge healed — seamless,
  no manual bounce.
- **kitsubito (Linux): RED.** `apply` swapped the canonical binary to v0.4.1
  (`/home/reavus/.local/bin/spt` == `35c9544c…`, the published linux artifact)
  and recorded `applied:8` — **but the respawned brain execs `spt.old-8`
  (`b4b7ea14…` = v0.4.0).** New code is NOT running; the record is optimistically
  wrong (the enlyzeam-class record/reality divergence, now provable via `exe_hash`).

**Root cause (confirmed in code + by `doyle`):** `brainproc.rs:817`
`spawn_brain_child` resolves the candidate binary (`binary = None`) via
`std::env::current_exe()` **per spawn**. On Linux `current_exe()` =
`readlink(/proc/self/exe)` is **inode-tracking** and follows the `apply` rename
(`spt` → `spt.old-N`), so the resident broker respawns the brain onto the OLD
binary. Windows `GetModuleFileName` is the **path string at process start**, so
it stays canonical — which is why Windows is green. ADR-0018 Q3
("auto-respawns from the executable path — now the new binary") silently assumed
path-string semantics; **that assumption is the bug.**

**Second defect (the false-success hole):** promotion gates on **readiness only**
(`run_trial` → `Promoted` → `record_promoted`). A brain that comes up on the
WRONG bytes still signals ready, so the trial promoted and recorded `applied:8`
over a v0.4.0-bytes brain. Readiness ≠ new-bytes.

**E2E gap:** `brain_survive.rs` models the swap as a **path-flip** between two
distinct fixtures (`fixture_a` → `fixture_b`) through the injected spawn closure
— it never renames a *running* binary, so `current_exe()`-follows-rename is never
exercised. CI green on kitsubito, prod red. The model boundary was drawn one
layer too high (it abstracts away the production `current_exe` default — the
platform-divergent part).

## Goal

A brain-only update on Linux runs the NEW bytes with no manual bounce (parity
with Windows), and a future respawn-vs-bytes divergence **self-heals** (auto
rollback + loud notif) instead of silently recording a wrong `applied`.

## REQ to mint (rule 3 — before the fix)

`REQ-HAZARD-BRAIN-RESPAWN-PATH` — `required_stages = ["doc","impl","unit","int"]`,
activated in the fix change. Sibling of `REQ-HAZARD-BROKER-PROCESS-ISOLATION`
(6.7): the broker must respawn the brain onto the **applied bytes**, and a
respawn that lands on the wrong bytes must fail-safe (rollback), never record
success. KNOWN-HAZARDS **6.11**.

## Fix — two halves (doyle ruling + deployah R2 belt)

### Half 1 — capture the canonical exe path once at broker start
`spawn_brain_supervisor` (`brainproc.rs:858`, the broker-process entry, runs in
`Daemon::run` before the seed loop) captures `std::env::current_exe()` **once at
t=0** (before any `apply` can rename under us) and passes it as the default
binary for the `binary = None` spawn case — never per-spawn `current_exe()`.

- `spawn_brain_child` grows a `canonical: Option<&Path>` param; `None` selection
  uses `canonical` (falls back to per-spawn `current_exe()` only if the t=0
  capture failed — degrade-safe, preserves today's behavior).
- The production closure handed to `supervise_brain` binds the captured path.
  `supervise_brain`'s generic `spawn_child` signature is unchanged → unit
  harness untouched. Rollback selection (`Some(.old-N)`) unchanged.
- Symlink starts are fine: `current_exe()` fully resolves at capture (kitsubito's
  `/usr/local/bin/spt` → `~/.local/bin/spt` resolves at t=0).
- Net effect: Linux gains Windows' path-at-start semantics. Zero install-path
  discovery, no new config.

### Half 2 — promotion bytes-gate (gate AND rollback trigger)
In `supervise_brain` the `TrialStep::Promoted` arm (`brainproc.rs:668`), BEFORE
`env.record_promoted(version)`:

- `expected = env.staged_artifact_hash(version)` (the staged release metadata's
  `artifacts[current_platform()].artifact_sha256`, via `ReleaseCache`).
- `actual = env.ready_exe_hash()` (the `exe_hash` the candidate stamped in
  `brain.ready`).
- **Both present and `expected != actual`** → the candidate came up on the wrong
  bytes: do NOT promote → `rollback(env, &record)` + `generation += 1` +
  `reason = Crash` + `continue` (same path as a failed trial; the loud rollback
  notif fires). The failure mode becomes auto-rollback to `.old-N`, not a
  silently-wrong record.
- **Either absent** → degrade to today's readiness-only promote (pre-metadata
  releases, missing breadcrumb — N-1-safe).

`TrialEnv` gains two best-effort methods: `ready_exe_hash(&self) -> Option<String>`
and `staged_artifact_hash(&self, version: u64) -> Option<String>`;
`ProductionTrialEnv` reads them from `brain.ready` + `ReleaseCache`. The D7-4
*manual* bytes assert becomes an *automatic* gate.

> **Bootstrapping note (deployah):** Half 2 is FORWARD protection — it lives in
> the v0.4.2 broker. The v0.4.2 `apply` itself rides the resident **v0.4.0**
> broker (no gate, buggy respawn), so it still needs the one manual Linux bounce
> to load the fixed broker. The gate protects every update AFTER v0.4.2.

## Tasks (per-task gates: build · test · clippy -D · build --no-default-features · traceable-reqs check EXIT=0 · xtask check)

- **T1 — doc.** KNOWN-HAZARDS **6.11** (platform-divergent `current_exe` trap +
  readiness≠bytes); `traceable-reqs.toml` mint `REQ-HAZARD-BRAIN-RESPAWN-PATH`;
  ADR-0018 amendment note (Q3 path-semantics assumption corrected). Tag
  `<!-- [doc->REQ-HAZARD-BRAIN-RESPAWN-PATH] -->`.
- **T2 — impl.** Half 1 (`spawn_brain_supervisor` + `spawn_brain_child` canonical
  capture) + Half 2 (`supervise_brain` Promoted-arm gate + `TrialEnv` two methods
  + `ProductionTrialEnv` impls). `// [impl->REQ-HAZARD-BRAIN-RESPAWN-PATH]`.
- **T3 — unit.** Spawn-path selection (captured default used for `None`, not
  per-spawn current_exe; `Some(.old-N)` unchanged) + promotion-gate truth table
  (match→promote · mismatch→rollback+notif · either-absent→readiness-only).
  `// [unit->REQ-HAZARD-BRAIN-RESPAWN-PATH]`.
- **T4 — int (the gap fix).** A `brain_survive` sibling: supervisor running the
  PRODUCTION selection (`binary = None` default through the captured path); the
  test **renames the running canonical binary `P` → `P.old-N` and writes the new
  bytes at `P`** (the real `apply` choreography, not a path flip), then asserts
  the respawned `exe_hash` == the new bytes. **Fails on Linux against the old
  per-spawn current_exe, passes after Half 1.** Regression guard on both runners.
  `// [int->REQ-HAZARD-BRAIN-RESPAWN-PATH]`. (deployah offered to build this leg.)
- **T5 — release.** Bump `0.4.1 → 0.4.2`; CHANGELOG `[0.4.2]` (user-clean: a
  Linux update now runs the new version immediately, no manual restart); regen
  docs; full gates; push → **CI both runners green** → deployah tags + signs +
  publishes **counter 9**.

## Traceability

`REQ-HAZARD-BRAIN-RESPAWN-PATH` `required_stages` set in the fix change (rule 5):
doc=T1, impl=T2, unit=T3, int=T4. `traceable-reqs check` stays green at every
commit.

## Ops + close sequence (post-publish)

1. **kitsubito** — apply v0.4.2 (rides the buggy v0.4.0 broker → records applied,
   brain on v0.4.1 bytes `.old-9`), then **one manual broker bounce** loads the
   fixed v0.4.2 broker. kitsubito's last bounce.
2. **enlyzeam** — catch up straight to v0.4.2; its one catch-up bounce doubles as
   the fix-load. The project's final manual bounce, paired.
3. **Linux seamless observation (operator: debug-channel rollout).** With the
   fixed broker resident, push a debug-channel build to the lab nodes
   (`DEBUG-ROLLOUT.md`, REQ-UPD-6) and observe a brain-only roll: brain pid
   changes, `exe_hash` == the debug artifact, broker pid held, NO manual bounce.
   This is the Linux leg the path-flip E2E could not give us in the field.
4. **Restoration CLOSE** — ROADMAP restoration → ✅ (both OS legs proven);
   KNOWN-HAZARDS 6.7/6.8 close-out; the kitsubito `applied:8`-over-v0.4.0 record
   captured in the appendix as evidence FOR the bytes-gate (NOT hand-corrected —
   v0.4.2's gated promotion supersedes it).

## Notes / non-goals
- v0.4.1 artifacts are sound — **no yank**. The defect is in the deployed v0.4.0
  broker's respawn logic.
- Rollback-selection-not-apply (`brainproc.rs:779`) is unchanged; the bytes-gate
  reuses the existing `rollback` path.
- The broker-side QUIC-op bound (pump-IPC-deadline B-half, DEFERRED) is NOT in
  scope — separate broker-update batch.
