# Known Hazards

Hard-won edge cases harvested from the sister project (`claude_skill_owl`, ~80 commits / 12+ phases / multiple production incidents). Per ADR-0001, this is a **test checklist for the spt-core rebuild** — the clean-room rebuild must re-satisfy each invariant rather than re-discover the bug.

**Architecture-translation note.** The sister project runs poll listeners and Psyche wrappers as *separate processes*. spt-core consolidates both into the one `spt-daemon` (brain), with a stable broker beneath it (ADR-0004). Many hazards below were inter-process races in the sister project; in spt-core some become intra-daemon concerns (potentially easier) while others move to the daemon↔broker IPC boundary or the network boundary (potentially new failure surface). Each entry notes the mapping where it differs. Citations point at sister-project paths for reference, not at spt-core.

---

## 1. Race conditions & ordering

### 1.1 Phantom INIT_SIGNOFF after grace period
- **Failure:** orphan teardown enqueues INIT_SIGNOFF before the grace-period recheck; a transient Self recovery (binary handoff, brief stale poll) makes the recheck pass-as-alive, but the signoff was already spooled and drains on the next iteration → teardown despite a live Self.
- **Invariant:** grace-period wait MUST complete *before* composing/delivering INIT_SIGNOFF; the recheck must bind `still_gone` before any envelope write.
- **spt-core mapping:** in-daemon now (no separate wrapper), but the ordering invariant is identical — orphan/teardown logic must re-evaluate liveness after the grace wait, not before enqueue.
- **Sister cite:** `src/live/wrapper/orphan.rs:201-259` (sleep@209 precedes compose@231-251); tests T-grace-recovery:576, T-still-gone-recheck:618.

### 1.2 Poll-rewrite race & info.json mid-write reads
- **Failure:** `info.json` written by the wrapper mid-iteration while a list/classify command reads it → torn read, misclassification.
- **Invariant:** consult liveness via the supervisor (`is_wrapper_alive`-equivalent) before any grace gate; reads of state files must tolerate concurrent writes (atomic write + rename, or read-retry).
- **spt-core mapping:** the daemon owns both writer and reader → use in-process locking/snapshotting instead of racing on disk. Cross-node registry reads remain eventually-consistent and must tolerate staleness.
- **Sister cite:** `src/common/list_filter.rs:100-150`; `src/owl/poll.rs:141`.

### 1.3 Stale `index.lock` wedge from prior git crash
- **Failure:** crashed git leaves a 0-byte `index.lock` in a psyche tracked worktree; every later commit blocks forever.
- **Invariant:** on daemon boot, sweep seed + all agent/project worktrees for stale locks (0 bytes, mtime > 60s) and remove; leave live locks alone.
- **spt-core mapping:** cross-node Psyche sync (ADR-0002/0003) replaces git-repo sync, so the *git* lock may disappear — but any equivalent lockfile in the new sync mechanism needs the same stale-sweep on boot.
- **Sister cite:** CHANGELOG v1.11.20 "Stale `index.lock`"; `src/common/git.rs`.

### 1.4 Deferred spool rows must not leak to the event stream
- **Failure:** a hook spools a deferred (spool-only, no TCP wake) notice; startup `drain_all` flushes ALL rows including deferred → event emitted at wrong time/priority.
- **Invariant:** startup drain (and idle/timeout TCP-wake sites) use `drain_non_deferred` only; deferred rows are picked up by their intended consumer via `peek`. All drain sites must agree on which rows they flush.
- **spt-core mapping:** carries directly — the daemon's spool-drain has the same deferred-vs-immediate distinction.
- **Sister cite:** `src/owl/poll.rs:276-316`; `spool::drain_non_deferred_with_metadata`.

### 1.5 Worker (working-perch) lifecycle path consistency
- **Failure:** subagent-start creates the perch at one path layout; later hooks read it at another → not found; stop-hook scan misses nested perches.
- **Invariant:** all Worker/Psyche child-perch path composition routes through one central resolver; no divergent path construction across hooks.
- **spt-core mapping:** `Worker` is a day-one endpoint type; the daemon owns the registry, so perch location is a registry lookup, not ad-hoc path math. Single source of truth for instance→location.
- **Sister cite:** `src/owl/hook_subagent_start.rs:122-168`; `hook_subagent_stop.rs:15-55`.

---

## 2. Identity & session-binding

### 2.1 Parent PID over ephemeral poll PID
- **Failure:** orphan check polls an ephemeral listener PID; it dies and is recycled (esp. Windows); a foreign process with the recycled PID reads as alive → false-positive teardown (or false-negative).
- **Invariant:** prefer the stable harness-session PID (`parent_pid`) over any ephemeral process PID for liveness; minimal `info.json` for supervisor-owned perches to avoid stale leaks.
- **spt-core mapping:** session binding (parent-process-tree anchor) still applies for harness-hosted topology. For spt-hosted sessions the broker holds the child directly → liveness is the broker's held-handle state, more reliable than PID polling.
- **Sister cite:** `src/live/wrapper/orphan.rs:141-161`; CHANGELOG v1.11.20.

### 2.2 Stdin session_id precedence over env
- **Failure:** subagent inherits a stale `OWL_SESSION_ID` env across `/clear`; hook gets two session_ids (fresh stdin, stale env) → wrong-agent binding.
- **Invariant:** stdin-provided session_id wins; env is fallback only.
- **spt-core mapping:** the harness-contract subcommand surface must define the same precedence for whatever identity fields hooks pass in.
- **Sister cite:** CHANGELOG v1.35.1 "IN-05"; `hook_subagent_start.rs:40-51`.

### 2.3 Binary-handoff argv schema must stay backward-compatible
- **Failure:** old binary spawns new binary with old argv arity; clap rejects before state rehydration → wrapper dies unlogged.
- **Invariant:** every newly-added handoff positional has a default; state-file rehydration happens *after* argv parse; defaults survive intermediate versions.
- **spt-core mapping:** CRITICAL — self-update (ADR-0004) makes handoff routine. The broker↔brain IPC and any brain-relaunch argv must be versioned and forward/backward tolerant (a newer brain talks to an older broker). This is the single most update-frequency-sensitive invariant.
- **Sister cite:** `src/live/wrapper/lifecycle.rs:17-106`; `src/cli.rs` defaults; CHANGELOG v1.11.10.

### 2.4 Generation `gen_start` always = now() on cold-start AND handoff
- **Failure:** stale gen_start from a rehydrated state file fires time-based discriminators on the new process.
- **Invariant:** wall-clock `gen_start` is set to `now()` on both cold-start and handoff; generation counter increments on every start/revive; session UUID captured fresh and carried so the resumed mind distinguishes "same gen continuing" vs "new gen born".
- **spt-core mapping:** carries to the daemon's per-instance generation tracking.
- **Restoration D3/D4 (ADR-0018):** the generation *counter* custody moved to the broker (D3-2 — it observes every brain spawn, planned or crash, and hands `{generation, start-reason}` at spawn; `gen_start` stays `now()`-fresh, never rehydrated). The brain→brain **`BrainState` *message*** (`{session_id, generation, next_seq, gen_start_ms}`) that previously carried continuity across a handoff is **retired from the production path in D4-2**: a brain the supervisor respawns cold-starts and reconstructs all session continuity by **querying the broker** (`Brain::resume_sessions` over the broker's cursor-of-record), never a frame. `BrainState` / `Brain::handoff` / `Brain::snapshot` remain `pub` and compiled **only for the integration tests** (handoff/idempotent/daemon_e2e/attach/brain_swap + `update.rs::apply_brain_only`, itself test-only-reached) — no non-test caller (grep-clean close-out). The falsifiable proof the production path needs no frame is the D4-2 hard-kill harness (`tests/resume.rs`): a brain dropped with no `snapshot()` taken still resumes gaplessly.
- **Sister cite:** `src/live/wrapper/lifecycle.rs:70`; `src/common/wrapper_state.rs`.

### 2.5 Daemon-hosted endpoints have no dedicated liveness PID
- **Failure:** the sister evaluates Psyche/perch liveness via a dedicated process PID — the wrapper's own pid in `info.json`, checked with `is_process_alive`. Under ADR-0004 the Psyche (and any spt-hosted Self) is a **loop inside the daemon**, not a separate process: it holds no dedicated pid, and its `claude`/summarizer subprocess is ephemeral (spawned per pulse/commune, then exits). If a daemon-hosted perch's `info.json` carries the **daemon's** pid, then *every* hosted endpoint shares one pid, and `is_process_alive(pid)` reads "alive" for a torn-down endpoint as long as the daemon runs — while `clean_stale_entries` (dead-pid deletion) can no longer distinguish a dead endpoint from a live one. The 2.1/5.1 liveness models do **not** cover this third category: the Psyche is neither harness-hosted (no `parent_pid` anchor) nor a broker-held PTY child.
- **Invariant:** for **daemon-hosted** perches (Psyche; spt-hosted Self), liveness is the **daemon's authoritative in-memory endpoint table + a `status` field** on `info.json` (`online|offline|…`), **never** `is_process_alive(info.pid)`. `info.pid` for a daemon-hosted perch is at most a *hosted-by-daemon* marker (the daemon pid), not a liveness signal; registry stale-clean for these rows keys on the daemon's endpoint table, not per-row pid. This reuses the pattern already specified for **Shells** (`info.json` carries daemon-managed `status`, capability resolved by `adapter_name` — CONTEXT "Shell… Not in the subnet registry") and extends it to daemon-hosted *agent* perches.
- **spt-core mapping:** the **M1/M2a interim** model keeps the Psyche/listener a real per-process owner (the `api listen` process), so its per-pid liveness (`deliver::is_online` → `info.read_pid` → `proc::is_process_alive`; `registry::clean_stale_entries`) is correct *interim*. **M3 daemon consolidation replaces it** with daemon-authoritative liveness for hosted perches. Keep the liveness check behind one resolver (mirrors `resolve_address` stale-clean) so the M3 swap is localized — do **not** let the per-pid assumption leak into new call sites.
- **Sister cite:** `src/live/wrapper/orphan.rs` (wrapper-pid liveness); `src/common/list_filter.rs:168-175` (pid-classify); spt-core `crates/spt-store/src/{proc.rs,registry.rs}` + `crates/spt-msg/src/deliver.rs::is_online`.

---

## 3. Lifecycle

### 3.1 Ephemeral perch cleanup on every `ring` exit path
- **Failure:** `ring` creates an ephemeral perch; early-exit paths (no-perch, empty-msg, timeout) skip cleanup → stale dirs accumulate.
- **Invariant:** every code path that creates an ephemeral perch cleans it before exit; exception: if the caller already had an active perch, do not treat as ephemeral and do not clean up.
- **spt-core mapping:** `ring` semantics carry; the daemon owns ephemeral-perch lifecycle, so a single guaranteed-cleanup (drop guard / RAII) is achievable in-process.
- **Sister cite:** `src/owl/ring.rs:58-294`.

### 3.2 Stale signoff sentinel must not kill a fresh start
- **Failure:** a leftover `.claude/<id>-signoff.md` from a prior session is read by a fresh listener as a live signoff → immediate teardown.
- **Invariant:** on every listener/daemon spawn, sweep stale signoff sentinels; signoff files are write-once per generation.
- **spt-core mapping:** same sweep on daemon (re)start per hosted instance.
- **Sister cite:** CHANGELOG v1.11.20; `src/owl/cleanup.rs:97`.

### 3.3 Orphan teardown fires echo-commune BEFORE INIT_SIGNOFF
- **Failure:** teardown delivers INIT_SIGNOFF without first saving the final context delta → Psyche signoff lacks the context-save summary.
- **Invariant:** on orphan path, synchronously run the echo-commune (final delta) before composing INIT_SIGNOFF; skip only if the session_id is missing.
- **spt-core mapping:** the daemon runs psyche/pulse loops in-process; ordering invariant identical.
- **Sister cite:** `src/live/wrapper/orphan.rs:175-199`; tests A-H:333-565.

---

## 4. Wire / transport

### 4.1 Envelope HTML-entity codec ordering — `&amp;` decoded LAST
- **Failure:** decoding `&amp;`-entity before the others double-decodes nested entities (`&amp;amp;lt;` → wrong result).
- **Invariant:** ENCODE order amp→first … `<br>`→last; DECODE order `<br>`→first … amp→**last** (`&lt;`,`&gt;`,`&quot;`, then `&amp;`). One sole decode site (at the LLM/stdin boundary); the parser never decodes.
- **spt-core mapping:** `spt-proto` owns the envelope grammar (public SDK, semver + wire-version). This codec contract is a copy-verbatim commodity item (ADR-0001) and a public-API conformance test.
- **Sister cite:** `src/owl/poll.rs:1-73`; `src/common/envelope.rs`.
- **CR-linesafety `[REQ-HAZARD-ENVELOPE-CR-LINESAFE]`:** the EVENT is LINE-FRAMED, so the codec must neutralize raw `\r` too — `event_body_escape` folds CRLF/lone-CR to `\n` (→`<br>`) **before** framing. **Failure (field, 2026-06-08):** a cross-node `spt send` from Windows (`echo` → CRLF) carried a raw `\r` into the single-line envelope; the receiver terminal did a CR→column-0 overwrite (`</EVENT>` clobbered `<EVENT t`). `\r` was never line-representable here, so normalizing it is robustness, not an ADR-0001 wire divergence (decoder + amp-last untouched). Belt-and-suspenders: `spt send`/`ring` trim stdin like `notify`.

### 4.2 Two-slice envelope parser is panic-free and tolerant
- **Failure:** malformed envelope (unclosed/misordered/nested tags) panics or drops output.
- **Invariant:** tags case-sensitive, all optional; no tags → whole body to live slot; unclosed → None for that tag; out-of-order → both still extracted; nested unknown tags preserved verbatim; zero `unwrap` on parsed text.
- **spt-core mapping:** `spt-proto` parser; property-test the robustness rules.
- **Sister cite:** `src/common/envelope.rs:64-92`; tests 99-207.

### 4.3 Registry stale-entry cleanup precedes lookup
- **Failure:** sender resolves a dead process's stale TCP port → delivery to wrong/dead listener.
- **Invariant:** clean stale entries (dead PID) before/at lookup; spool fallback is the safe path on TCP miss.
- **spt-core mapping:** now spans the **subnet registry** (ADR-0003) — eventually-consistent across nodes. Cross-node staleness is expected; resolution policy (local → most-recent → `id@node`) must degrade to spool/relay fallback on stale hits, and never hard-fail on a stale remote entry.
- **Sister cite:** `src/common/registry.rs:62-78`; `src/owl/send.rs`.

### 4.4 Deferred rows survive poll drain
- **Failure:** poll `drain_all` flushes a deferred (spool-only) message meant for a hook consumer → message lost.
- **Invariant:** deferred rows are never flushed by the event-stream drain; only `drain_non_deferred_*` / `peek_all` touch them.
- **Sister cite:** CHANGELOG v1.11.20; `src/common/spool.rs`. (See also 1.4.)

### 4.5 Inbox legacy compat must not double-deliver
- **Failure:** message surfaced via both spool (durable) and legacy inbox files → duplicate or racing delivery.
- **Invariant:** spool is the sole read path at poll time; inbox is write-for-compat only and never read.
- **spt-core mapping:** clean-room — likely drop the legacy inbox entirely. If kept for any compat, preserve "never read at drain time."
- **Sister cite:** `src/common/inbox.rs`.

### 4.6 Addressable-id charset reserves the address delimiters
<!-- [doc->REQ-HAZARD-ID-CHARSET] -->
- **Failure:** a bare endpoint id that contains `:` or `@` (or a path separator / whitespace / control char) makes the canonical qualified address `[subnet:]id[@node]` (ADR-0006 / REQ-INST-10) ambiguous to parse, and lets a name smuggle into a perch directory path. Once permissive ids exist in the wild, tightening later needs a migration.
- **Invariant:** every addressable id/name is validated to `[A-Za-z0-9_-]` + Hiragana/Katakana/CJK only, length `1..=64`, **at every creation seam** (`ready` start, `api bind`, `api listen`, `api worker-start`). `:` and `@` are permanently reserved as address delimiters; reads of existing perches are never re-validated. Enforce now (pre-M3/M4) so no permissive id-data accumulates.
- **spt-core mapping:** `spt_proto::id::validate_endpoint_id`; called at the four creation seams. The existing Psyche (`<parent>-psyche`) / Worker (`<parent>-w<N>`) suffix scheme uses only `-` + alphanumerics, so composite ids validate.

### 4.7 Concurrent SQLite openers must not fail with "database is locked"
<!-- [doc->REQ-HAZARD-REGISTRY-CONCURRENT] -->
- **Failure:** two endpoints on one machine open the same SQLite store at once (e.g. two `ReadyAgent::start` calls registering simultaneously) and one fails outright with `SQLITE_BUSY` / "database is locked" → spurious registration/spool failure. Surfaced as a parallel-test flake in `two_agents_exchange_message_tcp_and_spool`, but the bug is real concurrency, not test-only.
- **Invariant:** `busy_timeout` is set **before** any lock-taking statement on every connection. Switching `journal_mode=WAL` takes a brief exclusive lock; with the default 0ms timeout it fails immediately under contention, so the pragma order is load-bearing: `Connection::open` → `busy_timeout` → `journal_mode=WAL` → `CREATE TABLE …`. WAL alone is insufficient (concurrent *writers* still serialize; they must *wait*, not error).
- **spt-core mapping:** `spt_store::registry::open_registry` + `spt_store::spool::open_spool_at`; both set `busy_timeout=5000` first. Any future SQLite store (history Path B, instance registry) must follow the same ordering.

### 4.8 Registry merge ordered by epoch, never wall-clock (red-team #8)
<!-- [doc->REQ-HAZARD-REGISTRY-EPOCH-LEASE] -->
- **Failure:** the per-subnet registry replicates `endpoint_id → [instances]` eventually-consistently across nodes. Under a partition or clock skew, a lagging node re-announces a stale `Active` for an endpoint that has actually gone `Offline`. If the merge ordered updates by wall-clock (or "last write wins"), the stale `Active` overwrites the newer `Offline` and resolution routes a message to a dead/wrong instance.
- **Invariant:** the merge precedence key is a **per-node monotonic epoch counter** (`spt_store::epoch::EpochSource`, persisted, strictly increasing, NEVER wall-clock), compared version-vector style per `(endpoint_id, node)`: an incoming update wins **iff its epoch is strictly greater** than the stored one for that node; equal or lower is dropped as stale. So a newer `Offline` (higher epoch) can never be clobbered by a lagging `Active` (lower epoch), and an idempotent equal-epoch replay is a no-op. Wall-clock is at most a human tiebreaker hint inside a flagged conflict, never the ordering authority. The same epoch source unifies with the D6 sync-precedence concurrent-write detection (#7).
- **spt-core mapping:** `spt_net::net::registry::SubnetRegistry::merge_instance` (the lease) + `spt_store::epoch::EpochSource` (the counter). Cross-node replication of the merge wires at D4; the merge seam is identical for local and wire-delivered updates. Chaos/two-host verification = D9.

### 4.9 SQLite stores must create their parent dir — SQLite won't
<!-- [doc->REQ-HAZARD-REGISTRY-DIR-CREATE] -->
- **Failure:** `Connection::open` creates the database FILE but never its parent DIRECTORY. On a fresh home (first boot, fresh CI `_work` dir) a registry op that runs before any perch-creating op (`create_dir_all` side effects) fails `SQLITE_CANTOPEN` — "unable to open database file …owlery\.registry". Timing-dependent: whichever code path touches the home first decides the outcome, so it surfaces as a parallel-test flake (bind-first tests losing the dir-creation race to perch-first tests). Bit the hfenduleam CI leg twice (2026-06-03/04, four spt-msg unit tests at once on the second strike) before being run to ground; a slow runner filesystem (AV scanning fresh dirs) widens the window but is not the cause.
- **Invariant:** every SQLite store's open path `create_dir_all`s its parent dir itself, best-effort, before `Connection::open` — never relying on another subsystem having materialized the home first. (Mirrors the spool, which always did this; the registry didn't.)
- **spt-core mapping:** `spt_store::registry::open_registry` (`create_dir_all(owlery)` before open). `spt_store::spool::open_spool_at` already creates its perch dir. Any future SQLite store must do the same — pair this with the 4.7 pragma ordering on every new store.

### 4.10 Dead node identities leave immortal registry rows  `[REQ-HAZARD-REGISTRY-GHOST-ROWS]`
<!-- [doc->REQ-HAZARD-REGISTRY-GHOST-ROWS] -->
- **Failure:** the registry's only superseding mechanism is the per-`(endpoint_id, node)` epoch lease (4.8) — a row is replaced only by a newer row *from the same node*. When a node identity dies permanently (machine retired, or `node.key` regenerated so the "node" never speaks again), its rows are never superseded and never expire: they sit in the in-memory registries and the `identity/registry/<subnet>.json` snapshots forever. A bare-id send then resolves the same endpoint id on both the live and the dead identity and refuses with a **phantom `AcrossNodes` ambiguity** — unfixable by the user, because no qualifier reaches a node that no longer exists. Hit live in the M7 acceptance run (2026-06-06): gravity paired under two identities (09ef…, then 03854a… after its key universe flipped during the sudo experiments); the dead identity's `sergey` row made bare `spt send sergey` refuse on HFENDULEAM.
- **Invariant:** registry rows authored by a **silent** peer node decay: a node not *heard* (admitted inbound feed — the M7 D2 heard-map, REQ-SUBNET-1) within the eviction window (`registry_evict_after_ms`, default 300s ≈ 10 default pump cadences) has its rows **evicted** from every subnet registry, snapshots rewritten. Own rows never decay (the node always hears itself implicitly — it authors them each pump tick). Eviction is safe under the lease: v1 has **no transitive gossip**, so any future update for a node comes from that node itself, alive, re-inserting from its durable `EpochSource` within one cadence — there is no lagging third-party replay to mis-order against. A merely-offline node loses its rows after the window and reconverges on return; meanwhile resolution honestly reports it absent instead of poisoning bare-id sends.
- **spt-core mapping:** `spt_net::net::registry::SubnetRegistry::evict_nodes` (model) + `spt_daemon::registryhost::RegistryHost::evict_silent_peers` (heard-map TTL) driven from the registry pump tick (`peerloop`). Trust rows are NOT auto-evicted (trust is a user decision; a stale trust row only costs dead dials) — pruning those is a separate verb.
- **Source:** M7 acceptance run 2026-06-06 (DEFERRED.md "Ghost registry row eviction"); the AMBIGUOUS render fix rode along.
- **Mesh note (ADR-0017, 2026-06-08):** the subnet mesh **preserves** this invariant rather than superseding it. "No transitive gossip" sharpens to **no transitive *row* gossip** — the mesh relays only the member *roster* (discovery), while registry **rows stay own-authored and are fetched directly** from each member over a handshake. So "any future update for a node comes from that node itself, alive" still holds and the eviction lease is untouched. (The plan's rejected alternative — signed transitive *row* relay — would have broken this; roster-only relay was chosen precisely to keep it.)

### 4.11 Advertisement-epoch reset strands a node  `[REQ-HAZARD-EPOCH-RESET]`
<!-- [doc->REQ-HAZARD-EPOCH-RESET] -->
- **Failure:** a node whose advertisement-epoch counter resets (the durable `EpochSource` file lost/recreated) re-advertises with LOW epochs; peers hold a higher last-seen epoch for that `(endpoint, node)` lease and drop every fresh row as **stale** — the node advertises into a void until its counter outruns its own history. Nothing renders the cause: the node looks healthy locally, peers simply never update.
- **Invariant (mitigation by construction, common case):** the common trigger — a full reinstall / identity regeneration — is covered by the **re-pair trust overwrite** (M8 decision 13, REQ-SUBNET-7): a completed ceremony presenting the same label + machine id evicts the superseded identity's trust AND registry rows on the seed-holder, and the peer-side epoch memory **dies with the deleted row** — the re-paired node's fresh epochs land on a clean lease. M8 acceptance 7 verifies this explicitly (the epoch sub-check).
- **Residual (documented, guard deferred):** the narrow slice — epoch file lost while the node *identity* is kept (manual state surgery, partial restore from backup) — has no guard; it waits for a field hit before one is designed (M8 decision 24). `REQ-HAZARD-EPOCH-RESET` is minted inactive (TRACEABILITY rule 5) as the tracking hook. If hit: symptoms are one node's endpoints frozen-stale on every peer while its own views are fresh; recovery today is re-pairing the node (rides the common-case eviction above).
- **spt-core mapping:** epoch mint = `spt_store::epoch::EpochSource` (`identity/epoch.json`); the lease = the per-`(endpoint, node)` epoch compare in `spt_net::net::registry`; the eviction that clears peer-side epoch memory = `registryhost::repair_evict_superseded` + `RegistryHost::consume_repair_evictions`.
- **Source:** minted at M8 ratification (decision 24), recognized as a class during the 2026-06-07 pump diagnosis / re-pair overwrite design — not yet field-hit in its residual form.

---

## 5. Platform-specific

### 5.1 Windows PID recycling false positives
- **Failure:** recycled PID reads alive for the wrong process → orphan misclassification.
- **Invariant:** anchor liveness on the stable parent/harness PID; minimal info.json for supervisor-owned perches; mtime grace window (≥60s) masks transient mismatches.
- **spt-core mapping:** broker-held handles supersede PID polling for spt-hosted sessions; keep the grace window for harness-hosted.
- **Sister cite:** `src/live/wrapper/orphan.rs:141-161`; `src/common/list_filter.rs:168-175`.

### 5.2 Windows EBUSY on atomic rename
- **Failure:** `fs::rename` fails while a handle is (recently) held → registry/marketplace update fails.
- **Invariant:** tmp-write + atomic-rename with retry/backoff; best-effort side-fail; tolerate transient EBUSY.
- **spt-core mapping:** all on-disk state writes (registry, trust store, spool checkpoints) use this pattern. Self-update binary swap on Windows especially.
- **Sister cite:** CHANGELOG "EBUSY"; `src/common/owlery.rs` atomic_write.

### 5.3 Git/subprocess timeout stamping
- **Failure:** a hung subprocess (git on slow net) blocks the supervisor indefinitely.
- **Invariant:** every metadata-producing subprocess has a timeout; timeout yields `None` + rate-limited stderr, never a hang.
- **spt-core mapping:** generalize to all manifest-declared harness invocations (delegated commands, adapter updates) — timeouts mandatory.
- **Sister cite:** `src/common/git.rs`.

### 5.4 Windows UNC prefix in serialized paths
- **Failure:** canonicalized `\\?\C:\...` serializes to `//?/C:/...` and fails `read_to_string`.
- **Invariant:** strip the `\\?\` UNC prefix after backslash→forward-slash conversion; serialized path attrs must be directly consumable.
- **spt-core mapping:** any path crossing the wire (file-drop EVENTs, off-node file transfer per ADR-0003) needs canonical normalization at the `spt-proto` boundary.
- **Sister cite:** `src/common/owlery.rs:377-384`.

### 5.5 ConPTY withholds output until DSR is answered  `[REQ-HAZARD-CONPTY-DSR]`
- **Failure:** a broker reading a ConPTY master sees only the 4-byte startup query `ESC [ 6 n` and then nothing — the child looks hung/silent but is producing output normally. ConPTY blocks all child stdout until the terminal answers the cursor-position query.
- **Invariant:** every ConPTY reader auto-answers DSR (`ESC [ 6 n` → write `ESC [ 1;1 R`, or a real cursor position) on the PTY writer. Secondary: a ConPTY master does not EOF while the writer is held, so read loops drain on a thread and never gate exit on a blocking `read()`.
- **spt-core mapping:** `spt-term` broker PTY reader (ADR-0004). Brand-new to spt-core — not in the sister project (it never hosted ConPTY directly).
- **Source:** Spike #1 (`docs/spikes/SPIKE-01-broker-handoff.md`); reproduced with both a Rust child and `cmd.exe`.

<!-- [doc->REQ-HAZARD-DETACHED-PIPE-INHERIT] -->
### 5.6 Windows detached children inherit a captured caller's pipe  `[REQ-HAZARD-DETACHED-PIPE-INHERIT]`
- **Failure:** a caller captures an `spt` invocation's output through a pipe (`Command::output()`, a harness hook reading the command). That `spt` process detach-spawns a **long-lived** child (the daemon via `ensure_running`; a shell binary via `spt shell spawn`). On Windows `CreateProcess` runs with `bInheritHandles = TRUE`, and the spt process's std handles — the caller's pipe write-ends — are inheritable by construction, so the immortal child inherits them even when its *own* stdio is `Stdio::null()`. The caller's pipe read never sees EOF: the capturing caller **hangs forever** (unix is immune — pipe fds are `CLOEXEC`). Paid twice: daemon spawn (guarded at D4a-era `spawn_detached`), then again at M5-D3e when the mock-shell E2E hung `spt shell spawn` for hours.
- **Invariant:** every detach-spawn of a long-lived child runs with **`bInheritHandles = FALSE`** (`spt-daemon::daemon::detached_no_inherit`) — zero handles flow, whatever the pipe's depth in the ancestry. Stripping `HANDLE_FLAG_INHERIT` from the spawner's *std* handles is NOT sufficient: a grandparent capture's pipe sits in the handle table as a stray inheritable handle and still flows through every `bInheritHandles = TRUE` hop (the first guard shipped that way and was wedged by exactly this — a daemon spawned three layers deep held the pwsh-level pipe of the CI/test harness).
- **spt-core mapping:** `spt-daemon::daemon::spawn_detached` (the daemon) and `spt-daemon::shellhost::launch_shell` (the relay-receipt shell binary). Any future long-lived detached spawn (manifest-template children included) must use the same no-inherit spawn.
- **Source:** spt-core, M5-D3e (`shell_e2e.rs` hang, 2026-06-04, twice — once per guard generation); Rust `Command` restricts *its own* created stdio handles but a parent's inheritable handle table still flows.

### 5.7 Elevated commands spawn the daemon with the wrong token  `[REQ-HAZARD-ELEVATED-DAEMON-SPAWN]`
<!-- [doc->REQ-HAZARD-ELEVATED-DAEMON-SPAWN] -->
- **Failure:** membership-implies-reachability made *every* `spt` invocation a potential daemon spawner (`ensure_running`), including the elevation-gated ones (`subnet create`/`join`, REQ-SUBNET-4). The spawned daemon inherits the spawner's token. **Windows:** an elevated `subnet create` auto-starts an ELEVATED daemon whose named pipes deny unelevated clients — every subsequent unelevated `spt` reads "not running", tries to spawn its own daemon, and dies on bind Access-denied; the user had to taskkill (hit live, M7 acceptance 2026-06-06). **Linux:** a sudo'd command spawns a root daemon and/or root-owned state — and because sudo flips `$HOME`, the daemon can mint a *different node identity* in root's universe (the very key-flip that produced the 4.10 ghost rows).
- **Invariant:** the daemon **always runs unelevated in the invoking user's universe**, regardless of which command spawns it. Two enforcement points sharing one seam: (a) `spawn_detached` de-elevates the child — Windows: the UAC **linked token** (`TokenLinkedToken` → `DuplicateTokenEx` → `CreateProcessWithTokenW`; inherits no handles, so 5.6 holds by construction); Linux: drop to `SUDO_UID`/`SUDO_GID` with `$HOME`/`$USER`/`$LOGNAME` reset to the invoking user's (passwd lookup); (b) a `Daemon::run` entry guard catches a *directly* elevated `spt daemon` — Linux drops privileges in-process before touching any state; Windows respawns de-elevated and exits. When no unelevated identity exists to drop to (UAC disabled, genuine root login, SYSTEM), the daemon runs as-is with a loud warning — a consistent universe, never a torn one. Elevated one-shot *clients* talking to an unelevated daemon are fine (downward connects work); the daemon side is the invariant.
- **spt-core mapping:** `spt-daemon::deelevate` (the OS-split seam) consumed by `daemon::spawn_detached` + the `Daemon::run` entry guard. The fuller Linux elevation model (install symlink + default-account election) is deferred (DEFERRED.md, M8).
- **Source:** M7 acceptance run 2026-06-06 (DEFERRED.md "Non-admin daemon spawn"); interim field rule was "bring the daemon up unelevated FIRST".

<!-- [doc->REQ-HAZARD-CHILD-CONSOLE-FLASH] -->
### 5.8 Console children of the console-less daemon flash visible windows  `[REQ-HAZARD-CHILD-CONSOLE-FLASH]`
- **Failure:** the daemon runs DETACHED (no console, 5.6/`detached_no_inherit`). Any console-subsystem child it spawns (`git`, `taskkill`, manifest hook commands) gets a **fresh conhost with a visible window** — piped/null stdio does NOT prevent it. Field shape: the 60s sync pump's two git calls (`for-each-ref` + `rev-parse`) flashed two blank windows per minute on the user's desktop (2026-06-06).
- **Invariant:** every short-lived console child spawned from daemon-reachable code sets `creation_flags(0x0800_0000)` (`CREATE_NO_WINDOW`). Long-lived detached children use `detached_no_inherit` (already `DETACHED_PROCESS | CREATE_NO_WINDOW`); de-elevated spawns use `CREATE_NEW_CONSOLE + SW_HIDE` (5.7 — `CreateProcessWithTokenW` rejects `CREATE_NO_WINDOW`, error 87).
- **Test seam caveat:** window-absence is unobservable from a consoled test runner — the child inherits the runner's console and never creates a window, flag or no flag. Unit coverage asserts the flagged spawn still works (the error-87 "flag combo breaks spawn" regression class); window-absence was verified live by process-watch capture.
- **spt-core mapping:** `spt-store::gitrun::run_git` (every BranchStore/ContextStore git call), `spt-daemon::shellhost::kill_shell_pid` (taskkill), `spt-runtime::run_bounded_command` (manifest hook commands), `spt-runtime::ManifestRuntime::command_for` (the one shared builder behind `spawn_session` + `run_bounded_stdin` — the notif pump's `spawn_notif_command` and the live agent's psyche/echo/turn spawns), `spt-daemon::shellwake` (already guarded). The flag lives in each shared builder, not per call site, so the invariant holds for every ManifestRuntime spawn by construction.
- **Source:** spt-core field bug, 2026-06-06 — two blank windows flashing every 60 seconds on a desktop workstation, caught by process-spawn watcher (git.exe parent=spt daemon, conhost.exe child each).

### 5.9 `Instant - Duration` underflow-panics on a freshly-booted host  `[REQ-HAZARD-INSTANT-UNDERFLOW]`
- **Failure:** `Instant::now() - Duration::from_secs(N)` panics `overflow when subtracting duration from instant` when the process's monotonic clock is younger than `N` — i.e. the host booted less than `N` ago. The peer pump primed its cadence legs with `Instant::now() - 86_400s` to mean "everything due now"; on a Windows runner with sub-24h uptime the pump thread panicked at startup, so the subnet never converged (CI `pump_and_dispatch_self_drive_the_subnet` failed, run 27082417706). It is *environment-conditional* — green on any host up longer than the offset, red below it — so it slips local dev and only bites a fresh CI box or a just-rebooted machine.
- **Invariant:** NEVER compute an instant in the past by subtracting from `Instant::now()`. Represent "never run / due now" as `Option<Instant> = None` and gate on forward `now.duration_since(past)` only (`peerloop::due`). No backward instant arithmetic anywhere in scheduling.
- **Test seam caveat:** the convergence E2E only reproduces on a sub-offset-uptime host (it passed everywhere with >24h uptime). The deterministic guard is the `due(None, ..)`/`due(Some(now), ..)` unit on the extracted gate — it asserts first-tick-due with zero instant subtraction, independent of host uptime.
- **spt-core mapping:** `spt-daemon::peerloop::due` (the sole cadence gate behind `due_reg`/`due_notif`/`due_sync`/`due_upd`); cadence legs are `Option<Instant>` seeded `None`.
- **Source:** spt-core CI failure, 2026-06-07 — Windows runner `hfenduleam` (just booted) panicked the peer pump at the v0.1.1 release gate.

### 5.10 `sudo spt` dead-ends on a user-local install (secure_path)  `[REQ-HAZARD-SUDO-SECURE-PATH]`
- **Failure:** the elevation-gated commands (`subnet create` / `subnet join` / `show-code`) refuse when unelevated and tell the user to "run as administrator / root". The user types the obvious `sudo spt subnet create FOO` → `sudo: spt: command not found`. `spt` is a user-local install (`~/.local/bin`, `~/.cargo/bin`), and sudo's `secure_path` (a `/etc/sudoers` default) does NOT include those dirs, so a bare command name doesn't resolve under sudo. The guidance is a trap: it names an action that cannot work for the common install shape. Field-hit on KITSUBITO at the v0.1.1 ship.
- **Invariant:** elevation guidance on Unix emits the binary's **absolute path** under sudo — `sudo /home/u/.local/bin/spt subnet create FOO` — reconstructed from `current_exe()` + the real argv and shell-quoted. An absolute program path is executed directly; `secure_path` only governs bare-name PATH lookup, so the absolute form always resolves. On an interactive Unix TTY the command auto-elevates (re-execs itself under sudo, the elevated child does the work and `main` de-elevates back); non-interactive or sudo-absent falls back to printing the runnable hint. Never emit a bare-name elevation instruction.
- **Companion UX:** the post-de-elevation `DEELEVATED: running as uid N` line is internal state-safety noise — omit it from the user-facing CLI path (it confused the same field user). The detached daemon's own de-elevation log line is fine (it lands in the daemon log, not the terminal).
- **Test seam caveat:** the sudo re-exec needs a real `sudo` + TTY (not hermetic). The deterministic guard is the pure `elevation::sudo_argv` / `print_hint_command` (assert an absolute exe path, never a bare name, + shell-quoting on the printed line) and the `decide_elevation_path` matrix (which picks inline-sudo only on an interactive Unix TTY); the exec leg is manual/kitsubito-verified.
- **spt-core mapping:** `spt::elevation::{sudo_argv, print_hint_command, decide_elevation_path}` (pure — generalized from the M12-W4 self-elevation seam, 5.11), `spt::cli::{try_auto_elevate, with_elevation_hint}` wired into `cmd_subnet_create` / `cmd_subnet_join` / `cmd_subnet_show_code`; `spt::main` de-elevation drop silenced.
- **Source:** spt-core field report, 2026-06-07 — `reavus@KITSUBITO`, `spt` in `~/.local/bin`; the absolute-path `sudo` invocation was confirmed working before the fix landed.

### 5.11 Self-elevating re-launch must re-run verbatim, never widen / inject / loop  `[REQ-HAZARD-SELF-ELEVATE]`
- **Failure class:** a privilege-gated command (`subnet create` / `join` / `show-code`) self-elevates by re-launching itself with privilege (Windows UAC `runas`, Linux `pkexec` / a terminal-emulator `sudo`, or inline `sudo`). A careless re-launch is a security hole: widening the privilege scope (adding args), resolving the binary by a bare name (a PATH/`secure_path` hijack runs an attacker's `spt`), interpolating a crafted arg into a shell string (`sh -c "… $id …"` injects a second command), or re-elevating the already-elevated child (an infinite UAC/polkit loop). The user's UAC/polkit/sudo prompt is the ONLY consent gate — the mechanism must never bypass or widen it.
- **Invariant:** self-elevation re-runs the **EXACT** original invocation with the binary's **ABSOLUTE** exe path — never adding/altering args, never a PATH-resolved bare name, never a shell-interpolated string. Every launcher passes an **argv array** (`Command::new(prog).args([...])`, never `sh -c`); the Windows `ShellExecuteW` params string (which is inherently one string) MSVC-quotes each verbatim arg so `CommandLineToArgvW` round-trips it as a single token. The elevated child drops state back to the user (composes with the 5.7 de-elevation) and **never re-elevates**: `decide_elevation_path` returns `AlreadyElevated` whenever the process is `Elevated`, on every OS (loop-safety). The unprivileged parent never pipes/captures the elevated child's stdout across the privilege boundary — the child is self-contained (on Windows it self-pauses a fresh console via `GetConsoleProcessList` so its output stays legible). The print-hint floor prints the absolute-path command too.
- **Test seam caveat:** the real launch needs a UAC/polkit/sudo prompt (not hermetic) — manual-verify. The deterministic guards are the pure `decide_elevation_path` matrix (loop-safety: `AlreadyElevated` on every os; the os×env path order) and the argv builders (`sudo_argv` / `pkexec_argv` / `terminal_argv` assert absolute-exe + verbatim args + array; `windows_runas_params` asserts MSVC-quoting with no `cmd /c` interpolation; the crafted-arg test asserts a shell-metachar arg stays one element / one quoted token).
- **spt-core mapping:** `spt::elevation::{decide_elevation_path, sudo_argv, pkexec_argv, terminal_argv, windows_runas_params, print_hint_command, ElevatePath}` (pure), `spt::cli::{try_auto_elevate, launch_uac_window, pause_elevated_console_if_fresh, program_on_path, first_terminal_emulator}` (impure launchers) wired into `cmd_subnet_create` / `cmd_subnet_join` / `cmd_subnet_show_code`. Companions: 5.10 (the Unix abs-path-under-sudo facet) and 5.7 (the elevated child's de-elevation drop, which this composes with).
- **Source:** M12-W4 design (subnet QR + self-elevating window), doyle ruling `M12-W4-RULING.md` Q6 — a privilege-escalation feature carries a mandatory hazard REQ.

<!-- [doc->REQ-HAZARD-WIN-PTY-PROGRAM-RESOLVE] -->
### 5.12 Native-PTY spawn of a bare program runs the wrong (non-PE) file on Windows  `[REQ-HAZARD-WIN-PTY-PROGRAM-RESOLVE]`
- **Failure:** `portable-pty`'s ConPTY spawn resolves a bare program name with a `which` that takes the FIRST `PATH` match. A node/npm CLI installs as BOTH an extensionless shebang shim (`ccs`, for Git Bash) and a Windows launcher (`ccs.cmd`) in the same dir; portable-pty picks the extensionless `ccs`, and `CreateProcessW` then tries to execute that non-PE file and fails with **os error 193** ("%1 is not a valid Win32 application"). Live failure: `spt endpoint run claude-spt:ccs` → `CreateProcessW C:\nvm4w\nodejs\ccs` 193 (operator, 2026-06-16). The same bites any harness/shell whose `[session.self]`/`[shell].spawn` names a `.cmd`/`.bat`/`.ps1`-backed command — `CreateProcessW` cannot execute a batch or PowerShell script directly.
- **Invariant:** spt-term resolves the program ITSELF before handing it to `CommandBuilder`, bypassing portable-pty's `which`. A bare name is searched over `PATH` × `PATHEXT` (whose default order already prefers `.EXE`/`.COM` over `.BAT`/`.CMD`), then an extensionless fallback. A non-PE target is wrapped in its interpreter: `.cmd`/`.bat` → `cmd.exe /d /c <path>`, `.ps1` → `powershell -NoProfile -File <path>` (the wrap args precede the caller's args); a real executable spawns directly; an unresolvable name passes through unchanged (never makes a working case worse). Unix is a passthrough — `execve` honours a shebang on an extensionless script. Applied at the ONE `CommandBuilder` chokepoint (`PtySession::spawn_program_in`), so every broker harness + shell spawn is covered. *Caveat:* the `cmd.exe /d /c` wrap inherits cmd's argument-quoting rules for paths/args containing spaces or cmd metacharacters — adequate for the common install-path case; a fully robust cmd-quoting pass is a follow-on if it bites.
- **spt-core mapping:** `spt_term::winprog::{resolve_for_pty, resolve_in}` (the pure PATHEXT-precedence kernel + the Windows env wiring), wired into `spt_term::pty::PtySession::spawn_program_in`. Unit: `resolve_in` precedence (`.cmd`-over-shim, `.exe`-direct, explicit-extension, path-order, passthrough) [`winprog.rs`].
- **Source:** field diagnosis 2026-06-16 (operator dogfood, `claude-spt:ccs` bringup) — doyle.

---

## 6. Documented regressions (non-obvious invariants)

### 6.1 No flat/nested perch siblings; resolver-routed paths
- **Failure:** mixed flat + nested perch layouts confuse which perch is live; cascade-wipe risk.
- **Invariant:** one path resolver; never create divergent siblings.
- **spt-core mapping:** clean greenfield layout from day one (no migration window) — pick one structure, route everything through the registry. Storage layout deferred to design phase but this single-source-of-truth rule is binding.
- **Sister cite:** `src/common/perch_path.rs`; CHANGELOG Phase 25.4.

### 6.2 Soft-cleanup preserves state, removes `ready`
- **Failure:** hard-deleting a perch on cleanup loses spool (incl. stored signoff) needed for offline recovery.
- **Invariant:** soft-stop removes only the `ready`/online marker; preserves info + spool + dir. Hard-delete only on explicit operator action.
- **spt-core mapping:** instance offline-state recovery depends on this; carries to the daemon's stop path.
- **Sister cite:** `src/owl/stop.rs`.

### 6.3 Cascade-wipe guard: never delete a parent hosting non-empty children
- **Failure:** `doctor --fix` deletes a top-level perch that still hosts in-flight nested Worker/Psyche perches.
- **Invariant:** before hard-delete, check for non-empty nested children; if present, soft-clean only and surface the path.
- **spt-core mapping:** any destructive maintenance command must check for live child instances first.
- **Sister cite:** CHANGELOG v1.11.20 Phase 35.1.

### 6.4 Drop files are single-writer (supervisor-owned), read-only for the mind
- **Failure:** the Psyche LLM deletes a commune drop file the wrapper is concurrently reading → race, lost commune.
- **Invariant:** drop files (`<id>-commune.md`) are supervisor-owned single-writer; the LLM is read-only; only the supervisor (or explicit operator) deletes them.
- **spt-core mapping:** the daemon is the single writer; the harness-invoked mind never mutates drop files. Bake the ownership into the runtime contract.
- **Sister cite:** CHANGELOG v1.11.7; `src/live/context.rs:615` (removed delete).

### 6.5 Direct-write precedence guard against stale LLM overwrites
- **Failure:** the LLM emits an older snapshot that clobbers a fresher direct write to a context file.
- **Invariant:** every context write carries a source+timestamp precedence marker; LLM writes within a protection window after a recent direct write are suppressed (logged); direct writes always proceed.
- **spt-core mapping:** cross-node Psyche sync (ADR-0003) makes this multi-writer across machines — the precedence marker must include node identity **plus a per-node version vector** (entries from each node's monotonic `EpochSource`; wall-clock never orders). Distributed rule (ADR-0013, M4-D6): dominate→accept, dominated→drop, **concurrent→surface as durable replicated conflict artifacts + Psyche-reconcile on the active instance's node — never silent newest-wins, never lose either version**. The freshness rule (newest-and-newer-than-mine, per the cross-instance context-freshness feature) is the same guard read as vector dominance. Highest-value carryover for the sync design.
- **Sister cite:** CHANGELOG v1.11.6; `src/owl/echo_commune.rs`.

### 6.6 Surfaced context conflicts preserve both versions until dominated
- **Failure:** a cross-node concurrent context write (version vectors, neither dominates) gets auto-picked or partially dropped — half a mind silently lost.
- **Invariant:** a surfaced concurrent pair is durably preserved (both versions) until a strictly dominating write clears it; no merge/reconcile failure path may discard an unmerged version. Resolution is the Psyche reconcile turn (ADR-0013), whose merged write `join(vA,vB)+bump` dominates both parents — only that dominance clears the artifacts.
- **spt-core mapping:** `ContextStore::record_conflict` (tracked `.conflicts/` artifacts, content-hash named, idempotent, replicate like context) / `list_conflicts` / `clear_conflicts` (dominating-write-only). Local working file stays untouched while a conflict is pending.
- **Origin:** ADR-0013 design invariant (red-team #7's "wall-clock loses concurrent writes" closed M4-D6), not a sister bug — registered ahead of the wire path so D6c is born conformant.

### 6.7 Broker and brain MUST be separate processes (in-process collapse silently breaks no-endpoint-drop update) `[REQ-HAZARD-BROKER-PROCESS-ISOLATION]`
- **Failure:** the daemon hosts the broker as a background *thread* in the single `spt daemon` process (`daemon.rs:165-170`, `Arc<Broker>` + `thread::spawn(serve)`) instead of a separate process. A brain restart onto a swapped binary then cannot happen without killing the broker thread — closing every PTY, orphaning every harness child, dropping every socket. So `spt update apply` degrades to an in-process `Brain::handoff` no-op: the binary swaps on disk but the running daemon keeps executing the old code until an unrelated restart/logon. The no-endpoint-drop self-update pillar (REQ-UPD-3, ADR-0004) is silently unrealized. Observed live 2026-06-09: `enlyzeam` ran 0.3.0 with 0.3.2 on disk for ~a day, still reproducing the bug the update fixed.
- **Invariant:** the broker runs as its own long-lived process that survives every brain restart; the brain restarts onto the new binary and re-attaches via the versioned IPC. A routine (brain-only) update must leave every hosted endpoint untouched at the *process* level — not merely re-subscribe a brain within the same process. The evidence for REQ-UPD-3 / REQ-DAEMON-2 must prove process-level survival (a PTY child + a live QUIC conn survive a brain-process restart onto a swapped binary — SPIKE-01/03 productionized as `int`), NOT the in-process handoff shape that masks this regression.
- **spt-core mapping:** restoration is ADR-0018 (next milestone). The current `int` tags on REQ-DAEMON-2 / REQ-UPD-3 are regression-masked and re-point at restoration; the broker becomes the always-up per-machine anchor (seed-lock + liveness + brain supervisor). Two-process supervision, generation custody, durable-deadline loop timing, broker-cursor-of-record, and readiness-gated auto-rollback all hang off this.
- **Origin:** unintended spec/impl drift from ADR-0004 (the broker *process* was specced + spiked but built in-process), discovered during the v0.3.2 fleet update verify. Full audit + decisions: `docs/BROKER-BRAIN-SPLIT-RESTORATION.md` (verified) + ADR-0018.
<!-- [doc->REQ-HAZARD-BROKER-PROCESS-ISOLATION] -->
- **D1 (restoration skeleton, ADR-0018 Q2/Q3):** the process boundary is restored — `spt daemon run` is the broker process and spawns a supervised `spt daemon brain` child (`brainproc.rs`); the broker survives the brain dying and respawns it (proven in production topology by `crates/spt/tests/brain_split.rs`). The logic loops still run broker-side (D2 migrates them); the `int` process-level survival E2E + the in-process re-point land at D7.
- **Closed out (2026-06-11, v0.4.0–v0.4.2):** the two-process model shipped (v0.4.0); the D7 `int` E2E (`brain_survive.rs`) + the N-1 gate prove process-level survival onto swapped bytes and re-pointed REQ-DAEMON-2 / REQ-UPD-3. The v0.4.1 fleet-verify proved this Windows-seamless (hfenduleam: brain pid rolls, broker held, `exe_hash` flips, no manual bounce) but exposed a **Linux** respawn-path gap — the resident broker respawned the brain via per-spawn `current_exe()`, which on Linux follows the `apply` rename to `.old-N` and ran OLD bytes under an `applied` record (`[REQ-HAZARD-BRAIN-RESPAWN-PATH]`, 6.11), fixed in v0.4.2 (respawn from the canonical path captured at broker start + a promotion bytes-gate). Seamless update is now proven on **both OSes** — Windows live (hfenduleam ×2: 0.4.0→0.4.1→0.4.2, broker pid held) and Linux via the CI-gated in-place-rename E2E (`brain_respawn_rename.rs`, kitsubito runner). The fleet runs v0.4.2 on fixed brokers after the project's final two manual bounces. The process-isolation invariant holds; 6.11 is its Linux-respawn-path corollary.

### 6.8 No irreversible durable-state migration before update ready-promotion `[REQ-HAZARD-ROLLBACK-STATE-COMPAT]`
- **Failure:** the readiness-gated auto-rollback (ADR-0018 Q7) spawns the *previous* binary against durable state the *new* brain already wrote. The first release that migrates a durable-state schema in place would make the old binary unable to read it — silently bricking rollback exactly when it is needed (a logic-bricking update that can no longer fall back).
- **Invariant:** a brain must not irreversibly migrate durable state before it is ready-promoted; equivalently, every pre-ready write must remain readable by the N-1 brain. Schema migrations are gated behind ready-promotion (or written in an N-1-tolerant additive form).
- **spt-core mapping:** lands with ADR-0018's auto-rollback. Free to assert now (a 2026-06-09 source audit confirmed zero state-migration code exists); unmintable retroactively once a migration ships.
- **D5 conformance (2026-06-10):** the new durable timing state `<spt_home>/deadline-<key>.json` (restoration D5-1) is **additive** — a rolled-back pre-D5 binary does not know the file and simply ignores it (re-phasing on its own flat-sleep cadence, the pre-D5 behavior). No existing-file schema migration, no irreversible pre-ready write → the new file is rollback-N-1-safe by construction. The thing for a future D6 guard to gate is a *migration* of this file's shape, not its introduction.
- **D6 guard (2026-06-10, restoration D6-3):** the invariant is now **asserted**, not just noted. The pre-ready durable writes are **enumerated in one place** — `spt-daemon::PRE_READY_DURABLE_FILES` (`rollback_compat.rs`): `deadline-<key>.json` (D5, `DeadlineAnchor`), `applied-state.json` (D6-1, the two-phase `AppliedRecord`), and the generation-stamped `brain.ready` breadcrumb (D6-1b, `{pid, generation}`). A **tripwire unit test** pins each one's additive / N-1-readable contract (load-bearing field names present; an unknown extra field still deserializes), so a *non-additive* pre-ready change (renamed/removed field, or a `deny_unknown_fields`/non-tolerant shape) trips the test and forces the migration **behind ready-promotion** (or into an additive form). **Both new D6 durable files are additive → N-1-safe by construction:** the two-phase `applied-state.json` is a *new* file a rolled-back pre-D6 binary does not know — it ignores it and falls back to the legacy `applied.json`/`last-outcome.json` (which D6 keeps writing alongside for the convergence query); `brain.ready` only gained a `generation` field its sole reader tolerates. It is a tripwire, **not** a migration framework (activate-don't-pre-fail) — the day a real migration is needed, this guard is the wire it trips.
- **Origin:** verification amendment `[V1]` (agent `doyle`) on `docs/BROKER-BRAIN-SPLIT-RESTORATION.md`.
- **Closed out (2026-06-11):** the readiness-gated auto-rollback shipped (v0.4.0) and the pre-ready durable-file registry + tripwire guard (D6-3) hold. v0.4.2 added a promotion **bytes-gate** that turns a wrong-bytes respawn into an auto-rollback rather than a false `applied` record — strengthening the rollback path the same release exercised across the fleet. No in-place schema migration has shipped, so the invariant remains **asserted, not yet exercised by a real migration** (correct — activate-don't-pre-fail); the tripwire is the wire it trips the day one is needed.
<!-- [doc->REQ-HAZARD-ROLLBACK-STATE-COMPAT] -->

### 6.9 Resume-mode brain: a blocking spawn/command wait silently discards OTHER sessions' output
- **Failure:** a resume-mode brain (per-session `session_cursors` populated by `resume_sessions`) drives `Brain` over **blocking** `read_event` calls. Any command that loops `read_event` until its own reply — `spawn_session_pid` waiting for `Spawned`, `net_status`, `sessions`, etc. — calls `read_event` on *every* interleaved frame, so an OUTPUT frame for a **different** session is **cursor-processed** (its `session_cursors` entry snaps forward; the broker already counted it delivered via `delivered_through` on the live-send) and then **discarded** by the waiting loop's `_ => continue`. Cursor advanced + content dropped = that chunk is gone for the downstream consumer and the broker will **not** re-send it (resume reads from the delivered cursor, ADR-0018 D4). One session's spawn/command starves another session's output.
- **Invariant:** the daemon-hosted multi-session event loop must not consume a session's OUTPUT inside another session's blocking wait. The **live-agent adapter milestone must restructure the brain event pump** so command/`spawn` is non-blocking (a single demux loop owns `read_event` and routes every frame to its session's consumer), OR a blocking wait must re-queue/route the frames it reads for other sessions rather than dropping them.
- **spt-core mapping:** **unreachable today** — the supervised daemon brain hosts no PTY sessions and spawns none; a single-session seat (legacy, empty map) has no "other session" to starve. Surfaces the moment daemon-hosted sessions land (the live-agent adapter), which must rebuild the blocking `read_event` loop regardless (N interactive sessions cannot share one blocking reader). Recorded so that redesign inherits the constraint rather than rediscovering it. No machinery now (would be untested dead code, activate-don't-pre-fail).
- **Origin:** surfaced by the D4-2b resume-harness CI flake root-cause (agents `todlando` + `doyle`, 2026-06-10); sibling of `[REQ-HAZARD-BROKER-PROCESS-ISOLATION]` 6.7.

### 6.10 Phase-significant loop timing must be a durable absolute-deadline grid, not phase-relative sleep `[REQ-HAZARD-BROKER-PROCESS-ISOLATION]`
- **Failure:** a periodic loop that sleeps a flat `period` each iteration (`pulse_tick` then `sleep(pulse_period)`) is **phase-relative** — every brain restart silently re-phases the grid to the restart instant. Under the seamless-update model (the supervisor respawns the brain onto a swapped binary, ADR-0018 D3-3), a routine update would shift the cadence of every phase-significant loop, and continuity cannot ride a brain→brain frame (the outgoing brain is gone before the new one starts — the same constraint that moved session continuity to the broker in D4).
- **Invariant:** phase-significant periodic timing lives as durable absolute-deadline state on disk (`(anchor, interval)`), rehydrated on every brain start, with fires **derived functionally** (`next_fire = anchor + interval·⌈max(0,now−anchor)/interval⌉`) and **no per-fire write**. An **Update** restart re-reads the anchor and keeps deriving (phase preserved, lands mid-grid); a **Crash**/**Cold** restart re-bases the anchor to `now` (phase reset acceptable — the loop is idempotent catch-up). The update-vs-crash decision is the D3 spawn-time `StartReason`. **One-shot** (alarm) deadlines persist their absolute `target-time` at creation and **never reset** on any restart ("remind me at 3pm" is a commitment) — the asymmetry vs the periodic crash-reset is the rule. **[V4]** Only phase-significant loops convert; idempotent pump cadences (stagger-from-due-now) need none — converting them would re-add the per-loop writes Q4 minimizes.
- **spt-core mapping:** ADR-0018 Q4/V3/V4, restoration D5. Mechanism in `spt-daemon::deadline` (`DeadlineAnchor` periodic + `OneShotDeadline` rule-only pure helper); the pulse loop (`lifecycle::run_pulse_loop`) consumes it. The one-shot **machinery** (a durable in-daemon alarm scheduler) is the deferred alarm port (`docs/DEFERRED.md`) — the daemon has no one-shot consumer today, so building the timer now would ship untested dead code (activate-don't-pre-fail); D5 fixes the *rule* as a tested-unwired helper, the port builds the *scheduler*.
- **Origin:** ADR-0018 Q4 + verification amendments `[V3]`/`[V4]` (agent `doyle`); D5 plan vet (agents `todlando` + `doyle`, 2026-06-10).

### 6.11 Brain respawn must exec the APPLIED bytes, not the renamed old binary (Linux `current_exe` follows the apply-rename; readiness ≠ new-bytes) `[REQ-HAZARD-BRAIN-RESPAWN-PATH]`
- **Failure:** the broker respawns the brain candidate from `std::env::current_exe()` resolved **per spawn** (`brainproc.rs:817`). `spt update apply` swaps the binary by renaming the running file `spt` → `spt.old-N` and writing the new bytes at `spt`. On **Linux**, `current_exe()` = `readlink(/proc/self/exe)` is **inode-tracking** and follows the rename to `.old-N`, so the resident broker respawns the brain onto the **OLD** bytes — the brain comes up ready (readiness passes), the trial **promotes**, and the daemon records `applied:N` while still running the previous version. New code does not run; the record is optimistically wrong (the enlyzeam-class record/reality divergence, now provable via `exe_hash`). **Windows** dodged it — `GetModuleFileName` returns the path string captured at process start, so the swap lands new bytes — which is why v0.4.1 went green on hfenduleam and red on kitsubito. Observed live 2026-06-11 (kitsubito ran v0.4.0 bytes under an `applied:8` record after a brain-only `apply`).
- **Invariant:** the candidate-binary default is the canonical exe path **captured once at broker start** (before any `apply` can rename under the process), never a per-spawn `current_exe()` — giving Linux the path-at-start semantics Windows already had. AND promotion is **bytes-gated**: a trial promotes only if the candidate's stamped `brain.ready` `exe_hash` equals the staged artifact's hash for this platform; a mismatch is a failed trial → auto-rollback + loud notif (readiness alone is not proof the new bytes run). If either hash is absent the gate degrades to readiness-only (N-1-safe for pre-metadata releases / a missing breadcrumb) but emits `PROMOTE_BYTES_UNVERIFIED` so a disarmed gate stays field-diagnosable.
- **spt-core mapping:** v0.4.2 fix (`V042-PLAN.md`). Half 1 = `spawn_brain_supervisor` canonical-exe capture threaded into `spawn_brain_child`'s `None` default; Half 2 = the promotion bytes-gate in `supervise_brain`'s `Promoted` arm (`TrialEnv::ready_exe_hash` + `staged_artifact_hash`). The rollback path (`Some(.old-N)` selection) is unchanged. Sibling of 6.7 — the broker *process* is correct; the bytes it respawned the brain ONTO were not.
- **Origin:** ADR-0018 Q3 silently assumed `current_exe()` path-string semantics; surfaced by the v0.4.1 fleet-roll `exe_hash` bytes assert (agents `todlando` + `doyle` + `deployah`, 2026-06-11). ADR-0018 Q3 amended.
<!-- [doc->REQ-HAZARD-BRAIN-RESPAWN-PATH] -->

---

## 7. Boundary & delivery integrity (added 2026-05-31 — Stage A red-team)

These were absent from the sister-project harvest; codex surfaced them as load-bearing gaps for spt-core's new daemon/network surface.

### 7.1 Local `api` mutation auth  `[REQ-HAZARD-LOCAL-API-AUTH]`
- **Failure:** any local process calls `spt api bind|state|session-end|history-log|poll` and binds, ends, injects, or spoofs the state of an endpoint it does not own. Local untrusted processes are explicitly in scope (shells, third-party adapters).
- **Invariant:** every `api` *mutation* is authenticated to an endpoint/session — per-endpoint link token or OS-credential binding. An unrelated local process cannot bind, inject, end, or spoof state.
- **spt-core mapping:** the `api` subcommand surface (PRD R-API-*) + broker IPC.
- **Source:** codex Stage A #13 (`docs/reviews/STAGE-A-codex-redteam.md`).

### 7.2 Idempotent delivery across brain restart  `[REQ-HAZARD-RESTART-IDEMPOTENT]`
- **Failure:** broker queues, spool rows, PTY injection, remote streams, and file transfers cross the broker↔brain boundary; a brain restart duplicates or drops a side effect.
- **Invariant:** every side effect crossing the boundary carries a durable ID + replay rule; replay after restart is exactly-once / idempotent. Crash the brain before/after spool write, before/after PTY write, mid-transfer, mid-registry update — no dup, no drop.
- **spt-core mapping:** broker↔brain IPC (ADR-0004), self-update handoff.
- **Source:** codex Stage A #14.

### 7.3 Psyche outbound capture + sanitization  `[REQ-HAZARD-PSYCHE-OUTBOUND-PROXY]`
- **Failure:** the Psyche's sole outbound channel is its **stdout** (`<EVENT type="reply|notify">` intents — ADR-0012). Two ways to break it: (a) a **null-stdout / detached** live-Psyche driver silently **discards every reply and notify**; (b) the daemon relays a Psyche-supplied `from=`/target **unchanged**, letting a sandboxed Psyche **spoof identity** or address arbitrary endpoints.
- **Invariant:** the live-Psyche turn driver **MUST capture stdout** (a bounded, stdin-fed, stdout-captured invocation — **never** `Stdio::null()`); the daemon **MUST strip** every Psyche-supplied `from=`/target/routing attribute, **re-stamp `from=<self_id>`**, and **constrain routing** — `reply` → the inbound message's structural sender (its `from`) only, `notify` → the agent's own user/subnet only. Body validated per 4.1.
- **spt-core mapping:** the live-Psyche turn driver + daemon outbound relay (ADR-0012); `spt-proto::event` type taxonomy (`+reply`/`+notify`). The interim `runtime::spawn_session` `Stdio::null()` path is **not** the live-Psyche driver.
- **Source:** grill-with-docs 2026-06-03 (uncovered by prior design passes); sister `src/live/wrapper/claude.rs` (sandbox `["Read","Write","Edit"]` + `parse_markers`).

### 7.4 Per-agent pulse/psyche/echo scheduling must not serialize across agents  `[REQ-HAZARD-DAEMON-SCHED-NONBLOCKING]`
- **Failure:** echo-commune (`run_bounded_stdin`) and the live-Psyche turn driver (D7.5) are **bounded LLM calls that block their calling thread** until the child answers or the timeout fires. Today each agent drives its own pulse from its **own process** (`spt/src/api/{live,startup}.rs`), so blocking is isolated. When the daemon hosts **N per-agent loops** (ADR-0004 target; the `run_pulse_loop` fan-out is currently def+test only), a **single serial driver** that calls these invocations inline lets one agent's slow/hung LLM call **stall every other agent's heartbeat, commune, and reply** — the per-call timeout bounds the worst case *only* when the work is isolated; serial makes the stalls additive.
- **Invariant:** each agent's bounded LLM-bearing work (echo-commune summarizer, Psyche turn) runs on its **own thread / off the shared scheduler** — no single-threaded driver iterating all agents may call a blocking invocation inline. One agent's slow/timed-out call must not delay another agent's next tick beyond tolerance.
- **spt-core mapping:** the daemon's multi-agent pulse/psyche hosting (ADR-0004, "all Psyche/pulse loops" consolidated); `run_pulse_loop` fan-out; echo-commune + the D7.5 Psyche driver.
- **Source:** grill-with-docs 2026-06-03 (forward invariant — the multi-agent fan-out is not yet wired). Distinct from ADR-0002 SERIOUS #6 (crash blast-radius, not scheduling latency).

### 7.5 WAN-inbound origin is transport truth, never payload  `[REQ-HAZARD-WAN-ORIGIN-AUTH]`
<!-- [doc->REQ-HAZARD-WAN-ORIGIN-AUTH] -->
- **Failure:** the ADR-0009 access whitelist gates **unsolicited wire inbound by origin node**. If the gate's subject is read from record bytes (an `origin_node`/`from`/`node` field a sender wrote), any sender forges any origin and the whitelist is decoration — same spoof class as 7.3's Psyche-supplied `from=`, now on the cross-node surface.
- **Invariant:** the origin the gate (and detection/UX — "node X is driving") consumes is the **QUIC handshake-proven remote node id** (iroh `EndpointId` == Ed25519 node pubkey, the REQ-NET-1 identity binding) read from the **broker's conn/stream table** (`NetStreamInfo::remote_id_hex`), never from payload. Wire records carry **no origin field by design**; a forged one decodes as an ignored unknown field and influences nothing. `from` inside a record is reply-routing metadata only — never an authorization subject.
- **spt-core mapping:** `spt_net::net::wanmsg::WanMessage` (no origin field) + `spt_daemon::wan::receive_wan` (origin parameter sourced from the stream table) + every future wire-inbound consumer (D5b attach, D5c transfer, D8 notifs).
- **Source:** M4-D5a design (ADR-0009 consequence: the gate must bind to the transport identity).
- **Mesh note (ADR-0017, 2026-06-08):** preserved verbatim. The mesh never relays third-party rows, so a record's author is **always** the QUIC-handshake-proven connection origin. The new **membership proof** (seed-proof) is itself channel-bound to both handshake-proven pubkeys — the authorization subject stays transport truth, never payload.

### 7.6 Pump brain-IPC reads must be deadline-bounded (a blocked read wedges the whole pump)  `[REQ-HAZARD-PUMP-IPC-DEADLINE]`
<!-- [doc->REQ-HAZARD-PUMP-IPC-DEADLINE] -->
- **Failure:** the peer pump is a SINGLE thread driving every leg (registry/notif/sync/update) against every peer over ONE brain-IPC client. Its reply reads (`net_open_stream`, `net_stream_send`, `net_dial`, and the sync/update pull `read_event` loops) were `loop { read_event() }` with no deadline. When a peer's QUIC path black-holes, the broker's stream-open/send awaits the dead peer and never sends the reply, so the brain's `read_frame` blocks FOREVER and the pump freezes mid-round. The heartbeat (loop-top) stops and the stall warning fires honestly, but `supervise_pump` cannot rescue it: the supervisor catches a panic / error / clean return, and a BLOCKED thread never returns. Observed twice — 2026-06-07, then a **2.2h wedge on hfenduleam (2026-06-11)** where the broker stayed responsive to `daemon status` while the pump thread sat dead.
- **Invariant:** in PUMP mode the brain carrier is **SPLIT at construction** — a dedicated `pump-ipc-reader` thread does blocking `read_frame` on the `RecvHalf` and forwards each framed result down a channel; the main thread writes on the `SendHalf` and reads with `Receiver::recv_timeout`. Every IPC reply read is bounded by a per-call **total-wait** deadline (`PUMP_PEER_IO_TIMEOUT` = 30s, > any legitimate round-trip, < the 60s QUIC idle; re-armed on stream progress for the streaming pull legs so a healthy long sync is never killed — only a ≥30s silence). **Mechanism note (Windows):** the carrier is a reader-thread + channel, NOT a non-blocking socket + poll — interprocess 2.4.2 on Windows named pipes has no portable read timeout (`set_recv_timeout` → `no_timeouts()`) and its `set_nonblocking` uses deprecated `PIPE_NOWAIT`, which corrupts mid-stream (proven by the mesh E2E). Blocking reads work on both OSes; the channel supplies the deadline. A read that exceeds the deadline returns `io::ErrorKind::TimedOut`, treated as a **POISONED client** (a late reply could bind the wrong stream id in a later call — gotcha-#1-adjacent): it BUBBLES out of the round → `run_peer_pump` returns Err → `supervise_pump` restarts the pump with a fresh brain client + conn cache + reset WorkerLasts (the §V4 stagger re-primes every leg). NEVER a per-peer retry. An ordinary error (the broker REPLIED with one) stays a per-peer abort + conn drop + redial. **Leak watch (accepted, not fixed):** a restart abandons the old reader thread parked in `read_frame` (split halves share one OS handle → on a named pipe the broker may never signal the disconnect); bounded to ONE thread per actual wedge (the post-restart conn cache is fresh — the dead peer is re-dialed, not re-wedged), diagnosable by the reader's spawn/exit log lines, and fully cured by the broker-side B-half (§7.8, `REQ-HAZARD-BROKER-QUIC-DEADLINE`), shipped in v0.8.3 — with the broker bounding its QUIC await, the brain's read returns promptly (an ordinary error) so the reader thread is never left parked.
- **spt-core mapping:** `Brain::cold_start_pump` (splits the carrier + arms the deadline) / `BrainConn::Split` (the `SendHalf` + reader-thread channel) / `call_deadline` / `read_event_until` / `read_frame_until` (the `recv_timeout` dispatch), `pump::run_peer_pump` (connects in pump mode) + `pump::peer_outcome` (the tier-split), the `request_sync`/`request_update` pull loops (deadline re-armed on progress). The broker-side half — the broker must never make a brain wait unbounded on a QUIC op (bound the `net_dial`/`open_stream`/`send_stream` handlers) — shipped in v0.8.3 as §7.8 (`REQ-HAZARD-BROKER-QUIC-DEADLINE`).
- **Source:** field diagnosis 2026-06-11 (the 2.2h hfenduleam wedge); doyle ruling A-now / B-deferred. The stall warning (M8 decision 23) was the band-aid; this is the fix. The B-half landed v0.8.3 (§7.8) after the 2026-06-16 recurrence.

### 7.7 A slow/dead/hostile remote VIEWER must never stall the controller, child, or drain  `[REQ-HAZARD-VIEWER-ISOLATION]`
<!-- [doc->REQ-HAZARD-VIEWER-ISOLATION] -->
- **Failure:** the W2.5 controller/viewer model lets ANY number of read-only `--view` attachers ride one session's broker `OutputLog`. The single drain thread fans each output chunk to every attacher. If a viewer's socket is fanned out with a **blocking** write under the log lock (the controller's authoritative path), one wedged viewer (a slow terminal, a black-holed WAN peer, a hostile non-reader) stalls the drain — freezing the controller's stream and backing up the PTY child. A single watcher must never be able to degrade or hang the driver.
- **Invariant:** the drain writes the **controller** on the authoritative blocking bounded path (it alone advances `delivered_through`), but each **viewer** gets an **isolated bounded SPSC queue + a dedicated writer thread**; the drain `try_send`s under the log lock and **evicts** any viewer whose queue is `Full` (fell behind the live stream) or `Disconnected` (its writer died on a dead socket) — the drain thread **never touches a viewer socket**, so no viewer write can backpressure it. A **soft cap** (`MAX_VIEWERS`) bounds the writer-thread count (a viewer attach beyond it is refused). Viewer eviction never perturbs the controller stream, the `delivered_through` cursor, or the child. Ring **replay-at-attach** is owned by the writer thread (not the bounded live queue), so a viewer attaching to a busy session is not spuriously evicted.
- **spt-core mapping:** `OutputLog::append` (controller blocking + `viewer_send_evicts` `try_send` fan-out), `OutputLog::add_viewer` (bounded `sync_channel(VIEWER_CHANNEL_DEPTH)` + `viewer_writer` thread), `MAX_VIEWERS` soft cap, `ViewerSink`. Unit: `viewer_overflow_or_disconnect_evicts_never_blocks`. Int: `wedged_viewer_does_not_stall_controller` (a non-reading viewer is evicted while the controller keeps receiving past a 200KB burst).
- **Source:** M12 W2.5 controller/viewer model (doyle ruling 2026-06-14, Q1).

### 7.8 The broker must never make a brain wait UNBOUNDED on a QUIC op (the pump-IPC-deadline B-half)  `[REQ-HAZARD-BROKER-QUIC-DEADLINE]`
<!-- [doc->REQ-HAZARD-BROKER-QUIC-DEADLINE] -->
- **Failure:** the broker's brain-facing QUIC handlers (`dispatch_net_dial` / `dispatch_net_stream_open` / `dispatch_net_stream_send`) call into `NetHost::dial` / `open_stream` / `send_stream`, whose iroh awaits (`endpoint.connect` + `prove_membership`; `open_bi`; `write_all`/`finish`) had NO bound of their own. A dead/black-holed roster peer (its process gone, or a mixed-pair that accepts the conn but never answers the seed-proof) makes the broker await its QUIC path FOREVER, so the brain escapes only via its OWN 30s read-deadline (the 7.6 A-half) — a 30s stall + a full pump restart EACH round, and the supervised restart re-dials the SAME dead peer and re-wedges. Root cause of the 2.2h hfenduleam wedge (2026-06-11) and its recurrence (2026-06-16).
- **Invariant:** every brain-waiting QUIC op is wrapped in a broker-side deadline (`NetHost::bounded_block_on` → `tokio::time::timeout`, `BROKER_QUIC_OP_TIMEOUT_MS` = 10s). On elapse the future is DROPPED (cancelling the in-flight connect/stream op, so nothing is half-registered) and a non-`TimedOut` `io::Error` is returned, which the broker REPLIES as an ordinary error frame. The bound (10s) sits comfortably above any legitimate LAN/relay round-trip and 20s below the brain's 30s `PUMP_PEER_IO_TIMEOUT`, so the BROKER fires FIRST — the brain reconstructs `ErrorKind::Other` (its `net_dial`/`net_open_stream`/`net_stream_send` map a broker error reply to `io::Error::other`), NOT its own read-deadline `TimedOut`, so `pump::peer_outcome` takes the ordinary per-peer arm (drop conn + redial next tick), the round CONTINUES and the heartbeat keeps advancing. **Never the brain's read-deadline:** that `TimedOut` is the 7.6 poison → supervised-restart path this fix exists precisely to AVOID. **Exactly-once preserved:** a timed-out journaled op fails INSIDE its `apply_once` closure (the QUIC call is made there), so no phantom `conn_id`/`stream_id` is recorded (`effect()?` propagates before `applied.insert`) and a fresh tick re-dials cleanly — no dedupe into a dead conn. **Happy path unchanged:** a live peer completes with zero added latency; the bound only bites a non-responsive peer.
- **spt-core mapping:** `NetHost::bounded_block_on` (the timeout wrapper) wrapping `NetHost::dial` / `open_stream` (QUIC branch) / `send_stream`; `BROKER_QUIC_OP_TIMEOUT_MS` + `set_quic_op_timeout` (test override, off `NetConfig` — mirrors `set_roster_exchange`). Unit: `bounded_block_on_cuts_a_never_completing_op_with_an_ordinary_error` (a never-completing op → prompt non-`TimedOut` error; a ready op untouched). Int: `dial_to_a_black_holing_peer_fails_with_a_bounded_ordinary_error` (the broker REPLIES an ordinary error within the bound + exactly-once-on-timeout: journal un-applied, conn table empty, clean redial) and `pump_survives_a_black_holing_peer_heartbeat_advances_no_restart` (the production pump against a dead peer — heartbeat monotonic-advances, `run_peer_pump` exits Ok, no supervised restart). The `peer_outcome` ordinary-vs-`TimedOut` tier-split itself is unit-covered at 7.6.
- **Source:** the 7.6 B-half — deferred 2026-06-11 (doyle ruling A-now / B-deferred, DEFERRED.md) and shipped in v0.8.3 after the 2026-06-16 hfenduleam recurrence.

### 7.9 A daemon-state wire change needs a deliberate BROKER restart (the broker is resident across a brain self-update)  `[REQ-HAZARD-BROKER-SEED-WIRE-SKEW]`
<!-- [doc->REQ-HAZARD-BROKER-SEED-WIRE-SKEW] -->
- **Failure:** the broker serves the seed-control channel and is RESIDENT across a brain-only self-update (ADR-0004's no-terminate-during-update pillar forbids auto-killing it — 6.7). A self-update that changes a daemon-state WIRE FORMAT — e.g. the v0.9.0 adapter-agnostic `Seed` (the `adapter` field dropped) — therefore lands a NEW-version CLI talking to the STILL-RESIDENT OLD broker. The old broker cannot deserialize the new `Seed` (its formerly-required `adapter` is absent), so it drops the seed-control conn without acking; the CLI's `put_seed` ack-read hits EOF and surfaces a raw `UnexpectedEof` "failed to fill whole buffer" — a cryptic footgun that hides the real cause (perri PREP-4 FINDING 1: v0.9.0 CLI ↔ stale 0.8.x broker).
- **Invariant:** (a) spt-core surfaces an ACTIONABLE diagnostic on the seed-ack EOF — naming the stale-broker cause + the fix (`spt daemon stop`; the broker restarts on the next `spt api` call) — never the bare io error (scoped to `UnexpectedEof` on the seed-ack path, so it never mis-fires on an unrelated error like a refused connect). (b) A daemon-state wire change requires a DELIBERATE full broker restart; this is NOT automatic — ADR-0004 forbids auto-killing the resident broker, and a brain-only update keeps the old broker. (c) FORWARD discipline: daemon-state / `Seed` schema changes stay ADDITIVE + serde-default, so a resident OLD broker tolerates a NEW CLI across a brain-only update. (This would NOT have rescued 0.9.0 itself — the old broker's `adapter` was a REQUIRED field, so no additive tolerance could read the new bytes; for a wire change that removes/renames a required field, the operative rule is the broker restart.)
- **spt-core mapping:** `startup::seed_fail_message` (the `UnexpectedEof` → actionable-hint branch) fired from `cmd_seed`'s `put_seed` error arm; the broker's seed-control residency (`seedmap::serve_seed_control`, held across brain restarts — 6.7). Unit: `seed_fail_eof_gives_actionable_stale_broker_hint` (EOF → the `spt daemon stop` hint naming the stale broker; a non-EOF kind → the plain message, no mis-fire).
- **Source:** perri PREP-4 FINDING 1 (v0.9.0 dogfood) → v0.9.1.

### 7.10 A VIEW is independent from the endpoint — closing the launching tab must NOT reap the daemon-hosted harness  `[REQ-HAZARD-VIEWER-CLOSE-DETACH]`
<!-- [doc->REQ-HAZARD-VIEWER-CLOSE-DETACH] -->
- **Failure (Windows):** `spt endpoint run` autostarts the daemon (`ensure_running` → `detached_no_inherit`), which INHERITS the launching terminal's Windows Job Object. Windows Terminal / VS Code place the shell AND every descendant in a Job with `JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE`; closing the tab drops the job's last handle → the OS reaps the daemon + its broker-spawned ConPTY harness subtree. The `rc` pump detaching should end ONLY the viewport; the harness must keep running + stay re-attachable. (ConPTY isolation itself is already correct — portable-pty builds the pseudoconsole in the daemon; the leaking lifetime binding is the Job Object, not the console.)
- **Invariant:** both daemon spawn paths request `CREATE_BREAKAWAY_FROM_JOB` (0x0100_0000), best-effort with an in-job FALLBACK on `ERROR_ACCESS_DENIED`(5) / `ERROR_INVALID_PARAMETER`(87) so a breakaway-denying job NEVER regresses the spawn. **A real job CAN deny breakaway** (proven: the cargo/CI runner's own ancestor job ACCESS_DENIES it), so efficacy against the real terminal is UNKNOWN until measured. **int stage = OPERATOR MANUAL ACCEPTANCE, not CI** (doyle ruling 2026-06-18): the runner sits inside a breakaway-denying job and every test-created job nests inside it → a faithful "harness survives tab-close" test is a guaranteed FALSE-RED; the two units (escape-where-permitted self-skip + fallback-where-denied) are the CI evidence; `required_stages` stays `[doc,impl,unit]`. The daemon-OWNED harness Job is the **L4 daemon-stop reap** backstop ONLY — NOT tab-close survival (a job nested in the terminal's kill-on-close job dies with it). **Backstop candidate (design-only, build ONLY if the operator measures breakaway DENIED + daemon dies):** re-parent the cold-start daemon spawn OUT of the terminal job via a job-neutral creator — WMI `Win32_Process.Create` (owned by WmiPrvSE, outside the terminal job; synchronous, returns pid) preferred over a `schtasks` one-shot.
- **spt-core mapping:** `daemon.rs::detached_no_inherit` + `deelevate.rs::create_with_token` (the breakaway flag + fallback). Units: `detached_no_inherit_falls_back_under_a_breakaway_denying_job` (no-regression — the shared runner exercises the fallback directly), `breakaway_spawn_escapes_a_kill_on_close_job` (the OS escape mechanism — self-SKIPS where the ancestor job forbids breakaway). Unix: the daemon's `setsid` session-detach already keeps a closing terminal's SIGHUP off its children (guard test, no code).
- **Source:** v0.12.0 real-harness defect (operator) → v0.12.1 L1 @5ae68f8; doyle design+int ruling 2026-06-18.

### 7.11 A dead PTY child + a dropped operator pump must NOT wedge the broker for other clients  `[REQ-HAZARD-ATTACH-WEDGE]`
- **Failure (hypothesized, v0.12.0):** a legitimately dead PTY child (crash/kill) plus an `rc` pump dropped without a clean detach (closed tab) was thought to make the broker's loopback forward `write_all` block forever on a full 64 KB duplex, park a worker in the 2-worker net runtime, saturate both, and stall every new attach / `endpoint run` (with `daemon stop` unable to join).
- **Invariant — DISPOSITION = PROVE-DON'T-CHANGE** (doyle GATE-PASS @e883f45, 2026-06-18): the post-L0 code ALREADY prevents the wedge; NO fail-fast / worker-count code was added. (1) `serve_attach` forwards fire-and-forget (`net_stream_send` `op_id=None`) and the broker-side `send_stream` is already deadline-bounded (`bounded_block_on`, `BROKER_QUIC_OP_TIMEOUT_MS` = 10s — hazard 7.8), not forever. (2) the loopback duplex is drained broker-INTERNALLY by the operator row's OWN read pump (`nethost.rs` `RecvHalf::Loopback`), which for an ordinary attach stream (`retentive_cap == 0`) NEVER parks → `peer_w` never backs up on a dead `rc` (a dead `rc` is just a dropped IPC subscriber against a bounded, EVICTING ring). (3) `bounded_block_on` = `runtime.handle().block_on` → parks the BROKER DISPATCH thread, not a net worker, so worker-pool exhaustion cannot occur. A dead spt-hosted endpoint is OFFLINED within one reconcile tick on abrupt child death (the broker exit-waiter `wait()` reaps the session even on `taskkill /F` → `reconcile_hosted_liveness` clears the `status=online` latch).
- **spt-core mapping:** int `crates/spt/tests/attach_wedge_e2e.rs` (REAL detached daemon + dummy-harness fixture): serve the victim (rc sees its tick), abruptly kill rc (dropped pump) + kill the PTY child → a NEW endpoint still comes online + is served (no wedge), the dead endpoint is offlined within a tick (`LIVENESS_RECONCILE_OFFLINE`), `daemon stop` bounded. Leans on 7.8 (the QUIC-op deadline) + 7.7 (viewer isolation).
- **Source:** v0.12.0 real-harness finding (operator) → v0.12.1 L2 @e883f45; doyle GATE-PASS 2026-06-18.

### 7.12 Controller output must NOT be written inline on the drain thread — a backed-up controller wedges the session  `[REQ-HAZARD-INJECT-CONTROL-COEXIST]`
- **Failure (v0.12.x, operator dogfooding + doyle /diagnose 2026-06-19):** `OutputLog::append` fanned each live chunk to the CONTROLLER via a SYNCHRONOUS, blocking `write_frame` held INLINE on the session's single drain thread while `Mutex<OutputLog>` was locked (viewers already had a dedicated writer thread + bounded evicting channel). A backed-up controller socket — a slow operator, or the full 64 KB `rc` loopback duplex under heavy TUI redraw — parked the drain thread WITH THE LOG LOCK HELD: output + keystroke-echo stalled and every attach / resize / `KIND_SESSIONS` that needs the log lock blocked FOREVER, while the broker process stayed alive. Fires on NORMAL interactive `rc` use under heavy output (not only message injection); REOPENED the wedge facet of 7.11 (dead-child backpressure only).
- **Invariant:** controller delivery runs OFF the drain thread — a dedicated controller writer thread + bounded channel (the 7.7 viewer_writer pattern) does the blocking socket write; the drain hands each chunk off with a BOUNDED, OFF-LOCK send (deadline → detach + `clear_controller`, never park forever, never silently evict a LIVE operator's authoritative view). The controller is AUTHORITATIVE (unlike a viewer): its writer advances the `delivered_through` resume cursor (atomic, only-on-success, monotonic). `clear_controller` is the ONE unlatch path (clears `driven_by`, drops the sink) — W5's reconcile self-heal reuses it.
- **spt-core mapping:** impl `broker.rs` `OutputLog::append`/`controller_writer`/`become_controller`/`ControllerJob::deliver`/`clear_controller`; int `crates/spt-daemon/tests/inject_control_wedge.rs` (a fully-backed-up controller keeps `KIND_SESSIONS` answering = no wedge); unit the bounded-deliver + monotonic-cursor kernels. The input-side parks (write_input `write_all` on a full buffer; DSR-answer writer-mutex contention) are BENIGN on Windows ConPTY (absorbs a large inject), real only on Unix forkpty → bounded by W2's atomic-write substrate.
- **Source:** v0.12.x operator wedge → v0.13.0 W1 (doyle GATE-PASS 2026-06-19).

### 7.13 `spt rc` must SWAP Windows legacy-console Backspace/Ctrl+Backspace (`^H`↔DEL); do NOT enable VT console input  `[REQ-HAZARD-RC-INPUT-KEY-ENCODING]`
- **Failure (operator dogfooding):** `spt rc` is a raw verbatim stdin byte pump; the legacy Windows console delivers Backspace as `^H` (0x08), and Claude Code maps `^H` → backward-kill-word, so every Backspace deletes a whole word. Confirmed bytes (operator HITL capture, real Windows Terminal): Backspace=0x08, Ctrl+Backspace=0x7f.
- **REJECTED fix (dc07c39, reverted):** enabling `ENABLE_VIRTUAL_TERMINAL_INPUT` on the stdin console BACKFIRED — on Windows Terminal that flag yields **win32-input-mode**, not legacy xterm VT. Every key then arrives as a 6-field `ESC[Vk;Sc;Uc;Kd;Cs;Rc_` key-down/up record: (1) the ctrl-b detach broke (0x02 is wrapped, so `parse_stdin_chunk` never sees the raw `DETACH_PREFIX`), and (2) win32-input-mode was forwarded into Claude Code's own ConPTY input negotiation (garbage). The VT lever is wrong for Windows Terminal. (Lesson: verify a terminal-input fix on the REAL terminal via the operator re-capture — unit/clippy never exercise the live key path.)
- **Invariant:** stay a raw byte pump and NARROW-normalize in the forward path — **SWAP** the two delete bytes: `0x08` → `0x7f` (VT DEL, char-delete) AND `0x7f` → `0x08` (^H, kill-word), AFTER the detach state machine consumes any `0x02`, leaving `0x02` and every other byte untouched. Because CC reads `^H`=word-delete and DEL=char-delete, swapping the *input* bytes gives Backspace=char-delete and Ctrl+Backspace=word-delete — the native Win11 convention (the operator rejected losing word-delete; the earlier one-way `0x08`→`0x7f` map dropped it). cfg(windows) only (Unix is already VT; a real `ctrl+h`/`ctrl+?` there must pass through). Accepted minor loss: a real `ctrl+h` (0x08) and a real DEL (0x7f) are swapped on Windows — both are delete keys, so the practical effect is the Backspace/Ctrl+Backspace native mapping.
- **spt-core mapping:** the Backspace/Ctrl+Backspace SWAP intent persists; the `normalize_key_byte` byte-swap was SUPERSEDED on Windows by **7.16** (`REQ-RC-KEY-VT-TRANSLATE`, v0.13.0 bug 2) — the agnostic key-event→xterm-VT `translate_key_event` now emits Backspace → `0x7f` and Ctrl+Backspace → `0x08` natively (carrying the relocated `[impl]`/`[unit]` evidence), so `normalize_key_byte` + its unit were removed. The live keypress is HITL operator re-capture.
- **Source:** v0.13.0 W7 (operator dogfooding; doyle /diagnose + byte confirm + VT-backfire re-capture 2026-06-19; one-way→swap upgrade 2026-06-19; superseded-on-Windows by the bug-2 translator 2026-06-19).

### 7.14 The effect journal must NOT hold its lock across the PTY write (or fsync every keystroke) — interactive input stutters then wedges  `[REQ-HAZARD-EFFECT-JOURNAL-PTY-WEDGE]`
- **Failure (operator dogfooding + doyle /diagnose, MEASURED on the real Windows box, 2026-06-19, post-W1 escape):** `EffectJournal::apply_once` held its global `inner` mutex ACROSS `write_line(PENDING)` → `effect()` → `write_line(DONE)`, and `write_line` does `flush()+sync_all()` (a full fsync). Every operator keystroke is a `PtyWrite` effect (via `send_effect` → `dispatch_input` op_id branch), so each paid TWO fsyncs serialized under a GLOBAL lock with the blocking PTY write INSIDE the lock. Two facets, one root: **(A) stutter** — measured fsync on `%LOCALAPPDATA%\spt-core` median 6.5 ms / spikes 198 ms, ×2 per keystroke → choppy, worsens with volume; **(B) HARD WEDGE** — a blocking `PtyWrite` (ConPTY/forkpty input buffer not draining) held the lock indefinitely, so the single-threaded dispatch could open NO attaches → every `spt rc --view/--take` died with `brain IPC read deadline elapsed` (broker control-plane KIND queries still answered = a different thread). DISTINCT from 7.12 (the OUTPUT drain); this is the INPUT/effect-journal path 7.12 never touched. **Refutes** 7.12's "input parks are Windows-benign (ConPTY absorbs)" deferral — on the real box the input path wedges regardless, because the wedge is the LOCK HOLD, not the PTY write blocking per se.
- **Invariant:** `apply_once` RELEASES the inner lock across `effect()` — reserve the key (fsync `PENDING` for DURABLE kinds) under the lock, RELEASE, run the effect OFF the lock, then re-acquire to finalize (fsync `DONE` for durable, mark applied). Crash-idempotency comes from the per-key reservation + the applied-set, not from holding the lock across the effect (the brain-crash guarantee is unchanged: the broker survives the brain, so reserve→effect→finalize runs uninterrupted on the broker thread; a replay hits the applied-set and dedups). `EffectKind::PtyWrite` is **EPHEMERAL** — NO journal lines, NO fsync, in-memory dedup only (a keystroke lost to a broker crash is retyped; PTY state is never rebuilt from keystroke replay) — while durable kinds (spool/registry/net) keep their fsync'd markers. The PtyWrite itself must additionally be bounded/fail-fast and must not hold the writer mutex across a blocking write (the Unix-forkpty park bound; ConPTY absorbs = benign).
- **spt-core mapping:** impl `effect.rs` `apply_once` (reserve/release/finalize) + `EffectKind::is_durable`; unit (`effect.rs`) a barrier proving the lock is NOT held across `effect()` (a different key's `apply_once` completes while one effect blocks) + `PtyWrite` writes no journal line while a durable kind (`NetSend`) does (the no-fsync proxy); int `crates/spt-daemon/tests/inject_control_wedge.rs` (real broker+PTY, a stalled consumer + sustained blocking input effect: a concurrent `spt rc` attach stays serviceable AND actually receives PTY bytes — the assertion W1's gate lacked). Unix park-(b) folded here, gated on gravity-linux. doyle GATE-GUARD: assert structurally, NEVER on fsync wall-clock (env-dependent, flaky); the stutter is fixed as a consequence of the no-fsync path, proven by the structural unit.
- **Source:** v0.13.0 W1b (operator dogfooding post-W1 escape; doyle /diagnose + measurement 2026-06-19).

### 7.15 An OFFLINE spt-hosted endpoint must NOT render phantom `ONLINE+CONTROLLED` — clear `driven_by` when its session is gone  `[REQ-HAZARD-DRIVEN-BY-SELFHEAL]`
- **Failure:** `driven_by` (the `info.json` `ONLINE+CONTROLLED` latch) is single-written by the broker via `clear_controller`/`stamp_driven_by`, which only fire on a controller change. When an spt-hosted endpoint's broker session is GONE (harness dead — the B2 case, 7-series sibling `REQ-HAZARD-HOSTED-LIVENESS-RECONCILE`), no controller event ever fires, so a stale `driven_by=Some(node)` persists: the picker renders a phantom "controlled by X" on an endpoint that is actually OFFLINE. The B2 reconcile already clears the `status=online` latch here but left `driven_by` untouched.
- **Invariant:** `reconcile_hosted_liveness`, when it offlines a sessionless controllable perch (the B2 keystone — no live broker session ⇒ dead harness), ALSO clears `driven_by` (`set_driven_by(perch, None)`). RACE-FREE and single-writer-safe: with NO live broker session there is no controller to re-stamp `driven_by` concurrently, so the brain may write it here without contending the broker. (The LIVE-session leg — a controller gone while its session survives — is NOT this hazard: a clean disconnect already self-heals via `detach_if`→`clear_controller`, W1 (7.12) bounds the active-output wedge, and the residual idle-wedged-REMOTE case is the deferred `REQ-HAZARD-DRIVEN-BY-IDLE-REMOTE-EVICT`, which needs BROKER-SIDE liveness eviction — a brain reconcile can NOT detect it, since the broker still reports `controller_by==Some` on an idle wedged controller. Repro-proven: `inject_control_wedge.rs` w5_a2.)
- **Watch-out (repro-proven, real broker):** `SessionInfo.controller_by==None` is AMBIGUOUS — `dispatch_spawn` pre-attaches the spawner as the LOCAL controller with `by=None`, so a live LOCALLY-driven session also reads `None`. It is therefore NOT a usable standalone `driven_by` clear trigger (would false-clear a live local session). The shipped Gap-B self-heal needs no controller signal at all (it keys on session ABSENCE).
- **spt-core mapping:** impl `livehost.rs` `reconcile_hosted_liveness` Gap-B clear + the additive `SessionInfo.controller_by` observability field (`msg.rs`, populated in `broker.rs` `KIND_SESSIONS`); unit `livehost.rs` `pull_liveness…` extended (offlined sessionless perch clears `driven_by`; live/relay perches untouched); int `crates/spt-daemon/tests/driven_by_selfheal.rs` `gap_b` (real broker: reconcile offlines AND clears `driven_by`) + the A1/A2 characterization (`inject_control_wedge.rs` w5_a1/w5_a2).
- **Source:** v0.13.0 W5 (repro-first, todlando; doyle-assigned 2026-06-19). A2 (idle wedged-remote leg) deferred to `REQ-HAZARD-DRIVEN-BY-IDLE-REMOTE-EVICT`.

<!-- [doc->REQ-RC-KEY-VT-TRANSLATE] -->
### 7.16 `spt rc` translates Windows console KEY EVENTS to standard xterm VT (arrows/Home/End/F-keys reach the harness); supersedes the W7 byte-swap  `[REQ-RC-KEY-VT-TRANSLATE]`
- **Failure (operator dogfooding):** `spt rc` read raw stdin BYTES, but the Windows legacy console (no `ENABLE_VIRTUAL_TERMINAL_INPUT`) delivers arrows / Home / End / PgUp / PgDn / Insert / Delete / F-keys as console KEY_EVENTs, NOT stdin bytes — so the byte-pump saw nothing and those keys were DEAD (only byte-emitting keys like Backspace worked, via the 7.13 swap).
- **Invariant:** on Windows, read crossterm KEY EVENTS (the picker already does) and translate each to STANDARD xterm VT via the pure `translate_key_event` (copy a known-correct xterm table verbatim: arrows `ESC[A..D`, Home `ESC[H` / End `ESC[F`, `~` keys `ESC[<n>~`, modified `ESC[1;<m><final>` / `ESC[<n>;<m>~` with `m = 1 + Shift + 2·Alt + 4·Ctrl`, F1–F4 `ESC OP..S` / F5–F12 `ESC[<n>~`), forwarded through the SAME rc pump — the harness receives ordinary xterm VT (AGNOSTIC; NOT win32-input-mode, the 7.13 rejected lever). Press-only (drop Repeat/Release). Detach stays the `ctrl-b d` PREFIX, event-sourced (Ctrl+B arms; armed + plain `d` ⇒ Detach; armed + Ctrl+B ⇒ literal `0x02`; armed + other ⇒ `0x02` + translated). NON-tty stdin (piped / tests) falls back to the byte path (keeps the e2e byte-injection working). UNIX UNCHANGED (cfg-split; its raw-mode stream already delivers VT). SUPERSEDES 7.13's `normalize_key_byte` swap on Windows — Backspace → `0x7f` and Ctrl+Backspace → `0x08` are emitted NATIVELY by the translator.
- **spt-core mapping:** impl `rc.rs` `translate_key_event` + `csi_final`/`csi_tilde`/`f_key` + `key_event_step` (event detach SM) + `spawn_stdin_reader` cfg-split (`spawn_stdin_reader_events` on a tty / `spawn_stdin_reader_bytes` non-tty + Unix); unit the EXHAUSTIVE `translate_key_event` mapping + the event-detach SM (the Backspace/Ctrl+Backspace arm carries the relocated `REQ-HAZARD-RC-INPUT-KEY-ENCODING` evidence); NO int (live console = HITL, operator re-capture — REQ-RUN-PICKER/RC-1 precedent).
- **Source:** v0.13.0 bug 2 (operator ruling: proper agnostic translator, ship-blocker; doyle design + Option-B detach ruling 2026-06-19).

<!-- [doc->REQ-RC-WIN-PASTE] -->
### 7.18 `spt rc` paste is client-originated on Windows — read the LOCAL clipboard, inject a BRACKETED paste  `[REQ-RC-WIN-PASTE]`
- **Failure (operator dogfooding):** in an `spt rc` session neither ctrl+V nor right-click pasted (CC explicitly supports ctrl+V). `RawGuard` did only `enable_raw_mode` (no bracketed paste, no mouse capture, no clipboard interception); the Windows console delivers a paste as synthetic per-char KEY EVENTs (no crossterm `Event::Paste`), and ctrl+V translated to a bare `^V` forwarded to CC — but **CC runs daemon-side with NO access to the operator's LOCAL clipboard**, so remote paste is fundamentally CLIENT-ORIGINATED. A multi-line paste-as-keys also became a `\r` submit-storm.
- **Invariant:** on Windows (cfg-split; folds into the 7.16 event path), on a RIGHT-CLICK rc reads the LOCAL clipboard itself and forwards a BRACKETED paste — `wrap_bracketed_paste` = `ESC[200~` + content + `ESC[201~`. CC has bracketed-paste mode on (its TUI sets `ESC[?2004h`), so it treats the synthesized markers as a PASTE: content lands intact, NO submit-storm, harness-AGNOSTIC (standard xterm contract). `RawGuard` also `EnableMouseCapture` (disables console QuickEdit + enables `ENABLE_MOUSE_INPUT` so a right-click surfaces as `Event::Mouse`) on an interactive console only, restored on drop → right-button-down → `read_clipboard` → bracketed paste. `read_clipboard` = the `clipboard-win` crate; empty/failed read ⇒ a clean no-op (never inject garbage, never panic). Content forwarded VERBATIM (literal pasted text, no per-char translation). UNIX UNCHANGED (its terminal pastes natively through the byte pump). **ctrl+V is NOT intercepted** (P1b amendment, HITL re-open): Windows Terminal CONSUMES ctrl+V as its own paste accelerator — it delivers only a Key `kind=RELEASE` (never a Press, so a Press-guarded arm could never fire) AND injects the clipboard as a char-by-char KEY FLOOD spt-core cannot intercept. ctrl+V now rides WT's native paste as keystrokes (multi-line may submit-storm — acceptable; bracketed fidelity = right-click). The flood landing as keystrokes no longer wedges the broker (7.19).
- **spt-core mapping:** impl `rc.rs` `wrap_bracketed_paste` + `mouse_is_paste` + `clipboard_paste` + `read_clipboard` + `windows_mouse_wanted` + `RawGuard` (mouse capture/restore) + the `spawn_stdin_reader_events` right-mouse arm; unit the exact bracketed framing + content-verbatim + mouse classify (right-down ⇒ paste, all else ⇒ drop) + the injected-reader paste decision (non-empty ⇒ wrapped, empty/fail ⇒ no-op); NO int (live clipboard + console mouse = HITL, REQ-RUN-PICKER/RC-1 precedent).
- **Source:** v0.13.0 P1 (operator HITL; doyle design 2026-06-19). Depends on P0 (7.17). AMENDED v0.13.0 P1b: scope narrowed to right-click-only (the dead ctrl+V interception removed); scroll-forward is 7.20; the input-flood non-wedge is 7.19.

<!-- [doc->REQ-HAZARD-INPUT-ACK-BACKPRESSURE] -->
### 7.19 An operator input FLOOD must not deadlock the broker via the applied-ack on the same conn  `[REQ-HAZARD-INPUT-ACK-BACKPRESSURE]`
- **Failure (operator HITL, the ctrl+V re-open):** a flood of operator input on one brain↔broker conn wedged the WHOLE broker PERMANENTLY (no new/existing attach; the controller stayed latched — the per-conn handler couldn't process the detach). `serve_attach` processes a whole `NetStreamData` batch of N `Input` records in its inner loop, calling `send_effect` N times WITHOUT returning to `read_event()`; the broker answers each with `send_frame(applied_envelope)` on the SAME conn. Brain not reading → the broker→brain return direction fills (~10 frames = the IPC pipe buffer) → `send_frame` BLOCKS → the handler stops reading → the brain's writes block → mutual full-duplex DEADLOCK. (Capture: 11 frames, `write_input` 11/11 — P0 holds; `ack send` START=11/END=10 — frame #11's ack never returns.) WT's ctrl+V paste-accelerator key-flood was the trigger; the deadlock is generic to ANY input flood.
- **Invariant:** the applied-ack is OPT-IN. `InputReq` carries `ack: bool` (serde `default = true`, N-1-safe). The fire-and-forward operator/rc path (`serve_attach`) sends `ack=false` via `Brain::send_effect_no_ack`; `dispatch_input` writes NO applied frame when `ack=false`, so the per-conn handler never writes back while servicing the flood → it always drains → no deadlock (cures ANY input flood). `shellchan` (one-at-a-time spool delivery, WAITS on `BrokerEvent::Applied`) keeps `send_effect` (`ack=true`). EXACTLY-ONCE preserved: the broker dedups by `(session, op_id)` at the applied-set regardless of the ack. **N-1 caveat:** an OLD resident broker (the self-update window) ignores `ack=false` → still acks → the deadlock persists until a broker restart (inherent broker-resident-wire-change class, see 7.9).
- **spt-core mapping:** impl `msg.rs` `InputReq.ack` (`default_true`) + `brain.rs` `send_effect_no_ack`/`send_effect_inner` + `attach.rs` `serve_attach` operator path → `send_effect_no_ack` + `broker.rs` `dispatch_input` gates `send_frame(applied)` on `req.ack`; unit the serde default + the ack-emitted-iff-true + the no-ack-path exactly-once dedup; int (keystone, repro-first) a flood of N>pipe-buffer input frames through `serve_attach` on one conn — PRE-FIX deadlocks, POST-FIX drains all N + the session stays live + a concurrent attach opens (real broker+brain, no mocks).
- **Source:** v0.13.0 P1b (operator HITL re-open; doyle /diagnose on the enhanced rc+broker capture 2026-06-19). The real ship-blocker behind the ctrl+V wedge.

<!-- [doc->REQ-RC-MOUSE-FORWARD] -->
### 7.20 `spt rc` must forward the scroll wheel to the harness (our mouse capture steals WT's native scroll)  `[REQ-RC-MOUSE-FORWARD]`
- **Failure (operator HITL):** scroll stopped working in `spt rc`. P1's `EnableMouseCapture` (for right-click paste, 7.18) makes Windows Terminal forward ALL mouse — including the wheel — to rc instead of scrolling its own buffer, but the rc mouse handler dropped everything except right-button-down → scroll DIED (and WT's native scrollback is stolen under the capture).
- **Invariant:** on Windows, TRACK the harness's mouse-reporting mode from its OUTPUT — scan for DECSET `ESC[?1000h/1002h/1003h` (mouse on) + `ESC[?1006h` (SGR ext) and their `…l` (off) into a shared `MouseMode{enabled,sgr}` (the pump writes from the output render path, the stdin reader reads); the scan survives a sequence SPLIT across output chunks (a bounded carry buffer). The mouse handler: right-button-DOWN → bracketed clipboard paste (7.18, unchanged); `ScrollUp/Down` → an xterm SGR mouse report (`ESC[<64;col+1;row+1M` up / `ESC[<65;…M` down; 0-based crossterm → 1-based xterm), forwarded ONLY when `enabled && sgr` (else DROP — a legacy report the harness may misread is garbage); Moved/drag/left/middle DROP (scroll is the need; click-forward risks garbage, no click-to-position). UNIX UNCHANGED (no capture; the terminal scrolls natively).
- **spt-core mapping:** impl `rc.rs` `MouseMode` + `MouseModeScanner`/`parse_decset_private`/`apply_mouse_mode` (carry-buffer scan, fed from the pump's output path) + `scroll_dir` + `scroll_sgr` + the `spawn_stdin_reader_events` scroll arm (threaded via `Arc<MouseMode>`); unit `scroll_dir` classify + `scroll_sgr` exact bytes + the DECSET scan (set/reset, combined, mixed, and split-across-chunks); NO int (live console mouse = HITL).
- **Source:** v0.13.0 P1b (operator HITL; doyle design 2026-06-19). Bundled with 7.19.

<!-- [doc->REQ-HAZARD-CONTROLLER-WRITER-REORDER] -->
### 7.21 Exactly ONE `controller_writer` per brain↔broker connection — a superseded writer must write nothing further  `[REQ-HAZARD-CONTROLLER-WRITER-REORDER]`
- **Failure (doyle instrumented RACEDIAG repro, kitsubito):** on a brain-restart re-serve the handoff brain registered as controller on the SAME session TWICE over the SAME socket — `Brain::handoff` eagerly `subscribe(prior.next_seq=1)` → `become_controller(from_seq=1)`, spawning writer-A (writes seq 1); then `serve_attach` re-handled the replayed `Request{from_seq:0}` → `attach_as(sid,0)` → `become_controller(from_seq=0)`, spawning writer-B (writes 0,1). `become_controller` dropped the prior `ControllerSink` (its `tx`) but did NOT stop the prior writer — writer-A kept flushing its OWNED `initial` batch, and both writers held clones of one `SharedSend` (`Arc<Mutex<socket>>`) with no inter-thread ordering. When writer-A's seq 1 beat writer-B's seq 0, the strict consumer saw `output gap: got seq 1 want 0` → `attach_survives_target_brain_restart_exactly_once` panicked at `.expect("re-serve")` OR HUNG in `render_until` (serve thread died on the gap → `MARKER_TWO` never reached the wire). `prior.next_seq` is life1's CONSUMPTION cursor (life1 forwards each frame to the operator immediately on consume, so at crash it has forwarded exactly `[0, K)`); life2 resumes from that same `K`, so the boundary aligns and `[0, K)` need never be re-sent. The crash was NOT byte loss — it was the consumer running the strict **reject-gap legacy path** (handoff left `session_cursors` empty) which treats any out-of-order seq as fatal. PRE-EXISTING, surfaced by the v0.13.0 green-both-runners gate; P1b is innocent. Sibling flaky cluster: `inject_control_wedge::g2`, `broker::spawn_env_reaches_child`.
- **Invariant:** on a single brain↔broker connection exactly ONE `controller_writer` is ever the LIVE writer; a SUPERSEDED writer writes no further frames after the epoch bump it observes; and **every `controller_writer` emits a strictly ASCENDING seq stream** (sorted initial batch + ascending live frames). The CORRECTNESS guarantee that falls out: a snap-above consumer over any interleaving of ascending writers — where the surviving writer (`serve_attach`'s `attach_as(sid,0)`) offers the COMPLETE range `[0, end]` — delivers `[K, end]` with **no skip and no dup** (the first sighting of any seq `>M` is always preceded by a sighting of `M` on that same ascending writer, so `M` is delivered before the cursor can pass it). Enforced/relied-on three ways (NB: fix #1 — "drop handoff's eager subscribe" — was REVERTED: that subscribe is the standalone-resume mechanism the brain-only update engine + `handoff`/`idempotent`/`daemon_e2e` replay through with no `serve_attach`): (1) CORRECTNESS — `Brain::handoff` seeds `session_cursors` at `prior.next_seq` so the consumer runs dedup-below + snap-above (resume mode), never the reject-gap legacy trap; the ascending-merge property above makes this complete, not merely tolerant; (2) INVARIANT — `controller_writer`'s INITIAL-BATCH replay is EPOCH-GATED: `controller_epoch` is a shared `Arc<AtomicU64>`, the writer re-reads it UNDER `send.lock()` (atomically with `write_frame`) and returns the instant it is superseded, so a superseded writer can never flush its stale replay past the bump (W1-safe: never blocks the drain under `Mutex<OutputLog>`). The LIVE loop is deliberately NOT gated — new output only ever flows to the CURRENT controller's channel, so a superseded writer's channel holds only its pre-supersede backlog (deduped by snap-above) plus its TERMINAL `Displaced` kick, which the displaced controller MUST still receive; that loop ends naturally on `tx`-drop (gating it suppressed the loud-take `Displaced` — the cv-matrix hang); (3) EXPLICIT-RESUME / OPERATOR-STREAM BOUNDARY (the load-bearing fix — kitsubito RACEDIAG ~33% repro that the keystones missed) — `Brain::subscribe_with` (shared by `attach` AND `attach_as`) RESETS that session's dedup cursor to `from_seq` in resume mode. WHY it's load-bearing, not just the ground-truth re-read: the handoff's eager `subscribe(K)` makes `serve_attach`'s `brain` receive the replay frame at seq=K BEFORE the operator's `Request` is processed (`attached` still false); that early frame is dropped by the `if attached` forward gate but the snap-above cursor has already advanced past K, and `attach_as(sid,0)`'s re-subscribe used to leave the cursor advanced — so the broker's re-send of seq K arrived below it and was deduped → seq K never reached the operator viewport → a `no forward gap` panic at the operator render cursor (`render_until`), and SILENT content loss in the real `rc` consumer (dedup-below + snap-above). Resetting to `from_seq` on the `attach_as(0)` re-subscribe makes the broker's full re-send re-deliver from 0 (the operator dedups the overlap), so seq K is forwarded. The epoch gate (2) is sound (RACEDIAG: zero socket interleaving above K); the residual was purely this consumer-side boundary. Cold-start brains (empty map — e.g. the production dispatch serve brain) keep the legacy `next_seq` path untouched, so production is unaffected.
- **spt-core mapping:** impl `brain.rs` `Brain::handoff` (KEEP the eager `subscribe`; seed `session_cursors`) + `Brain::subscribe_with` (resume-mode dedup-cursor reset to `from_seq`, shared by `attach`/`attach_as` — the operator-stream boundary fix) + `broker.rs` `OutputLog.controller_epoch: Arc<AtomicU64>` / `become_controller` (atomic `fetch_add`, passes the new epoch + `Arc::clone` into the writer) / `controller_writer` (epoch gate read UNDER the lock on both loops) / `mark_controller_gone` + the `ControllerJob` epoch read; unit (white-box, `src/broker.rs`) the epoch-gated writer (a superseded writer flushes nothing — only the latest writer's monotonic stream reaches the wire) + `handoff` seeds `session_cursors`/resubscribes; int (keystone, `tests/broker.rs` + `tests/attach.rs`) deterministically force two `become_controller`-on-one-connection on a real broker+brain (no mocks) — PRE-FIX reorders/gaps, POST-FIX monotonic + byte-exact + session live, PLUS `attach_survives_target_brain_restart_exactly_once` green (doyle: 20× isolated single-threaded timeout-wrapped on Linux/kitsubito — the deterministic RED-on-revert carrier).
- **Source:** v0.13.0 P1c (operator-ruled root-fix before ship; doyle root-cause via instrumented repro 2026-06-20; design corrected across two gate rounds — fix #1 reverted, then the kitsubito RACEDIAG pinned the residual to the consumer-side operator-stream boundary, fixed by the `subscribe_with` cursor reset on `attach_as` re-subscribe). The last v0.13.0 ship-blocker.

### 7.22 An idle delivery with no working translation binary must SPOOL, never raw-inject a pseudo-delivery reported as delivered  `[REQ-HAZARD-IDLE-SILENT-NONDELIVERY]`
- **Failure (F-019 post-mortem):** the v0.11.0 path raw-injected `payload+\r` into the spt-hosted PTY whenever no working translation binary handled an inbound message — none declared, spawn-failed, FAULTED (commit-deadline miss / binary death), or its inject-worker channel dropped — AND acked `delivered=true`. But a bare `payload+\r` does NOT submit on a modern TUI (Claude Code): the message was TYPED into the harness but never sent, a silent pseudo-delivery reported as success. That silent degrade-to-raw-inject is precisely what MASKED F-019 (`[REQ-INSTALL-11]`, above) through a multi-hour black-box hunt — every behavioral hypothesis was moot because the "delivery" never delivered.
- **Invariant:** spt-hosted idle delivery is translation-binary-ONLY (ADR-0022 amendment). `dispatch_endpoint_input` with no working binary replies `endpoint_injected_envelope(ep, delivered=false)` and writes NOTHING to the PTY; the caller (`try_broker_inject` → `cmd_send`) reads `delivered=false` and falls through to `deliver::send` = SPOOL (poll-fed, never lost), reporting an honest `QUEUED`, never a confident-but-false `SENT`. The failure is LOUD (`ENDPOINT_INJECT:<ep>: no working translation binary … -> SPOOLED, not injected`). The raw-inject fallback (`input.enqueue`) is removed from the no-binary, worker-dropped, AND post-fault paths; the operator-keystroke floor-flush on FAULT is UNCHANGED (operator input is never stranded). Out of scope (follow-up): broker-side auto-redrive of already-spooled inbound when a live-update binary spawns (ordering/exactly-once hazards; the poll substrate + subsequent sends cover re-delivery).
- **Boundary — what this does NOT cover (the STATE-vs-transient precision):** the guarantee is the steady FAILED STATE — once a binary IS faulted/absent/worker-gone, subsequent deliveries spool (`faulted` is MONOTONIC: set once, never respawns, so the state is reached deterministically). It does NOT cover the FAULT-TRANSIENT: a delivery that lands in the worker's commit window — BEFORE `event_rx` is dropped / `faulted` is set — can be optimistically enqueue-acked (`delivered=true` the instant `event_tx.send` succeeds) then DROPPED when the worker faults+returns. This is a SEPARATE, PRE-EXISTING hazard (raw-inject removal did not touch it — the old code dropped that queued event too; v0.14.3 makes nothing worse) and is tracked for **v0.15.0 under `[REQ-MSG-DELIVERY-AXES]`** (the spool-centric delivery redesign: ack-on-SPOOL replaces ack-on-enqueue, which closes the optimistic-ack drop naturally). The g2 int gate asserts the steady state via **bounded-retry-until-spool** (faulted-monotonic → converges) rather than a single-shot ack, so it is load-robust under the parallel CI suite.
- **spt-core mapping:** impl `broker.rs` `dispatch_endpoint_input` (no-working-binary → `delivered=false` + loud log, no `input.enqueue`; the `input` writer is no longer resolved from the sessions table) + `build_translation`/`fault_translation` (`None`/fault now MEAN spool); unit `msg.rs` `endpoint_injected_envelope_carries_delivered_both_ways` (the `delivered=false` spool signal round-trips); int (real broker+PTY) `tests/broker.rs` `endpoint_keyed_inject_without_binary_spools_not_pty` + `tests/inject_control_wedge.rs` `large_endpoint_inject_to_a_no_binary_session_spools_promptly_without_wedging` + `g2_no_commit_deadline_faults_binary_and_does_not_wedge_controller_input` (post-FAULT inbound EVENTUALLY spools via bounded-retry-until-spool, marker NEVER on PTY across attempts — load-robust; the in-window fault-transient is the carve-out above).
- **Source:** v0.14.3 (ADR-0022 amendment @1eaeef1, operator-ruled; doyle-scoped, the F-019 follow-up). CHANGE-5 (`cli.rs` honest report) verified a NO-OP — the spool fall-through already reported `QUEUED`, never `SENT`.
<!-- [doc->REQ-HAZARD-IDLE-SILENT-NONDELIVERY] -->

---

### F-019 diagnosis lesson — confirm an adapter binary actually SPAWNED before behavioral diagnosis  `[REQ-INSTALL-11]`
- **Failure:** a bare/relative adapter-shipped program path — `[message-idle-translation-binary].path = "cc-spt-idle-translate"` — passed VERBATIM to the broker spawns via `Command::new` against the daemon's cwd/PATH, FAILS (not on PATH), and the spt-hosted session FAILS CLOSED to raw inject. The binary's `{text}{delay}{key:enter}{commit}` choreography never runs → idle messages are typed into the harness but **never submitted**. `build_translation` DOES log `TRANSLATION_SPAWN_FAILED:<path>:<err>` (broker.rs) — but it lands on the **DETACHED daemon's stderr, which nobody reads**, so it is "silent" in practice. This cost a multi-hour black-box hunt (byte encoding, win32-input-mode `?9001h`, bracketed-paste, focus, pacing — all moot; the binary simply never ran).
- **Invariant:** every adapter-manifest program path resolves against the adapter install dir BEFORE PATH (REQ-INSTALL-11, `resolve_program_in_dir`) — at the harness session spawn (`harnesshost::launch_harness_brokered_in`: the idle-translation binary AND the session program), the W3d live-update RESPAWN (`broker::read_translation_path`), and the notif command (`with_install_dir`, both the daemon `notif.rs` and the api `reporting.rs` render paths). **Diagnosis rule:** before any deep behavioral analysis of a translation/adapter binary, CONFIRM it actually spawned — a resident process exists, and a deployed path edit changes behavior. `TRANSLATION_SPAWN_FAILED` on daemon stderr is the signal today, but its visibility is poor (surfacing it on the perch / a louder channel is a candidate follow-up).
- **Source:** F-019 (v0.14.2); root confirmed on real claude-spt (perri + doyle). The raw-inject FALLBACK itself (degenerate no-binary floor) is unchanged by F-019 — its removal (message stays SPOOLED + poll-fed, never raw PTY inject) rides a separate ADR-0022 amendment.
<!-- [doc->REQ-INSTALL-11] -->

---

## Conformance checklist (condensed)

| # | Invariant | spt-core surface |
|---|---|---|
| 1.1 | Grace wait precedes INIT_SIGNOFF | daemon teardown |
| 1.4/4.4 | Deferred rows excluded from event-stream drain | daemon spool drain |
| 2.1/5.1 | Stable PID/broker-handle over ephemeral PID | liveness detection |
| 2.3 | Handoff argv/IPC version-tolerant (newer brain ↔ older broker) | broker↔brain IPC, self-update |
| 2.4 | gen_start = now() on cold-start + handoff | per-instance generation |
| 3.1 | Ephemeral perch cleanup on all exit paths | `ring` (RAII guard) |
| 3.3 | Echo-commune before INIT_SIGNOFF | daemon psyche loop |
| 4.1 | Envelope decode order, `&amp;` last | spt-proto (public, wire-versioned) |
| 4.2 | Parser panic-free + tolerant | spt-proto |
| 4.3 | Stale registry entries → fallback, never hard-fail | subnet registry resolution |
| 4.10 | Silent-node registry rows evicted (heard-map TTL); own rows never decay | registry pump eviction |
| 4.6 | Addressable-id charset reserves `:`/`@` delimiters | `spt_proto::id` at creation seams |
| 5.2 | tmp-write + atomic-rename + retry (EBUSY) | all state writes, binary swap |
| 5.3 | Timeout every harness subprocess | manifest invocations |
| 5.4 | Strip UNC prefix on serialized paths | spt-proto path normalization |
| 5.5 | ConPTY reader answers DSR (`ESC[6n`) | spt-term broker PTY reader |
| 7.7 | A wedged viewer is evicted, never stalls the controller/child/drain | broker `OutputLog` fan-out (controller/viewer) |
| 5.6 | Detached long-lived children spawn `bInheritHandles=FALSE` | daemon + shell spawn |
| 5.7 | Daemon always unelevated in the invoker's universe (de-elevated spawn + entry guard) | `deelevate` seam, `spawn_detached`, `Daemon::run` |
| 5.8 | `CREATE_NO_WINDOW` on every console child of the console-less daemon | `gitrun::run_git`, `kill_shell_pid`, `run_bounded_command`, `shellwake` |
| 5.12 | Native-PTY spawn PATHEXT-resolves a bare program + wraps non-PE targets (.cmd/.bat→`cmd /c`, .ps1→powershell), bypassing portable-pty's shim-first `which` (CreateProcessW os error 193) | `spt_term::winprog` → `PtySession::spawn_program_in` |
| 6.1 | Single path/registry source of truth | storage layout |
| 6.4 | Drop files supervisor-owned single-writer | runtime contract |
| 6.5 | Direct-write precedence marker (+ node id) | cross-node Psyche sync |
| 6.6 | Surfaced conflicts preserve both versions until dominated | context conflict artifacts (ADR-0013) |
| 6.7 | Broker + brain are separate processes (brain restart never drops a hosted endpoint at the process level) | daemon process topology, self-update (ADR-0018) |
| 6.8 | No irreversible durable-state migration before update ready-promotion (pre-ready writes stay N-1-readable) | auto-rollback / durable-state schema (ADR-0018) |
| 6.10 | Phase-significant loop timing is a durable absolute-deadline grid (no per-fire write; update preserves phase, crash resets, one-shot never resets) | durable loop timing / self-update (ADR-0018 Q4) |
| 6.11 | Brain respawn execs the applied bytes (canonical exe captured at broker start, not per-spawn current_exe) + promotion bytes-gate (exe_hash == artifact, else rollback) | daemon respawn path / self-update (ADR-0018 Q3) |
| 7.1 | Local `api` mutation authenticated to endpoint | api surface / broker IPC |
| 7.2 | Idempotent delivery across brain restart | broker↔brain IPC |
| 7.3 | Psyche outbound captured + `from=`/target stripped + reply-to-sender / notify-to-own-user | live-Psyche driver / daemon relay (ADR-0012) |
| 7.4 | Per-agent pulse/psyche/echo runs off the shared scheduler (no serial blocking across agents) | daemon multi-agent hosting (ADR-0004) |
| 7.5 | WAN-inbound origin = QUIC handshake identity from the broker's stream table, never payload bytes | wan receive funnel + every wire-inbound consumer (ADR-0009) |
| 7.6 | Pump brain-IPC reads deadline-bounded (30s total-wait); TimedOut → supervised restart, never per-peer retry | `Brain::cold_start_pump` / `BrainConn::Split` (reader-thread + `recv_timeout`), `pump::peer_outcome` |
| 7.8 | Broker bounds every brain-waiting QUIC op (10s < the brain's 30s); a dead peer fails as an ORDINARY error the broker replies → per-peer redial, round continues, no pump restart (the 7.6 B-half) | `NetHost::bounded_block_on` wrapping `dial`/`open_stream`/`send_stream` |
| 7.9 | A daemon-state wire change (e.g. the v0.9.0 agnostic Seed) needs a deliberate broker restart — the resident old broker can't read new bytes; the seed-skew EOF surfaces an actionable `spt daemon stop` hint, not "failed to fill whole buffer"; forward: additive + serde-default daemon-state | `cmd_seed`/`seed_fail_message`, broker seed-control residency |
| 7.10 | A view is independent from the endpoint: the cold-started daemon is launched JOB-NEUTRAL (WMI `Win32_Process.Create` PRIMARY → WmiPrvSE child, outside any terminal job from birth; breakaway DEMOTED to a fallback rung because a job CAN deny it). A WMI/scheduler child does NOT inherit transient shell env, so `SPT_*` (esp. `SPT_HOME`) is forwarded via a `cmd /c set … & start /b` wrapper. Ladder (first-success-wins, both cold-start + `spt daemon start`): WMI → schtasks → breakaway → in-job. int IS CI-testable via the WMI rung (no nesting false-red — WMI escapes regardless of job policy); operator WT/VSCode tab-close = final non-gating confirmation. `detached_no_inherit` unchanged for `launch_shell`; elevated `deelevate` keeps L1 breakaway (WMI-reparent = follow-up) | `daemon.rs::launch_daemon_job_neutral` (ladder), `spawn_daemon_via_wmi`/`wrapped_daemon_command`, int `job_escape_e2e.rs` |
| 7.11 | A dead PTY child + a dropped operator pump does NOT wedge the broker — PROVE-DON'T-CHANGE: send_stream already QUIC-deadline-bounded (7.8), the loopback duplex is drained broker-internally (evict-not-park), block_on parks the dispatch thread not a net worker; dead endpoint offlined within a reconcile tick | int `attach_wedge_e2e.rs` |
| 7.12 | Controller output delivered OFF the drain thread (dedicated writer + bounded deadline-detach), never inline under the log lock — a backed-up controller can't wedge the session | broker `OutputLog::append`/`controller_writer`, int `inject_control_wedge.rs` |
| 7.13 | `spt rc` maps Windows legacy-console delete keys so Backspace→char-delete / Ctrl+Backspace→word-delete; does NOT enable VT console input (= win32-input-mode on WT, broke detach). SUPERSEDED on Windows by 7.16 (the byte-swap `normalize_key_byte` removed; behavior native in `translate_key_event`) | `rc.rs` → 7.16 `translate_key_event` |
| 7.14 | `EffectJournal::apply_once` releases its lock ACROSS `effect()` (reserve→release→run→finalize), never holding it across the blocking PTY write; `EffectKind::PtyWrite` is ephemeral (no per-keystroke fsync, in-memory dedup), durable kinds keep fsync — fixes interactive stutter + the hard input wedge (`brain IPC read deadline`) | `effect.rs` `apply_once`/`is_durable`, int `inject_control_wedge.rs` |
| 7.15 | `reconcile_hosted_liveness` clears stale `driven_by` when it offlines a sessionless controllable perch (no live broker session) — an OFFLINE endpoint never renders phantom `ONLINE+CONTROLLED`; race-free (no session ⇒ no concurrent broker re-stamp). `controller_by==None` is ambiguous (local controller reads None) so it is NOT a clear trigger; the idle wedged-remote leg is deferred (`REQ-HAZARD-DRIVEN-BY-IDLE-REMOTE-EVICT`) | `livehost.rs` `reconcile_hosted_liveness`, int `driven_by_selfheal.rs`/`inject_control_wedge.rs` |
| 7.16 | `spt rc` on Windows reads crossterm KEY EVENTS and translates to standard xterm VT (arrows/Home/End/PgUp/Dn/Ins/Del/F-keys + modifiers reach the harness; agnostic, NOT win32-input-mode); `ctrl-b d` detach preserved event-sourced; non-tty + Unix keep the byte path; supersedes the 7.13 byte-swap | `rc.rs` `translate_key_event`/`key_event_step`/`spawn_stdin_reader_events`, unit-only (live = HITL) |
| 7.17 | PTY **input** is single-writer: each session spawns ONE dedicated input-writer thread = the SOLE caller of the blocking `write_input`, fed by a bounded FIFO (`sync_channel`). Every caller (`dispatch_input`, `dispatch_endpoint_input`, the inject worker/floor flush) ENQUEUES (`try_send`) + returns at once — a paste burst that fills the harness input buffer parks only that thread, never the broker dispatch thread. Full queue ⇒ DROP excess + stamp the perch `input_backpressure` (heal-on-resume: cleared on the next accepted enqueue); the daemon NEVER wedges on a stuck harness. Completes the W1b-deferred fix (2) of 7.14. The OUTPUT-side single-writer (7.12) mirror, applied to input | broker `InputWriter`/`input_writer`/`flush_inject_floor`/`run_inject_worker`/`dispatch_input`, `spt_store::info::set_input_backpressure`, int `inject_control_wedge.rs` |
| 7.18 | `spt rc` paste is **client-originated** on Windows: CC runs daemon-side with no access to the operator's LOCAL clipboard. On a RIGHT-CLICK rc reads the local clipboard itself and injects a BRACKETED paste (`ESC[200~` + content + `ESC[201~`) — CC has bracketed-paste mode on (`ESC[?2004h`), so a multi-line paste lands intact with NO `\r` submit-storm, harness-agnostic, content VERBATIM. RawGuard captures the mouse (disables console QuickEdit so right-click reaches the app) on an interactive console only + restores on drop. **ctrl+V is NOT intercepted** (P1b): WT consumes it (Key RELEASE only, never Press) + injects the clipboard as a key flood — it rides WT's native paste as keystrokes (no wedge, 7.19; bracketed = right-click). cfg(windows) only — Unix pastes natively | `rc.rs` `wrap_bracketed_paste`/`mouse_is_paste`/`clipboard_paste`/`read_clipboard`/`RawGuard`/`spawn_stdin_reader_events`, unit-only (live clipboard+mouse = HITL) |
| 7.19 | An operator input FLOOD must not deadlock the broker: `serve_attach` sends N `Input` frames on one conn without reading, the broker acks each on the SAME conn → the return direction fills (~pipe buffer) → `send_frame` blocks the per-conn handler → mutual full-duplex DEADLOCK (permanent broker wedge, controller latched). FIX: opt-in ack — `InputReq.ack` (serde default true); the operator/rc path sends `ack=false` (`send_effect_no_ack`), `dispatch_input` skips the applied frame → the handler never writes back while draining the flood. `shellchan` keeps `ack=true` (its `Applied`-wait). Exactly-once unaffected (dedup at the applied-set). N-1: an old resident broker still acks until restart (7.9 class) | `msg.rs` `InputReq.ack`, `brain.rs` `send_effect_no_ack`, `attach.rs` `serve_attach`, `broker.rs` `dispatch_input`, int (flood repro) |
| 7.20 | `spt rc` must forward the SCROLL wheel to the harness — our mouse capture (for right-click paste, 7.18) steals WT's native scroll. Track the harness's mouse-reporting mode from its OUTPUT (DECSET `ESC[?1000/1002/1003h` + `1006h` SGR, and `…l` off; scan survives a split across output chunks) into `MouseMode{enabled,sgr}`; forward `ScrollUp/Down` as an xterm SGR report (`ESC[<64/65;col+1;row+1M`) ONLY when `enabled && sgr`, else drop; Moved/drag/clicks dropped. cfg(windows) only — Unix scrolls natively | `rc.rs` `MouseMode`/`MouseModeScanner`/`scroll_dir`/`scroll_sgr`/`spawn_stdin_reader_events`, unit-only (live mouse = HITL) |
| 7.21 | Exactly ONE LIVE `controller_writer` per brain↔broker connection + every writer emits an ASCENDING seq stream ⇒ a snap-above consumer over the surviving writer's COMPLETE `[0,end]` replay delivers `[K,end]` with no skip/dup. A brain-restart re-serve double-registered the controller (handoff's `subscribe(from_seq=K)` + the re-serve's `attach_as(sid,0)`) → two writers raced one socket → the consumer's strict reject-gap legacy path saw `got seq 1 want 0` (flaky `attach_survives_target_brain_restart_exactly_once`). FIX (fix #1 "drop handoff's subscribe" REVERTED — it's the standalone-resume mechanism): (1) `handoff` seeds `session_cursors` → dedup-below + snap-above (correctness, made complete by the ascending-merge property); (2) `controller_writer` epoch-gated via shared `Arc<AtomicU64>` `controller_epoch`, re-read UNDER `send.lock()` (single live writer); (3) `subscribe_with` resets the resume-mode dedup cursor to `from_seq` (shared by `attach`/`attach_as`) — the LOAD-BEARING fix for the operator-stream boundary: serve_attach consumes the handoff replay's seq K before `attached`, advancing the cursor; the `attach_as(0)` re-subscribe reset re-delivers it so the operator viewport stays gapless. Pre-existing; P1b innocent | `brain.rs` `handoff`/`subscribe_with`, `broker.rs` `become_controller`/`controller_writer`/`controller_epoch`, unit `src/broker.rs`, int `tests/broker.rs`+`attach.rs` (keystone + restart carrier, 20× on kitsubito) |
| 7.22 | Endpoint-stop / brain-death reaps a brain-LESS perch's orphan detached Psyche via the cmdline-scoped guard — the handle-reap (`stop_host`, REQ-HAZARD-UNHOST-PSYCHE-REAP) CANNOT (the owning brain is dead, so its `psyche_child` handle died with it) and the brain-START sweep (REQ-HAZARD-BRAIN-RESTART-PSYCHE-DUP) never fires for a perch being STOPPED rather than re-hosted. So the live-host runs the scoped reap after `stop_host` at the reconcile stop-side AND in `confirm_residency_or_unhost`. Fail-safe-decline (pid-alive AND basename==psyche-program AND cmdline contains `<id>-psyche`; any unreadable signal DECLINES — a missed dup is bounded, a wrong-kill catastrophic). The orphan-leak half of perri F-010xF-015 (the psyche own-copy is the other half, ADR-0025 amendment) | `livehost.rs` `reap_orphan_psyche_for`/`reap_stopped_endpoint_orphan_psyche`/`reconcile_once`/`confirm_residency_or_unhost`, unit `livehost.rs`, int W3e (perri step 3) |
<!-- [doc->REQ-HAZARD-STOP-PATH-PSYCHE-ORPHAN-REAP] -->