# Known Hazards

Hard-won edge cases harvested from the sister project (`claude_skill_owl`, ~80 commits / 12+ phases / multiple production incidents). Per ADR-0001, this is a **test checklist for the spt-core rebuild** — the clean-room rebuild must re-satisfy each invariant rather than re-discover the bug.

**Architecture-translation note.** The sister project runs poll listeners and Psyche wrappers as *separate processes*. spt-core consolidates both into the one `spt-daemon` (brain), with a stable broker beneath it (ADR-0004). Many hazards below were inter-process races in the sister project; in spt-core some become intra-daemon concerns (potentially easier) while others move to the daemon↔broker IPC boundary or the network boundary (potentially new failure surface). Each entry notes the mapping where it differs. Citations point at sister-project paths for reference, not at spt-core.

---

## 1. Race conditions & ordering

### 1.1 Phantom INIT_SIGNOFF after grace period
- **Failure:** orphan teardown enqueues INIT_SIGNOFF before the grace-period recheck; a transient Self recovery (binary handoff, brief stale poll) makes the recheck pass-as-alive, but the signoff was already spooled and drains on the next iteration → teardown despite a live Self.
- **Invariant:** grace-period wait MUST complete *before* composing/delivering INIT_SIGNOFF; the recheck must bind `still_gone` before any envelope write.
- **spt-core mapping:** in-daemon now (no separate wrapper), but the ordering invariant is identical — orphan/teardown logic must re-evaluate liveness after the grace wait, not before enqueue.
- **Sister cite:** `src/live/wrapper/orphan.rs:201-259` (sleep@209 precedes compose@231-251); tests T-grace-recovery:576, T-still-gone-recheck:618.

### 1.2 Poll-rewrite race & info.json mid-write reads
- **Failure:** `info.json` written by the wrapper mid-iteration while a list/classify command reads it → torn read, misclassification.
- **Invariant:** consult liveness via the supervisor (`is_wrapper_alive`-equivalent) before any grace gate; reads of state files must tolerate concurrent writes (atomic write + rename, or read-retry).
- **spt-core mapping:** the daemon owns both writer and reader → use in-process locking/snapshotting instead of racing on disk. Cross-node registry reads remain eventually-consistent and must tolerate staleness.
- **Sister cite:** `src/common/list_filter.rs:100-150`; `src/owl/poll.rs:141`.

### 1.3 Stale `index.lock` wedge from prior git crash
- **Failure:** crashed git leaves a 0-byte `index.lock` in a psyche tracked worktree; every later commit blocks forever.
- **Invariant:** on daemon boot, sweep seed + all agent/project worktrees for stale locks (0 bytes, mtime > 60s) and remove; leave live locks alone.
- **spt-core mapping:** cross-node Psyche sync (ADR-0002/0003) replaces git-repo sync, so the *git* lock may disappear — but any equivalent lockfile in the new sync mechanism needs the same stale-sweep on boot.
- **Sister cite:** CHANGELOG v1.11.20 "Stale `index.lock`"; `src/common/git.rs`.

### 1.4 Deferred spool rows must not leak to the event stream
- **Failure:** a hook spools a deferred (spool-only, no TCP wake) notice; startup `drain_all` flushes ALL rows including deferred → event emitted at wrong time/priority.
- **Invariant:** startup drain (and idle/timeout TCP-wake sites) use `drain_non_deferred` only; deferred rows are picked up by their intended consumer via `peek`. All drain sites must agree on which rows they flush.
- **spt-core mapping:** carries directly — the daemon's spool-drain has the same deferred-vs-immediate distinction.
- **Sister cite:** `src/owl/poll.rs:276-316`; `spool::drain_non_deferred_with_metadata`.

### 1.5 Worker (working-perch) lifecycle path consistency
- **Failure:** subagent-start creates the perch at one path layout; later hooks read it at another → not found; stop-hook scan misses nested perches.
- **Invariant:** all Worker/Psyche child-perch path composition routes through one central resolver; no divergent path construction across hooks.
- **spt-core mapping:** `Worker` is a day-one endpoint type; the daemon owns the registry, so perch location is a registry lookup, not ad-hoc path math. Single source of truth for instance→location.
- **Sister cite:** `src/owl/hook_subagent_start.rs:122-168`; `hook_subagent_stop.rs:15-55`.

---

## 2. Identity & session-binding

### 2.1 Parent PID over ephemeral poll PID
- **Failure:** orphan check polls an ephemeral listener PID; it dies and is recycled (esp. Windows); a foreign process with the recycled PID reads as alive → false-positive teardown (or false-negative).
- **Invariant:** prefer the stable harness-session PID (`parent_pid`) over any ephemeral process PID for liveness; minimal `info.json` for supervisor-owned perches to avoid stale leaks.
- **spt-core mapping:** session binding (parent-process-tree anchor) still applies for harness-hosted topology. For spt-hosted sessions the broker holds the child directly → liveness is the broker's held-handle state, more reliable than PID polling.
- **Sister cite:** `src/live/wrapper/orphan.rs:141-161`; CHANGELOG v1.11.20.

### 2.2 Stdin session_id precedence over env
- **Failure:** subagent inherits a stale `OWL_SESSION_ID` env across `/clear`; hook gets two session_ids (fresh stdin, stale env) → wrong-agent binding.
- **Invariant:** stdin-provided session_id wins; env is fallback only.
- **spt-core mapping:** the harness-contract subcommand surface must define the same precedence for whatever identity fields hooks pass in.
- **Sister cite:** CHANGELOG v1.35.1 "IN-05"; `hook_subagent_start.rs:40-51`.

### 2.3 Binary-handoff argv schema must stay backward-compatible
- **Failure:** old binary spawns new binary with old argv arity; clap rejects before state rehydration → wrapper dies unlogged.
- **Invariant:** every newly-added handoff positional has a default; state-file rehydration happens *after* argv parse; defaults survive intermediate versions.
- **spt-core mapping:** CRITICAL — self-update (ADR-0004) makes handoff routine. The broker↔brain IPC and any brain-relaunch argv must be versioned and forward/backward tolerant (a newer brain talks to an older broker). This is the single most update-frequency-sensitive invariant.
- **Sister cite:** `src/live/wrapper/lifecycle.rs:17-106`; `src/cli.rs` defaults; CHANGELOG v1.11.10.

### 2.4 Generation `gen_start` always = now() on cold-start AND handoff
- **Failure:** stale gen_start from a rehydrated state file fires time-based discriminators on the new process.
- **Invariant:** wall-clock `gen_start` is set to `now()` on both cold-start and handoff; generation counter increments on every start/revive; session UUID captured fresh and carried so the resumed mind distinguishes "same gen continuing" vs "new gen born".
- **spt-core mapping:** carries to the daemon's per-instance generation tracking.
- **Sister cite:** `src/live/wrapper/lifecycle.rs:70`; `src/common/wrapper_state.rs`.

### 2.5 Daemon-hosted endpoints have no dedicated liveness PID
- **Failure:** the sister evaluates Psyche/perch liveness via a dedicated process PID — the wrapper's own pid in `info.json`, checked with `is_process_alive`. Under ADR-0004 the Psyche (and any spt-hosted Self) is a **loop inside the daemon**, not a separate process: it holds no dedicated pid, and its `claude`/summarizer subprocess is ephemeral (spawned per pulse/commune, then exits). If a daemon-hosted perch's `info.json` carries the **daemon's** pid, then *every* hosted endpoint shares one pid, and `is_process_alive(pid)` reads "alive" for a torn-down endpoint as long as the daemon runs — while `clean_stale_entries` (dead-pid deletion) can no longer distinguish a dead endpoint from a live one. The 2.1/5.1 liveness models do **not** cover this third category: the Psyche is neither harness-hosted (no `parent_pid` anchor) nor a broker-held PTY child.
- **Invariant:** for **daemon-hosted** perches (Psyche; spt-hosted Self), liveness is the **daemon's authoritative in-memory endpoint table + a `status` field** on `info.json` (`online|offline|…`), **never** `is_process_alive(info.pid)`. `info.pid` for a daemon-hosted perch is at most a *hosted-by-daemon* marker (the daemon pid), not a liveness signal; registry stale-clean for these rows keys on the daemon's endpoint table, not per-row pid. This reuses the pattern already specified for **Shells** (`info.json` carries daemon-managed `status`, capability resolved by `adapter_name` — CONTEXT "Shell… Not in the subnet registry") and extends it to daemon-hosted *agent* perches.
- **spt-core mapping:** the **M1/M2a interim** model keeps the Psyche/listener a real per-process owner (the `api listen` process), so its per-pid liveness (`deliver::is_online` → `info.read_pid` → `proc::is_process_alive`; `registry::clean_stale_entries`) is correct *interim*. **M3 daemon consolidation replaces it** with daemon-authoritative liveness for hosted perches. Keep the liveness check behind one resolver (mirrors `resolve_address` stale-clean) so the M3 swap is localized — do **not** let the per-pid assumption leak into new call sites.
- **Sister cite:** `src/live/wrapper/orphan.rs` (wrapper-pid liveness); `src/common/list_filter.rs:168-175` (pid-classify); spt-core `crates/spt-store/src/{proc.rs,registry.rs}` + `crates/spt-msg/src/deliver.rs::is_online`.

---

## 3. Lifecycle

### 3.1 Ephemeral perch cleanup on every `ring` exit path
- **Failure:** `ring` creates an ephemeral perch; early-exit paths (no-perch, empty-msg, timeout) skip cleanup → stale dirs accumulate.
- **Invariant:** every code path that creates an ephemeral perch cleans it before exit; exception: if the caller already had an active perch, do not treat as ephemeral and do not clean up.
- **spt-core mapping:** `ring` semantics carry; the daemon owns ephemeral-perch lifecycle, so a single guaranteed-cleanup (drop guard / RAII) is achievable in-process.
- **Sister cite:** `src/owl/ring.rs:58-294`.

### 3.2 Stale signoff sentinel must not kill a fresh start
- **Failure:** a leftover `.claude/<id>-signoff.md` from a prior session is read by a fresh listener as a live signoff → immediate teardown.
- **Invariant:** on every listener/daemon spawn, sweep stale signoff sentinels; signoff files are write-once per generation.
- **spt-core mapping:** same sweep on daemon (re)start per hosted instance.
- **Sister cite:** CHANGELOG v1.11.20; `src/owl/cleanup.rs:97`.

### 3.3 Orphan teardown fires echo-commune BEFORE INIT_SIGNOFF
- **Failure:** teardown delivers INIT_SIGNOFF without first saving the final context delta → Psyche signoff lacks the context-save summary.
- **Invariant:** on orphan path, synchronously run the echo-commune (final delta) before composing INIT_SIGNOFF; skip only if the session_id is missing.
- **spt-core mapping:** the daemon runs psyche/pulse loops in-process; ordering invariant identical.
- **Sister cite:** `src/live/wrapper/orphan.rs:175-199`; tests A-H:333-565.

---

## 4. Wire / transport

### 4.1 Envelope HTML-entity codec ordering — `&amp;` decoded LAST
- **Failure:** decoding `&amp;`-entity before the others double-decodes nested entities (`&amp;amp;lt;` → wrong result).
- **Invariant:** ENCODE order amp→first … `<br>`→last; DECODE order `<br>`→first … amp→**last** (`&lt;`,`&gt;`,`&quot;`, then `&amp;`). One sole decode site (at the LLM/stdin boundary); the parser never decodes.
- **spt-core mapping:** `spt-proto` owns the envelope grammar (public SDK, semver + wire-version). This codec contract is a copy-verbatim commodity item (ADR-0001) and a public-API conformance test.
- **Sister cite:** `src/owl/poll.rs:1-73`; `src/common/envelope.rs`.
- **CR-linesafety `[REQ-HAZARD-ENVELOPE-CR-LINESAFE]`:** the EVENT is LINE-FRAMED, so the codec must neutralize raw `\r` too — `event_body_escape` folds CRLF/lone-CR to `\n` (→`<br>`) **before** framing. **Failure (field, 2026-06-08):** a cross-node `spt send` from Windows (`echo` → CRLF) carried a raw `\r` into the single-line envelope; the receiver terminal did a CR→column-0 overwrite (`</EVENT>` clobbered `<EVENT t`). `\r` was never line-representable here, so normalizing it is robustness, not an ADR-0001 wire divergence (decoder + amp-last untouched). Belt-and-suspenders: `spt send`/`ring` trim stdin like `notify`.

### 4.2 Two-slice envelope parser is panic-free and tolerant
- **Failure:** malformed envelope (unclosed/misordered/nested tags) panics or drops output.
- **Invariant:** tags case-sensitive, all optional; no tags → whole body to live slot; unclosed → None for that tag; out-of-order → both still extracted; nested unknown tags preserved verbatim; zero `unwrap` on parsed text.
- **spt-core mapping:** `spt-proto` parser; property-test the robustness rules.
- **Sister cite:** `src/common/envelope.rs:64-92`; tests 99-207.

### 4.3 Registry stale-entry cleanup precedes lookup
- **Failure:** sender resolves a dead process's stale TCP port → delivery to wrong/dead listener.
- **Invariant:** clean stale entries (dead PID) before/at lookup; spool fallback is the safe path on TCP miss.
- **spt-core mapping:** now spans the **subnet registry** (ADR-0003) — eventually-consistent across nodes. Cross-node staleness is expected; resolution policy (local → most-recent → `id@node`) must degrade to spool/relay fallback on stale hits, and never hard-fail on a stale remote entry.
- **Sister cite:** `src/common/registry.rs:62-78`; `src/owl/send.rs`.

### 4.4 Deferred rows survive poll drain
- **Failure:** poll `drain_all` flushes a deferred (spool-only) message meant for a hook consumer → message lost.
- **Invariant:** deferred rows are never flushed by the event-stream drain; only `drain_non_deferred_*` / `peek_all` touch them.
- **Sister cite:** CHANGELOG v1.11.20; `src/common/spool.rs`. (See also 1.4.)

### 4.5 Inbox legacy compat must not double-deliver
- **Failure:** message surfaced via both spool (durable) and legacy inbox files → duplicate or racing delivery.
- **Invariant:** spool is the sole read path at poll time; inbox is write-for-compat only and never read.
- **spt-core mapping:** clean-room — likely drop the legacy inbox entirely. If kept for any compat, preserve "never read at drain time."
- **Sister cite:** `src/common/inbox.rs`.

### 4.6 Addressable-id charset reserves the address delimiters
<!-- [doc->REQ-HAZARD-ID-CHARSET] -->
- **Failure:** a bare endpoint id that contains `:` or `@` (or a path separator / whitespace / control char) makes the canonical qualified address `[subnet:]id[@node]` (ADR-0006 / REQ-INST-10) ambiguous to parse, and lets a name smuggle into a perch directory path. Once permissive ids exist in the wild, tightening later needs a migration.
- **Invariant:** every addressable id/name is validated to `[A-Za-z0-9_-]` + Hiragana/Katakana/CJK only, length `1..=64`, **at every creation seam** (`ready` start, `api bind`, `api listen`, `api worker-start`). `:` and `@` are permanently reserved as address delimiters; reads of existing perches are never re-validated. Enforce now (pre-M3/M4) so no permissive id-data accumulates.
- **spt-core mapping:** `spt_proto::id::validate_endpoint_id`; called at the four creation seams. The existing Psyche (`<parent>-psyche`) / Worker (`<parent>-w<N>`) suffix scheme uses only `-` + alphanumerics, so composite ids validate.

### 4.7 Concurrent SQLite openers must not fail with "database is locked"
<!-- [doc->REQ-HAZARD-REGISTRY-CONCURRENT] -->
- **Failure:** two endpoints on one machine open the same SQLite store at once (e.g. two `ReadyAgent::start` calls registering simultaneously) and one fails outright with `SQLITE_BUSY` / "database is locked" → spurious registration/spool failure. Surfaced as a parallel-test flake in `two_agents_exchange_message_tcp_and_spool`, but the bug is real concurrency, not test-only.
- **Invariant:** `busy_timeout` is set **before** any lock-taking statement on every connection. Switching `journal_mode=WAL` takes a brief exclusive lock; with the default 0ms timeout it fails immediately under contention, so the pragma order is load-bearing: `Connection::open` → `busy_timeout` → `journal_mode=WAL` → `CREATE TABLE …`. WAL alone is insufficient (concurrent *writers* still serialize; they must *wait*, not error).
- **spt-core mapping:** `spt_store::registry::open_registry` + `spt_store::spool::open_spool_at`; both set `busy_timeout=5000` first. Any future SQLite store (history Path B, instance registry) must follow the same ordering.

### 4.8 Registry merge ordered by epoch, never wall-clock (red-team #8)
<!-- [doc->REQ-HAZARD-REGISTRY-EPOCH-LEASE] -->
- **Failure:** the per-subnet registry replicates `endpoint_id → [instances]` eventually-consistently across nodes. Under a partition or clock skew, a lagging node re-announces a stale `Active` for an endpoint that has actually gone `Offline`. If the merge ordered updates by wall-clock (or "last write wins"), the stale `Active` overwrites the newer `Offline` and resolution routes a message to a dead/wrong instance.
- **Invariant:** the merge precedence key is a **per-node monotonic epoch counter** (`spt_store::epoch::EpochSource`, persisted, strictly increasing, NEVER wall-clock), compared version-vector style per `(endpoint_id, node)`: an incoming update wins **iff its epoch is strictly greater** than the stored one for that node; equal or lower is dropped as stale. So a newer `Offline` (higher epoch) can never be clobbered by a lagging `Active` (lower epoch), and an idempotent equal-epoch replay is a no-op. Wall-clock is at most a human tiebreaker hint inside a flagged conflict, never the ordering authority. The same epoch source unifies with the D6 sync-precedence concurrent-write detection (#7).
- **spt-core mapping:** `spt_net::net::registry::SubnetRegistry::merge_instance` (the lease) + `spt_store::epoch::EpochSource` (the counter). Cross-node replication of the merge wires at D4; the merge seam is identical for local and wire-delivered updates. Chaos/two-host verification = D9.

### 4.9 SQLite stores must create their parent dir — SQLite won't
<!-- [doc->REQ-HAZARD-REGISTRY-DIR-CREATE] -->
- **Failure:** `Connection::open` creates the database FILE but never its parent DIRECTORY. On a fresh home (first boot, fresh CI `_work` dir) a registry op that runs before any perch-creating op (`create_dir_all` side effects) fails `SQLITE_CANTOPEN` — "unable to open database file …owlery\.registry". Timing-dependent: whichever code path touches the home first decides the outcome, so it surfaces as a parallel-test flake (bind-first tests losing the dir-creation race to perch-first tests). Bit the hfenduleam CI leg twice (2026-06-03/04, four spt-msg unit tests at once on the second strike) before being run to ground; a slow runner filesystem (AV scanning fresh dirs) widens the window but is not the cause.
- **Invariant:** every SQLite store's open path `create_dir_all`s its parent dir itself, best-effort, before `Connection::open` — never relying on another subsystem having materialized the home first. (Mirrors the spool, which always did this; the registry didn't.)
- **spt-core mapping:** `spt_store::registry::open_registry` (`create_dir_all(owlery)` before open). `spt_store::spool::open_spool_at` already creates its perch dir. Any future SQLite store must do the same — pair this with the 4.7 pragma ordering on every new store.

### 4.10 Dead node identities leave immortal registry rows  `[REQ-HAZARD-REGISTRY-GHOST-ROWS]`
<!-- [doc->REQ-HAZARD-REGISTRY-GHOST-ROWS] -->
- **Failure:** the registry's only superseding mechanism is the per-`(endpoint_id, node)` epoch lease (4.8) — a row is replaced only by a newer row *from the same node*. When a node identity dies permanently (machine retired, or `node.key` regenerated so the "node" never speaks again), its rows are never superseded and never expire: they sit in the in-memory registries and the `identity/registry/<subnet>.json` snapshots forever. A bare-id send then resolves the same endpoint id on both the live and the dead identity and refuses with a **phantom `AcrossNodes` ambiguity** — unfixable by the user, because no qualifier reaches a node that no longer exists. Hit live in the M7 acceptance run (2026-06-06): gravity paired under two identities (09ef…, then 03854a… after its key universe flipped during the sudo experiments); the dead identity's `sergey` row made bare `spt send sergey` refuse on HFENDULEAM.
- **Invariant:** registry rows authored by a **silent** peer node decay: a node not *heard* (admitted inbound feed — the M7 D2 heard-map, REQ-SUBNET-1) within the eviction window (`registry_evict_after_ms`, default 300s ≈ 10 default pump cadences) has its rows **evicted** from every subnet registry, snapshots rewritten. Own rows never decay (the node always hears itself implicitly — it authors them each pump tick). Eviction is safe under the lease: v1 has **no transitive gossip**, so any future update for a node comes from that node itself, alive, re-inserting from its durable `EpochSource` within one cadence — there is no lagging third-party replay to mis-order against. A merely-offline node loses its rows after the window and reconverges on return; meanwhile resolution honestly reports it absent instead of poisoning bare-id sends.
- **spt-core mapping:** `spt_net::net::registry::SubnetRegistry::evict_nodes` (model) + `spt_daemon::registryhost::RegistryHost::evict_silent_peers` (heard-map TTL) driven from the registry pump tick (`peerloop`). Trust rows are NOT auto-evicted (trust is a user decision; a stale trust row only costs dead dials) — pruning those is a separate verb.
- **Source:** M7 acceptance run 2026-06-06 (DEFERRED.md "Ghost registry row eviction"); the AMBIGUOUS render fix rode along.
- **Mesh note (ADR-0017, 2026-06-08):** the subnet mesh **preserves** this invariant rather than superseding it. "No transitive gossip" sharpens to **no transitive *row* gossip** — the mesh relays only the member *roster* (discovery), while registry **rows stay own-authored and are fetched directly** from each member over a handshake. So "any future update for a node comes from that node itself, alive" still holds and the eviction lease is untouched. (The plan's rejected alternative — signed transitive *row* relay — would have broken this; roster-only relay was chosen precisely to keep it.)

### 4.11 Advertisement-epoch reset strands a node  `[REQ-HAZARD-EPOCH-RESET]`
<!-- [doc->REQ-HAZARD-EPOCH-RESET] -->
- **Failure:** a node whose advertisement-epoch counter resets (the durable `EpochSource` file lost/recreated) re-advertises with LOW epochs; peers hold a higher last-seen epoch for that `(endpoint, node)` lease and drop every fresh row as **stale** — the node advertises into a void until its counter outruns its own history. Nothing renders the cause: the node looks healthy locally, peers simply never update.
- **Invariant (mitigation by construction, common case):** the common trigger — a full reinstall / identity regeneration — is covered by the **re-pair trust overwrite** (M8 decision 13, REQ-SUBNET-7): a completed ceremony presenting the same label + machine id evicts the superseded identity's trust AND registry rows on the seed-holder, and the peer-side epoch memory **dies with the deleted row** — the re-paired node's fresh epochs land on a clean lease. M8 acceptance 7 verifies this explicitly (the epoch sub-check).
- **Residual (documented, guard deferred):** the narrow slice — epoch file lost while the node *identity* is kept (manual state surgery, partial restore from backup) — has no guard; it waits for a field hit before one is designed (M8 decision 24). `REQ-HAZARD-EPOCH-RESET` is minted inactive (TRACEABILITY rule 5) as the tracking hook. If hit: symptoms are one node's endpoints frozen-stale on every peer while its own views are fresh; recovery today is re-pairing the node (rides the common-case eviction above).
- **spt-core mapping:** epoch mint = `spt_store::epoch::EpochSource` (`identity/epoch.json`); the lease = the per-`(endpoint, node)` epoch compare in `spt_net::net::registry`; the eviction that clears peer-side epoch memory = `registryhost::repair_evict_superseded` + `RegistryHost::consume_repair_evictions`.
- **Source:** minted at M8 ratification (decision 24), recognized as a class during the 2026-06-07 pump diagnosis / re-pair overwrite design — not yet field-hit in its residual form.

---

## 5. Platform-specific

### 5.1 Windows PID recycling false positives
- **Failure:** recycled PID reads alive for the wrong process → orphan misclassification.
- **Invariant:** anchor liveness on the stable parent/harness PID; minimal info.json for supervisor-owned perches; mtime grace window (≥60s) masks transient mismatches.
- **spt-core mapping:** broker-held handles supersede PID polling for spt-hosted sessions; keep the grace window for harness-hosted.
- **Sister cite:** `src/live/wrapper/orphan.rs:141-161`; `src/common/list_filter.rs:168-175`.

### 5.2 Windows EBUSY on atomic rename
- **Failure:** `fs::rename` fails while a handle is (recently) held → registry/marketplace update fails.
- **Invariant:** tmp-write + atomic-rename with retry/backoff; best-effort side-fail; tolerate transient EBUSY.
- **spt-core mapping:** all on-disk state writes (registry, trust store, spool checkpoints) use this pattern. Self-update binary swap on Windows especially.
- **Sister cite:** CHANGELOG "EBUSY"; `src/common/owlery.rs` atomic_write.

### 5.3 Git/subprocess timeout stamping
- **Failure:** a hung subprocess (git on slow net) blocks the supervisor indefinitely.
- **Invariant:** every metadata-producing subprocess has a timeout; timeout yields `None` + rate-limited stderr, never a hang.
- **spt-core mapping:** generalize to all manifest-declared harness invocations (delegated commands, adapter updates) — timeouts mandatory.
- **Sister cite:** `src/common/git.rs`.

### 5.4 Windows UNC prefix in serialized paths
- **Failure:** canonicalized `\\?\C:\...` serializes to `//?/C:/...` and fails `read_to_string`.
- **Invariant:** strip the `\\?\` UNC prefix after backslash→forward-slash conversion; serialized path attrs must be directly consumable.
- **spt-core mapping:** any path crossing the wire (file-drop EVENTs, off-node file transfer per ADR-0003) needs canonical normalization at the `spt-proto` boundary.
- **Sister cite:** `src/common/owlery.rs:377-384`.

### 5.5 ConPTY withholds output until DSR is answered  `[REQ-HAZARD-CONPTY-DSR]`
- **Failure:** a broker reading a ConPTY master sees only the 4-byte startup query `ESC [ 6 n` and then nothing — the child looks hung/silent but is producing output normally. ConPTY blocks all child stdout until the terminal answers the cursor-position query.
- **Invariant:** every ConPTY reader auto-answers DSR (`ESC [ 6 n` → write `ESC [ 1;1 R`, or a real cursor position) on the PTY writer. Secondary: a ConPTY master does not EOF while the writer is held, so read loops drain on a thread and never gate exit on a blocking `read()`.
- **spt-core mapping:** `spt-term` broker PTY reader (ADR-0004). Brand-new to spt-core — not in the sister project (it never hosted ConPTY directly).
- **Source:** Spike #1 (`docs/spikes/SPIKE-01-broker-handoff.md`); reproduced with both a Rust child and `cmd.exe`.

<!-- [doc->REQ-HAZARD-DETACHED-PIPE-INHERIT] -->
### 5.6 Windows detached children inherit a captured caller's pipe  `[REQ-HAZARD-DETACHED-PIPE-INHERIT]`
- **Failure:** a caller captures an `spt` invocation's output through a pipe (`Command::output()`, a harness hook reading the command). That `spt` process detach-spawns a **long-lived** child (the daemon via `ensure_running`; a shell binary via `spt shell spawn`). On Windows `CreateProcess` runs with `bInheritHandles = TRUE`, and the spt process's std handles — the caller's pipe write-ends — are inheritable by construction, so the immortal child inherits them even when its *own* stdio is `Stdio::null()`. The caller's pipe read never sees EOF: the capturing caller **hangs forever** (unix is immune — pipe fds are `CLOEXEC`). Paid twice: daemon spawn (guarded at D4a-era `spawn_detached`), then again at M5-D3e when the mock-shell E2E hung `spt shell spawn` for hours.
- **Invariant:** every detach-spawn of a long-lived child runs with **`bInheritHandles = FALSE`** (`spt-daemon::daemon::detached_no_inherit`) — zero handles flow, whatever the pipe's depth in the ancestry. Stripping `HANDLE_FLAG_INHERIT` from the spawner's *std* handles is NOT sufficient: a grandparent capture's pipe sits in the handle table as a stray inheritable handle and still flows through every `bInheritHandles = TRUE` hop (the first guard shipped that way and was wedged by exactly this — a daemon spawned three layers deep held the pwsh-level pipe of the CI/test harness).
- **spt-core mapping:** `spt-daemon::daemon::spawn_detached` (the daemon) and `spt-daemon::shellhost::launch_shell` (the relay-receipt shell binary). Any future long-lived detached spawn (manifest-template children included) must use the same no-inherit spawn.
- **Source:** spt-core, M5-D3e (`shell_e2e.rs` hang, 2026-06-04, twice — once per guard generation); Rust `Command` restricts *its own* created stdio handles but a parent's inheritable handle table still flows.

### 5.7 Elevated commands spawn the daemon with the wrong token  `[REQ-HAZARD-ELEVATED-DAEMON-SPAWN]`
<!-- [doc->REQ-HAZARD-ELEVATED-DAEMON-SPAWN] -->
- **Failure:** membership-implies-reachability made *every* `spt` invocation a potential daemon spawner (`ensure_running`), including the elevation-gated ones (`subnet create`/`join`, REQ-SUBNET-4). The spawned daemon inherits the spawner's token. **Windows:** an elevated `subnet create` auto-starts an ELEVATED daemon whose named pipes deny unelevated clients — every subsequent unelevated `spt` reads "not running", tries to spawn its own daemon, and dies on bind Access-denied; the user had to taskkill (hit live, M7 acceptance 2026-06-06). **Linux:** a sudo'd command spawns a root daemon and/or root-owned state — and because sudo flips `$HOME`, the daemon can mint a *different node identity* in root's universe (the very key-flip that produced the 4.10 ghost rows).
- **Invariant:** the daemon **always runs unelevated in the invoking user's universe**, regardless of which command spawns it. Two enforcement points sharing one seam: (a) `spawn_detached` de-elevates the child — Windows: the UAC **linked token** (`TokenLinkedToken` → `DuplicateTokenEx` → `CreateProcessWithTokenW`; inherits no handles, so 5.6 holds by construction); Linux: drop to `SUDO_UID`/`SUDO_GID` with `$HOME`/`$USER`/`$LOGNAME` reset to the invoking user's (passwd lookup); (b) a `Daemon::run` entry guard catches a *directly* elevated `spt daemon` — Linux drops privileges in-process before touching any state; Windows respawns de-elevated and exits. When no unelevated identity exists to drop to (UAC disabled, genuine root login, SYSTEM), the daemon runs as-is with a loud warning — a consistent universe, never a torn one. Elevated one-shot *clients* talking to an unelevated daemon are fine (downward connects work); the daemon side is the invariant.
- **spt-core mapping:** `spt-daemon::deelevate` (the OS-split seam) consumed by `daemon::spawn_detached` + the `Daemon::run` entry guard. The fuller Linux elevation model (install symlink + default-account election) is deferred (DEFERRED.md, M8).
- **Source:** M7 acceptance run 2026-06-06 (DEFERRED.md "Non-admin daemon spawn"); interim field rule was "bring the daemon up unelevated FIRST".

<!-- [doc->REQ-HAZARD-CHILD-CONSOLE-FLASH] -->
### 5.8 Console children of the console-less daemon flash visible windows  `[REQ-HAZARD-CHILD-CONSOLE-FLASH]`
- **Failure:** the daemon runs DETACHED (no console, 5.6/`detached_no_inherit`). Any console-subsystem child it spawns (`git`, `taskkill`, manifest hook commands) gets a **fresh conhost with a visible window** — piped/null stdio does NOT prevent it. Field shape: the 60s sync pump's two git calls (`for-each-ref` + `rev-parse`) flashed two blank windows per minute on the user's desktop (2026-06-06).
- **Invariant:** every short-lived console child spawned from daemon-reachable code sets `creation_flags(0x0800_0000)` (`CREATE_NO_WINDOW`). Long-lived detached children use `detached_no_inherit` (already `DETACHED_PROCESS | CREATE_NO_WINDOW`); de-elevated spawns use `CREATE_NEW_CONSOLE + SW_HIDE` (5.7 — `CreateProcessWithTokenW` rejects `CREATE_NO_WINDOW`, error 87).
- **Test seam caveat:** window-absence is unobservable from a consoled test runner — the child inherits the runner's console and never creates a window, flag or no flag. Unit coverage asserts the flagged spawn still works (the error-87 "flag combo breaks spawn" regression class); window-absence was verified live by process-watch capture.
- **spt-core mapping:** `spt-store::gitrun::run_git` (every BranchStore/ContextStore git call), `spt-daemon::shellhost::kill_shell_pid` (taskkill), `spt-runtime::run_bounded_command` (manifest hook commands), `spt-runtime::ManifestRuntime::command_for` (the one shared builder behind `spawn_session` + `run_bounded_stdin` — the notif pump's `spawn_notif_command` and the live agent's psyche/echo/turn spawns), `spt-daemon::shellwake` (already guarded). The flag lives in each shared builder, not per call site, so the invariant holds for every ManifestRuntime spawn by construction.
- **Source:** spt-core field bug, 2026-06-06 — two blank windows flashing every 60 seconds on a desktop workstation, caught by process-spawn watcher (git.exe parent=spt daemon, conhost.exe child each).

### 5.9 `Instant - Duration` underflow-panics on a freshly-booted host  `[REQ-HAZARD-INSTANT-UNDERFLOW]`
- **Failure:** `Instant::now() - Duration::from_secs(N)` panics `overflow when subtracting duration from instant` when the process's monotonic clock is younger than `N` — i.e. the host booted less than `N` ago. The peer pump primed its cadence legs with `Instant::now() - 86_400s` to mean "everything due now"; on a Windows runner with sub-24h uptime the pump thread panicked at startup, so the subnet never converged (CI `pump_and_dispatch_self_drive_the_subnet` failed, run 27082417706). It is *environment-conditional* — green on any host up longer than the offset, red below it — so it slips local dev and only bites a fresh CI box or a just-rebooted machine.
- **Invariant:** NEVER compute an instant in the past by subtracting from `Instant::now()`. Represent "never run / due now" as `Option<Instant> = None` and gate on forward `now.duration_since(past)` only (`peerloop::due`). No backward instant arithmetic anywhere in scheduling.
- **Test seam caveat:** the convergence E2E only reproduces on a sub-offset-uptime host (it passed everywhere with >24h uptime). The deterministic guard is the `due(None, ..)`/`due(Some(now), ..)` unit on the extracted gate — it asserts first-tick-due with zero instant subtraction, independent of host uptime.
- **spt-core mapping:** `spt-daemon::peerloop::due` (the sole cadence gate behind `due_reg`/`due_notif`/`due_sync`/`due_upd`); cadence legs are `Option<Instant>` seeded `None`.
- **Source:** spt-core CI failure, 2026-06-07 — Windows runner `hfenduleam` (just booted) panicked the peer pump at the v0.1.1 release gate.

### 5.10 `sudo spt` dead-ends on a user-local install (secure_path)  `[REQ-HAZARD-SUDO-SECURE-PATH]`
- **Failure:** the elevation-gated commands (`subnet create` / `subnet join` / `show-code`) refuse when unelevated and tell the user to "run as administrator / root". The user types the obvious `sudo spt subnet create FOO` → `sudo: spt: command not found`. `spt` is a user-local install (`~/.local/bin`, `~/.cargo/bin`), and sudo's `secure_path` (a `/etc/sudoers` default) does NOT include those dirs, so a bare command name doesn't resolve under sudo. The guidance is a trap: it names an action that cannot work for the common install shape. Field-hit on KITSUBITO at the v0.1.1 ship.
- **Invariant:** elevation guidance on Unix emits the binary's **absolute path** under sudo — `sudo /home/u/.local/bin/spt subnet create FOO` — reconstructed from `current_exe()` + the real argv and shell-quoted. An absolute program path is executed directly; `secure_path` only governs bare-name PATH lookup, so the absolute form always resolves. On an interactive Unix TTY the command auto-elevates (re-execs itself under sudo, the elevated child does the work and `main` de-elevates back); non-interactive or sudo-absent falls back to printing the runnable hint. Never emit a bare-name elevation instruction.
- **Companion UX:** the post-de-elevation `DEELEVATED: running as uid N` line is internal state-safety noise — omit it from the user-facing CLI path (it confused the same field user). The detached daemon's own de-elevation log line is fine (it lands in the daemon log, not the terminal).
- **Test seam caveat:** the sudo re-exec needs a real `sudo` + TTY (not hermetic). The deterministic guard is the pure `elevation::rerun_command` (asserts an absolute path under sudo, not a bare name, + shell-quoting) and the `should_auto_elevate` truth table; the exec leg is manual/kitsubito-verified.
- **spt-core mapping:** `spt::elevation::{rerun_command, current_style, should_auto_elevate}` (pure), `spt::cli::{try_auto_elevate, with_elevation_hint}` wired into `cmd_subnet_create` / `cmd_subnet_join` / `cmd_subnet_show_code`; `spt::main` de-elevation drop silenced.
- **Source:** spt-core field report, 2026-06-07 — `reavus@KITSUBITO`, `spt` in `~/.local/bin`; the absolute-path `sudo` invocation was confirmed working before the fix landed.

---

## 6. Documented regressions (non-obvious invariants)

### 6.1 No flat/nested perch siblings; resolver-routed paths
- **Failure:** mixed flat + nested perch layouts confuse which perch is live; cascade-wipe risk.
- **Invariant:** one path resolver; never create divergent siblings.
- **spt-core mapping:** clean greenfield layout from day one (no migration window) — pick one structure, route everything through the registry. Storage layout deferred to design phase but this single-source-of-truth rule is binding.
- **Sister cite:** `src/common/perch_path.rs`; CHANGELOG Phase 25.4.

### 6.2 Soft-cleanup preserves state, removes `ready`
- **Failure:** hard-deleting a perch on cleanup loses spool (incl. stored signoff) needed for offline recovery.
- **Invariant:** soft-stop removes only the `ready`/online marker; preserves info + spool + dir. Hard-delete only on explicit operator action.
- **spt-core mapping:** instance offline-state recovery depends on this; carries to the daemon's stop path.
- **Sister cite:** `src/owl/stop.rs`.

### 6.3 Cascade-wipe guard: never delete a parent hosting non-empty children
- **Failure:** `doctor --fix` deletes a top-level perch that still hosts in-flight nested Worker/Psyche perches.
- **Invariant:** before hard-delete, check for non-empty nested children; if present, soft-clean only and surface the path.
- **spt-core mapping:** any destructive maintenance command must check for live child instances first.
- **Sister cite:** CHANGELOG v1.11.20 Phase 35.1.

### 6.4 Drop files are single-writer (supervisor-owned), read-only for the mind
- **Failure:** the Psyche LLM deletes a commune drop file the wrapper is concurrently reading → race, lost commune.
- **Invariant:** drop files (`<id>-commune.md`) are supervisor-owned single-writer; the LLM is read-only; only the supervisor (or explicit operator) deletes them.
- **spt-core mapping:** the daemon is the single writer; the harness-invoked mind never mutates drop files. Bake the ownership into the runtime contract.
- **Sister cite:** CHANGELOG v1.11.7; `src/live/context.rs:615` (removed delete).

### 6.5 Direct-write precedence guard against stale LLM overwrites
- **Failure:** the LLM emits an older snapshot that clobbers a fresher direct write to a context file.
- **Invariant:** every context write carries a source+timestamp precedence marker; LLM writes within a protection window after a recent direct write are suppressed (logged); direct writes always proceed.
- **spt-core mapping:** cross-node Psyche sync (ADR-0003) makes this multi-writer across machines — the precedence marker must include node identity **plus a per-node version vector** (entries from each node's monotonic `EpochSource`; wall-clock never orders). Distributed rule (ADR-0013, M4-D6): dominate→accept, dominated→drop, **concurrent→surface as durable replicated conflict artifacts + Psyche-reconcile on the active instance's node — never silent newest-wins, never lose either version**. The freshness rule (newest-and-newer-than-mine, per the cross-instance context-freshness feature) is the same guard read as vector dominance. Highest-value carryover for the sync design.
- **Sister cite:** CHANGELOG v1.11.6; `src/owl/echo_commune.rs`.

### 6.6 Surfaced context conflicts preserve both versions until dominated
- **Failure:** a cross-node concurrent context write (version vectors, neither dominates) gets auto-picked or partially dropped — half a mind silently lost.
- **Invariant:** a surfaced concurrent pair is durably preserved (both versions) until a strictly dominating write clears it; no merge/reconcile failure path may discard an unmerged version. Resolution is the Psyche reconcile turn (ADR-0013), whose merged write `join(vA,vB)+bump` dominates both parents — only that dominance clears the artifacts.
- **spt-core mapping:** `ContextStore::record_conflict` (tracked `.conflicts/` artifacts, content-hash named, idempotent, replicate like context) / `list_conflicts` / `clear_conflicts` (dominating-write-only). Local working file stays untouched while a conflict is pending.
- **Origin:** ADR-0013 design invariant (red-team #7's "wall-clock loses concurrent writes" closed M4-D6), not a sister bug — registered ahead of the wire path so D6c is born conformant.

### 6.7 Broker and brain MUST be separate processes (in-process collapse silently breaks no-endpoint-drop update) `[REQ-HAZARD-BROKER-PROCESS-ISOLATION]`
- **Failure:** the daemon hosts the broker as a background *thread* in the single `spt daemon` process (`daemon.rs:165-170`, `Arc<Broker>` + `thread::spawn(serve)`) instead of a separate process. A brain restart onto a swapped binary then cannot happen without killing the broker thread — closing every PTY, orphaning every harness child, dropping every socket. So `spt update apply` degrades to an in-process `Brain::handoff` no-op: the binary swaps on disk but the running daemon keeps executing the old code until an unrelated restart/logon. The no-endpoint-drop self-update pillar (REQ-UPD-3, ADR-0004) is silently unrealized. Observed live 2026-06-09: `enlyzeam` ran 0.3.0 with 0.3.2 on disk for ~a day, still reproducing the bug the update fixed.
- **Invariant:** the broker runs as its own long-lived process that survives every brain restart; the brain restarts onto the new binary and re-attaches via the versioned IPC. A routine (brain-only) update must leave every hosted endpoint untouched at the *process* level — not merely re-subscribe a brain within the same process. The evidence for REQ-UPD-3 / REQ-DAEMON-2 must prove process-level survival (a PTY child + a live QUIC conn survive a brain-process restart onto a swapped binary — SPIKE-01/03 productionized as `int`), NOT the in-process handoff shape that masks this regression.
- **spt-core mapping:** restoration is ADR-0018 (next milestone). The current `int` tags on REQ-DAEMON-2 / REQ-UPD-3 are regression-masked and re-point at restoration; the broker becomes the always-up per-machine anchor (seed-lock + liveness + brain supervisor). Two-process supervision, generation custody, durable-deadline loop timing, broker-cursor-of-record, and readiness-gated auto-rollback all hang off this.
- **Origin:** unintended spec/impl drift from ADR-0004 (the broker *process* was specced + spiked but built in-process), discovered during the v0.3.2 fleet update verify. Full audit + decisions: `docs/BROKER-BRAIN-SPLIT-RESTORATION.md` (verified) + ADR-0018.
<!-- [doc->REQ-HAZARD-BROKER-PROCESS-ISOLATION] -->
- **D1 (restoration skeleton, ADR-0018 Q2/Q3):** the process boundary is restored — `spt daemon run` is the broker process and spawns a supervised `spt daemon brain` child (`brainproc.rs`); the broker survives the brain dying and respawns it (proven in production topology by `crates/spt/tests/brain_split.rs`). The logic loops still run broker-side (D2 migrates them); the `int` process-level survival E2E + the in-process re-point land at D7.

### 6.8 No irreversible durable-state migration before update ready-promotion `[REQ-HAZARD-ROLLBACK-STATE-COMPAT]`
- **Failure:** the readiness-gated auto-rollback (ADR-0018 Q7) spawns the *previous* binary against durable state the *new* brain already wrote. The first release that migrates a durable-state schema in place would make the old binary unable to read it — silently bricking rollback exactly when it is needed (a logic-bricking update that can no longer fall back).
- **Invariant:** a brain must not irreversibly migrate durable state before it is ready-promoted; equivalently, every pre-ready write must remain readable by the N-1 brain. Schema migrations are gated behind ready-promotion (or written in an N-1-tolerant additive form).
- **spt-core mapping:** lands with ADR-0018's auto-rollback. Free to assert now (a 2026-06-09 source audit confirmed zero state-migration code exists); unmintable retroactively once a migration ships.
- **Origin:** verification amendment `[V1]` (agent `doyle`) on `docs/BROKER-BRAIN-SPLIT-RESTORATION.md`.
<!-- [doc->REQ-HAZARD-ROLLBACK-STATE-COMPAT] -->

---

## 7. Boundary & delivery integrity (added 2026-05-31 — Stage A red-team)

These were absent from the sister-project harvest; codex surfaced them as load-bearing gaps for spt-core's new daemon/network surface.

### 7.1 Local `api` mutation auth  `[REQ-HAZARD-LOCAL-API-AUTH]`
- **Failure:** any local process calls `spt api bind|state|session-end|history-log|poll` and binds, ends, injects, or spoofs the state of an endpoint it does not own. Local untrusted processes are explicitly in scope (shells, third-party adapters).
- **Invariant:** every `api` *mutation* is authenticated to an endpoint/session — per-endpoint link token or OS-credential binding. An unrelated local process cannot bind, inject, end, or spoof state.
- **spt-core mapping:** the `api` subcommand surface (PRD R-API-*) + broker IPC.
- **Source:** codex Stage A #13 (`docs/reviews/STAGE-A-codex-redteam.md`).

### 7.2 Idempotent delivery across brain restart  `[REQ-HAZARD-RESTART-IDEMPOTENT]`
- **Failure:** broker queues, spool rows, PTY injection, remote streams, and file transfers cross the broker↔brain boundary; a brain restart duplicates or drops a side effect.
- **Invariant:** every side effect crossing the boundary carries a durable ID + replay rule; replay after restart is exactly-once / idempotent. Crash the brain before/after spool write, before/after PTY write, mid-transfer, mid-registry update — no dup, no drop.
- **spt-core mapping:** broker↔brain IPC (ADR-0004), self-update handoff.
- **Source:** codex Stage A #14.

### 7.3 Psyche outbound capture + sanitization  `[REQ-HAZARD-PSYCHE-OUTBOUND-PROXY]`
- **Failure:** the Psyche's sole outbound channel is its **stdout** (`<EVENT type="reply|notify">` intents — ADR-0012). Two ways to break it: (a) a **null-stdout / detached** live-Psyche driver silently **discards every reply and notify**; (b) the daemon relays a Psyche-supplied `from=`/target **unchanged**, letting a sandboxed Psyche **spoof identity** or address arbitrary endpoints.
- **Invariant:** the live-Psyche turn driver **MUST capture stdout** (a bounded, stdin-fed, stdout-captured invocation — **never** `Stdio::null()`); the daemon **MUST strip** every Psyche-supplied `from=`/target/routing attribute, **re-stamp `from=<self_id>`**, and **constrain routing** — `reply` → the `__REPLY_TO__` sender only, `notify` → the agent's own user/subnet only. Body validated per 4.1.
- **spt-core mapping:** the live-Psyche turn driver + daemon outbound relay (ADR-0012); `spt-proto::event` type taxonomy (`+reply`/`+notify`). The interim `runtime::spawn_session` `Stdio::null()` path is **not** the live-Psyche driver.
- **Source:** grill-with-docs 2026-06-03 (uncovered by prior design passes); sister `src/live/wrapper/claude.rs` (sandbox `["Read","Write","Edit"]` + `parse_markers`).

### 7.4 Per-agent pulse/psyche/echo scheduling must not serialize across agents  `[REQ-HAZARD-DAEMON-SCHED-NONBLOCKING]`
- **Failure:** echo-commune (`run_bounded_stdin`) and the live-Psyche turn driver (D7.5) are **bounded LLM calls that block their calling thread** until the child answers or the timeout fires. Today each agent drives its own pulse from its **own process** (`spt/src/api/{live,startup}.rs`), so blocking is isolated. When the daemon hosts **N per-agent loops** (ADR-0004 target; the `run_pulse_loop` fan-out is currently def+test only), a **single serial driver** that calls these invocations inline lets one agent's slow/hung LLM call **stall every other agent's heartbeat, commune, and reply** — the per-call timeout bounds the worst case *only* when the work is isolated; serial makes the stalls additive.
- **Invariant:** each agent's bounded LLM-bearing work (echo-commune summarizer, Psyche turn) runs on its **own thread / off the shared scheduler** — no single-threaded driver iterating all agents may call a blocking invocation inline. One agent's slow/timed-out call must not delay another agent's next tick beyond tolerance.
- **spt-core mapping:** the daemon's multi-agent pulse/psyche hosting (ADR-0004, "all Psyche/pulse loops" consolidated); `run_pulse_loop` fan-out; echo-commune + the D7.5 Psyche driver.
- **Source:** grill-with-docs 2026-06-03 (forward invariant — the multi-agent fan-out is not yet wired). Distinct from ADR-0002 SERIOUS #6 (crash blast-radius, not scheduling latency).

### 7.5 WAN-inbound origin is transport truth, never payload  `[REQ-HAZARD-WAN-ORIGIN-AUTH]`
<!-- [doc->REQ-HAZARD-WAN-ORIGIN-AUTH] -->
- **Failure:** the ADR-0009 access whitelist gates **unsolicited wire inbound by origin node**. If the gate's subject is read from record bytes (an `origin_node`/`from`/`node` field a sender wrote), any sender forges any origin and the whitelist is decoration — same spoof class as 7.3's Psyche-supplied `from=`, now on the cross-node surface.
- **Invariant:** the origin the gate (and detection/UX — "node X is driving") consumes is the **QUIC handshake-proven remote node id** (iroh `EndpointId` == Ed25519 node pubkey, the REQ-NET-1 identity binding) read from the **broker's conn/stream table** (`NetStreamInfo::remote_id_hex`), never from payload. Wire records carry **no origin field by design**; a forged one decodes as an ignored unknown field and influences nothing. `from` inside a record is reply-routing metadata only — never an authorization subject.
- **spt-core mapping:** `spt_net::net::wanmsg::WanMessage` (no origin field) + `spt_daemon::wan::receive_wan` (origin parameter sourced from the stream table) + every future wire-inbound consumer (D5b attach, D5c transfer, D8 notifs).
- **Source:** M4-D5a design (ADR-0009 consequence: the gate must bind to the transport identity).
- **Mesh note (ADR-0017, 2026-06-08):** preserved verbatim. The mesh never relays third-party rows, so a record's author is **always** the QUIC-handshake-proven connection origin. The new **membership proof** (seed-proof) is itself channel-bound to both handshake-proven pubkeys — the authorization subject stays transport truth, never payload.

---

## Conformance checklist (condensed)

| # | Invariant | spt-core surface |
|---|---|---|
| 1.1 | Grace wait precedes INIT_SIGNOFF | daemon teardown |
| 1.4/4.4 | Deferred rows excluded from event-stream drain | daemon spool drain |
| 2.1/5.1 | Stable PID/broker-handle over ephemeral PID | liveness detection |
| 2.3 | Handoff argv/IPC version-tolerant (newer brain ↔ older broker) | broker↔brain IPC, self-update |
| 2.4 | gen_start = now() on cold-start + handoff | per-instance generation |
| 3.1 | Ephemeral perch cleanup on all exit paths | `ring` (RAII guard) |
| 3.3 | Echo-commune before INIT_SIGNOFF | daemon psyche loop |
| 4.1 | Envelope decode order, `&amp;` last | spt-proto (public, wire-versioned) |
| 4.2 | Parser panic-free + tolerant | spt-proto |
| 4.3 | Stale registry entries → fallback, never hard-fail | subnet registry resolution |
| 4.10 | Silent-node registry rows evicted (heard-map TTL); own rows never decay | registry pump eviction |
| 4.6 | Addressable-id charset reserves `:`/`@` delimiters | `spt_proto::id` at creation seams |
| 5.2 | tmp-write + atomic-rename + retry (EBUSY) | all state writes, binary swap |
| 5.3 | Timeout every harness subprocess | manifest invocations |
| 5.4 | Strip UNC prefix on serialized paths | spt-proto path normalization |
| 5.5 | ConPTY reader answers DSR (`ESC[6n`) | spt-term broker PTY reader |
| 5.6 | Detached long-lived children spawn `bInheritHandles=FALSE` | daemon + shell spawn |
| 5.7 | Daemon always unelevated in the invoker's universe (de-elevated spawn + entry guard) | `deelevate` seam, `spawn_detached`, `Daemon::run` |
| 5.8 | `CREATE_NO_WINDOW` on every console child of the console-less daemon | `gitrun::run_git`, `kill_shell_pid`, `run_bounded_command`, `shellwake` |
| 6.1 | Single path/registry source of truth | storage layout |
| 6.4 | Drop files supervisor-owned single-writer | runtime contract |
| 6.5 | Direct-write precedence marker (+ node id) | cross-node Psyche sync |
| 6.6 | Surfaced conflicts preserve both versions until dominated | context conflict artifacts (ADR-0013) |
| 6.7 | Broker + brain are separate processes (brain restart never drops a hosted endpoint at the process level) | daemon process topology, self-update (ADR-0018) |
| 6.8 | No irreversible durable-state migration before update ready-promotion (pre-ready writes stay N-1-readable) | auto-rollback / durable-state schema (ADR-0018) |
| 7.1 | Local `api` mutation authenticated to endpoint | api surface / broker IPC |
| 7.2 | Idempotent delivery across brain restart | broker↔brain IPC |
| 7.3 | Psyche outbound captured + `from=`/target stripped + reply-to-sender / notify-to-own-user | live-Psyche driver / daemon relay (ADR-0012) |
| 7.4 | Per-agent pulse/psyche/echo runs off the shared scheduler (no serial blocking across agents) | daemon multi-agent hosting (ADR-0004) |
| 7.5 | WAN-inbound origin = QUIC handshake identity from the broker's stream table, never payload bytes | wan receive funnel + every wire-inbound consumer (ADR-0009) |
