---
name: v012-lifecycle-milestone
description: "v0.12.0 spt-hosted lifecycle & liveness reconciliation milestone (operator-mandated blocking, doyle dispatch) — 7 REQs, wave-by-wave, B2 keystone built"
metadata: 
  node_type: memory
  type: project
  originSessionId: bf7260cd-fc4e-463b-9c46-b2ed704b4bab
---

**v0.12.0 spt-hosted LIFECYCLE & LIVENESS RECONCILIATION** — operator-mandated BLOCKING (endpoint run unusable: attach hangs, daemon stop doesn't stop). doyle dispatched 2026-06-17 (2-agent diagnosis). Branch **v0.12.0-lifecycle** off main@27e0619 (v0.11.0). todlando builds keystone-first in waves, doyle gates PER-WAVE, then deployah → v0.12.0 MINOR.

**UNIFYING ROOT:** info.json `status=online` is a ONE-WAY LATCH — set at establish (startup.rs:361/468), NEVER cleared vs real liveness (liveness.rs:80-93 is_perch_alive returns ONLINE for daemon-hosted). Every symptom falls out of this + stop/restart lifecycle not tearing-down/rehydrating.

**v0.12.0 PUBLISHED 2026-06-18 (counter 25, MINOR, no retag).** Hashes (signed rel-primary-2026): linux `0458b5ae94a16207a61fddcfb10d1f087c95ea129c253d2af57454b8ca4df785` · win `6699be3d4256ccc64ab34ecb51bae6fe56aa4e4fcc80019db969f6176de3b6cd`. https://github.com/SaberMage/spt-releases/releases/tag/v0.12.0 . Path: PR#25 merge c19f6a1 (incl reap-fix 6de905b) → bump c17550d (CHANGELOG=doyle VET PASS body verbatim + Cargo.lock 11 first-party only) → main CI Windows green + Linux rerun green (flaked once on `broker::spawn_env_reaches_child` = PTY output-vs-exit race, out="" = no-output-captured NOT env-drop, NOT perri's F-013 class; non-blocking follow-up = drain OUTPUT after KIND_EXIT in that test) → tag → sign+publish. **doyle informed perri (delivered): lifecycle unblocks her /sptc:live (brain-restart no-wedge/no-dup, attach no-hang, daemon-stop reaps, no phantom revival, status-reachability with RELAY EXEMPT so her relay agents aren't wrongly offlined); F-013 NO_PERCH was v0.11.0 not this; her FINDING A pointer-adapter gap still queued separately.** deployah publisher leg 19× clean (counter 7-25); 7 releases this revive session v0.8.3→v0.12.0. **MILESTONE COMPLETE.** ORIGINAL IN-FLIGHT NOTES below. deployah pushed the stack + opened PR#25; first CI RED on Linux kitsubito = TEST-ONLY cross-platform bug (NOT my production code): dup-reap unit hardcoded prog="sh" but Ubuntu /bin/sh→dash symlink → exe_basename="dash" → basename gate failed. FIX @6de905b (doyle-dispatched, code-read PASS, ZERO prod change): (1) orphan_reap test derives expected prog from the psyche's OWN live exe_basename AFTER readiness poll (not hardcoded sh/cmd) — isolates the test to the ID-SPECIFIC cmdline gate; (2) harden unix marker spawn `sh -c 'sleep 30; : <marker>'` (two cmds, trailing builtin `:` → shell stays resident, no tail-exec-replace, /proc/cmdline keeps marker) in BOTH orphan_reap + proc.rs marker units; Windows spawns unchanged. Earlier Windows red = global C: disk-full (1.85TB outside workspace), cleared by operator reclaim → both green this run. **LESSON: this Git Bash has NO `jq` — a CI Monitor using external `jq` silently never settles; use `gh pr view --json X --jq '...'` (gh's BUILT-IN jq) or `gh pr checks`.**

**v0.12.0 STATE @464eaf7 (now @6de905b on PR#25) — ALL ITEMS GATE-PASS, READY TO RELEASE:** mandated-7 (B2·B1·H3·B3·Breap·B5, **B4 closed-SUBSUMED**) + psyche-reap + dup-reap + endpoint-purge @981dea2 + **ready-agent-resume @464eaf7 = doyle GATE-PASS 2026-06-18** (independently: read diff, confirmed append-dedup L99-100, ran new unit + int + positive anchor live_bind_firsthost in target-seam all green, traceable EXIT=0, FIX(2) SUBSUMED state-gate verified-at-code; int non-vacuous — diag shows carried session sess-resume-carry-42, state=ready_agent, hosted=0). todlando told CLEAR. **NEXT: drive deployah → v0.12.0 MINOR release, AWAITING operator GO** (publish is outward-facing/irreversible, all prior releases operator-gated; stack 11 ahead of main, unpushed). REQ-PICKER-ADAPTER-DESCRIPTION = deferred fast-follow (post-v0.12.0).
- **REQ-READY-AGENT-RESUME** @464eaf7 [doc,impl,unit,int]: offline ReadyAgent now shows in `endpoint run` picker Resume-from-history. ROOT: harness-hosted ready bind ReadyAgent::start_homed wrote info.json but never ledgered → 0 session rows → picker (gates on ledger rows) skipped it. **FIX(1)** start_homed appends Boot row on bind (mirror establish_perch:250, !sid.is_empty, best-effort). **FIX(2)=SUBSUMED (verified-at-code B4-style, ZERO code):** no-psyche discriminator for a ready endpoint is the livehost reconcile START-side STATE gate (livehost.rs:319 `info.state != LIVE_AGENT_STATE → continue`, skips a ready_agent perch) AHEAD of the adapter(325)/psyche_init(343) gates; cmd_endpoint_run type-agnostic; --resume carries session via {session_id}/[env] fill. unit start_ledgers_a_boot_session_row. int ready_bind_ledgers_and_reconcile_hosts_no_psyche (real `spt ready` bind w/ OWL_SESSION_ID ledgers carried session + reconcile_once hosts NO psyche EVEN with a live-capable psyche_init adapter resolved = inversion of live_bind_firsthost). NO `spt` subcommand → no xtask gen needed. **GOTCHA: targeted `cargo test -p spt --test contract_e2e` shows 2 FAIL (`mock_bin.exists()`) — contract_e2e needs the `mock-session` helper which lives in the `mock-adapter` PACKAGE, not spt; targeted --test won't build it. Pre-build `cargo build -p mock-adapter --bin mock-session`. NOT a regression.** Seam bar all green (clippy workspace, traceable EXIT=0, new unit+int, poll_envelope/contract/quickstart/live_bind_firsthost siblings).
- **REQ-ENDPOINT-PURGE** @981dea2 [doc(doyle@d8e7761),impl,unit,int]: `spt endpoint purge <id>` — EndpointCmd::Purge{id,--yes,--force} + cmd_endpoint_purge. Offline-only gate (--force=cmd_stop→bounded-wait-reconcile-unhost+psyche-reap→purge), SPT_ENDPOINT_ID self-guard, confirm unless --yes. Removes perch TREE (remove_dir_all owlery/<id>/ = nested {id}-psyche/{id}-w*/shells, all under it) + unregister_address + ContextStore::remove_endpoint + AccessStore::open + NEW VisibilityStore::remove_endpoint. Generalizes fork --delete-source behind rename's offline gate. **GOTCHA: xtask gen BUILDS spt respecting CARGO_TARGET_DIR but READS hardcoded target/debug/spt → under CARGO_TARGET_DIR=target-seam the gen reads a STALE binary, reference.md misses the new command. FIX: run `cargo run -p xtask -- gen` in DEFAULT target (where CI's xtask check reads).** unit purge_offline_gate_and_self_guard + int purge_removes_every_record (cli.rs bin tests — spt has NO lib, use `--bin spt`). Seam: clippy clean, traceable 247 EXIT=0, xtask check OK, 178 bin tests green.

**WAVE 5: psyche-reap @7449f5e GATE-PASS. dup-reap @909ad01 GATE-PASS + B4 CLOSED SUBSUMED (same commit). SCOPE GREW: operator added 2 more to v0.12.0 — READY-AGENT-RESUME + `spt endpoint purge` — doyle DESIGNS+MINTS+DISPATCHES those (NOT mine to start). So v0.12.0 = mandated-7 (B4 subsumed) + dup-reap + [ready-resume + purge pending doyle design]. After all gate → deployah → MINOR.**
- **REQ-HAZARD-BRAIN-RESTART-PSYCHE-DUP** @909ad01 [impl,unit,int]: brain restart left DUPLICATE psyche/endpoint (abrupt brain death → stop_host never runs, old psyche orphaned alive; new brain re-hosts + overwrites perch pid → old untracked til daemon-stop). FIX = reap_orphan_psyches at brain-start (spawn_live_host, BEFORE first reconcile): triple-gate psyche_orphan_should_reap = alive AND normalize_basename(exe)==adapter psyche prog AND process_cmdline(pid).contains("<id>-psyche"). **ID-SPECIFIC is load-bearing (doyle): basename-only WRONG-KILLS a sibling — all sibling agents share `claude` basename; a recycled pid on a sibling's claude matches basename alone. cmdline-contains-full-"<id>-psyche" is sibling+prefix-safe; FAILS SAFE (any None→decline; missed dup Breap-bounded, wrong-kill catastrophic).** CAVEAT (documented): cmdline carries id only if psyche_init bakes {id} (norm); non-{id}=safe-miss. NO start-time (doyle YAGNI). New spt_store::proc::{kill_pid scoped, process_cmdline (Win NtQueryInformationProcess class60 UNICODE_STRING / Linux /proc/cmdline)}. unit orphan_reap_..._spares_a_same_basename_sibling + process_cmdline_reads_a_live_arg_marker (Win FFI in-test). int brain_restart_leaves_exactly_one_psyche (real daemon, SINGLE-pid brain kill→respawn→P1 reaped+one P2).
- **B4 CLOSED SUBSUMED** (REQ-HAZARD-BRAIN-RESTART-LIFECYCLE-REHYDRATE, required_stages=[]): 5-axis evidence all-rebuilt-on-restart (config→with_config_in / pulse+psyche→spawn_live_host@230→reconcile_once / delivery→resume_sessions@797 / online-off→B2+B5 / shellwake→spawn_wake_host@219). Kept the id (not redefined) — the real residual = dup-reap (own id). doyle "win not miss".

**(prior wave-5 first half) psyche-reap @7449f5e GATE-PASS** — owned-Child reap (recycle-PROOF). B4 = axis-table SUBSUMED.
- **REQ-HAZARD-UNHOST-PSYCHE-REAP** @7449f5e [impl,unit,int]: on un-host (endpoint-stop/signoff/B2+B5-offline) stop_host joined the driver THREAD but never killed the detached {id}-psyche PROCESS → orphan til daemon-stop/Breap. FIX = OWNED-Child (doyle mandate, recycle-PROOF vs bare-pid hitting a recycled sibling on shared box): additive spawn_session_owned→OwnedSession{pid,Child} at every layer (spt-runtime→spt-live spawn_and_bind_owned/spawn_psyche_owned→BrainLifecycle::spawn_psyche_owned→HostedLife.psyche_child→stop_host child.kill()+wait()). Fire-and-forget callers untouched (spawn_session=owned-then-drop). kill_pid REMOVED (owned path = dead-code). **GOTCHA #1 STRUCK AGAIN (seam caught, not code-read): owning the handle made confirm_residency_or_unhost's is_process_alive(perch_pid) read a fast-EXITED psyche ALIVE on Win (held handle lingers til wait()) → masked F-010 → nonresident E2E RED → FIX: residency probes owned child via try_wait (LiveSet::psyche_running). bootrace(resident) green throughout; only fast-exit caught it.** Owned-Child also = the recycle-proof pattern for any long-lived hosted child (sibling of Breap's owned brain).
- **B4 REQ-HAZARD-BRAIN-RESTART-LIFECYCLE-REHYDRATE = SUBSUMED (axis table all-covered):** BrainLifecycle (lifecycle.rs:61-74) is PURE CONFIG. On bare brain restart every axis already rebuilt: config→host_one with_config_in; pulse+psyche-host→run_brain spawn_live_host(:230)→reconcile_once; PTY-delivery cursors→resume_sessions(:797); online/offline→B2+B5; shellwake→spawn_wake_host(:219). REQ-as-written ("no livehost / Psyche never re-hosted") is STALE (predates the reconcile). → close as covered (doyle "win not miss"). **NEW RESIDUAL surfaced (high-confidence reasoning, untested): brain restart ORPHANS old psyches — stop_host never runs on abrupt brain death (LiveSet+owned handles die with brain); Breap job/group only fires at DAEMON stop; new brain re-hosts FRESH + overwrites {id}-psyche perch pid → OLD psyche untracked+alive = DUPLICATE psyche/endpoint til daemon-stop. Likely perri's "brain kill+restart wedged everything." REAP gap, not rehydrate. FIX = on brain start reap pre-existing {id}-psyche orphan (UNOWNED kill_pid back WITH a use + recycle guard is_process_alive+exe_basename). doyle ruling pending: (a) close B4 + mint REQ-HAZARD-BRAIN-RESTART-PSYCHE-DUP, (b) redefine B4, (c) defer to Breap-bound. I lean a/b.**

**PRIOR PROGRESS: B2✓ @03b6398. WAVE 2 (B1+H3) GATE-PASS+FIXED @78a13c2. WAVE 3 (B3+Breap) GATE-PASS @026e9b4+@0c1e59e. WAVE 4 (B5) GATE-PASS @9e338f1 (doyle INDEPENDENTLY ran the runtime sweep — clippy/traceable EXIT=0, B5 unit + ALL 6 livehost E2Es incl new bootgate green, bind-firsthost still hosts = no false-offline regression). WAVE 5 (LAST) RELEASED to todlando = B4 + NEW REQ-HAZARD-UNHOST-PSYCHE-REAP.** Then deployah → v0.12.0 MINOR.

**PSYCHE-IS-A-PROCESS RULING (operator Q, grounded CONTEXT.md+code):** the Psyche is a HARNESS PROCESS by design — a headless `claude --resume` session (CONTEXT.md:97; :203 own independent perch; :251 agent endpoint "something intelligent runs there"; code: manifest `[session.psyche_init/resume]` command, turn.rs:25-27 "persistent harness session"). What lives IN the brain as a THREAD is the pulse/psyche-driver LOOP (CONTEXT.md:34 "pulse/psyche loops"; = HostedLife.thread, joined by stop_host). spt-core moved away from legacy's separate WRAPPER (CONTEXT.md:173 "no separate wrapper"), NOT the psyche process. So the orphan fix is design-consistent: un-host must KILL the harness psyche process (NOT move it in-brain). → NEW REQ-HAZARD-UNHOST-PSYCHE-REAP (wave 5 w/ B4): stop_host kills the {id}-psyche pid (on the perch, residency reads it), scoped. [impl,unit,int].

**WAVE 4 (B5) design [impl,unit,int]:** lifted LIVENESS_RECONCILE_BOOT_GRACE skip in spawn_live_host → B2's reconcile_hosted_liveness runs from BOOT tick 1 (BEFORE reconcile_once) → sessionless controllable==Some(true) perch OFFLINED at boot → reconcile_once skips host → no phantom revival. doyle: REUSE the B2 fn, no 2nd liveness notion. WHY grace-drop is safe: an spt-hosted agent runs IN a broker PTY so its session PRECEDES the online mark → online+controllable+sessionless = ONLY a cold-start phantom (offline it) OR brain-restart survivor (in the session set, kept). None-skip + controllable gate preserved. 3-WAY MATRIX: (i)child-dead-midlife→B2, (ii)boot-revive→B5, (iii)dead-{id}-psyche→residency-confirm. NO E2E regression: existing livehost E2Es use `api listen` (Some(false), exempt) OR call reconcile_once in-proc w/o broker (None-skip); none host a sessionless-controllable perch. unit boot_gate_offlines_sessionless_controllable_then_reconcile_skips_host; int livehost_bootgate_e2e (real cold-start, stale online+controllable perch no session → offlined at boot + no {id}-psyche). **OPEN (doyle→operator scope call): stop_host stops the DRIVER thread but NOT the detached psyche PROCESS — a mid-life endpoint-death/stop leaves the psyche running until daemon-stop/Breap. Real+production-reachable, separate from B5/B4. Candidate new REQ (stop_host reads {id}-psyche pid + kills it). Not in B5/B4 unless operator greenlights.**

**WAVE 3 design (B3+Breap), both [impl,unit]:** B3 = client-side barrier in seedmap::request_stop (after STOPPING ack, drop conn, poll ping() to failure / 10ms slices, 5s wedge-ceiling — doyle's option 2, NOT socket-unbind-before-ack). Breap = `reap::BrainReaper` rooted at the BRAIN SUBTREE (never the daemon → reap never self-terminates, never touches a sibling live agent on the shared box): Win = kill-on-job-close Job, brain assigned per (re)spawn in spawn_brain_supervisor, Psyches inherit; Unix = brain pre_exec setpgid(0,0) leads own group, supervisor records pgid, kill(-pgid,SIGKILL). Daemon::run graceful path raises supervisor stop flag FIRST (hoisted the "held for symmetry never raised" Arc) THEN reaps; straggler-brain backstop = self-exit on broker-loss.

**WAVE 3 LESSONS (2 Windows-process gotchas, both reusable):** (1) **terminated process w/ open handle reads ALIVE** — is_process_alive(pid) (OpenProcess) returns true for a TerminateJobObject'd process while a Child handle is still held (kernel object lingers; same class as the Unix-zombie OpenProcess gap). For an OWNED child use try_wait, not is_process_alive(pid). (2) **UseShellExecute=true → job breakaway** — Windows PowerShell's [Diagnostics.Process]::Start defaults UseShellExecute=true → ShellExecute spawns the grandchild OUTSIDE the job → it survives the reap. Force UseShellExecute=$false (CreateProcess) so it inherits the job — mirrors the brain→Psyche spawn (null stdio + CreateProcess, no breakaway). doyle's "exercise the grandchild link, don't reason it" caught this — the child-only unit was green while the real orphan path escaped.

**CRITICAL LESSON — B1 path-(c) deadlock (cost ~3h, gate code-read MISSED it):** f85bf69's "primary dead-detector" `if let Ok(Some)=session.try_wait()` in Broker::dispatch_subscribe DEADLOCKS every LIVE attach (incl production rc). `PtySession::try_wait` locks the child mutex; the per-session exit-waiter thread holds that SAME mutex in a blocking `wait()` for the WHOLE life of a live child (broker.rs exit-waiter, pty.rs:203). Live child → try_wait blocks forever → no Subscribed reply → hang. Broker units passed (their children exit fast → lock freed); only the FULL attach suite (matrix/loopback/remote, 85-200s hangs) caught it. **Gating data-flow logic ("try_wait Some→Exit; None→live-safe") is BLIND to lock-contention/ordering deadlocks — you MUST run the real runtime sweep, not just touched units + a code-read.** Sibling of [[shared-seam-change-run-all-seam-tests]]. Path-(c) was also redundant: exit-waiter already broadcasts Exit to all sinks on reap; post-removal subscribe → "no such session". RED HERRINGS chased first: "concurrent cargo wedge" (real but not this), "input-race needs ready-banner" (phantom — silent-child tests pass in CI), "environmental QUIC-under-load" (wrong — cwd passed instantly). Operator's "fix it, don't tape it" forced the real root.

**7 REQs minted registry-first @e2d6391 (traceable EXIT=0). WAVE ORDER: B2 → B1+H3 → B3+Breap → B5 → B4.**
- **B2 REQ-HAZARD-HOSTED-LIVENESS-RECONCILE** (keystone) — **BUILT @03b6398, WAVE-READY, awaiting doyle gate.** doyle ruled PULL-PRIMARY (live-status analog of v0.11.0 ROSTER-GHOST). new `reconcile_hosted_liveness(owlery, live_sessions)` in livehost.rs: queries broker KIND_SESSIONS each tick (`query_live_session_endpoints`), marks_offline any status=online live_agent perch with NO live broker session. Wired into spawn_live_host BEFORE reconcile_once, gated past-10s-boot-grace + broker-reachable (None→skip). **CONTROLLABLE GATE (load-bearing): only controllable==Some(true) spt-hosted; relay Some(false) + legacy None EXEMPT** (else kills perri /sptc:live agents). PUSH (ExitEvent→mark_offline) NOT added — confirmed daemon brain has NO event-consumer loop (main=net_status heartbeat), push has nowhere to land for unattended; correctness rides pull. unit pull_liveness_marks_sessionless_spt_hosted_offline_only + int pull_reconcile_offlines_perch_when_broker_session_dies (real broker session killed+reaped→latch cleared). [impl,unit,int].
- **B1 REQ-HAZARD-RC-ATTACH-FAILFAST** — rc to dead/silent session → infinite blank. SUB-MECHANISM PINNED: wall-f hang = broker STILL resolves a session (resolve_session rc.rs:209 returns Some — child alive-silent OR dead-not-reaped on Win tab-close) → pump blocks on first read that never comes. FIX (doyle): (a) gate attach on is_online/status [after B2], (b) fail-fast bound first-output/ack→clear msg, (c) broker EOF stream on dead-child→PumpEnd::BrokerGone (REQ-HAZARD-RC-EOF). PIN with repro test FIRST (deterministic = attach to alive-but-silent session). wave 2.
- **H3 REQ-ENDPOINT-STOP-OFFLINE** — cmd_stop (cli.rs:2994-3010) removes ready+unregister but NO set_status offline → stopped still alive=true. FIX: add set_status(perch, STATUS_OFFLINE), folds B2 setter. CONFIRMED at code. wave 2.
- **B3 REQ-HAZARD-DAEMON-STOP-BARRIER** — request_stop (seedmap.rs:240-255) returns on KIND_STOPPING ack (174-176) BEFORE seed socket unbinds → is_running ping (daemon.rs:375) wins → start sees ALREADY_RUNNING. FIX: unbind before ack OR wait-ping-fail. wave 3.
- **Breap REQ-HAZARD-DAEMON-STOP-REAP** — stop leaves ~8 orphaned psyche/spt.exe (psyches DETACHED runtime.rs:342-356; livehost stop flag brainproc.rs:227-230 NEVER raised). FIX: raise stop flag + kill children via Win job object / Unix process-group. folds B3. wave 3.
- **B5 REQ-HAZARD-LIVEHOST-BOOT-LIVENESS-GATE** — daemon start spawns psyche per online-latched perch (reconcile_once livehost.rs:285) w/o child-liveness check → revives phantoms. FIX: boot-gate on real session (shares B2 reconcile — no-session perch marked offline at boot not revived). Verify None-spt-hosted phantom caught by residency-confirm (matrix: i child-dead→B2, ii boot-revive→B5, iii dead-psyche/None→residency). wave 4.
- **B4 REQ-HAZARD-BRAIN-RESTART-LIFECYCLE-REHYDRATE** (deepest) — bare brain restart loses ALL BrainLifecycle (lifecycle.rs:58-130); resume_sessions re-subscribes PTY but no livehost → post-restart endpoints unhosted until full daemon kill. FIX: on brain startup rebuild BrainLifecycle per resumed live-capable session (load manifest→instantiate→start pulse). wave 5.

**SEAM BAR (HARD, EVERY wave):** clippy --workspace -D warnings + traceable + FULL sweep daemon_lifecycle_real_brain + livehost(bootrace/nonresident/firsthost/bind-firsthost/psyche_fail) + broker + handoff + attach + brain_swap + resume — NOT just new tests. livehost E2Es need a SCRATCH target dir (live daemon holds target/debug/spt.exe — shared runner, no kill). doyle: "verify each agent-traced root at the code before building — flag wrong" (I confirmed B2 latch + push-has-no-consumer + H3; flagged the B2 push-vs-pull gap → doyle ruled pull-primary).
