# v0.12.1 — Lifecycle + Picker JIT Plan

**Why:** v0.12.0 shipped (counter 25) but real-harness testing (operator, `claude-spt`, Windows) showed the
lifecycle fixes do **not** deliver in the real daemon + broker + PTY path. v0.12.0 was gated on **mock adapters +
in-process `reconcile_once`** — green tests, broken reality. v0.12.1 reopens with **real-harness gating** (no mocks).

**Branch:** `v0.12.1-lifecycle` off `main@5f9ea23`. **Release:** v0.12.1 PATCH (deployah, after doyle gate-pass all waves).
**Roles:** doyle designs/mints/gates · todlando executes · deployah releases.

**Meta-fix (binding for this sprint):** every lifecycle gate runs against the **real dummy-harness fixture + a real
detached daemon** — NOT `reconcile_once` in-proc, NOT a mock with a live-draining peer. doyle independently re-runs each.

---

## Wave 1 — Test infrastructure (unblocks all real-harness gates)

- **T1 — dummy-harness adapter fixture** (operator's idea). A `kind="harness"` adapter whose command is a trivial program
  that (a) binds its perch on startup (the harness contract), (b) prints a stdout line on an interval, (c) stays alive
  until killed. Drives the REAL `spt endpoint run` → `launch_harness_brokered_in` → broker PTY → `rc` attach path.
  Isolates "broker wedged / endpoint died" from "a real harness failed to launch." Becomes the permanent regression
  fixture the v0.12.0 mock tests never had. Lives in the test tree; runs against a scratch target dir (livehost-E2E pattern).

## Wave 2 — Lifecycle core

- **L0 — DONE @cf5eab4 (gate-green local, doyle SEAM-GREEN pinged for independent re-run).** ROOT was NOT
  candidates a/b/c/d: a REPLAY-vs-FORWARD IPC deadlock in `serve_attach` (received session output AND sent wire
  forwards on the SAME broker conn; `become_controller`'s sync inline ring replay blocked the single-threaded
  conn handler from reading the forwards). FIX (Option a, doyle ruling): serve_attach forwards on a SEPARATE
  broker conn. REQ activated [impl,unit,int]. Deeper root (controller sync-replay vs viewer async pattern) =
  Option (b), flagged POST-v0.12.1. See memory [[v0121-l0-attach-deadlock-root]]. **NEXT = L1.**

- **L0 (original spec) — REQ-HAZARD-ENDPOINT-RUN-ATTACH-OUTPUT (KEYSTONE — do FIRST).** Confirmed v0.12.1 Wave 1 via the real
  dummy-harness fixture: a clean `spt rc` attach to a LIVE, heartbeating, psyche-hosted `endpoint run` harness receives
  **0 bytes** over 10s of its flushed `[session.self]` stdout — no death, no wedge. This IS the operator's central
  "attach shows no output," and it blocks the whole "view is independent" goal (re-attach shows nothing). Known-good
  (attach.rs loopback-attach E2Es) proves the broker drains+fans a `spawn_session` PTY child over the same transport →
  the gap is endpoint-run-specific (both paths share `dispatch_spawn`, broker.rs:706/835). ISOLATE path-vs-program first
  (run the attach.rs known-delivering child as the endpoint-run `[session.self]`); then root the mechanism — candidates:
  (a) `spawn_session_pid` SpawnReq stdio/env/cwd diff; (b) harness stdout write-blocks on a full ConPTY buffer (drain not
  reading THIS pty) → alive-but-0-bytes; (c) ConPTY reader-park (KH 7.6); (d) `rc` subscribe/`resolve_session` for an
  endpoint-run session reads the wrong/empty log. GATE (dummy harness): rc attach to a LIVE endpoint-run harness RECEIVES
  its DUMMY_HARNESS_TICK within a bounded window.

- **L1 — design GATE-PASS @5ae68f8 (doc+impl+unit landed; int = OPERATOR MANUAL ACCEPTANCE, awaiting operator real-env test).**
  Landed: CREATE_BREAKAWAY_FROM_JOB on both daemon spawn paths (daemon.rs `detached_no_inherit` + deelevate.rs token spawn),
  best-effort with ACCESS_DENIED/INVALID_PARAMETER fallback → in-job (no spawn regression). Units: no-regression-fallback
  (CI-green) + escape-mechanism (self-skips under denying ancestor job). **EMPIRICAL FINDING:** the cargo/CI runner's OWN
  job FORBIDS breakaway (code 5) → the "survives tab-close" int gate can't run faithfully in CI (nesting=false-green), AND
  a real job CAN deny breakaway so IF WT/VSCode denies it breakaway alone won't fix it.
  **doyle RULING 2026-06-18 (design APPROVED, breakaway is the correct primary mechanism):**
  (1) int stage = **OPERATOR MANUAL ACCEPTANCE, not CI** — CI is structurally a guaranteed false-red (runner job denies
  breakaway, every test job nests); keep the two units as CI evidence, `required_stages` stays `[doc,impl,unit]`, int
  documented manual-accept in REQ + KNOWN-HAZARDS 7.10. DON'T chase a CI int test.
  (2) **DON'T build the daemon-out-of-job backstop speculatively** — gate it on the operator result.
  (3) daemon-OWNED harness Job = **L4 reap backstop ONLY** (reap at daemon-stop), NOT tab-close survival (a nested job dies
  with the terminal's kill-on-close job). The L4 fold is right; don't oversell it as tab-close protection.
  **BACKSTOP CANDIDATE (design-only, build ONLY if operator shows breakaway DENIED + daemon dies):** re-parent the cold-start
  daemon spawn OUT of the terminal job via a job-neutral creator — **WMI `Win32_Process.Create`** (owned by WmiPrvSE, outside
  the terminal job; synchronous, returns pid) preferred over a `schtasks` one-shot; escapes even where the job denies
  CREATE_BREAKAWAY_FROM_JOB.
  **OPERATOR ACTION OWED:** real Windows Terminal / VS Code → `endpoint run` → close the tab → `spt rc <id>` must be ALIVE +
  re-attachable. If it dies, the terminal denies breakaway → build the WMI backstop. doyle holds Wave 3 P2 + Wave 4 E1 until
  this result lands (it may change L1 scope). See memory [[v0121-l1-viewer-close-detach-findings]]. Original spec retained below.

- **L1.5 — JOB-NEUTRAL DAEMON LAUNCH (PRIMARY); breakaway DEMOTED to fallback. NEXT BUILD (todlando), BEFORE Wave 3/4. Completes L1.**
  doyle re-scope (operator-approved 2026-06-18) + 5 gate conditions. ROOT reframed: the harness is ALREADY the daemon's
  child (KIND_SPAWN), never the terminal's — the ONLY thing in the terminal Job is the COLD-STARTED daemon, because
  `spawn_detached` → `detached_no_inherit` runs FROM the terminal-child CLI, so the daemon INHERITS the terminal's Job
  Object (DETACHED_PROCESS detaches the CONSOLE, not the JOB; a child inherits the parent's job regardless). CREATE_BREAKAWAY
  tried to claw back out but a job CAN deny it (the L1 finding). FIX: never put the daemon in a terminal job — launch it via
  a JOB-NEUTRAL creator so it's WmiPrvSE/Task-Scheduler-owned, OUTSIDE any terminal job from birth (why autostarted
  Task-Scheduler daemons never had this bug). Harness rides along. NO breakaway needed in the happy path.

  **LAUNCHER LADDER (first-success-wins), wired into `spawn_detached` so BOTH cold-start (`ensure_running`/
  `ensure_daemon_announced`) AND `spt daemon start` use it (condition 5):**
  1. **WMI (powershell Invoke-CimMethod)** — `Win32_Process.Create` → daemon = WmiPrvSE's child, job-escaped. Use the
     ABSOLUTE `C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe` (NEVER bare `powershell` — KH 5.12 /
     [[ccs-win-pty-program-resolve]]; PATH/PATHEXT not trusted for a privilege-relevant launch) (condition 1).
     `powershell.exe -NoProfile -NonInteractive -Command "Invoke-CimMethod -ClassName Win32_Process -MethodName Create
     -Arguments @{CommandLine='<abs spt exe> daemon run'}"`. The CommandLine value is a SINGLE-QUOTED PS literal with embedded
     `'` doubled to `''` (F-009 no-split/no-inject; the abs exe path may contain spaces) (condition 3). SUCCESS REQUIRES
     BOTH `ReturnValue == 0` AND a parsed `ProcessId` from the result (format the output, e.g. `... | Select -Expand
     ProcessId,ReturnValue`); ANY non-zero ReturnValue / ProcessId parse-miss / powershell-spawn-fail → FALL THROUGH to
     rung 2, NEVER a silent "launched" (condition 2). Readiness still via the existing `brain.ready` poll (the caller already
     polls — pid is for logging/return, not readiness). Latency (PS cold ~hundreds ms) is fine — cold-start only.
  2. **schtasks one-shot** — `schtasks /create /sc once /tn <uniq> /tr "<abs spt exe> daemon run" /st <near-future>` then
     `/run /tn <uniq>` then `/delete /tn <uniq> /f` → Task-Scheduler-owned, job-neutral. pid not returned → Ok sentinel,
     readiness via `brain.ready`.
  3. **CREATE_BREAKAWAY_FROM_JOB** — the L1 landed code, REORDERED below WMI/schtasks (escapes only where the job permits).
  4. **in-job last-resort** — current behavior; log `DETACH_IN_JOB` + the tab-close caveat.

  **IMPL (daemon.rs):** extract the one-attempt primitive `create_process_detached(program, args, extra_flags) -> io::Result<u32>`
  from `detached_no_inherit`'s inner `spawn_with`; KEEP `detached_no_inherit` (breakaway-then-in-job) UNCHANGED for its OTHER
  caller `shellhost::launch_shell` (a daemon-spawned shell is already job-neutral after this fix — do NOT put launch_shell on
  the WMI ladder). New `launch_daemon_job_neutral(exe, args)` drives the ladder; map each rung to its launcher; log
  `DAEMON_LAUNCH_VIA_{WMI|SCHTASKS|BREAKAWAY|IN_JOB}`. New `spawn_daemon_via_wmi` + `spawn_daemon_via_schtasks`. Wire the
  ladder into `spawn_detached` (replace the `detached_no_inherit(&exe,&args)?` at daemon.rs:608). The elevated `deelevate`
  path keeps its L1 breakaway for now — note an elevated-case WMI-reparent as a FOLLOW-UP (rarer; the operator case is
  unelevated WT/VSCode). DIRECT-COM WMI = a logged FOLLOW-UP TODO/REQ, not this gate.

  **PURE DECISION SEAM (the unit, condition 2):** `enum LaunchRung { Wmi, Schtasks, Breakaway, InJob }`,
  `fn launch_ladder() -> [LaunchRung; 4]`, and a driver `drive_ladder(ladder, try_one: impl FnMut(LaunchRung) -> io::Result<u32>)
  -> io::Result<(LaunchRung, u32)>` that returns the FIRST Ok + which rung, else the last Err. Unit: rung-1 Ok → picks Wmi;
  rung-1 Err → rung-2 Ok → picks Schtasks (the error→rung map); all Err → last Err; ladder order is exactly [Wmi,Schtasks,
  Breakaway,InJob]. (Real `try_one` calls the real launchers; the unit injects a fake to stay hermetic.)

  **INT (NOW CI-TESTABLE — no nesting false-red, condition 4):** `crates/spt/tests/job_escape_e2e.rs`. A small launcher HELPER
  (a tiny bin, or `spt` sub-invocation) is placed INSIDE a freshly-created `JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE` job and
  cold-starts the daemon via the FORCED rung-1 WMI path; the test then CLOSES the job handle (kill fires → helper dies) and
  asserts the daemon pid is STILL ALIVE + reachable + `spt rc <id>` re-attaches. MUST force + ASSERT rung-1 fired (a green
  that silently ran breakaway = false pass — assert the `DAEMON_LAUNCH_VIA_WMI` log / returned rung). WmiPrvSE is outside our
  job → the daemon survives. Reap scoped (shared runner).

  **REQ/DOCS:** REQ-HAZARD-VIEWER-CLOSE-DETACH → activate `+int` (becomes [doc,impl,unit,int]); update the disposition
  (job-neutral PRIMARY, breakaway FALLBACK). Update KNOWN-HAZARDS 7.10. Unix unchanged (setsid already keeps SIGHUP off;
  keep the guard test, no code). **GATE (doyle):** CI int (job-escape via the REAL WMI rung) + traceable + clippy. Operator
  real-env (WT+VSCode tab-close) = FINAL confirmation, NON-gating (WMI escape is known-good regardless of terminal job policy).
  Ping doyle on green.

- **L1 (original spec) — REQ-HAZARD-VIEWER-CLOSE-DETACH (PRIMARY).** Closing the tab/window where `spt endpoint run` was invoked must
  detach only the `spt rc` pump; the daemon-hosted harness keeps running and is re-attachable.
  ROOT: the daemon never breaks away from the launching terminal's Windows Job Object (`KILL_ON_JOB_CLOSE`); no
  `CREATE_BREAKAWAY_FROM_JOB` anywhere → tab close reaps the daemon's freshly-spawned ConPTY harness subtree. ConPTY
  isolation itself is already correct (portable-pty makes the pseudoconsole in the daemon; no console signal/handle leak).
  FIX: add `CREATE_BREAKAWAY_FROM_JOB` to both daemon spawn paths (`daemon.rs:707` `detached_no_inherit`,
  `deelevate.rs:519` elevated) **+** pin each broker-spawned harness into a **daemon-owned Job Object** (mirror Breap,
  `reap.rs`) as backstop (survives even if a terminal sets `SILENT_BREAKAWAY_OK=false`). `pty.rs` unchanged. Unix: verify
  the daemon's session-detach already covers terminal-close (SIGHUP scope) — likely no change, add a guard test.
  GATE: spawn daemon under a parent-held `KILL_ON_JOB_CLOSE` job → `endpoint run` a dummy harness → close the parent job →
  assert the harness pid **stays alive** AND `spt rc <id>` re-attaches AND a brand-new endpoint launches.

- **L2/L3/L4 — PROVEN @attach_wedge_e2e (PROVE-DON'T-CHANGE, GREEN 2026-06-18).** The post-L0 code ALREADY prevents
  the wedge — REQ-HAZARD-ATTACH-WEDGE activated `[int]`, NO impl/unit change. The original "loopback write_all blocks
  forever → parks the 2-worker net runtime" root is STALE: (1) serve_attach forwards fire-and-forget
  (`net_stream_send(op_id=None)`) and the broker-side `send_stream` is ALREADY bounded by `bounded_block_on`
  (BROKER-QUIC-DEADLINE, 10s), not forever; (2) the loopback duplex is drained broker-INTERNALLY by the operator row's
  own read pump (`nethost.rs` `RecvHalf::Loopback`), which for an ordinary attach stream (`retentive_cap==0`) NEVER parks
  → `peer_w` never backs up on a dead rc (a dead rc = a dropped IPC subscriber against a bounded evicting ring); (3)
  `bounded_block_on` = `runtime.handle().block_on` → parks the BROKER DISPATCH thread, not a net worker. The int gate
  (`crates/spt/tests/attach_wedge_e2e.rs`, real detached daemon + dummy harness): SERVE the victim (rc sees its tick),
  abruptly kill rc (undrained pump) + kill the PTY child → a NEW endpoint still comes online + is served (L2 no wedge),
  the dead endpoint is OFFLINED within one reconcile tick (L3 — broker exit-waiter reaps the dead session,
  `reconcile_hosted_liveness` clears the latch; brain log `LIVENESS_RECONCILE_OFFLINE:wedge1`), `daemon stop` bounded (L4).
  doyle independent re-run pending. Original spec retained below.

- **L2 (original spec) — REQ-HAZARD-ATTACH-WEDGE (ROBUSTNESS).** Even a *legitimately* dead PTY child (real crash/kill) + an undrained
  operator pump must NOT wedge the broker.
  ROOT: loopback attach output is a blocking `write_all` into a bounded 64 KB tokio duplex (`nethost.rs:1040,1090`); a
  dead operator (closed tab) stops draining → `write_all` blocks forever (the "loopback never hangs" assumption at
  `nethost.rs:1103` is false) → parks workers in the **2-worker** net runtime (`nethost.rs:640`) → both saturate → every
  new attach/`endpoint run` stalls after `PUMP_IPC_READER: spawned` → 30 s `FIRST_EVENT_GRACE` → "dead or wedged";
  `daemon stop` can't join the stuck workers. Distinct from the removed B1 path-(c) mutex deadlock.
  FIX: make loopback sends fail-fast — a full-buffer / `BrokenPipe` loopback write is an ordinary per-stream error that
  ENDS `serve_attach`; one dead stream can never hold a runtime worker. (Defense-in-depth: raise worker count — but the
  real fix is non-blocking-on-dead-peer.)
  GATE (dummy harness): kill the child abruptly + drop the operator pump without a clean detach → assert a **new**
  endpoint is still served, `brain.sessions()` returns promptly, `daemon stop` completes bounded.

- **L3 — status=online persistence (sub-item, folded into L2's gate).** Three dead endpoints stayed `status=online`.
  B2 (`reconcile_hosted_liveness`) should offline a controllable spt-hosted perch whose broker session is gone — confirm
  whether abrupt Windows child death actually reaps the broker session (so B2 sees it absent) and whether the reconcile
  tick fires. Resolve as part of L2; GATE asserts the dead endpoint is marked **offline within one reconcile tick**.

- **L4 — `daemon stop` ends everything (operator item #1, folded).** `daemon stop` did not end all spt processes / take
  everything offline. Largely downstream of L1+L2 (un-wedged broker can stop cleanly; harness in a daemon-owned job is
  reaped by Breap). GATE: after L1+L2, `daemon stop` terminates the daemon + brain + all hosted harness/psyche processes,
  bounded, and the roster goes offline.

## Wave 3 — Picker (operator-raised; ROOTS TBD — investigate before fixing)

- **P1 — REQ-PICKER-HISTORY-FRESH.** `spt endpoint run` picker does **not** show project history for fresh endpoints.
  Investigate the project-history loader (v0.10.0 PICKER-2, `picker/data.rs`) — real bug vs "fresh = no history yet"
  semantics. Then fix.
- **P2 — REQ-PICKER-ONLINE-ACTION.** Picker shows **"Start now"** for endpoints that are already online. Investigate the
  4-state status mapping (v0.10.0 PICKER-1, `picker/model.rs`) — is it reading live/online state correctly, or rendering
  stale/wedged broker state? Then fix (online → "Attach", not "Start now").

## Wave 4 — endpoint list cleanup (operator decision 2026-06-17, dropped — now minted)

- **E1 — REQ-ENDPOINT-LIST-MERGE-LOCAL.** Remove the `--local` flag; the bare `spt endpoint list` ALWAYS merges this
  node's local (unadvertised) perches into the view. Rationale: `spt whoami` is a thin alias — a just-online agent
  running `whoami` must see its own perch. Drop the flag + its `--detail` conflict test + the v0.10.0 hint line +
  `cmd_list_local`; fix the `whoami` alias path. Run `cargo run -p xtask -- gen` (docs-drift, DEFAULT target).

## Release

- **v0.12.1 PATCH.** deployah drives after doyle gate-pass on all waves. CHANGELOG: cross-check **each bullet's scope vs
  the commit range** (the v0.12.0 lesson — [[changelog-scope-vs-commit-range]]); name the specific delta. Post-publish:
  inform operator + perri (real-harness lifecycle now actually works).

## Open / carried

- Non-blocking test-hardening: drain OUTPUT after `KIND_EXIT` in `broker::spawn_env_reaches_child` (latent flake) — fold
  into Wave 1/2 test work if cheap, else carry.
- Machine cleanup: clear the operator's 3 phantom endpoints (wall-a/b/c) + orphan psyche — scoped, preserve daemon +
  the `doyle` live perch + CI.
