# v0.13.0 W5 — `driven_by` self-heal (JIT plan, repro-first)

> Grounded 2026-06-19 (todlando) after W2 close. doyle assigned W5 (holds W6/perri + Layer F gravity).
> BINDING: repro-first on a REAL broker (not theory). Tests via spt-test-engineer (ONE, background,
> no Monitor, ≤4 iter). Gates = clippy `--workspace --all-targets -D warnings` + `traceable-reqs check`
> EXIT=0 + real `cargo build -p spt --bin spt`. doyle gates at int.

## The bug (W5)
An endpoint can latch `ONLINE+CONTROLLED` (`info.json` `driven_by = Some(node)`) and never clear it
when a controller's detach is LOST — the picker then shows a phantom "controlled by X" forever.

## Grounding (code-read, confirmed)
- `driven_by` lives in `info.json` (`spt-store/src/info.rs:100`); single-writer = the broker via
  `OutputLog::stamp_driven_by` (`broker.rs:563`) → `set_driven_by(perch, self.controller_by())`.
- `clear_controller` (`broker.rs:382`) ALREADY re-stamps `driven_by`→None when it takes the controller.
  The W1 deadline-evict `mark_controller_gone` (`broker.rs:392`) calls it. **So a STALLED controller
  (active drain + send_timeout) already self-heals.** W5 is the paths with NO drain to trigger evict.
- **Gap A — dead controller, idle session:** controller process dies, but the session is idle (no
  output → no drain → `mark_controller_gone` never fires) → `driven_by` stuck `Some`.
- **Gap B — session/harness gone:** `reconcile_hosted_liveness` (`livehost.rs:511`) marks the perch
  offline when the broker hosts no session for it, but does NOT clear `driven_by` → an OFFLINE
  endpoint can still render CONTROLLED.
- `SessionInfo` (`msg.rs:570`, the `KIND_SESSIONS` reply rows) does NOT expose controller-presence —
  the reconcile can't currently tell "live session but no controller" (Gap A) apart.

## Plan
1. **Repro-first (RED on a real broker):** extend a daemon int test — set `driven_by` on a hosted
   endpoint, drop the controller connection WITHOUT a clean detach AND without driving output (so the
   drain-evict path is NOT what clears it), run the reconcile tick, assert `driven_by` STILL `Some`
   pre-fix (the gap), and that a picker/`endpoint list` render shows phantom CONTROLLED. (Author via
   spt-test-engineer; reuse the `attach_wedge_e2e` / `inject_control_wedge` real-broker rigs.)
2. **Expose controller-presence (Gap A):** add `SessionInfo.controller_by: Option<String>` —
   ADDITIVE, `#[serde(default)]` (KH-2.3 N-1 window, mirror `resume_seq`'s pattern). Populate from
   `OutputLog::controller_by()` when the broker builds the `KIND_SESSIONS` reply.
3. **Self-heal reconcile:** extend `reconcile_hosted_liveness` (or a sibling `reconcile_driven_by`)
   to clear `driven_by` (via `set_driven_by(perch, None)`) when, for a self-perch with `driven_by`
   set: (B) the endpoint has NO live broker session (already the offline case — clear alongside
   `mark_offline`), OR (A) it HAS a live session whose `SessionInfo.controller_by` is `None`. Skip the
   pass when the broker is unreachable (don't mass-clear on a transient — same guard as
   `query_live_session_endpoints`). Single-writer note: the broker owns `driven_by`; the reconcile
   runs in the brain/daemon loop — confirm the writer-race posture (the reconcile clearing vs a
   concurrent `stamp_driven_by`) is safe, or route the clear through the broker. **Check this in the
   repro** — prefer the broker as the clearer if there's any race (B2/REQ-HAZARD-HOSTED-LIVENESS-
   RECONCILE precedent: broker is single writer).
4. **Compose with W1:** `clear_controller` is the in-broker unlatch; W5 makes `info.json` follow it
   for the lost-detach paths. Don't duplicate the stalled-controller path (already healed).
5. **REQ:** check the registry for a W5 req (likely mint `REQ-...DRIVEN-BY-SELF-HEAL` or fold into
   REQ-REACH-1 / a REQ-HAZARD). Activate `[impl,unit,int]` on landing; KNOWN-HAZARDS entry (sibling of
   the B2 liveness-reconcile hazard). int = the repro flipped GREEN (driven_by clears for both gaps;
   no false-clear of a genuinely-controlled live session).

## Watch-outs
- Do NOT clear `driven_by` for a genuinely-live controlled session (false-heal) — the controller_by
  presence check is the guard; the repro must include a still-controlled session that must NOT clear.
- N-1: an older broker omits `controller_by` → newer brain reads `None`. That would FALSE-CLEAR a
  controlled session under N-1. Guard: only clear on `None` when the session ALSO shows no other
  controller signal, OR gate the Gap-A clear behind "field present" — decide in the repro. Gap-B
  (no session) is N-1-safe (no field needed).

## Sequencing
Independent of W4/W6. doyle gates at int. After W5: confirm gravity-linux CI green for the W2 g1
byte-receipt + Layer F + W1b F3 at the v0.13.0 PR (doyle tracks).
