---
name: w5-driven-by-selfheal-characterization
description: W5 REQ-HAZARD-DRIVEN-BY-SELFHEAL — empirical finding that controller_by==None is NOT a sufficient Gap-A signal for the production wedged-controller case; fix must be broker-side eviction.
metadata:
  type: project
---

W5 (REQ-HAZARD-DRIVEN-BY-SELFHEAL) — the two latch mechanisms, characterized empirically on a REAL broker (2026-06-19, inject_control_wedge.rs `w5_a1_*` + `w5_a2_*`).

**Finding (load-bearing for the FIX design):** `SessionInfo.controller_by == None` is NOT a sufficient Gap-A signal for the production wedged-pump case.

- **A1** (local/empty slot): a freshly-spawned session pre-attaches the LOCAL spawner as controller with `by=None`, so `OutputLog::controller_by()` returns `None` for a live, locally-driven session — same read as "no controller at all". But a local-only controller never latches `driven_by` (it's a remote-drive state), and the Gap-A clear is gated on `driven_by==Some`, so this is harmless: `controller_by==None` IS the correct Gap-A signal for the restart/lost-slot case.
- **A2** (abandoned REMOTE controller, IDLE session — the REQ's real target): a real loopback remote attach (`request_attach`/`serve_attach` → `become_controller(Some(origin))`) occupies the slot with `Some(origin)` and latches `driven_by=Some(origin)`. Abandon the operator's remote WITHOUT a clean EOF/detach and keep the session IDLE (no output → the W1 drain-evict `mark_controller_gone` never fires) → the slot STAYS `Some(origin)` and `driven_by` stays latched. **VERDICT: controller_by stays Some — does NOT self-clear.**

**Implication:** a reconcile that only checks `controller_by==None` catches A1 and Gap B (no session), but MISSES A2. The A2 lost-detach clear must be BROKER-SIDE eviction (a controller-liveness probe / reconcile-driven `clear_controller`), not merely a reconcile `controller_by` read. See [[effect-journal-wedge-tests]] for the sibling W1/W1b wedge gates and the real loopback-attach rig these reuse.

**Reconciles with [[w5-driven-by-selfheal]] (no contradiction):** that prior memory says a DIRECTLY-OBSERVED controller-conn socket-drop self-heals (per-conn `detach_if`→`clear_controller`→`stamp_driven_by`). A2 is the OTHER shape: the controller slot is owned by the TARGET's `serve_attach` `brain` conn (which calls `attach_as(Some(origin))`), and serve_attach keeps blocking on `brain.read_event()`. Abandoning the OPERATOR's remote stream does NOT drop that target conn, so the broker never observes the disconnect → no `detach_if` → slot stays Some(origin). That is exactly the production lost-detach / wedged-pump case the REQ targets, and why the in-process "plain socket drop" cannot reproduce it.

**Rig notes:** broker `stamp_driven_by` is best-effort — `set_driven_by` ERRORS when the perch info.json is absent, so an A2-style test MUST pre-seed `<owlery>/<endpoint>/info.json` (via `InfoJson::new` + `write_info` at `resolve_perch_path(endpoint, Infer)` under the test SPT_HOME) or `driven_by` silently stays None and the verdict is unobservable. `controller_by` is read straight from `Brain::sessions()` (the field the reconcile consumes). A parallel agent's `driven_by_selfheal.rs` holds the complementary RED gates proving the reconcile LEAVES the latch (Gap A + Gap B against `reconcile_hosted_liveness`).

**`w5_a2` is a FLAKY NEIGHBOR in the full-file run (confirmed 2026-06-19).** It passes in ISOLATION (~4s) but in a full `cargo test --test inject_control_wedge -- --test-threads=1` it intermittently FAILS or HANGS: it deliberately abandons the operator remote and runs `serve_attach` on a thread it never signals, then `server.join()`s it — so on a run where the abandoned serve thread doesn't return (no child output / no clean EOF) the test PARKS on the join, or the race-prone `cby_attached=Some(Some(_))` precondition + hard assert flips. When triaging a full-file failure that lands on `w5_a2`, re-run it standalone before suspecting your own change; a clean way to validate a NEW test in this file is `--skip w5_a2_abandoned` (9/9 green there). This is intrinsic to the A2 characterization staging, NOT a regression from neighboring tests.
