# V0.13.0 — viewer-drain decouple (b4 root) — JIT plan

**Status:** b4 fix IMPLEMENTED + verified + PUSHED → `origin/viewer-drain-decouple-b4` @ **677c3e7** (off delivery-control@18416fd). Local gates ALL GREEN: broker unit tests 10/10 (2 new b4 tests + regressions), workspace clippy -D warnings clean, traceable-reqs exit 0. AWAITING doyle's post-b4 forkpty re-measure (cherry-pick 677c3e7 onto PR #27 + v5 tally). Expect a_journaled GREEN, p0_paste still RED (next), g2 TBD (may close indirectly). NEXT items: p0_paste skip-to-live (gated, below) + g2 (re-measure post-b4).

## ⚠ v0.13.0 IS THREE ITEMS, NOT ONE (doyle's 3-gate counters, run 27898894089)

The 3 RED gates have THREE DISTINCT roots. b4 closes ONLY a_journaled. Do NOT ship
b4 and expect a clean board.

1. **a_journaled** — c1=0, EVICT=0, journaled_ops=13139, got_output=FALSE = **b4
   drain-throttle** (controller deliver blocks the drain inline; viewer never fanned).
   ✅ **THE b4 FIX (this plan) CLOSES THIS.**
2. **p0_paste** — EVICT FIRED ("dropped 1 viewer at seq 35584, drain_appends=35585"),
   pumped=35658, attach_received_output=FALSE, backpressured=true, BOTH platforms.
   = the drain ran FREE; the VIEWER (serve_attach's broker subscription) overflowed
   its 256 channel + was EVICTED because serve_attach forwards
   (read_event→b64decode→re-encode AttachRecord→net_stream_send) SLOWER than the
   drain fans out. **SEPARATE root. b4 does NOT touch it.** Triage lean (gate pending
   doyle): eviction-of-a-hopelessly-behind-viewer is correct session-protection; the
   bug is it's SILENT + PERMANENT. RIGHT product = SKIP-TO-LIVE — serve_attach
   re-subscribes from the current ring floor on eviction (like tail -f: see live, not
   die replaying an uncatchable backlog).
   **GATED (doyle): skip-to-live is REAL, not relax.** Eviction itself = correct
   (keep it); SILENT+PERMANENT eviction = the bug. VIEWER-only → B2-SAFE (viewers do
   NOT advance delivered_through / not authoritative; no cursor exposure). Design:
   (1) explicit broker→viewer EVICTION SIGNAL (unambiguous marker, DISTINCT from
   session-exit EOF — serve must not tear down a live attach on it) sent to the
   evicted viewer's conn; (2) serve_attach re-subscribes from the CURRENT ring floor
   (skip-to-live) on that signal; (3) HARD constraint: NO evict→resubscribe busy-loop
   — rate-limit / forward-progress so under max-flood the viewer sees intermittent
   LIVE bursts, never a CPU spin. Do NOT lead with serve stall/gap-detection (fragile
   fallback only). SEPARATE fix, AFTER b4.
3. **g2** — assert :1196 `assert!(delivered)` = the broker did NOT ACCEPT the injected
   endpoint event within 8s (NOT a got_output failure). forkpty-only. = inject-
   acceptance (dispatch_endpoint_input / inject path) starved by the output-flood
   CPU storm. **Distinct.** b4 frees drain-thread CPU (deletes the 5s inline poll), so
   b4 MAY close g2 indirectly — **RE-MEASURE g2 AFTER b4 lands before designing a
   separate fix.** If still red → real inject-acceptance-must-not-starve fix.

**v0.13.0 close = b4 (this plan) + p0_paste skip-to-live (doyle-gate pending) + g2
(re-measure post-b4, then decide).** This JIT plan covers item 1 (b4); items 2 & 3
are captured here but not yet designed/gated.

## The bug (b4 — confirmed by doyle's code read + todlando's verification)

The 3 inject_control_wedge gates fail WARM on forkpty (a_journaled/g2 Linux-only;
p0_paste both platforms) with subscribed=TRUE, got_output=FALSE. Localized via the
v4 counters: **c1=0, c2=0, c3=1, EVICT=0**.

Root: the session drain closure (`broker.rs:1450-1457`) runs `job.deliver()`
**inline**. `ControllerJob::deliver` (`broker.rs:669-685`) is a `try_send`
sleep-poll loop: on a FULL controller channel it sleeps `CONTROLLER_SEND_POLL`
and re-polls up to `CONTROLLER_WRITE_DEADLINE` (5s), **on the drain thread**. So a
controller that drains slower than the PTY floods → its `CONTROLLER_CHANNEL_DEPTH`
(4096) channel fills → `deliver()` throttles the drain → `append()` (and its viewer
`try_send` fan-out) doesn't run → **the viewer starves**. EVICT=0 because the
throttled drain is too slow to even fill the viewer's 256 channel.

No-recovery at 18.97s = steady-state-near-full: the controller socket drains JUST
enough that each `deliver()` gets a slot within 5s → returns Ok → controller is
NEVER deadline-evicted → drain throttled the WHOLE run (not 5s episodes).

Controller identity in a_journaled: after the driver's `w.attach(sid,0)` (Control,
by=None) it `become_controller`s over the spawner (same-identity local re-take) →
the **driver** is the throttling controller, and its PACED read (the 20ms+5ms
anti-CPU-storm pacing, Unix-only) is slow enough under the forkpty `yes` flood to
keep its channel full. forkpty-only because forkpty floods harder than Windows
ConPTY's paced output.

Regression verdict: NOT a new class. Pre-W1 a full controller wedged the drain
PERMANENTLY; W1 (8b5583e) bounded it to 5s episodes but STILL couples drain
throughput to the controller. The NEW gates are the first to ASSERT the
viewer-not-starved-by-a-busy-session property — a LEGITIMATE product property
(rc --view of a noisy session must work) → REAL FIX, not pathological-expectation.

## Fix direction — RESOLVED design (doyle-gated reshape, sent for final gate)

doyle's constraint 2 collapsed the design tension: a slow-but-alive controller does
NOT need gapless live delivery — it degrades to the EXISTING ring-fall-behind +
resume-from-floor (RESTORATION-D4 clamp / snap-above render cursor), the same as the
reconnect case. So the controller becomes viewer-like (try_send drop-on-full) + a
staleness-evict. No ring-replay-on-recovery machinery needed.

THE RESHAPE: controller backpressure becomes a SINGLE non-blocking `try_send`,
identical to viewers; `ControllerJob::deliver()`'s 5s sleep-poll is DELETED. A single
try_send never blocks, so doing it UNDER the log lock is safe (viewers already do
exactly that in append). So move the controller send INTO append() (under the lock,
beside viewer fan-out); the drain closure loses the deliver() call entirely.

- `ControllerSink` += `last_ok: Instant` (stateful staleness in the sink under the
  log, survives across chunks; set = now at `become_controller`).
- `OutputLog::append` (under log lock), controller arm:
    match c.tx.try_send(CtrlMsg::Output(seq, frame)) {
      Ok(())               => c.last_ok = now,                                  // healthy
      Err(Full(_))         => if now - c.last_ok >= CONTROLLER_WRITE_DEADLINE { evict = Some(epoch) }
                              else { /* DROP — controller falls behind the ring */ },
      Err(Disconnected(_)) => evict = Some(epoch),
    }
  append returns `Option<evict-epoch>` (replaces the ControllerJob return). Drain
  off-lock: `if Some(epoch) => mark_controller_gone(epoch)`. NO deliver() loop, NO
  sleep — drain never blocks.

### Maps to doyle's 3 gate constraints (B2 gapless-handoff is load-bearing)

1. `delivered_through` advances ONLY on a real successful socket write — UNCHANGED:
   `controller_writer` advances it on write; a DROPPED frame never enters the channel
   → never written → never advances. Zero phantom-advance.
2. Slow-but-alive controller drops frames under flood → on-wire forward-jumps →
   operator snap-above cursor / RESTORATION-D4 clamp = existing fall-behind-ring case.
   No hard gap, no drain backpressure.
3. Truly-stalled controller: `last_ok` stops advancing → CONTROLLER_WRITE_DEADLINE of
   continuous-Full → evict. Bounded-wedge preserved; only the inline sleep dies.

### CONSTRAINT (a) — CONTIGUOUS delivered_through (doyle's gate; REQUIRED, prevents a B2 re-break)

THE FLAW in the bare reshape: `advanced_cursor` (broker.rs:263) is a HIGH-WATERMARK
(`current.max(seq+1)`). With drop-on-full, the drain drops seq N (Full) then later
enqueues N+1 (slot freed). `controller_writer` writes …N-1 (cursor=N), then N+1 →
`advanced_cursor(max(N, N+2)) = N+2`. The cursor JUMPS past the dropped N. Since
`delivered_through` IS the `resume_seq` a COLD brain reads (broker.rs:235-236), the
resumed brain skips N → incoherent screen → non-gapless / not-exactly-once resume =
the B2 load-bearing invariant (broker.rs:8-30) RE-BROKEN. "A dropped frame never
advances the cursor" is true PER-FRAME but the NEXT contiguous write jumps the
watermark past the gap.

FIX (a): make the cursor CONTIGUOUS. `controller_writer` advances `delivered_through`
ONLY when the frame it writes has `seq == cursor` (the next expected seq); a gap
(`seq > cursor`, from a drop) FREEZES the cursor at last-contiguous (the frame STILL
goes to the socket — the live operator sees it; only the durable resume point freezes).
A fallen-behind controller then resumes from the frozen cursor → ring replay catches
it up, or D4 floor-clamp if the frozen seq aged out of the ring + the controller was
evicted = exactly constraint-2 degradation, now CORRECT. Backward-compatible: today's
no-drop path has no gaps, so the advance is byte-identical. Implement in
`controller_writer` (broker.rs:705+): before/at the `advance_delivered` call, gate on
`seq == expected-next`; freeze (skip advance) on `seq > cursor`. Read `advance_delivered`
(688-703) + `advanced_cursor` (263) to do it minimally (likely a contiguity check
rather than the high-watermark `advanced_cursor`).

FINAL GATED RESHAPE = items 1-3 above (try_send/append/last_ok/evict, deliver-poll
deleted) + KH-unchanged (epoch/one-writer/monotonic) + (a) contiguous cursor. All of
doyle's 3 constraints + B2 hold. doyle adding a per-chunk deliver/try_send tally to v5
to empirically show the fix closes the gate.

RESOLVED (doyle's call): deadline semantic = "CONTROLLER_WRITE_DEADLINE since last
SUCCESSFUL send" (continuous-full window). Accepted — strictly better (a controller
making progress isn't evicted).

KH guardrails (preserved): REQ-HAZARD-INJECT-CONTROL-COEXIST,
REQ-HAZARD-CONTROLLER-WRITER-REORDER. Epoch gate unchanged (mark_controller_gone still
epoch-checked); one-live-writer-per-conn unchanged; monotonic on-wire seq unchanged.
Read before coding: `controller_writer` (broker.rs:705+), `advance_delivered` (688-703),
`become_controller` (358), `ControllerJob`/`deliver` (649-686), `append` (302+).

### Traceability (CLAUDE.md binding)

Likely a NEW req (e.g. REQ-HAZARD-VIEWER-STARVE-UNDER-CONTROLLER-BACKPRESSURE) or
fold into REQ-HAZARD-INJECT-CONTROL-COEXIST. Add to traceable-reqs.toml FIRST,
then satisfy (impl + unit). The 3 gates are the int evidence.

## Open items before coding (await doyle)

1. EMPIRICAL b4 confirm (doyle cutting): bump `CONTROLLER_CHANNEL_DEPTH` huge OR
   make the gate controller drain fast (un-paced pump) → if a_journaled greens, b4
   is THE trigger and the decouple fix is justified.
2. 3-gate counters from run 27898894089 (a_journaled / p0_paste / g2, c1/c2/c3 +
   EVICT). p0_paste fails BOTH platforms — does it ALSO fill the controller on
   Windows, or is it a 2nd mechanism? Splits whether one fix covers all three.
3. W1-vs-parent bisect (doyle) — confirm pre-W1 = permanent wedge (validates
   "not a regression").

## Already shipped this session (test-infra, both verified + keepable)

- `pump-carrier-fix` branch (origin), off v0.13.0-delivery-control@18416fd:
  - **7a1cc68** — both rc-attach gate attachers cold_start→cold_start_pump (kills
    the `subscribed=false` measurement artifact; facet-B itself already fixed by
    W1b e404aeb + P0 input-writer isolation).
  - **7c09819** — env-tunable registry busy_timeout (SPT_REGISTRY_BUSY_TIMEOUT_MS,
    default 5000; CI exports 30000) — ends the Phase-A SQLITE_BUSY load flake.
  - Both validated locally; traceable-reqs check exit 0. doyle cherry-picked onto
    PR #27 validation branch (his 25f60c1 p0_paste + 9dd7873 registry + e54bc51 CI env).

## Confirmed FIXED (not the blocker)

facet-B (W1b e404aeb + P0 + no-ack, all in HEAD; operator's 06-19 wedge was a
pre-W1b build 8b5583e). Carrier artifact (pump fix). Registry SQLITE_BUSY (knob).

## v4 diagnostic record (do-not-merge)

`wedge-trace-v4` (origin) carries the SPT_WEDGE_TRACE c1/c2/c3 + EVICT counters
(broker.rs append, attach.rs serve, nethost.rs StreamLog::append). Diagnostic only.

## End goal

Cut v0.13.0. Path = land the gated viewer-drain decouple fix → 3 gates green warm
on forkpty (kitsubito CI) → fold pump-carrier-fix + decouple to delivery-control →
doyle folds to delivery-control → deployah cuts the release. g2/InjectFloor
(commit-deadline/floor-release) is a SEPARATE item, still parked.
