# v0.13.0 — spt-hosted message delivery + the control/wedge bug cluster (JIT plan)

> Milestone born from operator dogfooding of v0.12.1/v0.12.2 (grill-with-docs, 2026-06-19).
> Scope decision: **one milestone** — folds the in-flight v0.12.2 work (window-regression fix +
> Markdown-output sweep, already gated on branch `v0.12.2-cli-fixes`) into v0.13.0.
> **Binding lesson (v0.12.0/v0.12.1):** several "fixes" shipped green-but-broken because they were
> gated on theory / mocks, not the REAL claude-spt harness. EVERY fix here is gated against a REAL
> dummy-harness fixture + real daemon. No prove-don't-change shortcuts on a hazard that recurred.

## The unifying invariant (the spine)

**The broker must accept INJECTED keystrokes (the v0.11.0 raw direct-inject today; the
manifest-declared translation-binary tomorrow) AND a live `spt rc` controller into the SAME
broker-held PTY without (a) the operator losing control, (b) the endpoint latching
`ONLINE+CONTROLLED`, or (c) the broker wedging.** The injection inlet is permanent —
spt-claude-code *requires* keystroke injection — so this is root-caused + fixed at the
PTY-injection layer, IN STEP with the delivery redesign that formalizes the inlet.

## Recovered design (CCS session 16531b26, 2026-06-17 — Q1→"A+" locked, rest deferred)

Idle message delivery to spt-hosted endpoints, redesigned:
- **`[message-idle-translation-binary]`** — opt-in adapter manifest key. **= the binary PATH**
  (operator: *binary owns the choreography*). spt-core **manages its lifecycle** (up with the
  endpoint, down when the endpoint goes down).
- The binary (i) **hosts the poll listener + subscribes to its output**, (ii) **drives parsed
  output + keystrokes into the broker PTY** via an spt-core **send-keys API** supporting
  **arbitrary keys + pre/post delays**. CC choreography (operator-specified):
  `ctrl+s (stash) → 50ms → payload + \r → 50ms → ctrl+s (restore)`.
- **Unified substrate (Q1=A):** the daemon's **poll feed** is the ONE idle substrate for both
  topologies; consumer differs — harness-hosted = the Monitor child; spt-hosted = the translation
  binary. **spt-core prefers the poll listener for a perch IF it exists** (so spt-hosted can use a
  listener AND keep `spt rc`). Idle-only; **busy = hook-injection** (adapter, mid-turn).
- Today's hardcoded direct-PTY-inject (payload+`\r`, no choreography) = the **degenerate special
  case** of a translation-binary — and the cause of the control-loss (it fights the controller).
- **Grounding gap (current):** `api bind` registers NO listener port → a listener-less spt-hosted
  perch SPOOLS inbound messages (only spooling+adapter-poll works today). The redesign closes this.

### RESOLVED (grilled 2026-06-19 → **ADR-0022**). Final contract:
- **`[message-idle-translation-binary]`** = scalar binary path, opt-in per manifest. NEW primitive
  (not collapsed into `[inject]`/`notif_command`; shares the poll-feed substrate).
- Binary = **pure stdin→stdout filter**; **spt-core owns lifecycle + every PTY write**.
- **stdin** (spt-core→binary, JSON-lines): `{type:"init",endpoint_id,node}` first ·
  `{type:"event",envelope:"<EVENT…>"}` per message · `{type:"input"}` content-free ping on operator
  keystroke (binary tracks user-idle for its own idle-gated buffering).
- **stdout** (binary→spt-core, JSON-lines): `{key:…}` · `{delay_ms:…}` · `{text:…}`.
- spt-core applies the sequence to the PTY **ATOMICALLY** — controller input buffered during the
  sequence, flushed after (so ctrl+s stash/restore can't be clobbered). This is the
  injection-coexists-with-controller mechanism (shared with W1).
- spt-core PREFERS a perch's poll listener if present. Idle-only; busy = adapter hook-injection.
- v0.11.0 raw inject = degenerate `{text:payload}{key:enter}` case.
- CONTEXT.md:39 reworded (W0); busy/idle split per ADR-0022.

## Waves

### W0 — Design formalization (doyle; gate: operator sign-off)
- Resolve the deferred forks above. Author **ADR-0022 (spt-hosted idle delivery redesign)** — it
  qualifies (hard-to-reverse delivery-model change; surprising = "why a separate binary"; real
  trade-off = binary-owns vs manifest-declares, chose binary). Reword CONTEXT.md:39 +
  delivery section. Add the `[message-idle-translation-binary]` row to the inject-seam family.
- Mint the milestone REQs (registry-first, stages=[] until each wave builds).

### W1 — KEYSTONE: broker-never-wedges + control-loss root cause (todlando build, doyle gate)
- **Repro FIRST** on the REAL dummy-harness fixture: an `spt rc`-attached spt-hosted session,
  inject a message (keystrokes) → instrument to find the EXACT blocking call (operator's signal:
  one keystroke succeeds, the next wedges → single-threaded broker parks on a blocking PTY/loopback
  write after injection-induced output). Do NOT fix on theory — nail the mechanism.
- Fix the data-plane: injected keystrokes + controller input + PTY-output drain must never
  deadlock the broker (candidates: non-blocking/fail-fast PTY write, split input/output, bounded-
  evicting). **REOPEN REQ-HAZARD-ATTACH-WEDGE** (v0.12.1 prove-don't-change was insufficient — the
  wedge recurs via the injection trigger, NOT just dead-child).
- int gate: the repro fixture asserts — after injection, the controller keeps control, no wedge,
  a new attach still serves, `daemon stop` bounded.

### W1b — PRIORITY: the effect-journal PTY-input wedge (W1 escape; todlando build, doyle gate)
- **W1 escape, found in operator dogfood 2026-06-19.** W1 fixed the OUTPUT drain (real). The
  INPUT/effect-journal path is the untouched wedge. REQ-HAZARD-EFFECT-JOURNAL-PTY-WEDGE.
- ROOT (measured): `apply_once` (effect.rs:168-188) holds one global mutex across PENDING-fsync →
  `effect()` (the blocking PtyWrite) → DONE-fsync; `write_line` does sync_all() twice per effect.
  (A) every keystroke = 2 fsync serialized (measured 6.5ms median / 198ms spike) → stutter; (B) a
  blocked PtyWrite holds the lock forever → dispatch can't open attaches → `--view/--take` →
  'brain IPC read deadline elapsed' (permanent). Refutes the W2-deferred 'ConPTY-benign' ruling.
- This IS W2 Layer C's substrate (own-every-PTY-write-atomically + the park bound) — build it as
  the FOUNDATION of Layer C, before the translation choreography rides on top.
- Fix >= (1) don't hold the lock across effect() + (3) drop per-keystroke fsync (PtyWrite is
  ephemeral; in-memory applied-set dedup suffices, broker-survives-brain is the anchor); keep
  fsync for durable kinds. (2) bound/fail-fast PtyWrite (the deferred Unix park bound).
- int gate: extend inject_control_wedge.rs to a REAL backed-up PTY consumer + a REAL rc-client
  attach that stays serviceable AND actually receives PTY bytes — the assertion W1's gate lacked
  (it only checked KIND_SESSIONS liveness, never real attach delivery -> how this escaped).

#### W1b BUILD STATUS (todlando, in progress) — PRIORITY pivot from W2
- PIVOTED from W2 Layer C per doyle (this IS Layer C's substrate). W2 banked clean: A@c73eb6a,
  B@611205c, D-plumbing@7166f52, Layer-C InjectFloor kernel@79f5a40 (sits ON TOP of this fix).
- ROOT (confirmed in effect.rs): `apply_once` holds `inner.lock()` across `write_line(PENDING)`
  (fsync) -> `effect()` (the blocking PtyWrite) -> `write_line(DONE)` (fsync). Lines 168-188;
  `write_line` 235-239 = flush()+sync_all().
- FIX (mine, impl) — (1)+(3) minimum, doyle-gated: add `EffectKind::is_durable()` (PtyWrite=false,
  others=true). `apply_once`: reserve key (+ fsync PENDING for DURABLE) under lock -> RELEASE lock
  -> run effect() OFF lock -> re-acquire -> finalize (mark applied; + fsync DONE for durable). A
  concurrent in-flight same-key (pending) re-drive dedups. PtyWrite = NO journal lines / NO fsync
  (in-memory dedup only; ephemeral — lost keystroke retyped, broker-survives-brain is the anchor).
  Durable kinds (NetSend/NetDial/Registry/Spool) keep exact current behavior.
- FIX (2) park bound (pty.rs write_input holds writer mutex across blocking write_all; DSR-answer
  off the mutex) — Unix-forkpty only (Windows ConPTY absorbs = benign), gate gravity-linux. doyle:
  minimum is (1)+(3); (2) rides if clean else gravity follow-up within W1b.
- TESTS via spt-test-engineer (dispatched @a4f3a273, RED-capture-first): structural units in
  effect.rs (barrier proves lock-not-held-across-effect; PtyWrite writes NO journal line vs NetSend
  does = fsync proxy) + int in inject_control_wedge.rs (real broker+PTY, stalled consumer, concurrent
  attach STILL serviced + real byte receipt; Unix park-(b) folded; cfg-split Windows-benign).
  doyle GATE-GUARD: structural NOT fsync wall-clock (flaky). REPRO must be RED pre-fix.
- ORDER (blocking): subagent captures RED on UNFIXED code -> THEN I apply effect.rs fix (effect.rs +
  pty.rs are OFF-LIMITS until RED captured, else the repro can't reproduce) -> re-run GREEN -> activate
  REQ-HAZARD-EFFECT-JOURNAL-PTY-WEDGE [impl,unit,int] + KNOWN-HAZARDS entry -> doyle gate -> resume W2 C.

### W2 — KEYSTONE: delivery redesign (todlando build, doyle gate; depends W0)
- `[message-idle-translation-binary]` manifest key + validate; spt-core lifecycle-manages the
  binary (up/down with endpoint). The send-keys API (arbitrary keys + delays). Poll-feed substrate
  + listener-preference rule. spt-hosted gets real inbound delivery (closes the spool-only gap).
- int gate: a real dummy translation-binary delivers an inbound message into an rc-attached
  spt-hosted session via the choreography WITHOUT breaking control (composes with W1's invariant).

#### W2 BUILD STATUS (todlando, in progress) — locked decisions + layer ledger
**Contract decisions (doyle-gated, BINDING):**
- **Manifest shape = TABLE** `[message-idle-translation-binary]` with a `path` scalar
  (doyle OPT-B). NOT a bare top-level scalar (silent-absorb footgun). spt-core does
  NOT `deny_unknown_fields` → future keys degrade gracefully. Known keys: `path`.
- **Atomic-sequence boundary = explicit `{commit:true}` terminator** (doyle OPT-1).
  The binary emits it after the last keystroke of its response to an `event`.
- **`{commit}` is PER-EVENT**: each `{type:event}` dispatch OPENS the inject floor
  (buffers controller input), its matching `{commit}` CLOSES it (flush buffer →
  release floor). Concurrent events QUEUE — never interleave two uncommitted
  sequences. `{type:input}` pings keep flowing to the binary during buffering (idle
  tracking); the operator's real keystrokes are the BUFFERED bytes.
- **MANDATORY BOUNDED BUFFER (doyle, non-negotiable, = W1 'never park forever'):**
  `INJECT_COMMIT_DEADLINE` (~5s, reuse W1's order). No `{commit}` within deadline →
  flush buffer + release floor + log warning + FAULT the binary (terminate, do NOT
  respawn → degrade to plain pass-through, NEVER a dead operator). MUST be gated by
  a test: an event sequence WITHOUT commit must not wedge controller input.
- **doyle gate reminders (assert at int):** (1) `{type:input}` ping is CONTENT-FREE
  — no PTY input content leaks to the binary's stdin (privacy). (2) lifecycle kill
  on session-down is BOUNDED + reaped — no zombie translation child.

**Layer ledger:**
- A. Manifest table key + validate + schema regen — **DONE @c73eb6a** (REQ doc stage).
- B. `spt-daemon/translation.rs` — ToBinary(init/event/input) + KeyCmd(key/text/
  delay_ms/commit, untagged) + `key_to_bytes` send-keys map + `TranslationChild`
  (piped stdin, stdout-reader thread→Sender<KeyCmd>, Drop=bounded reap 500ms) —
  **DONE @611205c**. 3 unit tests (wire round-trips, content-free ping, send-keys).
- D-plumbing. `SpawnReq.translation_binary: Option<String>` (additive) threaded from
  manifest via harnesshost — **DONE @7166f52** (field on msg.rs; struct-literal sites
  swept to `None`; harnesshost fills from `manifest.message_idle_translation_binary.path`).
- C. **DONE @f9fe5bb** — broker atomic apply: `Translation` per-session handle (floor +
  `event_tx` queue + shared `Arc<TranslationChild>` + `faulted` flag) on `HostedSession`;
  `run_inject_worker` thread (one per translation-bearing session) drains queued events
  one-at-a-time: open floor → feed binary `{type:event}` → drain key/text(`write_input`)/
  delay(clamped sleep)/commit → `flush_inject_floor` (buffered controller bytes off-lock
  AFTER the sequence) → release. `inject_commit_deadline()` guard (5s; env-override
  `SPT_INJECT_COMMIT_DEADLINE_MS` for the fast int) → FAULT on miss/binary-death
  (`fault_translation` terminate, no respawn → raw-inject degrade). `dispatch_input`:
  content-free `{type:input}` ping per keystroke + buffer-if-held (op_id path buffers
  INSIDE the journal effect to keep ack+dedup). `TranslationChild.terminate()` made
  `&self`/Arc-callable for the FAULT-reap.
- E. **DONE @f9fe5bb** — `dispatch_endpoint_input` routes the whole `<EVENT>` to a live
  (non-faulted) worker via `event_tx`; raw inject = degenerate fallback (none/faulted/
  worker-gone). (Poll-listener preference rule = a later wave; not blocking W2 close.)
- F. **DEFERRED to gravity-linux follow-up** (cfg(unix) only; can't validate on Windows;
  doyle ruling: minimum is C-core, F rides else gravity follow-up). Unix park (b)/(c)
  bound: bounded/fail-fast `write_input` + DSR off the writer mutex (spt-term/pty.rs);
  the int's cfg(unix) park-(b) assert tracks it. Shared with W1b F3.
- G. **IN PROGRESS (spt-test-engineer a8dba5b0 / perch todlando-w172)** — int gate: REAL
  dummy translation-binary fixture (`src/bin/xlate_choreo_fixture.rs`) → delivers inbound
  into an rc-attached spt-hosted session via ctrl+s→delay→text→enter→delay→ctrl+s→commit
  WITHOUT breaking control (+ REAL byte receipt, the W1b-grade assert) + no-commit
  deadline FAULT (env-shrunk) + content-free ping + no-zombie. doyle gates here.
- Tests authored via spt-test-engineer subagent. REQ stages bump impl/unit/int as
  layers land; activate full set before declaring W2 done. traceable-reqs check + CI
  workspace clippy (`cargo clippy --workspace --all-targets -- -D warnings`) + real
  bin rebuild (`cargo build -p spt --bin spt`) at the W2 gate.

### W3 — P1: fresh endpoint shows under its project (todlando build, doyle gate)
- ROOT (found): `info.cwd` is NEVER set on bind. Thread cwd into `establish_perch` + set `rec.cwd`:
  `cmd_bind` (spt-hosted) reads its own `current_dir` (the broker spawned it in `project_cwd`);
  `bind_from_seed` (harness-hosted) passes `seed.cwd` (already captured, currently discarded).
- int gate: a REAL `endpoint run` perch has `info.cwd` set AND appears under its project tab
  (the v0.12.1 unit tested merge_origin_project with a *provided* origin — never asserted cwd is set).

### W4 — Picker UX (todlando build, doyle gate)
- Skip the first screen → open directly on **Pick existing**; `n` jumps to Create new.
- **Auto-attach** after Start-new AND Resume-from-history (both currently don't attach + show no
  stdout); add an `h` shortcut to run headless (no attach).
- `controlled by` shows the **node NAME** (node_label_display), not the raw hex (D).
- Clean up the Start-new output: drop the Rust `pid=Some(142748)` leak + the "harness binds its
  perch on startup" internals; user-friendly, not process-log.

### W5 — `driven_by` self-heal (todlando build, doyle gate)
- `ONLINE+CONTROLLED` must clear even if a detach is lost (don't rely on the detach IPC — same
  lesson as B2/REQ-HAZARD-HOSTED-LIVENESS-RECONCILE): reconcile clears `driven_by` when the
  endpoint has no live controller/session. Composes with W1 (wedge no longer blocks detach).

### W6 — perri hand-off (doyle, after W2)
- Direct perri to the UPDATED integration checklist + docs to build the spt-claude-code
  **translation binary** (CC choreography: ctrl+s→50ms→payload+\r→50ms→ctrl+s). Per the standing
  route: doyle designs (W0 ADR) → perri builds the adapter binary → doyle gates → perri validates.

### Side task (do early, docs-only) — correct v0.12.1 changelog
- v0.12.1 notes claim "a freshly-created agent appears under its own project right away" — REFUTED
  (P1 never worked). Amend CHANGELOG.md (main) + `gh release edit v0.12.1` notes, no retag
  (v0.12.0-precedent). v0.13.0 delivers it for real (W3).

## Sequencing notes
- W0 gates W2 (design before build). W1 (wedge/control) is independent + the scariest — can start
  immediately (repro-first). W3/W4/W5 are independent bug/UX fixes, parallelizable. W2+W6 are the
  feature spine. The folded v0.12.2 work (window fix @15fdf58, MD-sweep @2435a74, ATX @b3bb387)
  rides the v0.13.0 release; its changelog bullets merge in.
- Version: **v0.13.0** (MINOR — new delivery feature + breaking-ish behavior). Confirm at release.
