---
name: p1b-input-ack-deadlock-tests
description: REQ-HAZARD-INPUT-ACK-BACKPRESSURE (v0.13.0 P1b) test patterns — ack-gate via ordering on a real broker, exactly-once on the no-ack path, and the flood keystone's platform divergence + non-draining-conn self-stall trap.
metadata:
  type: project
---

v0.13.0 P1b ack-deadlock fix (`REQ-HAZARD-INPUT-ACK-BACKPRESSURE`): `InputReq` gains
`ack: bool` (serde `default = "default_true"`); `serve_attach` controller arm calls
`Brain::send_effect_no_ack` (ack=false); `dispatch_input` gates
`send_frame(applied_envelope)` on `if req.ack`. Cures an input FLOOD on one
brain↔broker conn that pre-fix filled the broker→brain return direction with applied
acks and full-duplex-deadlocked the per-conn handler. Sibling of [[p0-pty-input-writer-wedge-tests]]
and [[effect-journal-wedge-tests]] (same forkpty-vs-ConPTY banked stance).

**UNIT (serde, in `msg.rs` `#[cfg(test)]`):** absent `ack` field on the wire MUST
decode to `ack=true` (N-1: an old brain still acked) — `serde_json::from_value(json!{
session_id,data_b64,op_id})`; assert `req.ack`. And `ack:false` must serialize
EXPLICITLY (assert `wire.get("ack")==Some(false)`) so a default-clobber can't re-inflate
it. Non-vacuous: dropping `default="default_true"` → absent decodes to bool's `false`.

**ACK-GATE (the crux unit, real broker + real PTY, `tests/broker.rs`):** prove "no
applied for the no-ack op" by ORDERING on the single-threaded FIFO conn — send the
no-ack op FIRST (op_id=Some, ack=false), then a KNOWN-ACKED probe (ack=true). The
FIRST `KIND_APPLIED` frame observed must carry the PROBE's op_id, never the no-ack
op_id (a broken gate would emit the no-ack applied first). Both lines still echo (the
no-ack write is delivered; only the ack frame is suppressed). PASSED + verified
non-vacuous. Uses `AppliedEvent`/`KIND_APPLIED` from `spt_daemon::msg`.

**EXACTLY-ONCE no-ack (`tests/broker.rs`):** `Broker::bind_in(name, tempdir/effects.log)`,
replay SAME `(sid, op_id)` ack=false ×3; assert `broker.journal().is_applied((sid,op))`
+ `broker.journal().applied_count()==1` (the AUTHORITATIVE anchor — mirrors
`idempotent.rs`). Echo count is corroboration only (cooked-mode PTY re-echoes).

**CRITICAL RUNNER TRAP — blocking `read_frame` on Windows named pipes:** the broker.rs
harness's `read_frame` (Whole conn) IGNORES any deadline and BLOCKS once the broker
stops pushing frames. Any read loop that may not receive more output HANGS the test.
FIX: drive send+read on a WORKER thread feeding an mpsc, bound the MAIN thread with
`rx.recv_timeout(20s)` (the `probe()` discipline). NEVER add a trailing fixed-duration
"settle" loop that calls `read_frame` — it blocks forever when the echo child idles.
A 20-min wedge cost here before the watchdog rewrite.

**KEYSTONE INT (`tests/input_ack_deadlock.rs`, real broker+brain+serve_attach):** flood
N controller `Input` records through `serve_attach`'s controller arm (the route the fix
changed) on ONE loopback attach via `send_attach_input(brain, stream, bytes, op)`;
prove the broker stays serviceable with direct `probe(KIND_SESSIONS/KIND_NET_STATUS)`
on a FRESH conn + `flood_sent` (all N sent without stalling). GREEN gate:
flood_sent + broker_alive + sessions_answered.

  KEYSTONE GOTCHAS (each cost a hang/false-signal, all resolved):
  - **Non-draining flooder self-stalls.** A flood operator that NEVER reads its own
    conn deadlocks on its OWN conn's full-duplex backup (the broker pushes net-stream
    frames it isn't reading) REGARDLESS of the fix. Make the flooder a
    `cold_start_pump` (Split) so its bg reader drains — request-reply
    (net_dial_loopback/net_open_stream) works on a pump (reads from the event queue).
    `net_stream_subscribe` is FIRE-AND-FORGET (just `self.send`, no reply) — `.is_ok()`
    means "sent," NOT "serviced," so it's a useless liveness signal; use the direct
    `probe` instead.
  - **Use a QUIET child** (`sleep 600` unix / `waitfor /t 600 Sig` windows — NO stdout,
    ignores stdin). An echo/flood child's output backs up the non-draining controller
    conn → confounds the ack-gate with the W1 output-drain hazard.
  - **TWO concurrent loopback dials on one NetHost race the inbound stream demux** in
    this in-process rig → a second loopback viewer attach's subscribe looks unserviced
    even when the broker is fine. Keep the loopback-attach byte-receipt leg as a
    CAPTURED diagnostic; the direct `probe` is the gate.
  - **Teardown:** capture verdicts off watchdog'd channels, then `drop(flooder)`/
    `drop(attacher)` — do NOT `.join()` (a controller serve_attach on a Whole `target`
    can park; joining re-introduces the hang). Kill the child first, 750ms settle.

  PLATFORM DIVERGENCE (banked, same as W1b/P0): at the spec's FLOOD_N=64 the pre-fix
  deadlock is forkpty-ONLY (gravity-linux small pipe buffer fills at ~10 frames →
  flood_sent=false RED). Windows named pipes buffer more, so 64 small applied frames
  don't fill → benign on Windows, test asserts the POSITIVE invariant only. EMPIRICALLY
  VERIFIED 2026-06-19: at FLOOD_N≈2000 the full-duplex backup DOES force flood_sent=false
  on Windows on revert (cross-platform RED) — BUT at that N the loopback net send-window
  throttles even the FIXED path in this rig (flood_sent=false with the fix too), so 64
  is the clean cross-platform GREEN and gravity-linux carries the deterministic RED.

**Gate results (2026-06-19, Windows 11):** 2 serde units + 2 broker tests + 1 keystone
all GREEN. `cargo clippy --workspace --all-targets -D warnings` = 0. `traceable-reqs
check` EXIT=0, REQ shows `+doc +impl +unit +int`. Keystone ~32s (watchdogs); run via
`cargo test -p spt-daemon --test input_ack_deadlock -- --test-threads=1` (or nextest).