# Deferred Features

Intended features explicitly cut from v1 scope but committed for later. Not backlog noise — these are decided directions parked for sequencing reasons. Keep this list short and real.

| Feature | Cut from | Why deferred | Trigger to revisit |
|---|---|---|---|
| Scrollback on-disk spillover | terminal wrapper v1 | In-memory ring covers the common case; spillover adds a persistence/rotation story | First long-running session that overflows the ring usefully, or any "scroll back further than the buffer" user need |
| Sidecar adapter process (long-running, wire-protocol) | harness contract v1 | Manifest + `spt.exe` subcommand surface covers v1 harnesses; sidecar only earns its keep for streaming / in-memory cross-event state | A harness outgrows manifest+hooks (needs streaming or stateful coordination) |
| Manifest `include` / composition (option C) | manifest v1 | Single flat manifest per harness is enough until a harness wants to override only part of a section | First sub-project that needs to inherit spt-core defaults but override one section |
| Concrete Shell types (GameRobot, in-session-inject) — **OS-notification shell carved out: ships M5** as the dogfood proof (user decision 2026-06-04, M5-PLAN §Scope decisions) | Shell concept v1 | Model is locked; concrete surfaces are large, platform-specific, and not needed for core | First real use case for a driven surface (e.g. the 2D-world experiment) |
| PresenceChannel implementation (broker, dispatch/bind/thread) | endpoint types v1 | Depends on presence gossip + Shells + multi-instance routing; only the seams ship v1. **M5 ships presence *resolution* (gossip + most-recently-active API) — its future substrate; the endpoint stays deferred** (user decision 2026-06-04). <!-- [doc->REQ-PRES-1] --> **M5-D6 seam note:** the substrate now exists — the gossiped datum (`Instance.last_active_ms`, riding the registry's epoch lease + replication) and the one MRA API (`spt-daemon::presence`). A future PresenceChannel's dispatch/bind/thread styles consume exactly this datum plus the broker PresenceLog seam (REQ-EP-4); nothing about the endpoint type needs a new gossip channel. | Once Shells exist and the subnet registry is stable |
| OS-level input-activity presence signal (7a-A) | presence v1 | Per-platform, privacy-loaded; agent-interaction heartbeat covers the v1 goal. **Note:** distinct from the v1 broker *sensing* user input on a PTY it already holds (spt-hosted) — that is in v1 and is NOT OS-wide input monitoring | Opt-in enhancement after agent-interaction presence proves out |
| Instantiate-anywhere + consent gate | Instances V1-mid | Remote launch over network is additive once remote-drive of running instances works. Consent *model* is now designed (CONTEXT §Consent & security gates); **the framework (grant store + escalation + pre-consent flags) ships M5 with this capability id reserved-but-refusing** (user decision 2026-06-04) — the capability lands into it later. Defers with it: the remote-fork arm, cross-node shell spawn, `shell_wake_spawn_anywhere` *behavior* (grant shape ships M5). <!-- [doc->REQ-INST-6] --> **M5-D5b note:** remote `spt suspend/wake <id@node>` shipped (the rest-op wire family, remote-drive trust class) — it *moves existing instances between rest states only*; the remote-**fork** arm (creating an instance on a node that has none) stays here, riding instantiate-anywhere's consent gate. A wake addressed to a node with no local instance refuses with `WAKE_NO_REACHABLE_INSTANCE` naming this deferral. | Revisit-trigger met (remote-drive shipped M4); held back by user decision — capability drops into the M5 consent framework when picked up |
| Remote command execution on another node | off-node reach-back v1 | Highest-risk capability; needs the consent/security gate. Workarounds exist (message a local agent; interact with local agent directly). **M5 ships the framework with this capability id reserved-but-refusing** (user decision 2026-06-04) | Designed alongside instantiate-anywhere's consent gate |
| Shell-binary (and harness-binary) sandboxing | Shell concept v1 | A capability toolset bounds what the *agent* can ask, not what the *binary* can do with its OS perms. Baseline stance: running any adapter/shell binary = a disclosed, accepted trust risk for all spt-core users | If/when untrusted third-party shell/harness binaries become common enough to warrant OS-level sandboxing |
| Concrete Shell adapters (e.g. GameRobot) + the shell binaries themselves | Shell concept v1 | The Shell *model*, manifest schema, perch layout, and `api` surface are now specified; concrete shells are large, platform-specific, and not needed for core | First real driven-surface use case |
| REQ-CONSENT int evidence (interactive escalation E2E) | M5 closeout (D9b rule-5) | The escalation prompt''s end-to-end leg needs a real harness session answering a real consent prompt — loopback-faking it would prove nothing; the grant store + gate + CLI are impl/unit-proven | The downstream rebuilt spt plugin (v1 acceptance) — its first gated spawn IS the int evidence |
| REQ-INSTALL-4 int evidence (`spt adapter add --github` against a real repo) | M5 closeout (D9b rule-5) | Needs a real clone target; impl/unit cover manifest-first validation + registration. The standalone `SaberMage/spt-shell-notify` repo now makes this trivially possible | First milestone that touches adapter lifecycle again — one E2E cloning the notify repo |
| OS-service registration at install (REQ-INSTALL-1's third leg: systemd user service / Windows service or scheduled task for the always-on guarantee) | installer v0.1 (M6-D2, grill decision 3) | Daemon auto-start on any `spt` invocation covers dev-stage use; the minimal non-interactive script stays non-OS-entangled (also serves REQ-INSTALL-2's repackaging stance). Honest gap: a node is unreachable after reboot until something invokes `spt` | First always-on deployment need (headless/server node, or the GUI milestone's background expectations) |
| Tier 2 agent-docs: MCP doc/resource server + `spt <cmd> --help --json` structured-help mode (REQ-DOCS-4 legs) | docs v0.1 (M6-D3/D5, DOCS-STRATEGY §v0.1 grill) | Tier 1 covers the integration surface: the schemars-derived manifest schema + llms.txt/llms-full.txt + the generated CLI reference; `--help --json` is low marginal value over that reference, the MCP server is the standout *later* fit | First integrating dev-agent that outgrows llms.txt + schema (or the `spt-claude-code` work surfacing a concrete MCP-resource need) |
| Subnet attachment verbs: `spt subnet detach <NAME> [--auto]` / `attach <NAME> [--auto]` (daemon keeps running, stops/starts advertising + connecting for that subnet; `--auto` persists the startup default), `spt subnet leave <NAME>` (elevation-gated), maybe `spt subnet disband <NAME>` (elevation + current TOTP) | M7 grill (user spec 2026-06-06; CONTEXT §subnet attachment) | The attached/detached *term* + the all-attached banner ship M7; the per-subnet serve-state machinery (selective advertise/connect, persisted flags) is its own deliverable | M8 CLI nounification, or first multi-subnet user needing selective participation |
| Linux elevation model (user-ratified direction 2026-06-06): (1) install symlinks the binary into a sudo-reachable path (e.g. `/usr/local/bin`) so `sudo spt` resolves; (2) first `sudo spt` detects elevation and prompts once for the DEFAULT USER ACCOUNT — thereafter any `sudo spt` daemon launch always runs the daemon (and state) under that account, never root (kills the `sudo HOME=$HOME` dance and the root-owned-state hazard) | M7 acceptance run (gravity: `sudo: spt: command not found`, then the HOME-universe split) | Needs installer + first-run-state design; Windows elevation has no equivalent problem (same user profile) | M8, with the installer's next touch |
| Windows firewall registration for the installed binary: a DETACHED daemon never shows the first-bind firewall prompt, so inbound UDP (mDNS + QUIC meet/pairing) is silently dropped — found as the M7 acceptance NO_SEED_HOLDER (dev/test paths had accumulated allow rules; the install path had none). Options: an elevated install leg adds the rule (`New-NetFirewallRule`), and/or the daemon self-detects blocked inbound and renders it as the "no connection" state in `subnet status` + the coming-online banner | M7 acceptance run (2026-06-06) | User-scope installer is deliberately non-OS-entangled (REQ-INSTALL-2); needs the elevated-install / OS-service design anyway | M8, with the OS-service + Linux-elevation install work |
| ~~**Non-admin daemon spawn (both OSes)**~~ **SHIPPED 2026-06-06** (KNOWN-HAZARDS 5.7, REQ-HAZARD-ELEVATED-DAEMON-SPAWN): de-elevated spawn at `spawn_detached` (Windows: UAC linked token via `CreateProcessWithTokenW`; Linux: drop child to SUDO_UID/GID) + `Daemon::run` entry guard + the unix `spt` main-entry sudo drop (whole process re-anchored to the invoker's universe, elevation kept PROVEN for the gated commands). Remaining Linux follow-up = the install-symlink + default-account election row below (M8) | M7 acceptance run (2026-06-06) | The membership-implies-reachability activation made elevated commands daemon-spawners — correct goal, wrong token | DONE (real elevated-spawn verification on both OSes rides the next acceptance pass) |
| ~~**Ghost registry row eviction**~~ **SHIPPED 2026-06-06** (KNOWN-HAZARDS 4.10, REQ-HAZARD-REGISTRY-GHOST-ROWS): silent-peer decay — rows from a node unheard for `registry_evict_after_ms` (default 300s ≈ 10 pump cadences) evict on the registry pump tick, snapshots rewritten; own rows never decay. STALE trust rows for a dead identity still need a manual prune verb (below) | M7 acceptance run (2026-06-06) | Needs a decay/eviction policy (e.g. drop rows unrefreshed for N pump cadences; trust-row pruning verb) | DONE (fleet cleanup: 09ef831e ghost rows on both BIGNET nodes decay once the new build deploys) |
| Trust-row prune verb (`spt subnet prune <node>` or similar): registry GHOST rows now decay (4.10), but the dead identity's TRUST rows persist by design (trust is a user decision) — they cost dead dials every pump tick and clutter status views. BIGNET carries two such rows (09ef831e, both sides) | Ghost-eviction fix (2026-06-06) | Trust mutation = security surface; wants the elevation gate + the M8 noun shapes | M8 CLI nounification |
| Elevated spt-hosted endpoints — a consented elevation satellite: the broker lives inside the always-unelevated daemon (KH 5.7), so spt-hosted children are unelevated by construction; there is NO silent unelevated→elevated path on either OS (that is UAC/sudo's whole point). An agent needing admin powers is harness-hosted-elevated today (works: TCP listeners + talk-down pipes). The spt-hosted form needs a per-grant CONSENTED elevation: UAC prompt / polkit spawns a small elevated PTY-host satellite holding only the elevated children, linked back to the unelevated daemon — rides the REQ-CONSENT-2 escalation framework (allow-once/always) + the reserved-but-refusing capability pattern | Elevated-endpoint design grill (user 2026-06-06) | Real elevation-granting UX + a second privileged process = its own deliverable; the consent framework it rides already ships | First spt-hosted agent that needs admin powers (or the instantiate-anywhere milestone) |
| ~~AMBIGUOUS render~~ **SHIPPED 2026-06-06** (rode the ghost-eviction fix): resolution refusals render valid copy-paste targets at the wansend boundary (`render_refusal`) — labels preferred, key-prefix on label collision, `subnet:id` for cross-subnet | M7 acceptance run (user 2026-06-06) | Pure render fix at the wansend/CLI boundary; carries into the M8 nounification sweep | DONE |
| Post-join address seeding: the pairing ceremony holds a live, authenticated connection to the peer, but the peer pumps then re-find each other from scratch via id-only discovery (fresh node ids must propagate to n0 DNS first) — observed live as ~1 min to first `status --nodes` convergence after a join. Seed the pump's address cache from the ceremony connection (both sides) for instant first sync | M7 acceptance run (2026-06-06) | Convergence is correct, just slow on first contact; needs an addr-cache seam the pumps consult before the resolver | M8 polish |
| Clock source priority for TOTP/rendezvous: query a well-known world clock (NTP) FIRST, fall back to local time — machines with skewed local clocks currently derive wrong steps/rendezvous tokens silently | M7 acceptance grill (user 2026-06-06) | Needs an NTP client dep + offset caching + offline fallback semantics; ±1 window covers small skew today | First field report of clock-skew join failures, or M8 polish |
| CLI nounification (M8 candidate; ratified shape from the 2026-06-06 grill): `spt endpoint` namespace absorbs fork/suspend/wake/shutdown/rename/stop/digest + `access` (per-endpoint store — NOT subnet-scoped) + `description` (ex-`resources` blurb; bare = show, `set` = author); `spt endpoint list [--local\|--subnet <name>]` = merged listing, default all-subnets grouped by subnet, SELF endpoint pinned distinctly at top (bare `spt endpoint` = this list); agent HOT PATH stays flat (`send`/`ring`/`ready`/`whoami`/`how-to`); `notify` → `spt subnet notify`; `notif` stays top-level (user-recall, cross-subnet); `spt daemon stop\|status` noun (hidden `daemon` → `daemon run`); agent-endpoint `shutdown` keeps its name | M7 grill (2026-06-06) | Second surface-breaking reorg mid-acceptance; adapter window still open immediately after M7 closes | M8 planning — mint REQs first (rule 3) |
| ~~Node label for an endpoint-less node (REQ-SUBNET-3 gap)~~ **SHIPPED 2026-06-08** (option 2 — node-level presence datum): a peer with **zero endpoints** rendered as a bare key-prefix in `status --nodes` because `node_label` only rode endpoint `Instance` rows. FIX: a node-LEVEL carrier — `SubnetRegistry.node_labels` (node→label) + a `NodeLabelUpdate` feed record riding the existing registry replication stream as an untagged `RegistryFeedRecord` variant (instance bytes unchanged → mixed-version fleet safe, old peers skip the new lines). Merged under the same epoch lease, evicted with the node, gated identically to instance feeds. The pump emits one label record per served subnet regardless of endpoint count; `node_status_rows` reads it to NAME an endpoint-less peer without counting it as an endpoint (`[0/0]` holds, no liveness effect). Chose option 2 over option 1 (trust-pin label at pairing) to avoid touching the SPAKE2 ceremony + un-dormanting warn-on-change (which couples with the REQ-SUBNET-7 machine_id re-pair overwrite) — option 2 is also live-updating on hostname change | M8 criterion-5 acceptance run (HFENDULEAM↔kitsubito, 2026-06-07; HF at `[0/0]` showed bare uuid to kitsubito) | Cosmetic; deferred per user decision 2026-06-07, fixed 2026-06-08 (user picked option 2) | DONE (real-hardware verify rides the next acceptance/deploy pass) |
| **Instant + never-seen peer hostname via pairing-time capture (REQ-SUBNET-3 follow-up; the deferred "option 1")**: the shipped node-level label (option 2) is GOSSIP-ONLY, so a peer's name is learned only from a pump round while it is alive. Two residual gaps remain after the eviction fix below: (a) ~1 gossip cadence to appear after a join (the joiner trust-pins by pubkey with `label=None`; the hostname arrives next pump round — enlyzeam↔kitsubito converged to `ENLYZEAM` after a cadence, fine but not instant); (b) a peer that stops WITHIN a cadence of pairing is never named (its label never gossiped). FIX: capture the peer hostname **at pairing → the persistent trust-pin label**. Needs (1) BIDIRECTIONAL pairing intro — the joiner→responder `NodeIntro` (label+machine_id) already exists; add responder→joiner so BOTH sides capture each other's hostname; (2) wire `trust.record`'s `label` = peer hostname at both pin sites (`pairing/wire.rs` ~324 responder, ~435 initiator, both currently `None`); (3) handle the warn-on-change coupling with the REQ-SUBNET-7 machine_id re-pair overwrite (a legit machine reinstall = same machine_id, new key → repin, not a silent fail). **NOTE — the OFFLINE-loses-name half was a real bug, FIXED 2026-06-08** (not this deferred row): a *learned* node_label was being silence-decayed by `evict_nodes`, so an offline trusted member lost its name; `evict_nodes` now keeps labels (silence ≠ forget) and only an explicit `spt subnet prune` drops them (`evict_node_labels`). This row is the remaining INSTANT + NEVER-SEEN half | enlyzeam/kitsubito SPT_DEV diagnosis (user 2026-06-08) | The instant/never-seen half of REQ-SUBNET-3; pairs with the "post-join address seeding" row (same ceremony-connection seam, seed label + address both sides) — touches SPAKE2, so its own focused milestone | Next session — mint REQ for the trust-pin-label capture + bidirectional intro |
| ~~Subnet-scoped liveness probe (REQ-SUBNET-5 detach gap)~~ **SHIPPED 2026-06-08** (REQ-SUBNET-5 int stage): `subnet status --nodes` showed a DETACHED peer as **online** because liveness fell to `Probe` → a raw transport dial by node-pubkey, which succeeded on the peer's still-bound daemon endpoint (one ALPN `spt-core/net/0` serves all subnets, so the dial was subnet-blind). FIX: a new `ServeProbeRecord` (spt-net `serveprobe`) + daemon `serveprobe` handler/requester ask the peer "serving subnet X?", answered from its own `AttachmentStore::is_detached` ∧ membership (the same signal the pump/responder gate on); the dispatcher classifies it by the `serve_probe` first-line field; `wansend::probe_node_serving` dials (proves reachable) THEN asks, so a detached-but-alive peer reads offline within a cadence. Probe LATENCY had shipped separately 2026-06-07 (`run_bounded` 2.5s + "Checking remote nodes…" notice). int evidence: `dispatcher_serves_a_subnet_serve_probe` (loopback E2E) | M8 criterion-5 acceptance run (HFENDULEAM↔kitsubito, 2026-06-07) | Real liveness-model fix on accepted M7 hybrid-liveness; deferred to keep acceptance moving (user decision 2026-06-07) | DONE (criterion-5 detach-flip now GREEN; real-hardware re-verify rides the next acceptance/deploy pass) |
| **Subnet full-mesh visibility (PLANNED MILESTONE — `SUBNET-MESH-PLAN.md`)**: non-directly-paired members never see each other — a subnet is currently the PAIRING GRAPH, not a mesh (A↔B, C↔B; B offline → A and C invisible; even with B online, B doesn't relay). Root: pairwise TOFU trust (`trust.rs`) + own-rows-only gossip / no transitive relay (`peerloop.rs`, KH 4.10/7.5) + no member roster in the pairing seed transfer. FIX = member-AUTHENTICATED subnet data (seed-MAC and/or per-node signatures) unlocking transitive trust (seed-holder = member = trusted) + transitive gossip relay + roster propagation; MUST re-solve eviction/epoch-lease under relay (supersedes KH 4.10's "no transitive gossip" safety assumption). ADR-level; mint REQ-MESH-* first. Folds in the deferred pairing-time hostname capture + post-join address seeding (same roster/ceremony seam) | A→B→C invisibility test (user 2026-06-08) — confirmed two-gap design limitation, not a code defect | **DESIGN RATIFIED 2026-06-08** (grill): symmetric **seed-proof membership** + **roster-only relay** (rows stay own-authored, no third-party relay → KH 4.10/7.5 PRESERVED, not superseded). Full design = **ADR-0017**; reqs = **REQ-MESH-1..6** (inactive); plan = `SUBNET-MESH-PLAN.md`; glossary = CONTEXT §Pairing & trust. Folds in the pairing-time hostname-capture + post-join address-seeding rows (roster fields). Next: plan-phase over REQ-MESH-* | Next milestone — design done 2026-06-08; build pending plan-phase |
| `xtask debug-converge` — automated convergence watcher for debug rollout (the REQ-UPD-6 first-slice item ADR-0016 names): watch every expected debug-pinned reachable lab node until all report the target version applied, else a per-node timeout table (Offline / StagedAwaitingConsent / BlockedByBrokerResources / Rejected{reason}). Activates REQ-UPD-6's `int` stage. Full build plan: `docs/DEBUG-CONVERGE-PLAN.md` | REQ-UPD-6 first slice (2026-06-06) | The first slice ships manual "Apply and observe" (`DEBUG-ROLLOUT.md`); the watcher's one genuinely new surface is a node status-query wire leg (no-fetch `{channel, applied_version, last_outcome}` over the existing update handshake) — assembly otherwise | First multi-node debug rollout where hand-walking nodes is the bottleneck, or M8 update polish |
| **Origin-source update bootstrap (`spt update fetch` — pull the latest signed release from GitHub into the existing verify→stage→apply pipeline)**: today update DISCOVERY is **peer-only** (REQ-UPD-1: a daemon pulls offers from roster peers via `request_update`; `propagate.rs`/`peerloop.rs`). There is NO origin fetch, so the **first node in a fleet — or any isolated/solo node — cannot get a new release without a peer that already staged it** (the fleet converges by gossip, but only *after* a maintainer seeds the first node via the release-publish path). The installer DOES fetch from GitHub (`github.com/SaberMage/spt-releases/releases/latest/download/<asset>` + `SHA256SUMS`, `installer/install.sh:57-73`) but only at install time, outside the daemon's signed-release gate, and never re-runs. FIX = a new **origin-fetch leg** that downloads the per-platform artifact + its `<asset>.release.json` (the `SignedRelease` metadata) from the release origin and feeds them through the **SAME** `plan_verified` gate (`verify_metadata` two-key signature/channel/expiry/rollback + `verify_artifact` SHA-256, `update.rs:157`/`release.rs`), then `cache.stage()` — after which the EXISTING consent-notif / `spt update apply` flow is unchanged. So it reuses ~all the machinery; it only adds an HTTPS GET from the origin as an alternate SOURCE to the P2P `request_update` pull. Surface: `spt update fetch [--channel <ch>] [--tag vX.Y.Z]` (default: latest on the node's pinned channel); optionally a daemon-config `origin_check` cadence so a lone node self-bootstraps alongside P2P, then re-propagates to peers normally. **Key design note:** the daemon has **zero HTTP surface today** — all transport is iroh QUIC; `reqwest` is only a transitive dep (via iroh), not reachable. This leg needs a deliberate HTTPS-client decision (add `reqwest`/a minimal client as a direct dep, vs shell out to `curl`/`Invoke-WebRequest` like the installer — fragile). Honor the channel pin (`release-keys.json`) + the monotonic rollback floor; configurable repo (mirror `SPT_INSTALL_REPO`). The two-key signed-release anchor keeps the GitHub transport **untrusted-but-verified** (strictly stronger than the installer's HTTPS+SHA256SUMS-only first fetch). Mint **REQ-UPD-7** (origin-source fetch) first. | Update subsystem v1 (ADR-0016, REQ-UPD family) — P2P-pull only by design | v1 update flow is peer-propagation by design (seed one node, the fleet converges by gossip); origin-fetch is additive and the maintainer seed path covered bootstrap so far. No HTTP client is wired into the daemon yet | First isolated / first-in-fleet node that must update with no peer to pull from; or making `spt update` self-sufficient for solo / single-node users (user flagged the gap 2026-06-08) |
| **Test broker socket-bind hardening (CI-flake removal)**: `applyhost` tests' `served_broker` helper does `Broker::bind(name).expect("bind broker")` (`crates/spt-daemon/src/applyhost.rs:293`) where `name` = `spt-daemon-d7b-{pid}-{seq}.sock`. Flakes intermittently on the **kitsubito** self-hosted Linux runner (likely AF_UNIX path collision / stale socket file under the runner's reused workspace — Windows named pipes don't hit it). Cost the v0.3.2 release a full red CI run + a `--failed` rerun (2026-06-09); same code passed clean on rerun → flake, not a logic bug. FIX = make the test bind deterministic: unlink-before-bind, OR abstract-namespace socket (Linux `@`-prefixed, no filesystem entry), OR per-test `TempDir` isolation so each bind has a guaranteed-fresh path. Applies to every `served_broker`/socket-name test helper, not just applyhost | Triaged during v0.3.2 release (2026-06-09) — flake, shipped over it | Non-blocking: a real flake that taxes CI (one red cycle + rerun per hit) but never gates a correct build; hardening is test-infra work, not product. Held to keep the v0.3.2 release moving | **Restoration D1/D7** (`RESTORATION-PLAN.md`) — D1/D7 add real process-spawn tests on the kitsubito runner; fix the bind determinism there rather than fighting flakes through the milestone (M8 is complete) |
| **`spt update fetch` single→set shape-upgrade at same version (REQ-UPD-8 transitional trap)**: `cmd_update_fetch` sets the rollback floor = `staged_version` (the monotonic counter), and the gate is a plain `candidate > floor`. A node that fetched+staged version N as a platform **single** (pre-0.3.2 fetch behavior) can never re-fetch version N as a multi-platform **set** — same counter → `Rollback{current:N, candidate:N}`. Bites the **seed-node bootstrap**: the first node to origin-`fetch` a release applies the single for its OWN platform, then can't re-stage the same version as a set → has no Windows/other-platform artifact to SERVE peers over P2P. Surfaced live in the v0.3.2 cross-platform rig verify (2026-06-09): kitsubito staged+applied the v6 single, then its v6-set re-fetch was refused; unblock was a manual `releases/` cache-clear before re-fetch. Counter vs semver is working as designed (counter = trust spine, semver = cosmetic) — the gap is purely shape-blind: the floor compares version only, ignoring single-vs-set. FIX options: (a) allow a same-version re-fetch when staged shape is `single` and an available `set` is strictly-richer (shape-upgrade allowance), or (b) seed-server flow stages the SET before applying its own single, or (c) a `--force`/`--restage` flag. NOTE — only bites the one migration hop onto 0.3.2; once a node is on 0.3.2 with a fresh cache, future releases are vN+1 and stage clean. Mint a REQ before fixing | v0.3.2 cross-platform rig verify (2026-06-09; user-flagged via session watch) | Transitional-only: the workaround (cache-clear before staging the set) is reliable and was used to complete the v0.3.2 fleet roll; a code fix only saves the manual clear on same-version shape upgrades, which won't recur after the 0.3.2 hop | A future release that again changes staged-artifact SHAPE at a same/non-incrementing counter, or making `spt update fetch` self-sufficient for solo seed-nodes (pairs with the origin-source bootstrap row above) |
| **On-disk `current_exe` reconcile after auto-rollback (the D6-2 reboot residual)**: readiness-gated auto-rollback (ADR-0018 Q7, restoration D6-2) is **record-driven selection** — after a rollback the broker's supervisor keeps spawning the last-known-good `.old-N` binary, chosen from the durable `RolledBack` record, so the rescue survives a reboot for free **with no file rename at the failure instant** (the brittle path the design rejected). But the bytes on disk at `current_exe` are STILL the quarantined bad version. The supervisor rescues the **brain** (it never execs `current_exe` while a `RolledBack` record stands), but an OS-service / autostart / `ensure_running` relaunch execs `current_exe` **as the broker** — and **if the bad binary panics before broker bind, the node is down until manual intervention** (record-driven selection cannot save a binary that won't boot). FIX = a calm-path reconcile: on the next clean `apply` (or a broker self-reconcile once stable), rename the good bytes over `current_exe` so the on-disk seat matches the running selection. Deliberately NOT a failure-time rename | Restoration D6-2 (ADR-0018 Q7, open-call-3, 2026-06-10) — record-driven selection rescues the brain; the on-disk seat reconcile defers | The reboot-survives-via-record path is correct NOW for the common case (broker boots, supervisor re-selects the good binary); the residual only bites when the broker process itself re-execs the quarantined `current_exe` and that binary won't even bind. Failure-time rename is the brittle path the design rejected; the calm-path reconcile needs the next clean-apply seam | First field rollback where the broker (not just the brain) restarts onto the quarantined on-disk bytes — or the D7 process-survival proof exercising a broker re-exec after rollback |
| **Quarantine operator override (`spt update apply --force` — the note-i escape hatch)**: D6-2's quarantine guard refuses re-applying a version that auto-rolled-back on this node (the `RolledBack` record is the marker; clears naturally when N+1 stages). But a version whose failure was **environmental** (a transient boot condition, not bad bytes) leaves the node **permanently allergic** to a genuinely-good version with no escape but staging a newer one. FIX = an explicit operator override (`spt update apply --force`, or a `spt update unquarantine <version>`) that clears the `RolledBack` marker and re-opens the apply for that version — a deliberate, operator-gated act, never automatic | Restoration D6-2 (ADR-0018 Q7, note-i, 2026-06-10) — quarantine ships; the override defers | The natural N+1 clear covers the common "the next release fixes it" path; the override only matters when the SAME version must be retried after an environmental fix. Operator-gated re-apply is a small CLI + record-clear, deferrable to the alarm/adapter era | First node permanently allergic to a genuinely-good version (environmental rollback), or the M8 update-CLI polish |
| **✅ RESOLVED — shipped v0.8.3 (REQ-HAZARD-BROKER-QUIC-DEADLINE, KNOWN-HAZARDS 7.8).** ~~Broker-side bound on brain-waiting QUIC ops (the pump-IPC-deadline B-half)~~: the brain-side deadline (REQ-HAZARD-PUMP-IPC-DEADLINE, KNOWN-HAZARDS 7.6) stops the single-threaded pump wedging when a peer black-holes, by bounding the pump's brain-IPC reads and escalating a TimedOut to a supervised restart. But the ROOT cause is broker-side: the broker's `net_open_stream` / `net_stream_send` / `net_dial` handlers make the brain wait on a QUIC operation to a dead peer with no bound of their own, so the brain only escapes via its own deadline (a 30s stall + a full pump restart each time). FIX = the broker must never make a brain wait unbounded on a QUIC op — bound those handlers broker-side (a QUIC-level timeout that returns an Error event promptly), so a black-holed peer fails the single `peer_step` cleanly (ordinary per-peer abort + redial) without poisoning the client or forcing a pump restart. The broker is RESIDENT by design (CONTEXT §"broker update frequency: rare"), so this cannot reach the deployed fleet without a broker-update path — hence deferred to the next planned broker-update batch rather than riding the brain-only v0.4.1 roll the A-half ships on. **Added motivation (Windows, KH 7.6):** the A-half's reader-thread carrier leaks ONE `pump-ipc-reader` thread per wedge event — on a supervised restart the old reader is left blocked in `read_frame` holding its `RecvHalf`, and interprocess split halves share one OS handle (closes only when BOTH drop), so on a named pipe the broker may never signal the disconnect and the old reader can park indefinitely (especially while the broker itself is wedged on the dead-peer QUIC await — the very thing B fixes). Bounded (one thread per actual wedge, not per restart round — the post-restart conn cache is fresh, so the dead peer is re-dialed, not re-wedged on a cached dead conn), but B is the real cure: bounding the broker's QUIC await makes the brain's read return promptly, so the reader thread is never left parked | Pump-stall diagnosis 2026-06-11 (doyle ruling A-now / B-deferred) | The brain-side A-half fixes the fleet NOW via the brain-only update machinery (v0.4.1 candidate) and makes the wedge self-heal in 5s instead of 2.2h; B is the deeper correctness fix but only reachable through the rare broker-update path. Mint a REQ (broker-side QUIC-op bound) before building | Next planned broker-update batch (folds in with any other resident-broker change) |
| **Durable in-daemon alarm scheduler (one-shot deadline machinery — the Q4/V3 deferral)**: the broker/brain restoration (ADR-0018 Q4) fixes the *rule* for one-shot scheduled events — persist the absolute `target-time` at creation; every brain start (update *or* crash) reads it and fires-if-due; **never reset on crash** (a user's "remind me at 3pm" is a commitment that must outlive any restart). But it deliberately does **not** build the machinery: the spt-core daemon has **no one-shot consumer today** — `alarm` exists only as an event *shape* (`spt-proto`) + relay handling (`psyrelay.rs:93`) + a test fixture; the actual "wait until target, fire the TIMED PULSE" timer lives in the **legacy owl listener in-memory** (BROKER-BRAIN-SPLIT-RESTORATION §7). Building the one-shot deadline machinery in the restoration milestone would ship **untested dead code** (violates activate-don't-pre-fail). FIX = port alarms into the daemon as a durable scheduler that rides the Q4 one-shot rule (persist target-time at creation, fire-if-due on every brain start, survive both crash and update). The periodic-loop half of Q4 (anchor+interval derive) DOES ship in restoration D5; only the one-shot machinery defers here. | Restoration milestone D5 (ADR-0018 Q4/V3, 2026-06-09) — rule fixed, machinery deferred | The daemon has no one-shot consumer to exercise it; the periodic-deadline mechanism (the phase-significant pulse loop) ships in D5 and proves out the disk-anchored-deadline pattern the alarm port will reuse | The alarm port — porting the legacy in-memory alarm timer into the daemon as a durable scheduler (first real one-shot consumer); reuses the D5 disk-deadline pattern + the Q4 one-shot rule |
| **Autonomous session-digest freshness + Option-C re-home (M9 ADR-0008 amendment deferral)**: M9 re-founded the digest as an ON-DEMAND projection of normalized session logs (`spt-term::projection`; `spt-daemon::digest::project_endpoint_digest`) — snapshot-pull and the structured-delta-stream CONTRACT ship, deltas driven by pulls / `api digest-entry` pushes. Two pieces deferred, both because they have **no consumer yet** (Shell=M10, GUI/frontend pane, Gateway agent-window — none exist): (a) the **autonomous file-watch→publish freshness nudge** (a watcher that re-projects + publishes when the log grows on its own, vs today's pull/push-driven publishes) — CONTEXT.md "session digest" anticipates this as "an incremental normalize story (design-phase concern)"; (b) the **Option-C opt-in persistence re-home** (the byte-feed engine drove per-turn persistence off `pty_digest.persist`, now deleted; re-homing it onto a `[history]`/digest config seam now = a manifest surface with nothing reading it). Building either now = a speculative realtime subsystem with no reader (the M9-PLAN §9 "no new realtime subsystem" watch-item). | M9 Wave 3 / G3 (doyle ruling 2026-06-13) — snapshot-pull is the G3 bar, delta-stream contract ships | Both ride a live consumer: the digest works topology-independently TODAY (pull projects real content; a subscriber follows and gets push-driven deltas). Autonomous freshness only matters when a frontend pane streams it continuously; Option-C only matters when something reads the coarse activity log. Deferring honors CONTEXT.md's anticipated staging, not a scope cut | The **consuming-frontend milestone** that streams the digest live (Shell M10 drive/sensory pane, the GUI "latest session output" pane, or the Gateway agent-window) — wire the file-watch nudge + re-home Option-C onto a digest config seam then, with a real reader to exercise both |
| **Two-origin `owl_message` context-tap subtype refinement (M10 ADR-0019 deferral)**: the two-origin merge's owl-message producer (`spt::api::startup` `deliver` closure) taps the live-agent **`emit` chokepoint**, so EVERY delivered frame — including resurfaced **notif** frames and shell-context rows — is recorded as a single coarse `owl_message` context-injection entry (`context_kind:"owl_message"`). The data is not wrong (a notif IS injected context), only coarsely labeled; the digest is a glanceable data model this milestone, with no consumer that distinguishes the subtypes yet. Splitting the tap into honest subtypes (true peer message vs notif vs shell-context) means classifying the frame at the delivery seam. | M10 Wave 3 / gate (doyle ruling 2026-06-13, flag 2 ACCEPTED with this tracked condition) | Same logic as the M9 freshness deferral above: refinement only matters when a surface reads + renders the context-injection category by subtype (GUI collapse/expand, or the echo-reads-digest delta loop). Coarse-but-honest labeling is correct for a data-model-only milestone | The **consuming-frontend / echo-reads-digest milestone** — split the `emit`-seam tap by frame type (peer message vs notif vs shell-context) when a reader renders the subtypes |
| **W2.5 — attached-presence + Kick-and-attach (M12 picker blue tri-state)**: M12 W2's picker ships online/offline status only; the **blue ■ "attached" tri-state** + the **"Kick `<node>` and attach"** confirm option were SCHEDULED (operator 2026-06-14, NOT dropped) as a dedicated slice built AFTER W2 lands, then wired back into the picker. Scope: (1) a **broker attach-presence query** — "who is attached to endpoint E's PTY, by which node/surface" — derive from the `presence.rs` PresenceLog conn connect/disconnect if sound, else a new query op (today `presence.rs` has snapshots/most_recently_active/resolve but no per-PTY attach owner); (2) a **force-detach/kick wire op** with its OWN KNOWN-HAZARDS treatment — the PTY-ownership invariant (KH attach-lifecycle: detach never kills the session) + a who-may-kick auth gate — this hazard-zone surface is the reason it is its own slice, not a picker-wave rider; (3) picker: the blue tri-state + the Kick option (status becomes online/attached/offline; Kick is EXCLUSIVE to already-attached endpoints). | M12 W2 design ruling (doyle/operator 2026-06-14, M12-W2-RULING.md Q1) | A new broker MUTATION surface in the PTY-ownership hazard zone earns its own focused JIT plan + design-check + gate; bundling it into the picker UX wave would mix a hazard-zone broker op with front-end work. W2 ships the picker over surfaces that exist | **Immediately after W2 lands** (W2.5) — own JIT plan + doyle design-check (like W2/W3) + own gate; mint REQ-* first (rule 3) |
| **Cross-node shell TUNNEL relay (REQ-SHELL-4 cross-node-on-LAN carry)**: W3 built + proved the opaque byte tunnel **same-node only** (broker-homed `TunnelHub` loopback stream pair + the local control socket; `tunnel_e2e` round-trips `<EVENT`-looking bytes byte-exact). The **cross-node** path does NOT exist: `StreamFamily` (dispatch.rs) has no `Tunnel` variant, `classify_first_line` no tunnel arm, `SHELL_LINK` carries only relink/cmd/drive (no tunnel action), and every tunnel surface (`tunnel_ensure/send/recv/resolve/clear`, `serve_tunnel_control`, `tunnel_owner_stream`) is broker-homed / local-socket = same-node. cmd+drive ride the existing `ShellLink` request/reply substrate cross-node; the byte tunnel has no such substrate. FIX (a proper impl wave, ~W3-sized): new `StreamFamily::Tunnel` + a `classify_first_line` arm + a dispatch serve arm bridging an inbound Iroh **bidi** stream ↔ the bound shell's broker `TunnelHub` pair + an owner-side cross-node open (a `SHELL_LINK_TUNNEL` action or a dedicated open) + a resident tunnel-echo mock peer + the `[twohost]` rung (`open_stream` over the real link, byte-exact opaque round-trip, link-break closes). REQ-SHELL-4 int stays satisfied by the same-node `tunnel_e2e` — this is an additional cross-node int rung, NOT yet an obligation (activate-don't-pre-fail). | M11-W5 (doyle ruling 2026-06-16) — W5 surfaced that the M11-PLAN T5.1 "tunnel rides the same Iroh substrate" assumption is false for the byte tunnel; building a W3-sized relay in the M11-closing rig+docs wave would balloon scope | The tunnel's real consumer (usbip URB byte stream) is itself deferred/non-goal — NOTHING in-tree needs cross-node tunnel bytes now, so the relay would be speculative infra ahead of a deferred consumer (YAGNI). Same-node mechanism is proven; cross-node cmd/drive are proven; M11 closes at a clean boundary | First **cross-node tunnel consumer** (usbip URB across nodes, or any in-tree consumer of opaque cross-node bytes) — build the relay + the `[twohost]` rung as its own impl wave then |
| **Per-connection contiguous-sent cursor — structural guard against a decreasing-floor controller re-take (`REQ-HAZARD-CONTROLLER-WRITER-REORDER` follow-up)**: v0.13.0 P1c fixes the controller-writer reorder while KEEPING handoff's eager subscribe (fix #1 "drop the subscribe" was reverted — it's the standalone-resume mechanism), so the decreasing-floor double-take (handoff subscribe@K + serve_attach subscribe@0) is still PRESENT and made safe by: `controller_writer` epoch-gated under `send.lock()` (single live writer per connection) + `handoff` seeds `session_cursors` so the consumer snap-aboves, which is COMPLETE because every writer emits ascending and the surviving writer offers the full `[0,end]` range (no skip/dup). doyle proposed making the invariant **structurally un-reintroducible**: have `become_controller` refuse/clamp a re-take whose `from_seq` falls below the connection's already-delivered floor. That guard as literally stated is UNSAFE — `delivered_through` is **session-wide** (the `Arc<AtomicU64>` on `OutputLog`, shared by all controllers/viewers, advanced monotonic-MAX) and `resume_seq` reads straight off it, so (a) a NORMAL fresh-operator `from_seq=0` attach to a producing session sits below it and needs the full ring replay (the dedup-below+snap-above path exists for exactly this) → a guard/assert there starves every normal attach, and (b) monotonic-MAX literally can't distinguish the hazard (a `seq1`-without-`seq0` write reads as `2`, falsely implying `0` was delivered). The structurally-correct guard needs a **per-connection contiguous-sent cursor** (a true "highest contiguous seq this socket has actually received"), which does not exist today — bigger than P1c. The precise shape it would refuse: a future double-take on one connection whose prior writer advanced the cursor NON-CONTIGUOUSLY could commit one pre-epoch-bump frame above the new writer's floor → reorder. | v0.13.0 P1c (doyle proposed, todlando gated, doyle concurred 2026-06-20) | The actual bug is fixed at the one caller + bounded to already-committed-only for any other + the consumer snap-aboves; a guard on the session-wide cursor would FALSE-FIRE on normal attach/resume (no-false-fire scope demands the per-connection cursor first). Activate-don't-pre-fail: `REQ-HAZARD-CONTROLLER-WRITER-REORDER` covers the shipped fix; this structural-guard slice is seeded inactive | First broker change that adds a per-connection delivered-cursor (or the next time the controller-writer/attach seam is reworked) — build the per-connection contiguous cursor + the `become_controller` refuse-decreasing-floor guard then; mint/activate a `REQ-HAZARD-CONTROLLER-REtake-FLOOR`-class req first (rule 3) |
| **Adapter-manifest-enabled rc VT/mouse mode (`spt rc` scroll vs right-click-paste xor resolution)**: v0.13.0 P1/P1b's Windows `EnableMouseCapture` (for client-originated right-click bracketed paste, 7.18) makes Windows Terminal hand ALL wheel events to rc instead of scrolling its own scrollback; rc forwards the wheel to the harness as SGR reports (7.20) but ONLY when the harness has mouse-reporting on (`enabled && sgr`) — and **Claude Code's REPL never enables mouse tracking** (it has no internal scroll; it relies on the terminal's scrollback), so every wheel event is dropped → scroll dies. Windows console modes can't give both at once: it's QuickEdit (native wheel-scroll + raw right-click = paste submit-storm) **xor** mouse-input (bracketed right-click paste + dead wheel). INTERIM SHIPPED: **Shift+scroll** (WT reserves it for local scrollback even under app mouse-capture) — operator-confirmed working, documented, zero code. FIX (the follow-up the operator asked for): an **adapter-manifest flag** declaring whether the harness actually CONSUMES mouse reports (a mouse-tracking TUI with internal scroll) — when declared, rc keeps full capture + the SGR scroll-forward (the current path, correct for such a harness); when NOT declared (default — CC's case), rc SKIPS `EnableMouseCapture` so WT native wheel-scroll works, trading away client-originated right-click paste (ctrl+V + Shift+scroll cover the gap). Resolves the capture-vs-native-scroll xor PER-HARNESS via the manifest instead of globally. | v0.13.0 P1b (operator ruling 2026-06-20, HITL: scroll-forward to CC is moot — CC has nothing to scroll; Shift+scroll is "good enough for now") | Shift+scroll is a clean, working interim for the CC case (no internal-scroll harness) so there's no live gap; the manifest flag is a real design slice (a new `[rc]`/mouse manifest field + validation + the rc capture-decision branch + an adapter declaring it to test the consume-path) that earns its own JIT plan, not a v0.13.0 closeout rider | **Post-v0.13.0** — when building the next adapter-manifest surface, or the first harness that genuinely consumes mouse reports (internal-scroll TUI). Mint a `REQ-RC-VT-MODE`-class requirement first (rule 3), then the manifest field + rc capture-decision branch + an int adapter that declares it |