# M3b Plan — `spt-daemon` broker/brain process

> **Just-in-time, lightweight** — same pattern as `M0/M1/M2a/M2b/M3a-PLAN.md`. The
> ordered task layer between `ROADMAP.md`, `M3-PLAN.md` (the M3 umbrella), and
> `traceable-reqs.toml` (the requirement checklist), scoped to **M3b only**.
> Branch: `dev-freeform`. Authoritative architecture: ADR-0004 (broker/brain split
> + the §B ownership table + the closed §E spike gaps) and `M3-PLAN.md` §M3b.

> **Upstream is done:** Phase 0 spike-gate ✅ (Spikes #3/#4/#5/#6 all PASS — ADR-0004
> §E closed on both OSes) and **M3a ✅ delivered** — `spt-term` provides the
> single-surface PTY *mechanism* (SessionSurface + ConPTY/forkpty backend + DSR
> drain + injection + bounded byte-stream + digest parser primitive). **M3b builds
> the *process* that hosts many of those surfaces** behind a versioned local IPC,
> consolidating the interim no-daemon model.

## Goal

Replace the interim no-daemon model (M1/M2a/M2b: a long-lived `spt api listen`
process holds the perch + drives a relay/pulse loop; a seed *file* bridges
binding; per-pid `is_process_alive` is liveness) with the **real architecture**
(ADR-0004): one per-machine `spt-daemon`, split into a **broker** (stable kernel
holding the un-transferable resources) and a **brain** (restartable logic), so a
routine self-update swaps the brain with **zero endpoint interruption**.

**M3b done =** the daemon hosts the M2b LiveAgent lifecycle as in-process loops
(Psyche-as-loop, durable pulse — no separate listener/wrapper process, only a thin
stateless in-session relay residue) · the broker hosts `spt-term` PTYs + harness
children + sockets behind a **versioned local IPC** that a newer brain speaks to an
older broker · **a brain kill/restart leaves the child + its live output stream
intact and gapless** (Spike #1 shape, made real) · liveness for daemon-hosted
perches is the daemon's `status`, never `is_process_alive(info.pid)` · every
broker↔brain side-effect boundary is **exactly-once** across a brain crash (Spike
#6) · any `api` invocation **auto-starts** the daemon · spt-hosted startup is
spawn-session→`api bind` with an **in-memory** seed (no file) · the **digest
daemon-half** completes REQ-TERM-4 (`int`) · `cargo test --workspace` + clippy
`-D warnings` + `traceable-reqs check` green on the CI matrix (ubuntu + windows).

## Scope

### In — the `spt-daemon` crate (above `spt-live`, below `spt`)

- **broker** (stable kernel, `REQ-DAEMON-2`) — owns **only** the resources a live
  consumer would lose continuity on if the brain restarted (ADR-0004 §B normative
  table): PTY master handles (**from `spt-term::PtySession`**), spawned harness
  child processes, accepted local client sockets, listening sockets. No logic.
  Almost never restarts.
- **brain** (userspace, `REQ-DAEMON-2`) — all logic: routing, registry, manifest
  parse, **the re-hosted M2b lifecycle**. Restarts freely; rehydrates from disk +
  re-attaches to broker-held handles over IPC. `gen_start = now()` on cold-start
  **and** handoff (`REQ-HAZARD-GEN-START-NOW` / 2.4).
- **versioned local IPC** (`REQ-HAZARD-HANDOFF-ARGV-COMPAT` / 2.3) — named pipe
  (Windows) / Unix-domain socket, behind one transport trait (mirrors how
  `SessionSurface` abstracts the PTY OS-split). A version handshake + a
  forward-compatible frame schema so a **newer brain talks to an older broker**.
- **consolidation** (`REQ-DAEMON-1`) — the `spt api listen` poll-loop + the pulse
  loop + the Psyche loop move **into the brain** as scheduled in-process loops; the
  only residue is the thin **stateless in-session relay** for harness-hosted
  sessions (topology 1), freely killable. Durable, configurable pulse period
  replacing the interim 5s constant.
- **daemon-authoritative liveness** (`REQ-HAZARD-DAEMON-HOSTED-LIVENESS` / 2.5) —
  the daemon's in-memory endpoint table + a `status` field on `info.json`
  supersede `is_process_alive(info.pid)` for **daemon-hosted** perches; localized
  behind the single liveness resolver so external/interim perches still probe.
- **idempotent broker↔brain boundary** (`REQ-HAZARD-RESTART-IDEMPOTENT` / 7.2,
  Spike #6) — durable IDs + dedup-at-effect at every side-effect crossing (spool
  write, PTY write, registry update), with a **broker-owned recovery anchor**;
  exactly-once across a brain crash at before-intent / before-effect / after-effect.
- **auto-start** (`REQ-DAEMON-3`) — any `api` invocation starts the daemon if
  absent (`listen` is the reliable anchor); ensure-running lives in the `api` layer.
- **spt-hosted startup** (`REQ-START-3`) — daemon spawn-session into a broker PTY,
  then `api bind`; **in-memory** seed map replaces the `spt-store::seed` file bridge.
- **orphan-watch** — carry the M2b graceful ordering (grace 1.1 / echo 3.3 /
  stale-sweep 3.2) into the **supervised-crash** path the always-on daemon now
  enables (M2b delivered graceful-only).
- **digest daemon-half** (`REQ-TERM-4` `int`, ADR-0008) — manifest-sourcing of the
  `input`/`agent`/`tool` patterns (the `pty_digest` manifest seam), running the
  M3a `DigestParser` over the broker PTY; `spt digest <id>` snapshot pull, the
  broker-pushed structured **delta-stream** on the R-TERM-3 substrate, and opt-in
  Path-B persistence. Local addressing only (the qualified `[subnet:]id@node` form
  is M4); `REQ-DAEMON-4` (honor every KNOWN-HAZARDS invariant) is the meta-gate.

### Out

- networking / P2P, multi-subnet, notifications, **cross-node** digest addressing —
  **M4** (`spt-net`). The QUIC-stream broker-ownership *implementation* is M4
  (only its *shape* was spiked, #3).
- self-update (update-class taxonomy, signing, ripple) — **M3c**.
- whole-daemon live FD-passing (zero-interruption even for *broker* updates) —
  explicit future polish, not v1 (ADR-0004 consequences).

## Clean-room posture

- **`spt-daemon` is brand-new.** The sister ran poll listeners + Psyche wrappers as
  separate processes; ADR-0004 deliberately consolidates them. Clean-room the crate.
- **Re-host, don't rewrite.** The M2b lifecycle is already pure, unit-tested
  functions in `spt-live` (`spawn_psyche`, `resume_psyche`, `run_echo_commune`,
  `ingest_drops`, `graceful_signoff`, `sweep_stale_signoff`, `pulse::tick`,
  `fetch_history`, `write_resume_commune`, `compose_psyche_prompt`). The brain
  *calls* these as loops; the ordering invariants (3.3 / 1.1 / 3.2) stay where they
  are. **Reuse `spt-term::PtySession` for the broker's PTYs** — do not re-implement.
- **Spike findings are binding inputs:** #1 (no-EOF-while-writer-held + DSR; the
  broker-survives-brain-restart shape), #3 (live-stream survival shape), #6
  (broker-owned recovery anchor + dedup-at-effect keyed by durable ID).

## New requirements to register first (TRACEABILITY rule 3)

**Assessment — likely none.** The M3b reqs already exist: `REQ-DAEMON-1/2/3/4`,
`REQ-START-3`, `REQ-HAZARD-HANDOFF-ARGV-COMPAT`, `REQ-HAZARD-GEN-START-NOW`,
`REQ-HAZARD-DAEMON-HOSTED-LIVENESS`, `REQ-HAZARD-RESTART-IDEMPOTENT`, and
`REQ-TERM-4` (`int` stage). They sit `required_stages = []`; activate as each task
lands. **If** the versioned-IPC frame contract or the in-session relay turns out to
need its own conformance id during B0/B3, register it in `traceable-reqs.toml`
first, then satisfy it.

## Sequencing rationale

Scaffold + the **versioned IPC transport** (the load-bearing seam everything rides)
→ the **broker kernel** (owns resources, holds no logic) → the **brain + gapless
handoff** (the milestone's hardest proof: kill brain, child + stream survive) →
**consolidation** (fold the interim loops in) → **daemon liveness** (localized swap)
→ **idempotent boundary** (durable IDs at every effect) → **auto-start + in-memory
startup** (retire the seed file) → **orphan-watch** → **digest daemon-half** →
activation + E2E. Each task tags evidence + activates its reqs in the same commit;
`traceable-reqs check` green and **both OSes** validated before the next (the M3a
discipline: Windows local + Linux via gravity-linux).

## Tasks — `spt-daemon` (the broker/brain process)

| # | Task | Reqs / hazards | Acceptance |
|---|------|----------------|------------|
| B0 | **Scaffold `crates/spt-daemon`** (above spt-live, below spt; deps spt-store/msg/runtime/term/live) + the **versioned local-IPC transport**: a `DaemonTransport` trait over named-pipe (win) / UDS (unix), a length-framed message codec, and a **version handshake** with a forward-compatible frame schema | REQ-DAEMON-2 (foundation), REQ-HAZARD-HANDOFF-ARGV-COMPAT | crate compiles; layering `…→spt-live→spt-daemon→spt` acyclic; a brain frame with an unknown trailing field is accepted by an older broker (forward-compat unit test); `check` green |
| B1 | **Broker kernel**: spawn a child under a `spt-term::PtySession`; hold the PTY master + child + a listening/client socket; serve them over IPC (subscribe-to-output, write-input, resize, lifecycle). **No logic in the broker.** | REQ-DAEMON-2 (broker half) | a broker hosts one real PTY child on both OSes, streams its output + accepts injected input over IPC; the ADR-0004 §B ownership table is the only state it holds (review-asserted) |
| B2 | **Brain + gapless handoff**: brain connects over the versioned IPC, subscribes to the broker's PTY output, sends input; **a brain kill + restart re-attaches with no lost/dup output and the child untouched** (Spike #1 made real). `gen_start = now()` on cold-start **and** handoff | REQ-HAZARD-GEN-START-NOW, REQ-DAEMON-2 (brain half) | regression test: spawn child → brain attached → kill+restart brain ≥10× → child alive throughout, output stream contiguous across each restart, `gen_start` advances each handoff |
| B3 | **Consolidation**: fold the `api listen` poll-loop + pulse + Psyche loops into the brain as scheduled in-process loops (re-hosting the spt-live seams); leave only the thin **stateless in-session relay** for harness-hosted sessions. Durable, configurable pulse period | REQ-DAEMON-1 | a LiveAgent runs fully daemon-hosted (spawn→Psyche-loop→commune-ingest→pulse) with **no** separate listener/wrapper process; the relay is killable+respawnable without losing state; period is config-driven |
| B4 | **Daemon-authoritative liveness**: the brain's endpoint table + `info.json` `status` supersede `is_process_alive(info.pid)` for daemon-hosted perches; localized behind the single liveness resolver (`is_online`/registry stale-clean) so external/interim perches still probe | REQ-HAZARD-DAEMON-HOSTED-LIVENESS | a daemon-hosted Psyche reads alive/offline from the daemon `status`, not its pid; killing the summarizer subprocess does **not** flip the perch offline (regression test); external perches still pid-probe |
| B5 | **Idempotent broker↔brain boundary**: durable IDs + dedup-at-effect at every side-effect crossing (spool/PTY/registry), broker-owned recovery anchor; replay rules on brain restart | REQ-HAZARD-RESTART-IDEMPOTENT (Spike #6) | crash the brain before-intent / before-effect / after-effect at each boundary → exactly-once (no dup, no drop) across all three, value-set verified |
| B6 | **Auto-start + spt-hosted startup**: any `api` invocation starts the daemon if absent (`listen` anchor); spt-hosted startup = daemon spawn-session into a broker PTY then `api bind`, with an **in-memory** seed map replacing the `spt-store::seed` file bridge | REQ-DAEMON-3, REQ-START-3 | a cold `api` call transparently spins up the daemon; an spt-hosted session binds with **no seed file written** (the file path is dead); the harness-hosted seed is in-memory + consumed by `parent_pid` |
| B7 | **Orphan-watch**: carry the M2b graceful ordering (grace 1.1 / echo 3.3 / stale-sweep 3.2) into the supervised-crash path the always-on daemon enables | REQ-DAEMON-4 (partial) | a supervised crash of a hosted session triggers the same echo-before-signoff + grace + stale-sweep ordering as graceful teardown (regression test) |
| B8 | **Digest daemon-half**: manifest `pty_digest` seam (adapter-supplied `input`/`agent`/`tool` patterns) → run the M3a `DigestParser` over the broker PTY; `spt digest <id>` snapshot pull + broker-pushed structured **delta-stream** on the R-TERM-3 substrate + opt-in Path-B persistence. Local addressing only | REQ-TERM-4 (`int`), ADR-0008 | `spt digest <id>` returns the live structured buffer for a daemon-hosted session; a subscriber receives only deltas; opt-in persistence appends to the history store; cross-node addressing explicitly deferred to M4 |
| B9 | **Activation sweep + E2E**: activate REQ-DAEMON-1/2/3/4 + REQ-START-3 + the 4 hazards + REQ-TERM-4(`int`); a daemon E2E (spawn→Psyche-loop→commune→**brain-restart-survives**→graceful-signoff) on the CI matrix; amend ROADMAP (M3b delivered) + CONTEXT; author the **M3c plan** stub | — | `check` green with M3b reqs activated; clippy `-D warnings`; CI matrix (ubuntu+windows) green; both-OS validated |

## M3b requirement-activation map

Activate as each task lands (default `["impl","unit"]`; daemon E2E supplies `int`):
- **REQ-DAEMON-2** (B0–B2), **REQ-HAZARD-HANDOFF-ARGV-COMPAT** (B0),
  **REQ-HAZARD-GEN-START-NOW** (B2), **REQ-DAEMON-1** (B3),
  **REQ-HAZARD-DAEMON-HOSTED-LIVENESS** (B4),
  **REQ-HAZARD-RESTART-IDEMPOTENT** (B5), **REQ-DAEMON-3** + **REQ-START-3** (B6),
  **REQ-DAEMON-4** (B7/B9 — the honor-all-hazards meta-gate),
  **REQ-TERM-4 `int`** (B8). The brain-restart-survives daemon E2E (B9) is the
  cross-cutting `int` evidence for REQ-DAEMON-1/2.

**Stay `[]` (M3c/M4):** all `REQ-UPD-*`, the M4 grill reqs (`REQ-INST-9..14`,
`REQ-PAIR-5/6/7`, `REQ-NOTIF-1/2`, `REQ-SEC-1`), cross-node digest addressing.

## Workspace change

Add `crates/spt-daemon` (above `spt-live`, below `spt`). Layering (R-ARCH-1, acyclic):

```
spt-proto → spt-store → spt-msg → {spt-net, spt-term, spt-runtime} → spt-live → spt-daemon → spt
```

`spt-daemon` is **not** public SDK (R-ARCH-2 stays proto/runtime/msg) — it is the
internal supervisor; the `spt` binary becomes a thin CLI over it.

## Key seams (from the interim model M3b replaces)

| Concern | Interim locus | M3b move |
|---|---|---|
| listen relay loop | `spt/src/api/startup.rs` (`cmd_listen`) | daemon brain loop |
| pulse driver | `spt-live/src/pulse.rs` (`tick`, 5s constant) | daemon-scheduled, configurable |
| liveness | `spt-store/src/proc.rs` (`is_process_alive`) via `deliver::is_online` + `registry::clean_stale_entries` | daemon `status` behind the one resolver |
| seed bridge | `spt-store/src/seed.rs` (file by `parent_pid`) | in-memory daemon seed map |
| Psyche/echo/signoff | `spt-live` seams (already pure fns) | brain calls the same fns |
| broker PTY | `spt-term::PtySession` (M3a) | broker holds it, serves over IPC |

## Risks carried into M3b

- **The two-process IPC is the hard part.** Abstract the OS split (named pipe vs
  UDS) behind one transport trait from B0, exactly as M3a abstracted ConPTY vs
  forkpty behind `SessionSurface`. Version the frame schema from the first byte —
  retrofitting forward-compat after B3 is expensive.
- **Broker scope creep.** The broker holds *only* the ADR-0004 §B resources and
  **no logic**. A task that finds itself putting routing/registry/parse in the
  broker has crossed the line — stop and surface it. (The §B table grew to include
  Iroh streams; that is M4, not M3b.)
- **Gapless handoff (B2) is the milestone's load-bearing proof.** Spike #1 proved
  the shape on ConPTY; B2 makes it real in the daemon on **both** OSes. Treat the
  brain-restart regression as a gate, not polish.
- **Idempotency (B5) must thread durable IDs through every effect.** Design against
  Spike #6's two constraints (broker-owned recovery anchor; dedup-at-effect keyed by
  durable ID), not bolted on after.
- **Digest scope (B8).** Local addressing only; the `[subnet:]id@node` qualified
  form + cross-node fetch are M4. Resist pulling multi-subnet forward.
