# M1 Plan — Local messaging + binary (`spt-msg` + `spt`)

> **Just-in-time, lightweight** — same pattern as `M0-PLAN.md`. The ordered task
> layer between ROADMAP.md (milestone sequence) and `traceable-reqs.toml`
> (requirement checklist), scoped to **M1 only**. Honors ROADMAP §13 ("lightweight
> but structured, GSD too heavy"). Branch: `dev-freeform`.

## Goal

Ship the **killer quickstart (local)**: two agents exchange a message, end-to-end
through the real crate boundaries. Delivers `spt-msg` (delivery + routing +
send/ring/ready) and the `spt` binary + CLI, layered on the M0 crates
(`spt-proto` envelope/payload, `spt-store` spool/registry/perch/info).

**M1 done =** `spt-msg` + `spt` compile · `cargo test --workspace` green · an E2E
test proves two local ready-agents exchange a message (live TCP path **and**
offline spool path) · `traceable-reqs check` green with M1 reqs *activated* · CI
stays green.

## Scope

In: localhost TCP delivery with spool fallback, target→address routing via the
registry, reply routing, ready-agent lifecycle (register perch + listener + drain
spool), `ring` (send-and-wait), deferred (hook-channel) send, the `spt` binary +
CLI (`send`/`ring`/`ready`/`poll`/`list`/`stop`/`whoami`).

Out (later milestones): cross-node / WAN delivery, mDNS, pairing (all M4 — M1 is
**loopback only**); harness manifest + Psyche/live lifecycle + `api` surface (M2);
PTY/terminal + the consolidated daemon (M3). **Interim-model note:** M1 ready
agents run a standalone per-agent TCP listener; M3's `spt-daemon` consolidates all
listeners into one supervisor (CONTEXT "spt-daemon"). M1 builds the messaging
mechanism; M3 owns where it runs.

## Sequencing rationale

`spt-msg` before the binary (the binary is a thin CLI over the crate). Within
`spt-msg`: TCP framing/listener → delivery (TCP-first, spool-fallback) → ready
lifecycle (needs listener + delivery + spool drain) → ring (needs delivery +
ephemeral perch) → deferred send. Then the binary: skeleton/CLI → wire each
subcommand → list/stop/whoami. Then the E2E proof + activation sweep. Each task
tags evidence + activates its reqs in the same commit; `check` green before the
next.

## New requirements to register first (TRACEABILITY rule 3)

No PRD `R-*` covers local message delivery (it was foundational/implicit). Register
before satisfying — precedent: `REQ-NODE-IDENTITY` (M0 T5).

- **`REQ-MSG-1`** — Local message delivery: TCP-first to a registered address with
  spool fallback when the target is offline/unreachable; target id → address via
  the registry (stale-clean first); reply routing (`__REPLY_TO__`).
- **`REQ-MSG-2`** — `spt` binary CLI surface: `send` / `ring` / `ready` / `poll` /
  `list` / `stop` / `whoami`, stable arg shapes + exit codes.
- **`REQ-MSG-3`** — Ready-agent lifecycle: register perch (info.json + listener +
  registry address) on `ready`, drain spooled backlog on startup, clean teardown.

## Tasks — `spt-msg`

| # | Task | Source | Reqs / hazards | Acceptance |
|---|------|--------|----------------|------------|
| T1 | TCP wire framing + loopback listener socket (length-framed envelope lines) | copy `protocol.rs` + `poll.rs` socket setup | REQ-MSG-1 | listener binds an ephemeral localhost port; one framed envelope round-trips client→listener |
| T2 | Delivery: TCP-first, spool-fallback + registry routing + reply routing | copy `send.rs` `deliver_message`/`run`/`run_reply` | REQ-MSG-1; REQ-HAZARD-WINDOWS-PID-RECYCLE, -INBOX-NO-DOUBLE | online target receives via TCP; offline target spools; recycled-pid address never misdelivers; no double-delivery |
| T3 | Ready-agent lifecycle: `ready` registers perch (info.json T12 + listener + `register_address` T11), drains spool backlog (`drain_non_deferred` T9) on startup | copy `poll.rs` startup + `setup.rs` | REQ-MSG-3; REQ-HAZARD-SOFT-CLEANUP (6.2) | `ready` is resolvable + reachable; backlog drains on start; soft-cleanup removes only the ready marker, preserves spool |
| T4 | `ring`: send + block for reply; ephemeral reply-perch created + cleaned on **every** exit path | copy `ring.rs` | REQ-MSG-1; REQ-HAZARD-EPHEMERAL-CLEANUP (3.1) | ring returns the reply; the ephemeral perch is gone on success, timeout, AND error |
| T5 | Deferred send: spool-only, no listener wake (hook channel) | copy `send.rs` `run_deferred` | REQ-MSG-1 (+ DEFERRED-SURVIVE-DRAIN, covered M0) | deferred send writes a deferred row; event-stream drain skips it; hook drain delivers it |

## Tasks — `spt` binary + CLI

| # | Task | Source | Reqs | Acceptance |
|---|------|--------|------|------------|
| T6 | `spt` binary skeleton + CLI parser (subcommands, `--help`, exit codes) | adapt `main.rs` + `cli.rs` (clean-room arg layer; likely `clap`) | REQ-MSG-2 | `spt --help` lists subcommands; unknown args exit non-zero; binary is a workspace member |
| T7 | Wire CLI → `spt-msg`: `send` / `ring` / `ready` / `poll` | adapt `cli.rs` dispatch | REQ-MSG-2 | each subcommand drives the matching `spt-msg` entry point |
| T8 | `list` / `stop` / `whoami` subcommands (enumerate perches, teardown, self-id) | copy/adapt `list.rs` / `stop.rs` / `whoami.rs` | REQ-MSG-2; REQ-HAZARD-SOFT-CLEANUP | `list` shows live perches; `stop` tears down cleanly; `whoami` resolves own perch |

## Tasks — integration + infra

| # | Task | Reqs | Acceptance |
|---|------|------|------------|
| T9 | **Killer-quickstart E2E test:** spawn two ready agents, A sends to B (TCP path), B sends to offline C (spool path), C starts and drains | REQ-MSG-1/2/3 (`int` stage) | a `tests/` integration test proves the message arrives both online and via spool |
| T10 | Activation sweep: register REQ-MSG-1/2/3, activate M1 reqs + hazards, green `check`, CI green | — | `check` green with all M1 reqs activated; CI matrix + gate pass |

## M1 requirement-activation map

Activate as each task lands (default `["impl","unit"]`; messaging-delivery reqs add
`int` at T9 since the proof is cross-process — two listeners + spool on one box):

- **New:** REQ-MSG-1 (`impl`+`unit`+`int`), REQ-MSG-2 (`impl`+`unit`), REQ-MSG-3 (`impl`+`unit`+`int`).
- **Hazards:** REQ-HAZARD-EPHEMERAL-CLEANUP (3.1), REQ-HAZARD-INBOX-NO-DOUBLE (4.5),
  REQ-HAZARD-SOFT-CLEANUP (6.2), REQ-HAZARD-WINDOWS-PID-RECYCLE.
- **Strengthen (stay as-is):** REQ-INST-7 remains *partial* (local routing only;
  cross-node subnet resolution still M4). R-ARCH-1 gains a third crate (`spt-msg`)
  — already active, no change needed.

**Stay `[]` (later milestones):** all R-NET/R-PAIR (M4), R-DAEMON/R-API/R-SEAM/R-START
(M2), R-TERM (M3), R-UPD (M3), R-REACH (M4), the live/Psyche + broker hazards
(CONPTY-DSR, RESTART-IDEMPOTENT, GRACE/ECHO/SIGNOFF, DROP-FILE-SINGLE-WRITER M2/M3).

## Workspace change

Add `crates/spt-msg` to `members`. Layering stays acyclic (R-ARCH-1):
`spt-proto → spt-store → spt-msg`; the `spt` binary is the top consumer.

## Risks carried into M1

- **Listener model is interim.** Standalone per-agent listeners now; M3 folds them
  into `spt-daemon`. Keep the listener surface thin so the M3 consolidation is a
  move, not a rewrite. (KNOWN-HAZARDS 1.1 maps the eventual single-writer model.)
- **Loopback port churn + stale registry.** A crashed listener leaves a stale
  address; M0's `resolve_address` (clean-stale-before-lookup) is the guard —
  exercise it under the recycled-pid hazard in T2.
- **Ephemeral ring perch leak** (3.1) is the classic ring bug — test cleanup on the
  timeout and error paths, not just success.
- None invalidate the architecture (the FATALs were ADR-0004 daemon / ADR-0005
  pairing — M3/M4). M1 is local-only mechanism on the proven M0 substrate.

## Execution log / deviations (M1 COMPLETE)

Built T0–T10; `spt-msg` + `spt` ship. 101 tests (+2 E2E) · `clippy -D warnings`
clean workspace-wide · `traceable-reqs check` green, 100 reqs / 0 findings.

- **Copy scope narrower than planned.** Only the stable wire format
  (`protocol.rs` → `wire.rs`) was copied verbatim (ADR-0001); `listener` /
  `deliver` / `ready` / `ring` were **clean-roomed** because the sister API
  (`owlery::*`, `(target, owlery)` spool sigs, `types::*`) diverges hard from the
  M0 `spt-store` surface (path-aware `*_at` spool, `perch` resolver,
  `ParentHint`). Re-pointing was a rewrite, not a copy — consistent with the
  ADR-0001 "clean-room everything else" rule.
- **Crate API is pure orchestration.** No `stdin`/`process::exit`/printing in
  `spt-msg`; entry points take a body and return a structured outcome
  (`SendOutcome`/`RingOutcome`/`DeliveryPath`). The `spt` binary owns all I/O +
  exit codes. Reply routing is `send()` with `from`=replier (the `__REPLY_TO__`
  line is the only seam) — no separate wire path.
- **`ready` and `poll` share one listener loop** in M1. With no daemon, a perch
  is reachable only while its process runs; `poll --once` is the single-cycle
  variant. M3's `spt-daemon` decouples registration from the live process
  (documented at the call site).
- **T6–T8 landed in one commit** — the binary is cohesive; an unwired skeleton
  would be clippy-flagged dead code.
- **`send_deferred` takes no `owlery` arg** (the M0 spool is path-addressed).
- **Recycled-pid guard is structural** (connect-must-succeed), not a pid-equality
  check — a recycled process isn't on the old ephemeral port, so connect refuses
  and the message spools (REQ-HAZARD-WINDOWS-PID-RECYCLE).
- **Activation:** REQ-MSG-1/3 `[impl,unit,int]`, REQ-MSG-2 `[impl,unit]`; hazards
  EPHEMERAL-CLEANUP / INBOX-NO-DOUBLE / SOFT-CLEANUP / WINDOWS-PID-RECYCLE
  `[impl,unit]`. REQ-INST-7 stays partial (local routing only; cross-node = M4).
  R-ARCH-1 grew two crates (build-evidenced; no stage change).
