# Pump worker seam — deepen the peer pump behind a worker interface (task plan)

> **STATUS: EXECUTED (2026-06-10). P1–P5 landed; doyle G1/G2/G3 all ACK with
> zero deviations; gates green every commit; behaviour-neutral (the
> `tests/pump.rs` oracle passed unmodified throughout). Executor: todlando.
> Reviewer: doyle (three gates, §Review gates).** Architecture-review
> rev 2 card 1 (top recommendation), grilled to a frozen interface in one
> session: 8 design forks ruled by the operator (granularity, round context,
> failure mode, clock shape, due-ownership, repair-queue scope, dep wiring,
> naming/layout) — all rulings inlined below. CONTEXT.md already carries the
> minted terms (**peer pump**, **pump worker**, §Networking → "The peer pump");
> the definitions there are normative for this plan.

> Working doc for the post-restoration architecture track. This is a
> **behaviour-neutral refactor**: zero new daemon capability, zero wire/IPC
> change, zero toml `required_stages` change. `crates/spt-daemon/src/peerloop.rs`
> (1036L) hard-wires schedule + cadence + call order for the four cadenced legs
> (registry / notif / sync / update) inside one `run_peer_pump`; pump
> choreography is E2E-only (the 8 existing unit tests cover kernels + supervisor
> only). The deepening: each leg becomes a **pump worker** module behind one
> small worker seam; the pump becomes a generic scheduler + per-peer fan-out
> whose choreography is unit-testable with scripted workers and no real time.
> Rationale precedent: D6 `TrialEnv` (injected env, scripted fake + production
> adapter) and D5 pure kernels (pure fn + thin shell).

## Goal

The peer pump's *when* (scheduling) and *toward whom* (fan-out) are separated
from the workers' *what*, such that:

- **The worker seam is real.** Four production adapters (registry, notif, sync,
  update) plus scripted test workers satisfy one `PumpWorker` trait — two-plus
  adapters, a real seam by the house definition.
- **Choreography is a unit table.** First-tick-primes-all · wake-forces-one-
  worker-only · pre_round-once-per-due-round-before-any-peer_step ·
  failed-peer_step-aborts-remaining-due-workers-for-that-peer-and-drops-conn ·
  mark-after-round stagger · detached-subnet skip — all asserted with scripted
  workers, no sleeps, no loopback E2E.
- **Behaviour is provably unchanged.** Every existing unit test survives (pure
  fns relocate, not rewritten); the loopback E2E (`tests/peerloop.rs`) passes
  unmodified in substance; every existing `[impl->…]`/`[unit->…]` tag relocates
  with its code; `traceable-reqs check` EXIT=0 at every commit.

## Frozen design (operator-ruled, do not re-litigate)

1. **Two-hook worker** (granularity ruling):

   ```rust
   trait PumpWorker {
       fn cadence(&self) -> Duration;
       fn poll_wake(&mut self) -> bool { false }   // exactly-once marker take
       fn pre_round(&mut self, ctx: &RoundCtx) {}  // node-local, once per due round
       fn peer_step(&mut self, io: &mut PeerIo, subnet: &str, peer_hex: &str,
                    ctx: &RoundCtx) -> io::Result<()>;
   }
   ```

2. **Due-ownership split**: side effects at the worker edge, kernel pure. The
   worker exposes `cadence()` + `poll_wake()` (the marker take — registry's
   `take_advertise_now`, sync's `take_freshness_pull`); the shell owns
   `WorkerLasts` (in-memory `Option<Instant>` per worker) and feeds the
   **pure kernel** `due(last, wake, now, cadence)` — the existing `due()`
   extended with the wake flag. `due()` comes OFF the trait.
3. **Pump-owned `RoundCtx`**: subnets / attachment / roster-backed `SyncPolicy`
   loaded ONCE per round by the shell, passed `&RoundCtx` to every worker. The
   single-read invariant (push targets and sync gate see the SAME roster) stays
   a structural fact. `RoundCtx` carries ONLY per-round shared loads — never
   long-lived deps.
4. **Construction capture for deps**: each worker is built with what it owns —
   `Arc<RegistryHost>` (registry worker: sweep/advertise; sync worker:
   `snapshot` bootstrap refs; update worker: canonical-epoch notif mint), its
   `PathBuf` slice out of `PumpPaths`, its config knob (`full_auto_update`).
5. **`PeerIo { brain, conn_id, ops }`**: the shell owns the ONE brain IPC
   handle, the conn cache + `ensure_conn`/`dial_seeded`, and the SOLE open-op
   `EpochSource` (`pump-ops.json`). Workers mint open-ops only through
   `PeerIo` — a worker-owned `EpochSource` over the shared counter is the
   KH epoch-lease double-mint and is forbidden (now also in CONTEXT.md).
6. **Abort-on-first-fail preserved** (failure ruling): a failed `peer_step`
   aborts the remaining due workers for that peer and drops its conn (redial
   next tick) — current behaviour exactly. Per-worker isolation is explicitly
   NOT this change.
7. **Repair-evictions verbatim** (scope ruling): `consume_repair_evictions`
   moves INTO the registry worker's `pre_round`, order preserved exactly —
   repair-evictions → `evict_silent_peers` → `fire_due_rotations` →
   `advertise_local` (REQ-SUBNET-7 · KH 4.10 · REQ-MESH-4 orderings). Card 2
   (rows-vs-replication) re-seams the queue itself LATER, inside one module
   instead of two. Adverts computed in `pre_round` are held as registry-worker
   state for its own `peer_step` (per-subnet filter unchanged).
8. **Layout + names** (naming ruling): `crates/spt-daemon/src/pump/{mod,
   registry,notif,sync,update}.rs`; `peerloop.rs` retires into `pump/mod.rs`.
   The loop is *the* pump; the legs are workers — fix the module-header drift
   ("three cadenced pumps") in the move.

### Constraints standing (violations are gate-FAIL findings)

- **ADR-0018 `[V4]`**: pump cadences stay in-memory stagger-from-due-now —
  `None` = due immediately, mark-after-round with a POST-round re-taken
  `Instant::now()`, ran = attempted not succeeded. **No per-loop timing
  writes, no deadline-grid conversion.** `WorkerLasts` is RAM only.
- **KH 5.9 / REQ-HAZARD-INSTANT-UNDERFLOW**: only forward `duration_since`;
  the kernel keeps the `Option<Instant>` shape — never `Instant - Duration`.
- **KH 7.4**: LLM-bearing work is never a pump worker. The trait rustdoc must
  say so (the seam must not become the invitation).
- **KH epoch-lease class**: consent-notif ids mint from the registry's
  CANONICAL epoch (`registry.with_epoch`), never from `ops` — the update
  worker's construction-captured registry Arc preserves this by structure;
  the existing comment block moves with the code.
- **Shell-owned, NOT worker-owned**: heartbeat (loop-liveness fact, per-tick
  not per-round — REQ-DAEMON-5), `supervise_pump` + backoff (untouched), the
  200ms `TICK`, conn cache, `EpochSource`.

## What is already satisfied (don't re-build)

- `due()` is already pure + unit-tested (extend with `wake`, keep the test).
- `supervise_pump` / `next_backoff` / `write_heartbeat` / `dial_seeded` /
  `push_targets` / `fire_due_rotations` / marker-take tests — all survive
  verbatim; only file location changes.
- The loopback E2E (`tests/peerloop.rs`) is the behaviour oracle — it must
  pass before and after with no substantive edit (rename/path churn only).

## Per-commit discipline

Each sub-task is its own atomic commit. Gates every commit: `cargo build` ·
`cargo test` · `cargo clippy` · `cargo build --no-default-features` ·
`traceable-reqs check` (EXIT=0) · `xtask check`. Tags relocate IN the commit
that moves their code — a moved function's `[impl->…]`/`[unit->…]` tag travels
in the same diff, never a follow-up. Push to a dev-freeform branch; CI both
runners before merge to main. No tag/release rides this plan.

---

## P1 — Mechanical move: `peerloop.rs` → `pump/mod.rs`

Pure relocation, zero logic change. `src/peerloop.rs` → `src/pump/mod.rs`;
`tests/peerloop.rs` → `tests/pump.rs`; callers updated (`daemon.rs` spawn
site; any `crate::peerloop::` path). Fix the module-header drift while the
header moves ("three cadenced pumps" → the CONTEXT.md pump/worker language).
Note: prose line-refs elsewhere (`ADR-0018` cites `peerloop.rs:805`) are
historical citations in a decision record — do NOT rewrite the ADR; the
CONTEXT.md entry is the living pointer.

Evidence: no tag changes (tags move inside the file move, `git mv` + path
fixups). Green suite proves the move was mechanical.

## P2 — The seam: trait + kernel + `RoundCtx` + `PeerIo`, registry worker first

The keystone commit — the interface as actually cut (Gate G1 reviews this).

- `PumpWorker` trait (§Frozen-1) with the KH-7.4 rustdoc line.
- Kernel: `due(last, wake, now, cadence)` (extend existing fn + its unit
  table: wake forces a never-due leg, wake on a due leg is one round not two).
- `RoundCtx` (subnets, attachment, policy) — shell loads once per round.
- `PeerIo` (brain, conn_id, ops) — shell-owned, passed `&mut`.
- `pump/registry.rs`: the registry worker — `poll_wake` = advertise-marker
  take; `pre_round` = the four-step ordered sweep (verbatim, §Frozen-7);
  `peer_step` = the per-subnet advert push. Shell drives registry through the
  seam; notif/sync/update stay inline this commit (mixed-mode shell is
  temporary scaffolding, removed by P4).
- Unit: kernel table; registry `pre_round` order (scripted store dirs,
  assert eviction-before-advertise observable order); pre_round-once-per-round.

Evidence: existing tags relocate with moved code (`[impl->REQ-CONV-2]` marker
take → worker; `[impl->REQ-SUBNET-7]`, `[impl->REQ-HAZARD-REGISTRY-GHOST-ROWS]`,
`[impl->REQ-MESH-4]` → `pump/registry.rs` pre_round). New kernel-table asserts
add `[unit->REQ-CONV-2]`-grade evidence on already-active REQs (additive, no
toml change).

## P3 — Notif + sync workers

- `pump/notif.rs`: `peer_step` = spool emit + push (`[impl->REQ-NOTIF-1]`
  moves here).
- `pump/sync.rs`: `poll_wake` = freshness-pull take (`[impl->REQ-INST-3]`);
  `peer_step` = select_refs ∪ registry-derived bootstrap refs (construction-
  captured registry Arc) → `request_sync` via `PeerIo.ops` → `hooks.on_pull`.
  `PumpHooks` stays a sync-worker construction dep (the conflict surface,
  ADR-0013 posture unchanged).

Evidence: tag relocation only.

## P4 — Update worker; shell becomes pure scheduler + fan-out

- `pump/update.rs`: the REQ-UPD-1/4/6 leg verbatim — VerifyPolicy floor
  arithmetic, `request_update`, consent-notif production via the
  construction-captured registry Arc (`registry.with_epoch` — the epoch-lease
  comment block moves intact), `record_last_outcome` writes.
- Shell de-scaffolded: `run_peer_pump` = heartbeat → poll wakes → kernel →
  pre_rounds → per-subnet × per-peer (`ensure_conn` → each due worker's
  `peer_step`, abort-on-first-fail) → mark-after-round. No leg logic left
  inline.

Evidence: tag relocation (`[impl->REQ-UPD-6]`, `[impl->REQ-HAZARD-REGISTRY-
EPOCH-LEASE]` → `pump/update.rs`).

## P5 — Choreography unit table + doc reconciliation

- Scripted-worker tests (a recording fake implementing `PumpWorker`): the six
  choreography facts (§Goal bullet 2) as one table-style module. Run the shell
  body with TICK-free direct invocation (extract the per-round body into a
  testable fn if the loop shape resists — shell stays thin either way).
- Doc sweep: module rustdoc tree (`pump/mod.rs` = the pump contract,
  per-worker headers), `xtask check` drift-clean. CONTEXT.md already done
  (this session).

Evidence: new `[unit->…]` adds on already-active REQs where a choreography
fact IS the REQ's property (wake-forces-one-round → REQ-CONV-2/REQ-INST-3;
detached-skip → REQ-SUBNET-5). No toml change.

---

## Sequencing

P1 (mechanical move — isolates rename noise from logic diffs) → P2 (seam +
registry worker — the interface cut, **Gate G1**) → P3 (notif + sync) → P4
(update + shell de-scaffold, **Gate G2**) → P5 (choreography table + docs,
**Gate G3**, then CI both runners → merge).

## Review gates (doyle check-ins — executor pings via owl, blocks until ACK)

- **G1 (after P2):** the seam as cut. Reviews: trait signature vs frozen
  design, kernel purity (V4/5.9), registry pre_round order verbatim,
  RoundCtx contents (shared loads only), mixed-mode shell sanity.
- **G2 (after P4):** behaviour-neutrality of the full extraction. Reviews:
  side-by-side leg diff (old inline vs worker — logic identical), epoch-lease
  structure (update worker mints via registry only; `ops` solely via PeerIo),
  abort-on-first-fail + conn-drop placement, mark-after-round timing,
  shell-owned heartbeat unchanged, E2E green.
- **G3 (after P5, pre-merge):** choreography table coverage vs the six facts,
  tag-relocation audit (`traceable-reqs check` + grep for orphaned tags),
  doc drift, CI both runners green.

Deviation rule: any departure from §Frozen design or §Constraints is a STOP —
ping doyle with the friction before coding around it. Mid-task discoveries
that don't touch the frozen rulings (naming, private helpers, test plumbing)
are executor's discretion.

## Traceability — zero toml changes

No new REQ (architecture refactor, no new requirement); no `required_stages`
change; every existing tag relocates in the commit that moves its code; new
unit tags are additive evidence on already-active REQs only. `traceable-reqs
check` EXIT=0 at every commit is the per-commit gate, and the G3 audit
re-walks the tag map end-to-end.

## Risks / watch-items

- **Behaviour drift hides in the small stuff**: store-load freshness (per
  ROUND, not per peer — keep), NotifStore opened per peer inside the due
  branch (keep), adverts filtered per subnet (keep), `mark_ran` with the
  POST-round now (keep — the stagger), detached-skip BEFORE conn establishment
  (keep). G2 walks each.
- **The mixed-mode shell (P2–P3) is scaffolding** — ugly is fine, lingering is
  not; P4 must remove it fully.
- **Tag orphaning in moves** — `traceable-reqs check` catches missing, not
  mis-PLACED; G3's grep audit covers tags that landed on the wrong moved
  block.
- **tests/peerloop.rs rename** may collide with CI test-name filters
  (`[twohost]` rungs filter by name on the runners) — check the workflow
  files for `--test peerloop` references during P1.
- **Don't expand scope**: no per-worker failure isolation, no card-2 repair
  queue re-seam, no deadline conversion, no new pump consumers. The adapter
  era (`spt-claude-code`) lands AFTER this seam — that's the point.

## Immediate next step

On pickup, todlando starts **P1** (the mechanical move), then pings doyle at
G1 after P2 lands. Plan questions before starting → ping doyle.
