# Restoration D5 — durable absolute-deadline loop timing (task plan)

> Working doc for RESTORATION-PLAN.md **D5** (ADR-0018 Q4, V3, V4). D1 (process
> split), D2 (loop relocation), D3 (supervision anchor + real brain-process
> update trigger), D4 (multi-session cold-start resume; `BrainState` message
> retired) are DONE + cross-OS CI-green on main (@3055eb7). The brain the
> supervisor respawns — for an **update** cycle (D3-3) **or** a **crash**
> recovery (D1) — comes up via `Brain::cold_start`; the outgoing brain is gone
> before the new one starts, so anything time-significant that must survive a
> swap cannot ride a brain→brain frame. D4 solved **session continuity** from the
> persistent side (the broker). D5 solves **loop-timing continuity** from the
> persistent side (the disk).
>
> **Discriminator already in hand (D3-2).** The broker stamps
> `--generation`/`--start-reason` at spawn; `StartReason::{Cold,Crash,Update}`
> (`brainproc.rs:63`) already flows into `run_brain(generation, reason)`. Its own
> rustdoc says it: *"the reason is Q4's update-vs-crash discriminator that D5 will
> use to decide whether to preserve or reset phase-significant loop timing. D3-2
> surfaces them; D5 consumes the reason."* D5 is where `reason` stops being
> carried and starts being **used**.

## Goal (D5-close invariant)

Phase-significant periodic timing lives as durable **absolute-deadline** state on
disk, rehydrated on every brain start, so the cadence is **continuous across a
seamless update** and **safely reset across a crash** — with **no per-fire
writes**. Concretely at D5-close:

- A phase-significant periodic loop (the **pulse loop**) persists `(anchor,
  interval)` **once** per fresh/crash start and derives every fire functionally:
  `next_fire = anchor + interval × ⌈max(0, now − anchor) / interval⌉`. No write
  on the hot path.
- An **update** restart re-reads the existing `(anchor, interval)` and **keeps
  deriving** — phase preserved, the loop lands back on the same grid mid-stride
  (a seamless swap is seamless in timing too).
- A **crash** restart is a **fresh** start — rewrite `anchor = now`; phase reset
  is acceptable (a crash already disrupted; the loop is idempotent catch-up).
- A **cold** (first-ever) start writes a fresh anchor exactly like a crash.
- The brain picks preserve-vs-reset purely from the **D3 `StartReason`** — no new
  IPC field, no broker round-trip.
- **The idempotent pump cadences are NOT converted [V4].** They already
  "stagger from everything due now" (catch-up / idempotent / restart-safe);
  converting them would reintroduce the per-loop writes Q4 exists to minimize.
- **The one-shot (alarm) deadline RULE is fixed and documented [V3]; the
  machinery is deferred** — the daemon has no one-shot consumer today (alarms are
  legacy-listener-in-memory, §7), so building the timer now ships untested dead
  code. The rule is locked here; the timer ships with the alarm port
  (activate-don't-pre-fail). Already tracked in `docs/DEFERRED.md`.

## What is already satisfied / true (don't re-build, don't mis-scope)

- **The discriminator is wired (D3-2).** `StartReason` parses lenient (unknown →
  `Cold`), round-trips through argv, and reaches `run_brain`. D5 does **not** add
  a versioned field — it consumes the one D3 already shipped. **The D7 N-1 harness
  does NOT grow in D5** (the anchor is brain-local disk state, never on the IPC
  wire; the only D5 versioned surface is the disk-file shape, governed by the
  rollback-state-compat invariant below, not by KH-2.3 wire compat).
- **The pulse loop is `def+test`-only, NOT resident in the daemon brain yet.**
  `run_pulse_loop` (`lifecycle.rs:243`) has exactly one caller — its own unit test
  (`lifecycle.rs:652`). `run_brain` (`brainproc.rs:147`) does **not** drive it;
  the pulse/Psyche loops arrive with the live-agent adapter (D2 ledger:
  *"Psyche / pulse loops — legacy listener, not hosted in the spt-core daemon
  yet"*; REQ-HAZARD-PER-AGENT-SCHEDULING note: *"run_pulse_loop is def+test only
  and each agent drives its own pulse from its own process"*). **Therefore D5,
  exactly like D4, is forward-correct capability + a real harness proof, not a
  field-exercised path** — the disk-anchored-deadline mechanism is built and
  exercised by a real spawn/kill/respawn harness now, so it is genuine machinery
  the adapter inherits, not dead code waiting on the adapter.
- **The durable-state pattern already exists.** `EffectJournal`
  (`effect.rs`) and `DaemonConfig` (`config.rs:182`,
  `spt_store::perch::spt_home().join("daemon.json")`) are the precedent: a JSON
  file under `spt_home()`, parent-dir-created on open, corrupt-degrades-safe. D5's
  anchor file follows it verbatim — no new storage machinery.
- **The pulse period is already config-driven (REQ-DAEMON-1).**
  `DaemonConfig.pulse_period` is the `interval`; D5 reads it, it does not
  re-introduce a constant.
- **The loop is already freely killable/respawnable.** `run_pulse_loop`'s rustdoc:
  *"holds no state beyond the cursor on disk … a fresh loop re-reads the drop dirs
  and gate."* D5 adds **timing** durability to a loop whose **work** is already
  restart-safe — it does not make the loop stateful, it makes its *phase* durable.

## Per-commit discipline

Each sub-task is its own atomic commit with evidence tagged in-commit. Gates
every commit: `cargo build` · `cargo test` · `cargo clippy` · `cargo build
--no-default-features` · `traceable-reqs check` (EXIT=0) · `xtask check`. Push to
a dev-freeform branch → **CI both runners** before any tag (the v0.3.0 lesson).

---

## D5-1 — Durable absolute-deadline primitive · Q4, V4  ✅ DONE (8206ca2)

The persistent side gains a tiny, pure-derive deadline anchor — built and unit-
proven in isolation before any loop consumes it.

- **New module `deadline.rs`** (`spt-daemon`). A `DeadlineAnchor { anchor_ms:
  u64, interval_ms: u64 }` persisted as JSON under `spt_home()`. Open =
  parent-dir-created, **corrupt-degrades-to-fresh** (a garbled file is treated as
  absent → fresh anchor, never a hard fail — same posture as `EffectJournal`
  recover and `DaemonConfig` load).
- **[doyle amendment 6 — parameterize the anchor key (forward-collision guard,
  same latent-gap class as the D4-2b amd-5 shared-cursor drop).]** The anchor file
  is **keyed**, not a singleton: `DeadlineAnchor::open(key)` →
  `deadline-<key>.json`. A singleton `deadline-pulse.json` collides with D5's own
  forward story — the cited forward consumer is the live-agent adapter's
  **per-agent** pulse driver (*"each agent drives its own pulse"*,
  REQ-HAZARD-PER-AGENT-SCHEDULING / KH 7.4). Two agents rehydrating ONE file would
  cross-clobber: agent B's Crash/Cold rewrites `anchor=now` over agent A's phase;
  B's Update re-reads A's anchor as its own. The daemon pulse uses key `"pulse"`
  (file `deadline-pulse.json` — identical name to the singleton it replaces, the
  only loop that exists today), the adapter era passes the **agent id**. Same
  code, parameterized path — no new machinery, and it serves KH 7.4's per-agent
  non-serialization posture for free.
- **The rehydration rule, keyed on `StartReason` (the one decision point):**
  - `Update` → **load** the existing anchor; if present, **keep** it (preserve
    phase). *Defensive fallback:* if no file exists (e.g. updating **from** a
    pre-D5 binary that never wrote one), treat as fresh — write `anchor = now`.
    Never fail an update on a missing anchor.
  - `Crash` → **rewrite** `anchor = now` (fresh phase; reset acceptable).
  - `Cold` → **write** `anchor = now` (first-ever start; identical to crash).
  - `interval_ms` is always (re)written from the live `DaemonConfig.pulse_period`
    — a config change to the period takes effect on the next start, and the
    derive uses the current interval, not a stale persisted one. (Anchor persists;
    interval tracks config.)
- **Pure derive (no I/O, the hot path):** `next_fire(now_ms) -> u64 = anchor_ms +
  interval_ms × ⌈max(0, now_ms − anchor_ms) / interval_ms⌉`. **No per-fire
  write** — the file is written once at start and never on a tick.
  - **[boundary — doyle watch-for]** Decide and **test** the on-grid case
    explicitly: when `(now − anchor)` is an exact multiple of `interval`,
    `⌈0⌉`-style ceil yields `next_fire == now` (fire-now), then the post-fire
    advance must move strictly forward by one interval so a single grid instant
    cannot double-fire. The loop computes `next_fire(now)`, sleeps to it, fires,
    then advances `now` past that fire before the next `next_fire`. Assert: exact-
    grid `now` fires once; `now` between grid points sleeps to the next; a long
    jump (`now` ≫ anchor) snaps to the next future grid point (catch-up collapses
    missed ticks to one — idempotent, no fire-storm).
  - **Saturating arithmetic throughout** (the `peerloop` cadence-panic lesson,
    REQ-HAZARD line 615: an `Instant`/u64 subtraction underflow panicked the pump
    thread on a low-uptime runner). `max(0, now − anchor)` is a saturating sub;
    the ceil-div guards `interval_ms == 0`. **[doyle minor — pick ONE degrade,
    unit it]**: `interval_ms == 0` → fall back to the config default if it is
    `> 0`, else `1ms`; **unit-test that exact rule** rather than leaving an
    alternative in a comment.
- **No wall-clock dependency in tests.** `next_fire` takes `now_ms` as a param
  (pure), so the unit tests pass explicit clock values — no `Date.now()`-style
  flake, deterministic on every runner.

Evidence: `[impl->REQ-HAZARD-BROKER-PROCESS-ISOLATION]` (durable phase custody on
the persistent side) · `[unit->REQ-HAZARD-BROKER-PROCESS-ISOLATION]` (the pure
derive: on-grid fire-once, between-grid next, long-jump catch-up-to-one, zero-
interval guard, saturating no-underflow; **and the rehydration rule per
`StartReason`: update-keeps-anchor, crash-rewrites-now, cold-writes-now, update-
with-no-file-falls-back-fresh, corrupt-file-degrades-fresh; **and amd-6: two
keys → two independent `deadline-<key>.json` files, no cross-clobber — agent B's
Crash-rewrite leaves agent A's anchor untouched**). REQ already `[doc,impl,unit]`
active — **no toml change**.

---

## D5-2 — Convert the pulse loop to disk-anchored deadline · Q4, V4  ✅ DONE (4f107b6)

The one phase-significant resident loop consumes the anchor. **No other loop is
touched.**

- **`run_pulse_loop` sleeps to `next_fire`, not a flat `pulse_period`.** Today
  (`lifecycle.rs:249-257`) it `pulse_tick()`s then `sleep_interruptible(
  pulse_period)` — phase-relative, so every restart silently re-phases the grid
  to the restart instant. Convert it to: load/rehydrate the `DeadlineAnchor`
  (D5-1) at loop entry, then each iteration sleep **until `next_fire(now)`** (via
  the existing `sleep_interruptible` slice machinery so a stop still wakes it
  promptly), `pulse_tick()`, advance past the fire, repeat.
- **[doyle minor — double-fire guard is STRUCTURAL in the loop, not only asserted
  in tests.]** After a fire at grid point `fired_t`, derive the next deadline from
  **`fired_t` itself** (`next_fire(fired_t + 1)`, i.e. the `k+1` grid point),
  **never from a freshly re-sampled `now` alone** — a fast tick that re-samples
  `now` within the same millisecond re-yields `fired_t` and double-fires. Make
  "advance from the fired grid point" an **invariant of the loop body**, so the
  guard holds regardless of clock resolution (the D5-1 pure-derive on-grid test
  proves the math; this makes the *loop* honor it structurally).
- **Thread `StartReason` to the loop.** `run_pulse_loop` gains the start reason
  (or a pre-built `DeadlineAnchor`) so it applies the D5-1 rule. Because the only
  current caller is the unit test, this is a clean signature change — no
  production caller to migrate (contrast D4's byte-for-byte legacy-path
  preservation; here there is no legacy production path to preserve). The
  forward caller (the live-agent adapter's per-agent pulse driver) will pass the
  `StartReason` it received from the broker spawn stamp.
- **`sleep_interruptible` stays as-is** — it is the responsive-to-stop sleep
  primitive; D5 changes *how long* to sleep (to-deadline), not *how* to sleep
  (sliced, stop-aware). The 10ms-slice kill responsiveness is unchanged.
- **Idempotent pump cadences explicitly untouched [V4].** Do **not** convert
  `peerloop`'s due/stagger cadences. State the negative in the commit message and
  a code comment at the pump cadence site so a future reader does not "finish the
  job" — they are catch-up/idempotent and converting them re-adds the per-loop
  writes Q4 minimized. (The plan-level negative is in RESTORATION-PLAN.md
  "Immediate next-session start" item 4; D5-2 carries it into the code.)

Evidence: `[impl->REQ-HAZARD-BROKER-PROCESS-ISOLATION]` (the pulse loop derives
its cadence from the durable anchor) · `[impl->REQ-DAEMON-1]` (the consolidated
daemon pulse loop's timing is now swap-durable) · `[unit->REQ-HAZARD-BROKER-
PROCESS-ISOLATION]` (**the real harness — forward-correct, not pre-failed**:
drive a real `run_pulse_loop` against a temp `spt_home`, observe N ticks on a
grid; **stop + restart with `StartReason::Update` → the loop resumes on the SAME
grid (phase preserved, next fire lands at `anchor + k·interval`, not
restart+interval)**; **stop + restart with `StartReason::Crash` → the anchor is
rewritten to the restart instant (phase reset, grid re-based)**; assert **no
per-fire write** — **[doyle minor]** byte-compare the anchor file **content**
unchanged across N ticks, **not** mtime (mtime granularity is filesystem-
dependent → flake bait)).
REQs already active — **no toml change**. The **int** process-level survival
re-point stays **D7**.

---

## D5-3 — Lock the one-shot rule [V3] + rollback-state-compat note + docs · Q4, V3, V1-adjacent  ✅ DONE (c4af6cf)

D5 fixes the one-shot *rule* without building the *machinery*, and records the
durable-state invariant the D6 rollback path will depend on.

- **One-shot (alarm) rule — locked as a PURE HELPER, no machinery [V3].**
  **(Open call resolved by doyle → pure helper, 3 reasons: V3's target is
  untested dead code so a tested-unwired derive is the D4 `resume_sessions`
  posture exactly; the D5-3 evidence tag already presumes a helper and docs-only
  would orphan it; falsifiability now as executable asserts. The never-reset-on-
  crash asymmetry vs the periodic rule is exactly the kind of thing that drifts
  if the alarm port re-derives it from prose.)** The rule: persist the absolute
  `target-time` at **creation**; **every** brain start reads it and
  **fires-if-due**; **never reset** on any `StartReason` ("remind me at 3pm" is a
  commitment that outlives any restart — unlike the periodic crash-reset).
  - **Land it as `OneShotDeadline::fire_if_due(now) -> bool` in `deadline.rs`,
    beside `DeadlineAnchor`.** Constraints to keep V3's conservative lean honest:
    **pure derive + minimal persist shape ONLY** (`target_ms`); **`load` ignores
    `StartReason` entirely — that asymmetry IS the rule**; **NO scheduler / timer
    / thread / wiring** — no consumer drives it (the daemon has none today: alarm
    is an event *shape* in `spt-proto` + `psyrelay.rs:93` relay handling + a test
    fixture; the real timer is legacy-owl-listener-in-memory, BROKER-BRAIN-SPLIT-
    RESTORATION §7).
  - **The `docs/DEFERRED.md` row stays the machinery record** (*"Durable in-daemon
    alarm scheduler — the Q4/V3 deferral"*) — cross-referenced from the helper's
    rustdoc. No new DEFERRED row; the helper is the *rule*, the row is the
    *scheduler* the alarm port will build (reusing this helper + the D5 disk
    pattern).
- **[V1-adjacent] Anchor file is rollback-N-1-safe — record it now.** D6's auto-
  rollback may spawn the **old** (pre-D5) binary against durable state the new
  brain wrote, including `deadline-pulse.json`. The old binary simply does not
  know the file → ignores it → re-phases on its own flat-sleep cadence (the
  pre-D5 behavior). The new file is **purely additive durable state** — no schema
  migration of an existing file, no irreversible pre-ready write. This satisfies
  the D6 rollback-state-compat invariant **for free**; record it as a one-line
  note (KNOWN-HAZARDS 6.8 sibling / the D6 plan input) so the D6 invariant guard
  knows `deadline-pulse.json` is already conformant and a future *migration* of
  it is the thing to gate.
- **Docs.** Add the absolute-deadline mechanism to KNOWN-HAZARDS (new entry or
  the Q4 sibling of the existing generation/handoff notes): periodic phase-
  significant timing is disk-anchored + derived; update preserves phase, crash
  resets, one-shot never resets; **no per-fire write**. Dual-audience per
  DOCS-STRATEGY. `xtask check` must stay drift-clean.

Evidence: `[unit->REQ-HAZARD-BROKER-PROCESS-ISOLATION]` (the one-shot **rule**
proven as a pure-derive helper even though no consumer drives it — fire-if-due on
start, never-reset-on-crash, distinct from the periodic crash-reset; this proves
the rule without shipping the scheduler) + a doc note (KNOWN-HAZARDS + DEFERRED
cross-ref). No new REQ; no toml change.

> *D5-3 open call — RESOLVED (doyle, 2026-06-10): **pure helper**.*
> `OneShotDeadline::fire_if_due(now)` in `deadline.rs` — tested, unwired, the D4
> `resume_sessions` posture (real machinery, real test, no live wiring). Pure
> derive + minimal `target_ms` persist only; `load` ignores `StartReason`; no
> scheduler/timer/thread. The DEFERRED.md row remains the scheduler record,
> cross-ref'd from the helper rustdoc.

---

## Sequencing

D5-1 (the durable primitive + pure derive + rehydration rule, unit-proven in
isolation) → **D5-2** (the pulse loop consumes it; the real spawn/kill/respawn
harness proves update-preserves / crash-resets / no-per-fire-write) → **D5-3**
(lock the V3 one-shot rule, record the rollback-N-1-safe note, docs). D5-1 before
D5-2 because the loop derives from the primitive; D5-3 last because it asserts the
end-state and the cross-milestone (D6) invariant the first two establish.

## N-1 compat — does NOT grow in D5

D5 adds **no new IPC wire field**. The `StartReason` discriminator is D3-2's
already-shipped field; the `DeadlineAnchor` is brain-local disk state. The D7
old-broker × new-brain verb-surface harness therefore gains **no new assertion**
in D5 (it grew by one field in D3-2 and one in D4-1; D5 contributes none). The
only D5 versioned surface is the **disk-file shape**, governed by the additive /
corrupt-degrades-fresh / rollback-N-1-safe rules in D5-1/D5-3 — not by KH-2.3 wire
compat. Record this explicitly so the D7 close-out does not hunt for a D5 field
that doesn't exist.

## Traceability — no toml activation in D5

| REQ | State entering D5 | D5 adds | Activation note |
|-----|-------------------|---------|-----------------|
| `REQ-HAZARD-BROKER-PROCESS-ISOLATION` | `[doc,impl,unit]` | impl + unit (durable deadline primitive, pulse-loop conversion, one-shot rule helper) | already active; **int → D7** |
| `REQ-DAEMON-1` | `[impl,unit,int]` | impl (pulse-loop timing now swap-durable) | already active; no stage change |

No `required_stages` change lands in D5 (rule 5): every D5 REQ is already active.
`traceable-reqs check` stays green at every commit. The D6 rollback-state-compat
hazard (`REQ-HAZARD-ROLLBACK-STATE-COMPAT`) is **doc-activated at D6**, not D5 —
D5 only leaves the note that `deadline-pulse.json` is conformant input for it.

## Risks / watch-items (doyle's, baked in)

- **[V4] Do NOT convert the idempotent pump cadences.** The single biggest
  mis-scope risk: a reader "completes" the deadline conversion by touching
  `peerloop`. They are catch-up/idempotent/restart-safe and converting them re-
  adds the per-loop writes Q4 minimized. D5-2 carries the negative as a code
  comment at the pump cadence site, not just a plan line.
- **Forward-correct, not field-exercised.** The pulse loop is `def+test`-only in
  the spt-core daemon (not driven by `run_brain`); it arrives with the live-agent
  adapter. Keep the D5-2 harness **real** — a real `run_pulse_loop` over a temp
  `spt_home`, real stop/restart with a real `StartReason`, real assertion that the
  anchor file is unwritten across ticks — so the capability is genuinely tested,
  not a mocked stub (the D4 posture, verbatim).
- **The on-grid boundary (double-fire).** `next_fire` at an exact grid instant
  must fire **once** and advance strictly forward. Test the exact-multiple case,
  not just between-grid. Collapse missed ticks to one fire on a long jump (no
  fire-storm after a long downtime) — idempotent catch-up, the pulse work re-reads
  state anyway.
- **No underflow panic (peerloop lesson, REQ-HAZARD line 615).** `now − anchor`
  is a **saturating** sub; guard `interval == 0`. A cadence-arithmetic panic
  already cost a CI red on a low-uptime runner — the derive must be host-uptime-
  independent and pass explicit clock values in tests.
- **Crash-reset is correct, not a bug.** Resetting the periodic phase on a crash
  is the Q4 rule; do not "preserve phase on crash too" — a crash carries no
  durable intent about phase, only the one-shot deadline does (and that one never
  resets). Keep the two rules distinct.
- **Rollback-N-1-safety of the new file.** `deadline-pulse.json` must stay
  ignorable by a rolled-back pre-D5 binary (it is, by construction — additive
  file). Don't migrate or repurpose an existing durable file; a new additive file
  is the safe shape. Recorded for D6.
- **Config period change semantics.** `interval` tracks live `pulse_period` on
  each start (re-written), `anchor` persists. A period change re-phases the grid
  on the next start under the new interval — acceptable and intentional (a config
  change is an operator act, not a swap). Note it so it is not read as a phase bug.

## Immediate next step

Start **D5-1**: add `deadline.rs` with `DeadlineAnchor` (persist `(anchor,
interval)` under `spt_home()`, corrupt-degrades-fresh), the pure
`next_fire(now_ms)` derive (saturating, zero-interval-guarded, on-grid-fire-once),
and the `StartReason`-keyed rehydration rule (update-keeps / crash-rewrites /
cold-writes / update-no-file-falls-back-fresh). Unit-prove the derive + the rule
in isolation with explicit clock values **before** D5-2 wires it into
`run_pulse_loop`. Do **not** touch the `peerloop` cadences.
