# Digest extractor & thread-spanning — JIT milestone plan

> **STATUS: PLANNED (2026-06-13).** Product of the digest grill (operator + `doyle`).
> Grounded against the live tree post-M9 + **ADR-0019** (the governing decision) +
> the rewritten CONTEXT.md "live activity buffer (session digest)" entry. This
> project does **not** use GSD — this is a hand-authored JIT plan, the heir to
> `M9-PLAN.md`. Executes on operator dispatch (likely → todlando), `doyle` gates.
>
> **Roadmap placement: M10** (operator-set 2026-06-13) — ahead of Shell (now M11)
> and remote-attach (now M12); it is an `spt-claude-code` enabler.

---

## 1. Goal

Make the session digest something a **real, first adapter** (`spt-claude-code`) can
actually drive: give it its own adapter-declared extractor seam, follow a live
agent across `/clear` session-hops, and merge in the context spt itself feeds the
agent. Reverses M9's "no manifest seam" stance (ADR-0019).

**End state:** an adapter declares `[digest]`, runs `spt adapter digest-proof` to
confirm its extractor before shipping, and `spt endpoint digest <id>` renders a
time-ordered timeline that (a) spans a `/clear` with a visible boundary marker and
(b) interleaves the agent's activity with spt's injected context — with no silent
drops.

## 2. Requirements (registered; activate at execution start — rule 5)

| REQ | Scope | Stages to activate |
|-----|-------|--------------------|
| **REQ-TERM-5** | `[digest]` extractor seam + register validation + presentation knobs + `digest-proof` + runtime diagnostics | `doc, impl, unit, int` |
| **REQ-TERM-6** | thread-spanning: session ledger + `api boundary` append + last-K enumeration + boundary marker | `impl, unit, int` |
| **REQ-TERM-7** | two-origin merge: spt context-injection entries → `digest.log`, ts-interleave, context category (data model only) | `impl, unit, int` |
| REQ-TERM-4 | carried (projection + pull + delta-stream + `api digest-entry`) — no re-point; new REQs layer on | unchanged |

**Task 0 (first commit of execution):** set the three `required_stages` above in
`traceable-reqs.toml`. The tree is green now because they are empty; activation +
first evidence land together so no commit is ever left activated-but-unevidenced.

## 3. Design (settled — see ADR-0019, do not re-litigate)

The full decision record is **ADR-0019**. Load-bearing points the tasks depend on:

- **Imperative extractor, no DSL.** `[digest]` declares a command (native log →
  `{role,text,tool,ts}`). Defaults to the `[history]` source files (reference its
  locate pattern); own-source escape hatch. `api digest-entry` push stays the
  fallback. Real CC logs (9 record types, `assistant.content` a mixed
  thinking/text/tool_use list, one line → many entries) are why a declarative map
  is refused.
- **One source, two extractors.** `[history]`/echo stays opaque + single-session;
  `[digest]` is contract-typed + session-spanning. They no longer share a
  normalizer — the dual-consumer conflict is the whole reason for this milestone.
- **Presentation is adapter-defaulted, consumer-overridable** (window depth,
  arg-truncation, sprint-collapse). The fixed "~3 turns" is dropped.
- **Thread-spanning** via a per-endpoint **session ledger** (`<perch>/sessions.log`,
  heir to legacy `psyches/tracked/agents/<id>/sessions.log`); `api boundary`
  appends the rotated `session_id`; the digest enumerates the last K sessions.
- **Two-origin merge.** spt appends its own context-injection entries
  (`psyche_download | echo_mirror | owl_message`) to `digest.log`; the projection
  interleaves them with extracted activity records by `ts`.
- **Diagnostics are mandatory** (the silent drop is the gap that bit us): a shared
  engine behind `spt adapter digest-proof` (author-time) and a runtime warn/count.

## 4. Task breakdown

### Wave 1 — the `[digest]` seam + extractor (REQ-TERM-5)
- **T1 — manifest `[digest]` section.** `spt-runtime`: struct + serde + `validate()`
  (extractor command present; source pattern resolvable or own-source given; knob
  types). Add optional `ts` to the `spt-term` digest-record contract. `[doc/impl/unit]`
- **T2 — extractor execution.** `spt-daemon::digest`: `project_endpoint_digest`
  resolves the `[digest]` extractor over the source file(s); apply adapter-default
  presentation knobs with consumer override threaded from pull/subscribe. `[impl/unit]`
- **T3 — diagnostics + proof tool.** Shared skip-diagnostic engine (count + first
  mismatch + reason); `spt adapter digest-proof <adapter> [--sample <log>]` (prints
  parsed records, rendered digest, every dropped line + why); daemon emits the same
  at runtime. Remove the silent `from_str().ok()` drop. `[impl/unit]`
- **T-doc — MANIFEST.md `[digest]` section.** `[doc]`

### Wave 2 — thread-spanning (REQ-TERM-6)
- **T4 — session ledger.** `<perch>/sessions.log` JSONL `{ts, session_id, trigger:
  boot|clear|compact}`; writer appends at `establish_perch` first bind; reader
  enumerates last K (config-bounded). `[impl/unit]`
- **T5 — boundary append.** Extend `cmd_boundary` / `resurface_at_boundary` to append
  the rotated `session_id` to the ledger (on top of the existing rebind). `[impl/unit]`
- **T6 — spanning projection + marker.** Digest source enumerates last K sessions,
  runs the extractor per file, concatenates oldest→newest; insert
  `DigestEntry::Boundary{kind, ts}` at each transition; window bridges the hop. `[impl/unit]`

### Wave 3 — two-origin merge (REQ-TERM-7)
- **T7 — context category + merge.** `digest.log` becomes a tagged union (activity
  record vs context-injection record `{kind, body, ts}`); projection interleaves
  extracted activity + digest.log entries by `ts`. `[impl/unit]`
- **T8 — injection taps.** Append a context entry where spt injects: psyche download
  (`spt-live::psyche`), echo mirror (`spt-live::echo`), owl message (spool/`ingest`).
  Adapters never emit context records — spt owns them. `[impl/unit]`

### Wave 4 — integration, docs, gate
- **T9 — E2E (REQ-TERM-5/6/7 `int`).** One harness source → echo (opaque) **and**
  digest (contract) from the same files; a `/clear` spanned with a boundary marker;
  spt context entries interleaved by ts. The realest fixture (corrected dual-consumer
  proof + Candidate 03). `[int x3]`
- **T-docs — sweep.** MANIFEST `[digest]` (T-doc), **harness integration checklist**
  updated, CHANGELOG `[Unreleased]` (note: ADR-0008 source-mechanism **superseded by
  ADR-0019**; the v0.5.0 note is **not** retro-edited — the next release's changelog
  records the supersession coinciding with it), CONTEXT.md (done), ADR-0019 (done). `[doc]`

## 5. Commit choreography
Atomic per task; `traceable-reqs check` EXIT=0 + CI-targeted local sweep green at
each commit. Task 0 (activate stages) rides T1's commit so first evidence and
activation land together. No commit left activated-but-unevidenced.

## 6. Gate checklist (doyle, pre-merge)
- [ ] REQ-TERM-5/6/7 each `doc?/impl/unit/int` satisfied; `traceable-reqs check` EXIT=0; no orphan tags.
- [ ] `[digest]` section validated at register (bad extractor / bad knob caught at `adapter add`, not spawn).
- [ ] `spt adapter digest-proof` runs against a **real CC session JSONL** sample and surfaces a deliberately-broken extractor's drops (no silent empty).
- [ ] Spanning E2E: digest bridges a `/clear`, boundary marker present, window not reset-to-empty.
- [ ] Two-origin E2E: a psyche/echo/owl injection appears interleaved by ts.
- [ ] Full suite: workspace 0-fail, `clippy -D`, `--no-default-features`, xtask check, `traceable-reqs check` EXIT=0.
- [ ] Docs: MANIFEST `[digest]`, harness integration checklist, CHANGELOG supersession note (NOT a v0.5.0 retro-edit), CONTEXT + ADR-0019 consistent.

## 7. Deferred (no consumer yet — same logic as M9)
- GUI collapse/expand of context-injection entries.
- The echo-commune-reads-the-digest delta loop (the consumer that motivated two-origin).
- Autonomous file-watch digest freshness (still deferred from M9).
- Candidate 04 (Psyche M2b→M3b hosting) — orthogonal; belongs to the daemon/live-agent milestone.

## 8. Risks / watch-items
- **`ts` source skew.** Adapter records carry the harness's `timestamp`; spt stamps
  its own injections. Clock-source differences could misorder the merge — pin both to
  the same monotonic/wall basis or document the ordering rule. Flag at gate.
- **Ledger growth / bound.** `sessions.log` must bound or rotate (legacy's seal-on-boot
  failure grew it unboundedly — see the sister project's `doyle-sessions-seal` post-mortem). Cap K + the on-disk ledger.
- **`digest.log` dual-schema.** Activity and context records share the file; the
  tagged-union discriminator must be unambiguous and forward-compat (unknown kind → skip, like the contract).
- **Don't expand scope** — data model only for two-origin; no GUI, no echo loop, no new realtime subsystem.
