---
status: root_cause_found
trigger: |
  Outdated psyche context delivered by SessionStart hook to gen38 doyle. Phase 28 was completed
  and committed before the prior Claude session was cleared via /clear, but no commune capturing
  Phase 28 completion was written. Stored psyche still shows Phase 28 as "PLANNED" with plan 04
  tests "in progress", missing commits e3e38d6..5f64c98 (UAT, code review, v1.9.12 bump, review
  fixes). User is ready to discuss Phase 29 but SessionStart hook injected stale context.
created: 2026-05-13
updated: 2026-05-13
---

# Debug Session: psyche-stale-after-clear

## Symptoms

- **Expected**: After /clear, next SessionStart delivers psyche context reflecting Phase 28
  completion marker (status flipped PLANNED → COMPLETE).
- **Actual**: SessionStart injected psyche context still showing Phase 28 PLANNED, last commune
  #6 at 2026-05-13T00:34:27 captured Phase 28 plan 04 tests in progress. Pulses #7-9 fired
  (00:54, 00:59, 01:31) but no commune. Latest commit 5f64c98 at ~01:11 (matches new gen38
  perch start time per `started` field) — phase complete before /clear.
- **Errors**: None surfaced in user-facing output. Wrapper appears alive (perch listed ACTIVE).
- **Timeline**: Phase 28 final commits ~01:11. User ran /clear in same Claude Code session
  5-10 min later. New SessionStart fired (this session) → stale psyche injected.
- **Reproduction**: Land batch of commits, then run /clear in same CC session before Psyche
  wrapper next commune fires.

## Current Focus

hypothesis: |
  CYCLE 3: The wrapper HANDOFF (Phase 18.4 binary-upgrade rehydrate via wrapper-state.json)
  at 01:11:12 caused loss of an in-flight queued echo-commune that was successfully fired
  at 00:59:57 (exit=0, 2089 stdout bytes, 17 excerpt lines). The user fired an explicit
  $LIVE commune 5-10 min BEFORE /clear, and a system echo-commune fired at 00:59:41
  per the wrapper log — neither produced a psyche.md entry. The next pulse drain at
  01:31:12 returned only 40 bytes (bare PULSE_TRIGGER, no echo-commune content).

  Three suspects, ranked:
  (c) Spool drain in `owl poll` returns early after first message instead of draining the
      whole psyche-inbox spool — so PULSE_TRIGGER consumed but queued echo-commune lost.
  (a) Wrapper handoff rehydrate from wrapper-state.json does NOT preserve the SQLite spool
      cursor / queued messages — fresh process loses uncommitted inbox state.
  (b) Echo-commune delivery target was tied to the old wrapper PID/listener that died at
      handoff at 01:11:12; new wrapper (pid=8068) never received the queued payload.

next_action: |
  Read in this order (file paths only, no inlining):
    1. src/live/wrapper/handoff.rs OR perform_wrapper_handoff (whatever name) +
       wrapper-state.json serialization — list every field preserved across handoff.
       Specifically check: SQLite spool cursor, in-flight inbox rows, listener TCP socket
       handoff, perch info.json continuity.
    2. src/owl/poll.rs spool drain loop — does it `break` after the first message returned,
       or does it loop until spool empty? Trace `check_message_blocking_until` →
       drain path → exit code 0 with N bytes.
    3. src/live/wrapper/echo_fire.rs ECHO_COMMUNE delivery path — what perch name / address
       is the echo commune delivered to? Is it stored by perch-id (survives handoff)
       or by listener PID (dies on handoff)?
    4. Inspect doyle-psyche perch spool DB rows around 01:11:12 timestamps — were the
       00:59:57 messages present pre-handoff, present post-handoff, or missing entirely?

  Then form a definitive answer ranking suspects (c) vs (a) vs (b), produce
  ROOT CAUSE FOUND with specific file:line citations.

## Evidence

- timestamp: 2026-05-13T01:11
  observation: Latest commit 5f64c98 (review fix log) — Phase 28 fully complete + pushed.
  source: `git log --oneline -30`

- timestamp: 2026-05-13T01:11:12
  observation: Perch `doyle` started at 01:11:12 PDT, parent_pid=11364, session_id=0c440482.
  source: `$OWL list` info.json

- timestamp: 2026-05-13T01:31:36
  observation: psyche.md DOES contain commune #7 — written 20+ min AFTER /clear (after this
  new Claude session was started). Captures "Phase 28 SHIPPED" + next step "/gsd-discuss-phase 29".
  source: `cat $LOCALAPPDATA/spt/psyches/tracked/doyle.md`
  significance: The wrapper eventually DID flush the right content — but only at 01:31:36,
  well after the SessionStart hook injection at 01:11:12 had already delivered stale content
  to the current Claude session.

- timestamp: 2026-05-13T01:51:41 (psyche perch info.json `started`)
  observation: Wrapper had a restart between session creation and now.
  source: `cat $LOCALAPPDATA/spt/owlery/doyle-psyche/info.json`

- last_commune_epoch: 1778662301 = 2026-05-13 01:51:41 (validates commune #7 fire and
  pulse #10 sentinel rebump after).

- timestamp: 2026-05-13 (orchestrator cycle 2)
  observation: `src/owl/cleanup.rs:67-69` Psyche perches ARE skipped on cleanup-session
  (regression-tested via `cleanup_skips_psyche_perch_with_matching_session_id`).
  source: orchestrator code-read

- timestamp: 2026-05-13 (orchestrator cycle 2)
  observation: `src/owl/cleanup.rs:76-84` Listener perches matching the ending session_id
  are SOFT-STOPPED — `ready` sentinel removed, `info.json` preserved. Regression test
  `cleanup_still_soft_stops_listener_perch_with_matching_session_id` asserts this is the
  intended behavior. Self's listener does NOT survive /clear under current code.
  source: orchestrator code-read

- timestamp: 2026-05-13 (cycle 3 — wrapper log hard evidence, $LOCALAPPDATA/spt/psyches/tracked/doyle.log)
  observation: |
    [00:59:41] [ECHO] gate open: sentinel stale — firing echo commune
    [00:59:57] ECHO_COMMUNE exit=0 stdout_bytes=2089 excerpt_bytes=34085 excerpt_lines=17 excerpt_source_error=ok
    [01:11:12] poll exited code=2 stderr=HANDOFF_DEFER:doyle-psyche
    [01:11:12] HANDOFF wrapper: target=cache/cplugs/spt/1.9.12/owl.exe, new pid=8068
    [01:11:12] wrapper started gen=38 handoff_hydrated=true
    [01:11:12] HANDOFF: skipping init_session, rehydrated from wrapper-state.json
    [01:11:12] poll iteration 1 starting
    [01:31:12] poll exited code=0 stderr=
    [01:31:12] poll returned 40 bytes
    [01:31:12] [MSG] from=PULSE_TRIGGER (2026-05-13 01:31:12 PDT): ""
    [01:31:20] [ECHO] gate open: firing echo commune
    [01:31:36] ECHO_COMMUNE exit=0 — finally captures Phase 28 SHIPPED
  significance: |
    00:59:57 echo-commune SUCCEEDED (exit=0, 2089 stdout bytes, 17 excerpt lines).
    User also fired an explicit $LIVE commune within ~5-10 min before /clear (00:59-01:06).
    NEITHER produced a psyche.md entry. The 01:31:12 poll drain returned only 40 bytes
    (bare PULSE_TRIGGER timestamp/empty body) — the queued echo-commune content
    from 00:59:57 was GONE. Wrapper HANDOFF ran in the gap at 01:11:12.

- timestamp: 2026-05-13 (cycle 3 — pulse log analysis)
  observation: Pulse #8 fired at 00:59:31, 10s BEFORE the echo at 00:59:41. So the echo
    landed in psyche-inbox spool awaiting next pulse drain. Next drain at 01:31:12
    returned 40 bytes (only PULSE_TRIGGER). Queued echo content lost.
  source: pulse log + wrapper log cross-reference

## Eliminated

- **Stale wrapper-state.json reading (SessionStart payload)**: SessionStart hook reads psyche.md
  via `download_payload`, NOT wrapper-state.json. The .md is authoritative on-disk.

- **Wrapper crashed/wedged**: Wrapper alive — commune #7 + pulse #10 written after /clear.

- **Psyche LLM ignored the work**: When invoked at 01:31:36, Psyche correctly captured
  Phase 28 completion. The semantic flush is fine — the upstream data delivery is broken.

- **Echo-commune ENOENT on jsonl**: Not the issue — echo-commune at 01:31 happened after
  session_id patch and still captured pre-/clear content correctly.

- **Orphan-detect → INIT_SIGNOFF derailed wrapper**: No INIT_SIGNOFF in log around /clear.

- **HANDOFF drops in-flight spool**: REJECTED by user 2026-05-13. HANDOFF only fires on
  version-bump deploy. User does not commune mid-deploy. Original incident HANDOFF at
  01:11:12 was after user's pre-/clear commune already failed to fire — handoff cannot
  be blamed. The lost 00:59:57 echo content was lost to the same TCP-wake bug, not handoff.

## Eliminated (cycle 2 — user pushback)

- **Initial root cause (no sync flush on /clear)**: REJECTED. Both perches expected to
  survive /clear; communes expected to wake Psyche fast-path. Real bug must explain
  20-min latency on a wake that should be near-instant.

## Eliminated (cycle 3 — user pushback)

- **15-min ECHO_COMMUNE_WINDOW gate as PRIMARY root cause**: REJECTED. The 00:59:57 fire
  PASSED the gate and executed successfully (exit=0, 2089 stdout bytes). The bug is
  downstream of the gate — the commune fired but its payload never reached the psyche
  resume session. User also fired explicit $LIVE commune in the same window which also
  produced no psyche.md entry. The gate is secondary at best; the data-loss across the
  01:11:12 wrapper HANDOFF is primary.

## Cycle 4 — Live trace confirms TCP wake is the sole RC

Hands-on trace 2026-05-13 04:02-04:09 on gen39 doyle while wrapper pid=35948 was alive
and LISTENING on 127.0.0.1:54619 (per registry + netstat):

| t        | event                                                                      |
|----------|----------------------------------------------------------------------------|
| 04:02:20 | `$LIVE commune doyle <<<"TEST-COMMUNE-TRACE-..."` → `SENT:doyle-psyche`    |
| 04:08:07 | `$OWL send doyle-psyche doyle <<<"PROBE-OWLSEND-..."` → `SENT:doyle-psyche`|
| (gap)    | wrapper log mtime frozen at 03:49:02 — poll iter 6 blocking                |
| 04:09:02 | pulse timer fires → poll exits → 331 bytes drained (BOTH msgs + PULSE)     |
| 04:09:45 | claude resume exit=0 (43s wall) → psyche.md +441 bytes                     |

Both messages reached the registry-correct target and returned `SENT`, but neither
woke the poll. They drained only at the next 20-min pulse boundary. ~6m42s and
~55s observed latencies are bound by pulse cadence, not TCP.

This contradicts cycle-2's "TCP wake mechanics fine" elimination. **Psyche-wrapper's
inner poll is pulse-timer-only — it does NOT TCP-wake on inbound messages.**

## ROOT CAUSE (cycle 4, confirmed)

Psyche-wrapper inner poll is pulse-timer-driven only. Inbound TCP messages to the
psyche perch sit in spool until next pulse-cadence expiry (default 20 min), regardless
of liveness or success. This single bug explains:

- Original /clear incident: user commune ~5-10 min pre-/clear sat in spool. /clear
  created new Self SessionStart which read psyche.md BEFORE next pulse drained the
  queued commune. Psyche.md was stale relative to user's mental model because the
  upstream queue had not been drained yet.
- 00:59:57 echo-commune "lost content": fired successfully, posted to psyche perch
  spool, awaited next pulse. Sat in spool through HANDOFF and post-HANDOFF iter 1.
  Drained at 01:31:12 — but log shows only PULSE_TRIGGER. Per cycle 4 user clarification,
  HANDOFF is NOT the culprit; this is the same pulse-cadence delivery delay, possibly
  combined with spool drain accounting issue (separate investigation if reproduces).

## Confounding factor observed but orthogonal

Anthropic monthly quota exhausted between 02:48 and 04:09 — every `claude -p --resume`
returned `You've hit your org's monthly usage limit` (exit=1). Compounded staleness
during the quota window but does NOT cause the TCP-wake bug.

## Resolution

status: root_cause_found
specialist: rust
fix_target: psyche-wrapper inner poll must TCP-wake on inbound messages (mirror Self
            listener's `check_message_blocking_until` OS-level poll readiness path).
files_to_inspect:
  - src/live/wrapper/mod.rs (pulse cadence loop)
  - src/live/wrapper/scheduler.rs (pulse timing)
  - src/owl/poll.rs (Self listener's TCP wake reference impl)
  - src/live/wrapper/echo_fire.rs (downstream context)

