---
status: root_cause_found
trigger: |
  doyle.log shows the psyche-wrapper stuck in an endless INIT_SIGNOFF loop.
  After Self exited at 21:53:23 (user-initiated signoff with FINAL COMMUNE
  payload), the wrapper polled the message, did NOT recognise it as an
  INIT_SIGNOFF, fed it to claude --resume normally, and continued looping.
  From 21:53:59 onward, orphan-detect fires every iteration (Self.ready
  removed), sends a fresh INIT_SIGNOFF, waits 5s, then re-enters the poll
  loop because (a) check_orphan() always returns false and (b) the next
  iteration's poll-drained INIT_SIGNOFF line also fails the line-prefix
  check. Net result: 415 INIT_SIGNOFF occurrences in a single log over
  hours of looping; wrapper never exits.
created: 2026-05-13
updated: 2026-05-13
---

# Debug Session: psyche-wrapper-init-signoff-loop

## Symptoms

- **Expected**: First INIT_SIGNOFF delivered to the psyche perch causes the
  wrapper to fire `final_session` (one last claude --resume to flush context)
  then break the poll loop and exit. Subsequent orphan-detects must NOT fire
  if the wrapper has already exited.
- **Actual**: 415 INIT_SIGNOFF entries in `doyle.log` over a multi-hour window.
  Pattern repeats every ~7-9s: orphan detected → 5s grace → INIT_SIGNOFF
  delivered to own psyche inbox → poll drains it → `[PSYCHE] resume (exit=0)`
  (NOT `[PSYCHE] final`) → next `poll iteration N+1 starting`. Wrapper never
  exits; `final_session` never logged once across the entire 2086-line log.
- **Errors**: None surfaced. Resume calls return exit=0 with empty bodies
  because the LLM session has nothing meaningful to say to a repeated
  shutdown notice during a usage-limit / post-Self window.
- **Timeline**:
  - 21:53:23 — Self runs `live signoff`. INIT_SIGNOFF + FINAL COMMUNE
    delivered to doyle-psyche spool. Wrapper drains 2493 bytes, logs
    `[MSG] from=<EVENT type="msg" from="">INIT_SIGNOFF (: ""`, then feeds
    payload to `[PSYCHE] resume (exit=0)`. NOT routed to `final_session`.
  - 21:53:59 — Self.ready gone (Self process terminated post-signoff).
    `check_orphan` fires → 5s grace → INIT_SIGNOFF delivered → loop continues
    because check_orphan return value is discarded by caller.
  - 21:54:04 … 22:19:xx — loop repeats every ~7-9s indefinitely. Each
    iteration the spool now contains the wrapper's own orphan-fired
    INIT_SIGNOFF which also fails the line-prefix check.
- **Reproduction**: Any user-initiated `live signoff` after commit `aba13d9`
  (poll stream-emit envelope) will hit this. The orphan loop only emerges if
  Self also dies (so .ready is removed). Both happen on `live signoff`.

## Current Focus

hypothesis: |
  Two reinforcing defects make the wrapper structurally unable to exit
  once orphaned:

  (1) Envelope-vs-prefix mismatch. Commit `aba13d9` (feat(27-02): convert
      poll::run delivery arms to stream-default emit) wrapped every poll
      delivery as `<EVENT type="msg" from="...">BODY</EVENT>`. The wrapper's
      INIT_SIGNOFF detection at src/live/wrapper/mod.rs:187-194 still uses
      `line.trim_start().starts_with("INIT_SIGNOFF")` — which is false for
      a line that starts with `<EVENT`. So `final_session` is never reached
      and the loop is never broken via the INIT_SIGNOFF path. By contrast
      `drain_stale_init_signoffs` at lifecycle.rs:89 uses `msg.contains(...)`
      and works correctly; only the live-loop detection is broken.

  (2) Orphan path can't trigger exit either.
      - src/live/wrapper/orphan.rs:72 hard-codes `false` (with the comment
        "Don't break the loop -- let the next poll pick up INIT_SIGNOFF").
        That comment is correct ONLY IF the next-poll path actually breaks
        on INIT_SIGNOFF — which (1) shows it does not.
      - src/live/wrapper/mod.rs:94 discards the return value entirely
        (`self.check_orphan();` — return type bool is unused). So even
        flipping orphan.rs to return `true` would not help without also
        changing the caller.

  Together, (1) and (2) mean: when Self dies, the wrapper is in a tight
  orphan-detect → INIT_SIGNOFF → resume-it-as-a-regular-message loop with
  no exit valve.

next_action: |
  Fix recommendation (apply via /gsd-quick — atomic commits per project rule):

  Primary fix — restore INIT_SIGNOFF detection over the `<EVENT>` envelope:
    src/live/wrapper/mod.rs:187-194
    Change predicate from
      msg.lines().any(|line| line.trim_start().starts_with("INIT_SIGNOFF"))
    to a tolerant variant that matches both raw and EVENT-wrapped lines,
    e.g.
      msg.contains("INIT_SIGNOFF")
    (matches the same idiom already used by drain_stale_init_signoffs at
    lifecycle.rs:89) OR a stricter line scan that strips the
    `<EVENT type="msg" from="...">` prefix before the starts_with check.
    contains() is the lowest-risk change and aligns with the existing
    convention in this module.

  Secondary fix — make orphan path able to exit when it should:
    src/live/wrapper/orphan.rs:67-72
    When `still_gone` is true after the 5s grace, return `true` (not `false`).
    src/live/wrapper/mod.rs:94
    Wrap the call: `if self.check_orphan() { break; }`.

  Either fix in isolation breaks the loop. Land BOTH so the system has
  defense in depth (a future regression in either path still terminates).

  Regression tests to add:
  - tests/live_wrapper_signoff.rs (or existing wrapper unit-test module):
    Feed a mocked `<EVENT type="msg" from="">INIT_SIGNOFF (ts): body</EVENT>`
    string through the predicate logic and assert it matches.
  - Unit test for `check_orphan` return value: when Self.ready is removed,
    after grace period `check_orphan` returns true.
  - Existing `poll_psyche_argv_contains_once_flag` test (db08bf5) stays.

  Manual verification:
  - Build release, redeploy via DEPLOY.ps1, kick a live agent, run
    `live signoff <id>` with a tiny commune body, watch the psyche log:
    expect a single `INIT_SIGNOFF detected, firing final context save`
    followed by `Final psyche invocation complete, wrapper exiting`,
    THEN the wrapper process exits. No second INIT_SIGNOFF cycle.

## Evidence

- timestamp: 2026-05-13T21:53:23
  observation: |
    User-initiated signoff. Wrapper drains 2493-byte payload. The [MSG] line
    is logged as `[MSG] from=<EVENT type="msg" from="">INIT_SIGNOFF (: ""`.
    Critically, the wrapper then logs `[PSYCHE] resume (exit=0)` and a normal
    `auto-commit: git_commit_context after resume_session_checked` — NOT
    `INIT_SIGNOFF detected, firing final context save` and NOT
    `[PSYCHE] final` and NOT `Final psyche invocation complete, wrapper exiting`.
    Grep for `final_session|firing final|Final psyche` over the entire 2086-line
    log returns ZERO matches.
  source: |
    C:\Users\decid\AppData\Local\spt\logs_latest\doyle.log lines 744-751;
    Grep("final_session|firing final|final \\(exit|Final psyche",
         doyle.log) → "No matches found"

- timestamp: 2026-05-13T21:53:59
  observation: |
    Self.ready removed. `check_orphan` fires correctly:
      [21:53:59] Self parent session is gone (orphan detected via parent_pid),
                 triggering INIT_SIGNOFF
      [21:54:04] Self parent confirmed gone after grace period, proceeding
                 with INIT_SIGNOFF
    But the very next line is `[21:54:04] poll iteration 56 starting`,
    confirming the orphan-detect did NOT break the loop. Combined with
    src/live/wrapper/orphan.rs:72 (`false // Don't break the loop`) and
    src/live/wrapper/mod.rs:94 (`self.check_orphan();` — discards return),
    this is structurally guaranteed regardless of what `check_orphan` returns.
  source: doyle.log lines 751-755; orphan.rs line 72; mod.rs line 94.

- timestamp: 2026-05-13T21:54:04 onward
  observation: |
    Loop body verifiable from lines 757-829 (and 415 INIT_SIGNOFF occurrences
    total across the log):
      [N+0]   orphan detected → 5s grace → INIT_SIGNOFF posted to spool
      [N+5]   poll drains 129 bytes (the just-posted self-signoff message)
      [N+5]   [MSG] from=<EVENT type="msg" from="">INIT_SIGNOFF (: ""
      [N+5]   feeds it to claude --resume
      [N+12]  [PSYCHE] resume (exit=0): (empty)
      [N+12]  auto-commit
      [N+13]  poll iteration N+1 starting
      (repeat)
  source: doyle.log lines 757-829 and Grep("INIT_SIGNOFF", doyle.log)=415.

- timestamp: 2026-05-13 (code read)
  observation: |
    Envelope is emitted unconditionally by src/owl/poll.rs:548-572
    (`emit_event_line`):
      println!("<EVENT type=\"msg\" from=\"{}\">{}</EVENT>", from_esc, body_esc);
    This was introduced by commit aba13d9 (feat(27-02): convert poll::run
    delivery arms to stream-default emit, 2026-05-11). Every line drained
    by the wrapper's poll subprocess now begins with `<EVENT`, NOT the body.
  source: src/owl/poll.rs:548-572; `git log -- src/owl/poll.rs`.

- timestamp: 2026-05-13 (code read)
  observation: |
    Existing precedent for the fix: src/live/wrapper/lifecycle.rs:89 uses
    `msg.contains("INIT_SIGNOFF")` in drain_stale_init_signoffs and works
    correctly under the envelope. Only the live-loop site at mod.rs:189
    uses the brittle `line.trim_start().starts_with` form.
  source: src/live/wrapper/lifecycle.rs:84-100 vs src/live/wrapper/mod.rs:187-194.

- timestamp: 2026-05-13 (code read — secondary defect)
  observation: |
    `check_orphan` is documented to "return true after a 5s grace period"
    (orphan.rs:10 docstring) but line 72 returns `false` unconditionally,
    AND line 94 in mod.rs discards the return value with `self.check_orphan();`.
    Doc, implementation, and call site are all inconsistent.
  source: src/live/wrapper/orphan.rs:8-73; src/live/wrapper/mod.rs:92-94.

## Eliminated

- **TCP-wake / --once regression (commits 14d1840, db08bf5)**: NOT the
  proximate cause of the INIT_SIGNOFF loop. Those commits make the inner
  poll exit on first message instead of buffering until pulse expiry —
  which actually surfaces this bug FASTER (the wrapper now drains and
  reacts to its own INIT_SIGNOFF within seconds, accelerating the loop)
  but does not introduce it. The envelope+predicate mismatch (aba13d9)
  is the structural defect; --once just turned a slow leak into a fast one.

- **claude usage-limit / quota window**: orthogonal. The wrapper logs
  `[PSYCHE] resume (exit=0): (empty)` (not exit=1 quota messages) during
  the 21:54+ loop, indicating the LLM session is genuinely returning empty
  responses to redundant shutdown notices, not hitting quota. The loop is
  driven by structural flow, not failure recovery.

- **Handoff path (Phase 18.4)**: no `HANDOFF_DEFER` or `HANDOFF wrapper`
  log lines anywhere in the loop window. Wrapper-state.json rehydration
  is not involved.

- **drain_stale_init_signoffs at startup**: lifecycle.rs:84-100 works
  correctly (`msg.contains("INIT_SIGNOFF")`). The bug is only the live
  loop's predicate.

## ROOT CAUSE (confirmed)

Two reinforcing defects in src/live/wrapper/ make the wrapper unable to exit
when an INIT_SIGNOFF is delivered or when Self dies:

1. **Predicate vs envelope mismatch** — src/live/wrapper/mod.rs:189
   `line.trim_start().starts_with("INIT_SIGNOFF")` never matches because
   commit aba13d9 wraps poll-emitted lines as
   `<EVENT type="msg" from="...">INIT_SIGNOFF (...)</EVENT>`. The line
   starts with `<EVENT`, not `INIT_SIGNOFF`. Result: `final_session()`
   is never invoked → loop never breaks via INIT_SIGNOFF path.

2. **Orphan path can't trigger exit** — orphan.rs:72 hard-codes `false`
   AND mod.rs:94 discards the return value. Result: even when orphan
   detection correctly confirms Self is gone, the loop continues.

Together, the wrapper has no working exit valve for the orphan / signoff
case. The 415 INIT_SIGNOFF log entries are the visible symptom.

## Resolution

status: root_cause_found
specialist: rust
fix_target: |
  Two atomic commits via /gsd-quick:

  Commit A — fix INIT_SIGNOFF detection over EVENT envelope:
    src/live/wrapper/mod.rs:187-194
      replace `line.trim_start().starts_with("INIT_SIGNOFF")` predicate
      with `msg.contains("INIT_SIGNOFF")` (matches lifecycle.rs:89 idiom).
    Add regression test: feed
      `<EVENT type="msg" from="">INIT_SIGNOFF (ts): body</EVENT>\n`
    through the predicate and assert match.

  Commit B — wire orphan-detect to exit on confirmed orphan:
    src/live/wrapper/orphan.rs:67-72
      return `true` from `check_orphan` when `still_gone` is true.
    src/live/wrapper/mod.rs:94
      `if self.check_orphan() { break; }`.
    Update docstring/inline comment to match.
    Add unit test: with Self.ready removed and grace period elapsed,
    `check_orphan` returns true.

files_to_inspect:
  - src/live/wrapper/mod.rs (loop body, lines 87-256; specifically 94, 187-194)
  - src/live/wrapper/orphan.rs (return value bug, line 72; docstring line 10)
  - src/live/wrapper/lifecycle.rs (correct predicate precedent, lines 84-100)
  - src/live/wrapper/claude.rs (final_session, lines 297-374 — confirmed
    returns true so once entered, loop will break)
  - src/owl/poll.rs (emit_event_line envelope source, lines 548-572)

verification_plan:
  - cargo test (existing + new regression tests)
  - cargo build --release
  - DEPLOY.ps1 bump patch
  - Manual: spawn live agent, `live signoff <id>` with small commune,
    verify single `Final psyche invocation complete, wrapper exiting`
    in psyche log followed by clean process exit; no second
    `INIT_SIGNOFF detected` line.
  - Verify no other live agent in logs_latest shows similar loop
    (deployah.log, mica.log, dunsen.log spot-check for INIT_SIGNOFF count).
