---
status: resolved
trigger: "Theoretical edge case: a stale `.claude/{id}-signoff.md` retained from a prior generation (gen42 wrapper's final_session failed → file retained per D8 → wrapper exited ungracefully → listener gone) will be picked up by the NEXT generation's (gen43) listener on its first scan_drop_files iteration. Listener composes a file_drop signoff EVENT, delivers to gen43's psyche, prints STOP, soft-stops the perch, and exits. Wrapper consumes envelope, runs final_session with the stale gen42 body, tears down. Net effect: gen43 dies milliseconds after spawn — user sees doyle 'live' briefly then gone. UX disaster. User's desired fix: surface gen42's signoff body into gen43's bootstrap context via the existing SC-6 `## Pending Signoff (uncommitted)` append in psyche-download, THEN unlink the stale file before the poll loop starts. Investigation only — no fix yet."
created: 2026-05-15T00:00:00Z
updated: 2026-05-15T00:00:00Z
---

## Current Focus

hypothesis: CONFIRMED end-to-end. Three causally-linked structural facts produce the bug:
  (1) `src/owl/poll.rs::run` runs `scan_drop_files` at the **top of every loop iteration** (lines 253-287) with NO mtime gate, NO startup-time check, NO warm-up delay. The very first iteration scans `.claude/{id}-signoff.md` unconditionally.
  (2) `src/live/start.rs::run` (lines 58-279) spawns the wrapper and then enters `poll::run` synchronously at line 278 with NO `.claude/{id}-signoff.md` cleanup, body extraction, or surfacing between perch creation (lines 152-191) and poll loop entry (line 278).
  (3) The SessionStart hook (`owl.exe plugin-session-start` → `resume::run_with_input` → `inject_active_perch_context`) only emits `<psyche-context>` for ACTIVE live perches whose `parent_pid` matches the current process (resume.rs:184-200). On a cold-start gen43 scenario, gen42 is dead and gen43 has not yet spawned → no active doyle perch at SessionStart fire-time → `download_payload_for_injection` is NOT invoked → pending-signoff body is NOT surfaced into Claude's bootstrap context via the SessionStart path.
  Net effect: the user's stated fix ordering ("SessionStart hook surfaces pending body → unlink → poll loop starts clean") does NOT hold for the cold-start `$LIVE start` path. The body has no natural injection point in cold-start flow without new code.

test: read code paths verified via direct file inspection. Specific evidence catalogued in `## Evidence` below.

expecting: User greenlight to proceed with Candidate A fix (surface + unlink inside `$LIVE start::run`, between wrapper spawn and `poll::run` entry). Surface area ≈ 20 LOC + 1 integration test ≈ 30 LOC. Commune left alone per user guidance + asymmetry analysis.

next_action: present Root Cause Report (this file + final assistant message); await user greenlight to apply Candidate A fix. Do NOT apply until greenlight. Concurrent `wrapper-signoff-doesnt-break-loop` debug session's territory (`src/live/wrapper/mod.rs`) is NOT touched by this fix — the recommended surface lives entirely in `src/live/start.rs` and `tests/file_drop_integration.rs`.

## Symptoms

expected: When gen42's psyche wrapper's final_session subprocess fails (rate limit, crash, kill, timeout — any non-zero exit), per D8 the `.claude/{id}-signoff.md` file is RETAINED for manual recovery / next-generation visibility. The next time the user runs `$LIVE start {id}` to spawn gen43, gen43 should (a) become aware of gen42's pending signoff (via the SC-6 `## Pending Signoff (uncommitted)` section in psyche-download), (b) NOT self-signoff on its first poll iteration. Gen43 should stay alive and operational; the user should see the pending-signoff content in gen43's bootstrap context.
actual: (CONFIRMED via code reading) Gen43's listener's first poll iteration scans `.claude/` for drop files, finds the stale `.claude/{id}-signoff.md`, composes a `<EVENT type="file_drop" kind="signoff" path="..." from="{id}">`, delivers it to `{id}-psyche`, prints `STOP:{id} (signoff dropped)`, calls `soft_stop_perch`, and exits via `std::process::exit(0)`. The wrapper independently consumes the envelope, runs `final_session` with gen42's stale body, deletes the file, tears down. Gen43 process dies milliseconds after spawn.
errors: none expected at the OS level — no panic, no exception. The bug is silent: gen43's perch directory is torn down cleanly. The user-visible symptom is "I just started doyle and it's already gone" with no indication that a stale gen42 signoff was the cause.
reproduction: (1) Start `$LIVE start doyle`, wait for it to be live. (2) Drop a signoff for doyle (e.g., echo content into `.claude/doyle-signoff.md` or run the normal signoff path). (3) Force gen42's final_session to fail (kill the wrapper's psyche subprocess mid-final, or simulate a rate-limit). (4) Kill the wrapper process ungracefully WITHOUT the file_drop signoff branch successfully running on the listener side either. (5) Verify `.claude/doyle-signoff.md` still exists on disk. (6) Run `$LIVE start doyle` again to spawn gen43. (7) Observe gen43 logs — expect to see gen43's listener immediately scan_drop_files, find the stale signoff, deliver the EVENT, print STOP, exit. Gen43's wrapper consumes and tears down.
started: Phase 30 introduced the file_drop signoff path (commits d6bd054, 25dd749, 3b27264). Pre-Phase-30, init_signoff was inbox-message-driven and required a live wrapper to compose the signoff envelope — there was no "drop file on disk and let the next listener pick it up" path, so this stale-file class of bug did not exist. Phase 30's D8 retain-on-error decision combined with the listener-side scan_drop_files polling created the new failure mode.

## Eliminated

- hypothesis: "The SessionStart hook will surface the pending signoff body into gen43's bootstrap context, and a unlink-before-poll fix can rely on that surfacing as the user described."
  evidence: `src/owl/resume.rs::inject_active_perch_context` (lines 140-249) filters perches with TWO gates before calling `download_payload_for_injection`: (a) `pid_alive` check at line 184-190 (Numeric pid must be alive), (b) `parent_pid` match at lines 192-200 (perch's parent_pid must equal current process's parent_pid). On a cold-start gen43 scenario, gen42 is dead AND gen43 has not yet spawned. The doyle perch's info.json (if it exists from gen42) shows a dead pid — fails the `pid_alive` gate. Even if the perch's pid were alive (it won't be in this scenario), there's still no perch for gen43 to inject context FOR. So SessionStart fires with NO active doyle perch → no `<psyche-context id="doyle">` block emitted → the SC-6 `## Pending Signoff (uncommitted)` append never reaches Claude's bootstrap context via this path.
  timestamp: 2026-05-15

- hypothesis: "The listener spawn is independent of the SessionStart hook — maybe SessionStart spawns the listener, and we can sequence the fix in the hook."
  evidence: `src/live/start.rs::run` line 278 calls `crate::owl::poll::run(...)` directly, inline, synchronously, in the same process. The SessionStart hook (`owl.exe plugin-session-start` → `resume::run`) does NOT spawn any listener — it only emits stdout decorations (`<psyche-context>`, `<owl-active-perch>`, `<owl_orphan_warning>`, `<spacetime-reorientation>`). The listener spawn is exclusively driven by `$LIVE start` (and likely `$LIVE revive`, `$LIVE reconnect`, and `$OWL poll <id> listen` direct invocations). Therefore the fix cannot live in the SessionStart hook for the cold-start path.
  timestamp: 2026-05-15

- hypothesis: "mtime-gating in `scan_drop_files` (skip files older than listener start_time) is a viable alternative to surface+unlink."
  evidence: It IS technically viable (5-line change in `scan_drop_files` + capture listener start_time alongside `listener_cwd` at poll.rs:234), BUT it conflicts with the user's explicit requirement: "User explicitly OK with the eventual fix DELETING the stale `-signoff.md` IF the body's content is FIRST surfaced into Self's `$LIVE psyche-download` output". mtime-gating silently drops the body — gen43 has no awareness that gen42 had a pending signoff. This loses the recovery story the user wants preserved. Eliminated on user-intent grounds, not technical grounds.
  timestamp: 2026-05-15

- hypothesis: "The bug also affects `.claude/{id}-commune.md` the same way — the fix should symmetrically delete both."
  evidence: Per the asymmetry analysis (commune vs. signoff), commune semantics are informational (Self → Psyche message-passing), not terminal. A stale gen42 commune auto-firing on gen43 spawn is **arguably defensible** as a recovery path: gen42 user's commune intent is preserved, Psyche subprocess re-runs against it, no data loss. The user explicitly flagged this asymmetry: "commune may legitimately be a recovery path. Surface the asymmetry but do not propose a commune fix unless the analysis shows the commune case is equally broken." The commune case is NOT equally broken — gen43 stays alive after a commune fire (the signoff branch's `soft_stop_perch + exit(0)` at poll.rs:281-284 is signoff-only; commune just delivers the EVENT and continues the loop). Eliminated as part of the fix scope; commune retained as recovery path.
  timestamp: 2026-05-15

## Evidence

- timestamp: 2026-05-15
  checked: src/owl/poll.rs:236-287 — poll loop entry + file_drop scan arm
  found: The `loop {` starts at line 237. The FIRST statements inside the loop (lines 240-251) handle pulse-file mtime refresh. The SECOND block (lines 253-287) is the Phase 30 file-drop scan. It runs `let drops = scan_drop_files(cwd, id);` on EVERY iteration including iter 1. For each `(kind, abs_fs)` it composes a file_drop envelope via `compose_file_drop_event`, delivers to `{id}-psyche` via `send::deliver_body` (panic-safe wrapped), then for `kind == "signoff"` runs the teardown sequence: `output::owl_status` STOP message → `fs::remove_file(&ready_path)` → `poll_listener.close()` → `crate::owl::stop::soft_stop_perch(&perch, id)` → `std::process::exit(0)`. NO mtime gate, NO startup-time filter, NO cooldown, NO warm-up delay. The capture of `listener_cwd` at line 234 (one-time pre-loop) is the only `cwd`-related setup; it does not capture a start_time.
  implication: Iter 1 of any listener that enters `poll::run` with a pre-existing `.claude/{id}-signoff.md` on disk will unconditionally compose+deliver the signoff EVENT and self-exit. The bug is structural and the failure mode is deterministic.

- timestamp: 2026-05-15
  checked: src/owl/poll.rs:724-749 — `scan_drop_files` implementation
  found: signature `pub(crate) fn scan_drop_files(cwd: &std::path::Path, self_id: &str) -> Vec<(String, String)>`. Body iterates over `["commune", "signoff"]` (commune first, signoff second per the doc comment ordering invariant), builds `claude_dir.join(format!("{}-{}.md", self_id, kind))`, and emits `(kind, abs_fs)` IFF `path.exists()`. There is no mtime check, no metadata stat, no comparison against any process-start timestamp. The function is pure "does this file exist right now?"
  implication: There is no built-in protection against stale files. The function trusts that any file on disk represents a fresh, intentional drop.

- timestamp: 2026-05-15
  checked: src/owl/poll.rs:765-772 — `compose_file_drop_event`
  found: emits `<EVENT type="file_drop" kind="{kind}" path="{abs_fs}" from="{from}"></EVENT>` with `event_attr_escape` on each attribute value. Empty body — the wrapper reads the file at consume time (D7).
  implication: confirms the wire form. The wrapper-side consumer expects a path attribute and reads the file body itself, so the listener-side fix that **deletes the file before the wrapper consumes** would cause the wrapper's consume to find no file. This is part of why the cleanest fix is to delete BEFORE the listener enters the poll loop, not after the listener composes the EVENT.

- timestamp: 2026-05-15
  checked: src/live/start.rs:57-279 — `$LIVE start` end-to-end
  found: Function signature `pub fn run(id: &str, period: Option<u64>)`. Sequence:
    - Lines 65-128: collision checks (wrapper PID, stale psyche perch, reconnection detection).
    - Lines 130-145: generation tracking, log relocation, status writeback, activity bump.
    - Lines 147-150: emit `LIVE-START:{id}` status to stderr (live-orange).
    - Lines 152-186: pre-create Self perch (reconnect-aware), drain queued messages if reconnecting.
    - Lines 188-191: ensure psyche perch + ready file exist.
    - Lines 203-274: build wrapper argv, spawn wrapper as detached process (unix #[cfg] block 215-247, windows #[cfg] block 249-274).
    - Line 277-278: `crate::owl::poll::run(id, "listen", false, true, false, 500, None, None, false);` — enter poll loop inline, blocks until perch teardown.
  No mention of `.claude/{id}-signoff.md`, no call to `download_payload`, no call to `psyche-download`, no `fs::remove_file` for any `.claude/` path, no `<psyche-context>` emit, no `<owl_pending_signoff>` emit. The path from entry to `poll::run` is silent on the stale-signoff-file question.
  implication: The fix surface for the cold-start `$LIVE start` path lives between line 191 (perches ensured) and line 277 (just before `poll::run`). A 15-25 LOC insertion would be sufficient: read file body if exists → print body wrapped in `<owl_pending_signoff>` to stdout (so Claude's `$LIVE start doyle` tool-call output captures it) → unlink → enter poll loop. The `id` parameter is in scope (line 58), `std::env::current_dir()` is reliably available (start.rs already used it at line 157 for the reconnect path).

- timestamp: 2026-05-15
  checked: plugin/spt/hooks/hooks.json — SessionStart hook wiring
  found: SessionStart hook is `$CLAUDE_PLUGIN_ROOT/owl.exe plugin-session-start` (single command). No separate session-resume script; the hook calls into `owl.exe`'s `plugin-session-start` subcommand directly.
  implication: confirms the hook → `plugin_session_start::run` → `resume::run_with_input` call chain. SessionStart fires once per Claude Code session boot, NOT once per `$LIVE start`.

- timestamp: 2026-05-15
  checked: src/owl/plugin_session_start.rs:1-50 — top-level SessionStart handler
  found: `pub fn run()` at line 10 reads stdin, does early checks (skip-resume / re-orientation), then at line 45 calls `super::resume::run_with_input(&input)`. The re-orientation path (line 41 `return; // Re-orientation injected, skip resume`) handles `/clear` and `/compact` sources via `inject_reorientation_if_needed` which separately calls `super::resume::inject_reorientation(&id, is_live)` at line 164.
  implication: confirms two SessionStart code paths: (a) primary `run_with_input` (fresh session boot, `/resume`, etc.), and (b) re-orientation `inject_reorientation` (`/clear`, `/compact`). Both ultimately call `download_payload_for_injection` for live perches. Both are gated by an EXISTING live perch — neither runs on cold-start gen43 where doyle perch is dead/missing.

- timestamp: 2026-05-15
  checked: src/owl/resume.rs:139-249 — `inject_active_perch_context`
  found: Lines 144-148: read owlery dir; bail silently on error. Lines 149-204: iterate perches with filters — non-directory skip (151-153), missing info.json skip (155-159), unparseable info.json skip (161-164), psyche/spine/touch skip (167-172), worker-perch skip (176-181). Then the pid_alive gate at lines 183-190: `types::is_process_alive(*p)` for Numeric pid, `true` for `Busy(_)`. Then parent_pid match at lines 192-200: if both `my_ppid` and `perch_pp` are Some, require equality. Lines 218-231 emit `<psyche-context>` ONLY for `is_live` perches where `download_payload_for_injection` returns Some.
  implication: The cold-start gen43 case fails the pid_alive gate (gen42 wrapper's pid is dead — info.json's recorded pid points to a terminated process). `download_payload_for_injection` is never invoked, the pending-signoff body is never surfaced via this path. The user's stated fix ordering breaks at this point.

- timestamp: 2026-05-15
  checked: src/live/context.rs:211-330 — `download_payload`, `download_payload_for_injection`, `append_pending_sections`
  found: `download_payload` (line 219) composes memformat XML + context.md + pending sections; `append_pending_sections` (line 304) reads `cwd/.claude/{self_id}-{commune,signoff}.md`, appends a `## Pending {Kind} (uncommitted)` header + body for each existing file (commune first, signoff second). `download_payload_for_injection` (line 364) wraps `download_payload` with `strip_pulse_log`. Both helpers are usable IF a caller invokes them with the correct `self_id`.
  implication: The SC-6 surface for surfacing the pending body EXISTS and is correct. It's the *invocation* that's missing on the cold-start path. A fix that calls `download_payload_for_injection(id)` from `$LIVE start::run` (or just reads `.claude/{id}-signoff.md` directly and prints it) would close the surfacing gap.

- timestamp: 2026-05-15
  checked: src/live/wrapper/lifecycle.rs:81-111 — `drain_stale_init_signoffs`
  found: existing precedent for "drain stale signals from prior generation": at wrapper boot, `drain_stale_init_signoffs` drains the Psyche's SPOOL (not the `.claude/` file) of any messages containing `init_signoff`, discarding them and re-spooling unrelated messages. Called from `mod.rs:527`. The doc comment at line 82 explicitly states: "Prevents stale INIT_SIGNOFFs from a prior generation from immediately terminating a newly spawned wrapper."
  implication: This exact-shape protection (stale-generation defense) EXISTS for the spool path but is MISSING for the Phase 30 file-drop path. The fix is conceptually adding a "drain stale file-drop signoff" defense parallel to the existing `drain_stale_init_signoffs`. Naming convention suggestion: the new helper could be called `drain_stale_signoff_file` or `surface_and_clear_stale_signoff_file`.

- timestamp: 2026-05-15
  checked: src/live/wrapper/mod.rs (selective grep for signoff/commune/file_drop entry points)
  found: wrapper has both an `init_signoff` predicate (mod.rs:654, `msg.to_ascii_lowercase().contains("init_signoff")`) and a `file_drop` arm (mod.rs:367-393, `handle_file_drop_arm`). The disjointness is explicit in test `file_drop_signoff_wire_form_disjoint_from_init_signoff`. The file_drop signoff branch dispatches via `process_file_drop` → `final_session` (per the concurrent debug session `wrapper-signoff-doesnt-break-loop`, which is the OTHER live debug session and out of bounds for this investigation).
  implication: this investigation's fix MUST live outside `src/live/wrapper/mod.rs` to avoid concurrent-edit conflicts with the other debug session. The recommended fix in `src/live/start.rs` satisfies that constraint.

- timestamp: 2026-05-15
  checked: src/owl/poll.rs:737-749 (scan_drop_files) ordering invariant + signoff-only teardown at poll.rs:273-285
  found: scan emits commune-first, signoff-second. Signoff branch (kind == "signoff") triggers `soft_stop_perch + exit(0)` (lines 273-285). Commune branch does NOT — it just emits the EVENT and continues the loop iteration.
  implication: confirms the commune asymmetry. A stale commune fires once per gen43 boot (Psyche receives the gen42 commune body, processes it, replies via normal psyche cycle) but gen43 stays alive. A stale signoff kills gen43 instantly. The fix can be signoff-scoped without touching commune.

- timestamp: 2026-05-15
  checked: poll.rs:230-234 — `listener_cwd` capture
  found: `let listener_cwd = std::env::current_dir().ok();` captured ONCE outside the loop, reused on every iteration. No start_time captured alongside.
  implication: an alternative fix (Candidate B variant: mtime-gating) would require also capturing `start_time = std::time::SystemTime::now()` at this point, and modifying `scan_drop_files` to take and apply that filter. Rejected per the user's body-surfacing requirement (see Eliminated section).

## Specialist Review

(none — specialist dispatch deferred until fix-apply phase; this is investigation-only.)

## Resolution

Status: ROOT CAUSE CONFIRMED, FIX DESIGN PROPOSED, NOT YET APPLIED.

### Root Cause

Three structural facts compose the failure:

1. **`scan_drop_files` runs on every poll iteration with no staleness filter** — see Evidence entries for poll.rs:236-287, poll.rs:724-749. Iter 1 fires unconditionally if `.claude/{id}-signoff.md` exists.

2. **`$LIVE start::run` has no stale-signoff cleanup or surfacing between wrapper spawn and poll-loop entry** — see Evidence entry for start.rs:57-279.

3. **The SessionStart hook's `<psyche-context>` injection is gated on an active live perch with a matching parent_pid** — see Evidence entries for resume.rs:139-249. On cold-start gen43, gen42's perch is dead → no active perch → no injection → user's proposed "rely on SessionStart to surface body first" ordering does NOT hold.

The bug is real, deterministic, and structural. It is the Phase 30 file-drop path's missing analog to the wrapper's existing `drain_stale_init_signoffs` (lifecycle.rs:84) spool-drain defense.

### Commune Asymmetry

The same `scan_drop_files` path WILL auto-fire a stale `.claude/{id}-commune.md` on gen43's iter 1, but the commune branch only delivers the EVENT (no soft_stop_perch, no exit). Gen43 stays alive; Psyche receives the gen42 commune body and processes it. This is defensible as a recovery path: gen42 user's intended commune is preserved, no data loss, no UX disaster. Per user guidance and asymmetry analysis, **commune is left alone**. The fix is signoff-scoped.

### Ordering Verification

| Path | Spawns listener? | Surfaces pending signoff body? |
|---|---|---|
| Claude Code SessionStart hook (`plugin-session-start` → `resume::run_with_input`) | No | Only IF an active live perch with matching parent_pid exists. NOT on cold-start gen43. |
| `$LIVE start` (`src/live/start.rs::run`) | Yes (synchronous `poll::run` at line 278) | No (gap — this is where the fix goes) |
| `$LIVE revive` / `$LIVE reconnect` (TODO audit) | Likely yes | Likely no — needs verification during fix-apply |
| `$OWL poll <id> listen --live` (direct) | Yes | No |

The fix must live in the listener-spawn paths, not in the SessionStart hook, because the SessionStart hook cannot see the not-yet-spawned gen43 perch.

### Edge Cases

1. **Gen42 listener killed mid-scan, before wrapper consume**: file retained, gen43 fires instantly. ✅ Covered by proposed fix.
2. **`.claude/` doesn't exist on gen43 start**: `scan_drop_files` no-ops via `path.exists()` → no issue; the proposed fix is also a no-op (read fails → no surface, no unlink, proceed normally).
3. **User drops `.claude/{id}-signoff.md` DURING gen43 startup but before unlink**: race window is microseconds (between perch creation at start.rs:191 and `poll::run` entry at line 278, with the wrapper spawn in between). Practically un-hittable.
4. **Both stale commune and stale signoff present**: signoff-scoped fix deletes signoff only. Commune fires on iter 1 as recovery path. Gen43 stays alive, Psyche processes gen42's commune. Acceptable per asymmetry analysis.
5. **Cold-start where gen42 perch info.json is also stale-but-unlinked-from-pid**: covered — proposed fix doesn't depend on info.json state, only on the `.claude/{id}-signoff.md` file's existence in cwd.
6. **`$LIVE revive` / `$LIVE reconnect` reuse the same listener-spawn path**: needs audit during fix-apply. If they call into `start::run`, they inherit the fix automatically. If they have a separate spawn path, the fix should be refactored into a shared helper called by all spawn sites.

### Proposed Fix Design (Candidate A — Surface + Unlink in `$LIVE start`)

**Location**: `src/live/start.rs::run`, inserted between the perch-ensure block (around line 191) and the wrapper-spawn block (around line 203). Specifically: after `let _ = fs::File::create(owlery::ready_file(&psyche_id));` and before `let exe = std::env::current_exe().expect(...)`.

**Sketch** (~20 LOC):
```rust
// Phase 30 stale-signoff defense (mirrors the wrapper's drain_stale_init_signoffs).
// If a prior generation's wrapper failed final_session and retained .claude/{id}-signoff.md
// per D8, the listener's first scan_drop_files iteration would fire it and kill gen43
// instantly. Surface the body into the $LIVE start stdout (so the user-visible tool
// output captures gen42's pending signoff), then delete the file so iter 1 sees a
// clean .claude/ directory.
if let Ok(cwd) = std::env::current_dir() {
    let signoff_path = cwd.join(".claude").join(format!("{}-signoff.md", id));
    if let Ok(body) = fs::read_to_string(&signoff_path) {
        let trimmed = body.trim_end();
        if !trimmed.is_empty() {
            output::live_status(
                output::S_READY,
                &format!("STALE-SIGNOFF-RECOVERED:{} (body surfaced to stdout, file cleared)", id),
            );
            println!(
                "<owl_pending_signoff id=\"{}\" cleared_from=\"{}\">\n{}\n</owl_pending_signoff>",
                id,
                owlery::to_forward_slash(&signoff_path),
                trimmed,
            );
        }
        let _ = fs::remove_file(&signoff_path);
    }
}
```

**Test scope** (~40 LOC integration test in `tests/file_drop_integration.rs`, subprocess-based per parallel-test safety pattern documented in context.rs:296-303):
- Test name: `stale_signoff_md_does_not_kill_fresh_live_start`.
- Setup: tempdir cwd, pre-create `.claude/doyle-signoff.md` with body `"gen42 signoff body marker"`.
- Run: subprocess `$LIVE start doyle --period 60` (background), wait briefly.
- Assert: `.claude/doyle-signoff.md` no longer exists.
- Assert: stdout contains `<owl_pending_signoff id="doyle"` AND `gen42 signoff body marker`.
- Assert: listener perch ready file still exists after ≥2 poll-interval ticks (gen43 stayed alive).
- Teardown: explicit `$LIVE stop doyle` (or kill subprocess).

**Surface area total**: ~20 LOC fix + ~40 LOC test = ~60 LOC. Single-file fix (start.rs) + single-file test addition (file_drop_integration.rs). No changes to `src/live/wrapper/mod.rs` — concurrent debug session's territory is untouched.

**Open question for fix-apply phase**:
- Audit `$LIVE revive` and `$LIVE reconnect` for the same hazard. If they spawn a listener WITHOUT going through `start::run`, extract the surface+unlink into a shared helper (e.g. `start::drain_stale_signoff_file(id)`) and call it from all three sites. Probably worth a 5-line refactor.

### Status

Investigation complete. Awaiting user greenlight before applying any code changes. Per user instructions ("Investigation only", "Do NOT apply the fix. Await greenlight."), no Edits/Writes outside this debug-session file. The wrapper-signoff-doesnt-break-loop concurrent debug session's territory (`src/live/wrapper/mod.rs`) is NOT touched by this proposed fix.

## Resolution Applied (post-investigation)

**Date:** 2026-05-15
**Approach:** Candidate A with shared helper extraction (see "Proposed Fix Design" above).
**Verification doc:** `.planning/phases/30-commune-signoff-file-drop-flow-change/30-VERIFICATION.md` — `## Post-Passed Stale-Signoff Cold-Start Defense` section.

**Commits:**

- `7d93022` test(30): RED — stale .claude/{id}-signoff.md must not kill fresh $LIVE start
  Adds regression test `stale_signoff_md_does_not_kill_fresh_live_start` in `tests/file_drop_integration.rs`. Subprocess-based per parallel-test safety pattern.
- `ad3f9f0` fix(30): GREEN — surface + clear stale .claude/{id}-signoff.md on listener spawn
  Adds `pub(crate) fn drain_stale_signoff_file(id: &str, cwd: &Path)` in `src/live/start.rs`. Invoked from both `start::run` (covers $LIVE start + revive via stop::run_revive + reconnect inner branch) and `live_start_result` (MCP/structured-result variant; defense-in-depth) BEFORE `poll::run` entry. Surfaces body to stdout wrapped in `<owl_pending_signoff>` markers, then `fs::remove_file`s the file. Tolerates IO errors with stderr-logged degraded state.
- `2d10447` docs(30): record stale-signoff cold-start defense
  Appends `## Post-Passed Stale-Signoff Cold-Start Defense` to 30-VERIFICATION.md alongside the wrapper-loop-break addendum. Phase 30 retains `status: passed` (6/6 SC).

**Spawn-path audit conclusion:** all live-agent listener-spawn paths route through `start::run`:
- `$LIVE start` → `start::run` directly.
- `$LIVE revive` → `stop::run_revive` → delegates to `start::run`.
- Reconnect → inner branch of `start::run` itself.
- `live_start_result` (MCP variant) does NOT enter `poll::run` itself but is covered as defense-in-depth.

Shared helper extracted (`drain_stale_signoff_file`) so both `start::run` and `live_start_result` invoke the same contract.

**Test verification:**

- `cargo test --lib --test-threads=1` → 356 passed / 0 failed / 1 ignored.
- `cargo test --test file_drop_integration --test-threads=1` → 5 passed (incl. new test) / 0 failed / 4 ignored.
- `signoff_listener_exits_zero` (legit live-signoff path): still PASSES — unaffected.
- `signoff_dispatch_signals_break_loop` (concurrent wrapper-loop-break fix from `wrapper-signoff-doesnt-break-loop`): still PASSES — no overlap.
