---
status: resolved
trigger: "Poll subprocess (owl.exe poll <id> [listen --live|--psyche]) exits with code=1, empty stdout, empty stderr after exactly ~5 minutes of idle time. Wrapper interprets empty stdout as terminate signal and shuts down. Psyche prematurely dies after ~5 min idle. Self's `live start` background poll behaves the same way."
created: 2026-04-16T04:45:00-07:00
updated: 2026-04-16T05:18:00-07:00
resolved_by: quick-260416-aaa-fix-poll-process-job-inheritance-killing-wrapper
---

## Current Focus

hypothesis: CONFIRMED — **Windows handle/job-object inheritance** from Claude Code's Bash tool transitively into both `live start` (Self's inline poll) and the Psyche wrapper, then into the wrapper's spawned poll subprocess. After ~5 minutes Claude Code's tool runtime calls `TerminateProcess(handle, 1)` on processes still holding inherited pipe write handles or sitting in its job object — producing exit code 1 with empty stdout and empty stderr. SAME root cause family as quick-260416-uz8 (echo-commune grandchild handle leak), different spawn site.
test: Reproduced in isolation: standalone `owl.exe poll tester3 listen --psyche --pulse-interval 600 --setup` from PowerShell ran 600s and exited cleanly with PULSE_TRIGGER. Reproduced 5-min kill: only happens via wrapper invocation chain (Bash tool → live start → wrapper → poll). Comparison: PowerShell-spawned tester2 with no pulse-interval has been alive 17+ min with no exit.
expecting: Apply CreateProcessW + bInheritHandles=FALSE + CREATE_BREAKAWAY_FROM_JOB at TWO spawn points: (1) `src/live/start.rs:225-258` where `live start` spawns the wrapper, AND (2) `src/live/wrapper/mod.rs:170-205` where the wrapper spawns the poll subprocess. Plus harden wrapper loop at `src/live/wrapper/mod.rs:89` so empty stdout no longer interpreted as terminate.
next_action: Implement both fixes in tandem: (A) raw CreateProcessW for wrapper spawn from live start (port helper from hook_idle.rs); (B) raw CreateProcessW for poll subprocess from wrapper, with explicit pipe-handle setup so stdout still captured; (C) wrapper-loop hardening to require explicit termination tags.

## Symptoms

expected: Poll runs the loop. On `--pulse-interval N` expiry, prints `PULSE_TRIGGER (timestamp)` to stdout and exits 0. On idle without pulse-interval, runs forever until message/teardown/poison. Wrapper feeds the message to claude --resume.
actual: Wrapper log records `poll exited code=1 stderr=` with empty stdout after exactly 5min 20s of idle. Wrapper sees empty stdout, interprets as terminate signal (`src/live/wrapper/mod.rs:90`), runs cleanup, exits the wrapper process. Psyche dies. Same pattern for Self's `live start` poll loop (background task `b2zv8nqp8` exited code=1 with no further output after the initial 2 status lines).
errors: NONE in stderr. NONE in stdout. Just exit code 1.
reproduction: Launch live agent via Bash tool with `run_in_background:true`, let it sit idle for 5+ minutes with no incoming messages. Wrapper log shows the empty-exit pattern. Standalone PowerShell-spawned poll DOES NOT reproduce — confirms wrapper-/parent-runtime-specific.
started: Currently observable. Same root-cause family as quick-260416-uz8 fix.

## User hypotheses (to verify/refute)

1. ~~**H1 — Silent panic in poll loop body.**~~ REFUTED. Standalone repro from PowerShell ran for 10+ min and exited cleanly with PULSE_TRIGGER. Code paths verified panic-free.
2. ~~**H2 — `println!` BrokenPipe panic.**~~ REFUTED. Standalone repro printed PULSE_TRIGGER cleanly.
3. ~~**H3 — pulse_deadline arithmetic.**~~ REFUTED. tester4 ran with EXACTLY the doyle wrapper invocation (`poll tester4 listen --psyche --pulse-interval 300 --setup`) — exited at 300.1s, code=0, stdout="PULSE_TRIGGER (...)".
4. ~~**H4 — `process::exit(1)` somewhere unfound.**~~ REFUTED via standalone repros.
5. **H5 — External signal (TerminateProcess from Claude Code).** **CONFIRMED.** Exit code 1 + empty stdout + empty stderr is the signature of `TerminateProcess(handle, 1)` per `src/common/process.rs:98,125,229,239`. NO internal code path can produce this signature. The wrapper's poll subprocess is killed externally.
6. ~~**H6 — D-09 spool timeout.**~~ REFUTED. D-09 prints "TIMEOUT:" message to stderr; observed stderr is empty.
7. ~~**H7 — period-spool intersection.**~~ REFUTED.
8. ~~**H8 — race between pulse_deadline and check_message_blocking.**~~ REFUTED via standalone reproductions.

## Eliminated

- All internal code paths inside `src/owl/poll.rs` (verified via three standalone reproductions: tester1 60s pulse=0/PULSE; tester3 600s pulse=0/PULSE; tester4 300s pulse=0/PULSE — all clean exits). The 5-min kill is EXTERNAL.

## Evidence

- timestamp: 2026-04-16T04:45:00-07:00
  source: ~/.claude/spacetime/logs_latest/doyle.log:144-149
  finding: Wrapper iter 4 ran 04:31:04 → 04:36:24 = exactly 5min 20s. Poll exited code=1 with empty stdout AND empty stderr. Wrapper interpreted empty stdout as exit signal, called cleanup, killed itself.

- timestamp: 2026-04-16T04:45:00-07:00
  source: ~/.claude/spacetime/logs_latest/doyle.log all iterations + deployah.log
  finding: deployah ran with default period 1200s, ALL idle iterations exited code=0 with stdout=40 bytes (= "PULSE_TRIGGER (timestamp)\n"). doyle's iter 4 (period ~300s) was the only iteration that exited code=1. Differentiating factor: doyle's wrapper had been spawned from a Claude Code session via Bash tool / live start; deployah's was older and may have detached from its parent task. Both Self's `live start doyle` background task `b2zv8nqp8` AND the wrapper's poll subprocess died at the same wall-clock moment 04:36:24, suggesting a SHARED external trigger.

- timestamp: 2026-04-16T04:45:00-07:00
  source: src/owl/poll.rs grep for exit(1)
  finding: Only two `exit(1)` sites: line 46 (DUPLICATE) and line 160 (bind failure). Both print to stderr first. Neither matches "empty stderr + empty stdout".

- timestamp: 2026-04-16T04:45:00-07:00
  source: src/common/process.rs lines 98, 125, 229, 239
  finding: All Windows TerminateProcess invocations pass `TerminateProcess(handle, 1)` — exit code 1. So an externally-terminated process exits with EXACTLY code=1, no stdout, no stderr. **PERFECT match for the observed signature.**

- timestamp: 2026-04-16T05:00:00-07:00
  source: standalone repro tester1 (`poll tester1 listen --psyche --pulse-interval 60 --setup`)
  finding: Started 04:47:07, exited 04:48:08 (60.6s), code=0, stdout="✓ READY:tester1...\nPULSE_TRIGGER (2026-04-16 04:48:08 PDT)". CLEAN behavior.

- timestamp: 2026-04-16T05:02:08-07:00
  source: standalone repro tester4 (`poll tester4 listen --psyche --pulse-interval 300 --setup`) — IDENTICAL ARGS to doyle's wrapper invocation
  finding: Started 04:57:08, exited 05:02:08 (300.1s), code=0, stdout="✓ READY:tester4...\nPULSE_TRIGGER (2026-04-16 05:02:08 PDT)". CLEAN behavior. Crosses the 5-min mark without dying. **Proves the bug is NOT in poll.rs — bug is environmental.**

- timestamp: 2026-04-16T05:05:22-07:00
  source: standalone repro tester3 (`poll tester3 listen --psyche --pulse-interval 600 --setup`)
  finding: Started 04:55:22, exited 05:05:22 (600.5s), code=0, stdout="...\nPULSE_TRIGGER (2026-04-16 05:05:22 PDT)". 10-MINUTE clean run from PowerShell — well past any 5-min boundary.

- timestamp: 2026-04-16T05:02:00-07:00
  source: standalone repro tester2 (`poll tester2 listen --live --setup` -- NO pulse-interval, like Self)
  finding: Started 04:48:15, still running at 05:02:22 (14+ min), no exit. CONFIRMS standalone Self-style poll runs indefinitely as designed. Bug ONLY manifests when invoked through Claude Code Bash → live start → wrapper chain.

- timestamp: 2026-04-16T05:00:00-07:00
  source: src/live/start.rs:225-256 (wrapper spawn from live start)
  finding: Uses Rust `Command::spawn` with `creation_flags(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW)`. **CRITICAL OMISSIONS**: (a) NO `CREATE_BREAKAWAY_FROM_JOB` flag — wrapper inherits Claude Code's job-object membership; (b) Rust default `bInheritHandles=TRUE` — wrapper inherits all of Claude Code's inheritable handles including pipe write ends. Same anti-pattern fixed in `src/owl/hook_idle.rs:118-254` for echo-commune grandchild via raw `CreateProcessW`. Same fix needed here.

- timestamp: 2026-04-16T05:00:00-07:00
  source: src/live/wrapper/mod.rs:170-205 (poll subprocess spawn from wrapper)
  finding: Uses Rust `Command::spawn` (via `output()`) with `Stdio::piped()` for stdout/stderr and only `creation_flags(CREATE_NO_WINDOW)`. NO breakaway, NO bInheritHandles=FALSE. Inherits everything from wrapper, which itself inherited from Claude Code. The poll subprocess holds Claude Code's pipe handles AND sits in Claude Code's job object. After 5 min Claude Code likely terminates orphaned descendants. Need raw CreateProcessW with explicit pipe handles for stdout/stderr capture (since wrapper still needs to read poll output) but bInheritHandles=FALSE for everything else.

- timestamp: 2026-04-16T05:00:00-07:00
  source: src/owl/hook_idle.rs:118-254 (existing fix template for analogous bug)
  finding: Existing helper `spawn_detached_no_inherit` uses raw CreateProcessW with bInheritHandles=FALSE + CREATE_BREAKAWAY_FROM_JOB. For wrapper spawn (no stdio capture needed, like echo-commune) the helper can be reused almost verbatim. For poll subprocess spawn (stdout MUST be captured by wrapper) we need a variant that creates ANONYMOUS pipes via CreatePipe with the read end inheritable=FALSE on the parent and the write end marked inheritable=TRUE only for that specific handle, then dup_handle into the child via STARTUPINFOEX with PROC_THREAD_ATTRIBUTE_HANDLE_LIST — the canonical Windows pattern for selective handle inheritance. Alternative simpler approach: have the poll subprocess write its result to a TEMP FILE that the wrapper then reads — eliminates pipe handles entirely.

## Root Cause

**Windows handle/job-object inheritance from Claude Code's Bash tool runtime through the live-start spawn chain.**

Sequence:
1. User (or agent) launches `$LIVE start doyle` via Claude Code's Bash tool with `run_in_background:true`.
2. Claude Code spawns the Bash subprocess with `bInheritHandles=TRUE` (Rust default for Command::spawn) AND likely places it in a job object for tracking. The Bash subprocess receives Claude Code's stdout/stderr pipe write-end handles and joins the job.
3. `live start doyle` runs inside the Bash session. It spawns the Psyche wrapper via `Command::spawn` with `DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP | CREATE_NO_WINDOW` — **NO `CREATE_BREAKAWAY_FROM_JOB`** — and **default `bInheritHandles=TRUE`**. Wrapper inherits pipe handles + job membership from `live start`.
4. The wrapper enters its loop. Each iteration spawns `owl.exe poll <psyche_id> ...` via `Command.output()` with `Stdio::piped()` for capture. This spawn ALSO uses default `bInheritHandles=TRUE`, so the poll subprocess inherits Claude Code's pipe write handles (transitively through wrapper→live-start→bash→Claude-Code) AND joins Claude Code's job object.
5. `live start` (Self) ALSO directly enters `crate::owl::poll::run(...)` inline at `src/live/start.rs:276` — no spawn intermediary, but Self IS in the job and holds the pipe handles.
6. After ~5 minutes of inactivity (no stdout writes), Claude Code's tool runtime times out the Bash subprocess and **calls `TerminateProcess(handle, 1)` on the entire process tree / job object**. This kills:
   - Self's poll loop (inside `live start` process) → background task `b2zv8nqp8` exits code=1
   - The wrapper's currently-running poll subprocess → wrapper sees `code=1 stderr=<empty> stdout=<empty>`
7. The wrapper itself MAY survive (because `DETACHED_PROCESS` partially detaches it from console, and the wrapper's parent `live start` exited long ago). But the wrapper's NEXT poll subprocess will still be inheriting the same handles when spawned, so the cycle would repeat — except the wrapper interprets the empty exit as a teardown signal and exits voluntarily (`src/live/wrapper/mod.rs:89-92`).

This is the SAME bug family as quick-260416-uz8 (echo-commune grandchild handle leak) — different spawn point, same root cause: **Rust `Command::spawn` on Windows defaults to `bInheritHandles=TRUE` and joins parent's job object**, which causes any descendant of a Claude Code tool invocation to be force-terminated when the parent tool times out.

## Fix Plan

**Two-commit atomic fix:**

### Commit 1: Eliminate handle/job inheritance in the live-start → wrapper → poll chain

**File 1:** `src/live/start.rs` — wrapper spawn (lines 225-273).
- Extract the existing `spawn_detached_no_inherit` helper from `src/owl/hook_idle.rs` into a shared `src/common/win_spawn.rs` module (or `src/common/process.rs`).
- Replace the Rust `Command::spawn` call for the wrapper with a call to that helper, passing the wrapper args.
- Result: wrapper runs with `bInheritHandles=FALSE + CREATE_BREAKAWAY_FROM_JOB + DETACHED_PROCESS + CREATE_NEW_PROCESS_GROUP + CREATE_NO_WINDOW`. Fully detached from Claude Code's job and pipe handles.
- Side effect: the env vars `OWL` and `LIVE` set on the wrapper Command will need explicit transfer — the existing helper takes only args, no env. Either (a) wrapper resolves these via `current_exe()` itself (already does via lifecycle.rs), or (b) extend helper to accept env pairs and serialize to a CREATE_UNICODE_ENVIRONMENT block.

**File 2:** `src/live/wrapper/mod.rs` — poll subprocess spawn (lines 170-205, `poll_psyche`).
- This call MUST capture stdout (the whole point of the wrapper-poll dance). The straightforward approach:
  - Option A: Use raw CreateProcessW with `bInheritHandles=FALSE` + STARTUPINFOEX + PROC_THREAD_ATTRIBUTE_HANDLE_LIST to selectively inherit ONLY the pipe write end for stdout/stderr. Complex but textbook Windows pattern.
  - **Option B (RECOMMENDED, simpler): switch the wrapper-poll IPC from pipe to file.** Pass `--output-file <path>` to poll, have poll write its result there, wrapper polls the file for completion (via a sentinel newline or separate `--done-file`). Spawn with the existing `spawn_detached_no_inherit` helper so the poll process has zero inherited handles. No piped stdio at all. Eliminates the bug class entirely for this spawn point.
  - Option C: Spawn poll with the helper (no inheritance), and have poll deliver its result via the existing message-passing mechanism (e.g., write to a status file in the perch dir). Cleanest semantically.

For minimal surface area, **Option C** is preferred: extend poll to write its result line(s) to `owlery/<id>/poll-result-<pid>.tmp` then exit. Wrapper reads that file when child exits. This reuses existing perch dir conventions.

If Option C feels heavy for this fix, fall back to **Option B**: poll reads a new `--output-file` arg, opens the file with O_WRONLY, dups it to fd 1 (or replaces stdout with it via Rust `std::io::set_output_capture` won't work cross-platform; use direct file write instead of println), exits. Wrapper reads the file post-exit.

For the fastest possible patch (Commit 1 alone): swap `Command::spawn` in wrapper for a `bInheritHandles=FALSE` raw spawn that DOES use STARTUPINFOEX with the explicit handle-list pattern. Implementation reference: `windows-sys` crate's `InitializeProcThreadAttributeList` + `UpdateProcThreadAttribute` with `PROC_THREAD_ATTRIBUTE_HANDLE_LIST`. ~30 lines of unsafe.

### Commit 2: Wrapper-loop hardening — never treat empty output as terminate

**File:** `src/live/wrapper/mod.rs` — main loop (lines 89-92).
- Current: `if msg.trim().is_empty() { self.log("poll returned empty stdout, exiting loop"); break; }`
- New: Empty output is a SOFT failure. Log it and CONTINUE the loop (re-arm the poll). Only break on:
  - Explicit `TERMINATED:<id>` or `STOPPED:<id>` status lines parsed out of the captured combined output (status from poll's stderr).
  - Three consecutive empty exits in a row (defensive against runaway loop if the binary is genuinely broken).
  - `ready` file removed (already checked at top of loop).
  - INIT_SIGNOFF processed.
- Bonus: parse stderr from poll output for `TERMINATED:`/`STOPPED:` tags before deciding break vs continue. Currently the log captures stderr but doesn't act on it.
- This commit alone HARDENS the wrapper but does not fix the root cause. Combined with Commit 1, the wrapper survives even if the poll subprocess is killed externally for any reason.

### Validation plan

1. `cargo check --release` clean.
2. `cargo test --release` all pass.
3. Run DEPLOY.ps1 -Bump patch (per quick task vbf convention).
4. Restart Claude Code, start fresh `$LIVE start <id>` (with no `--period` so default 1200s).
5. Leave idle for 6+ minutes WITHOUT any messages. Verify wrapper log shows iter completing past 5min without `code=1 stderr=` empty exit.
6. Send a message during the next iteration; verify it's delivered.
7. Send `$LIVE stop <id> --all` and verify clean teardown.

## Atomic Commits

- Commit A (root fix): `fix(quick-260416-XXX): stop wrapper/Self poll TerminateProcess(...,1) at 5min via bInheritHandles=FALSE + CREATE_BREAKAWAY_FROM_JOB`
- Commit B (defensive hardening): `fix(quick-260416-YYY): wrapper loop no longer treats empty poll stdout as terminate signal`

## Notes for tooling

- Use the same Windows API helper pattern from `src/owl/hook_idle.rs::spawn_detached_no_inherit`. Consider promoting it to `src/common/process.rs::spawn_detached_no_inherit` and `spawn_with_pipe_no_inherit` to share between hook_idle.rs, live/start.rs, and live/wrapper/mod.rs.
- `b9strsytj` — current Self task PID 76548, confirmed alive 16+ min as of 05:00. Once Commit 1 lands and is deployed, this Self process should be stopped/restarted to pick up the fix; otherwise it will continue to be a 5-min-fragile process until next session.


## Resolution

**root_cause:** Windows handle/job-object inheritance from Claude Code's Bash tool transitively into the live-start → wrapper → poll spawn chain. Rust `std::process::Command::spawn` on Windows hard-codes `bInheritHandles=TRUE` and inherits the parent's job-object membership. After ~5 minutes the host calls `TerminateProcess(handle, 1)` on every descendant — producing the observed exit code 1 with empty stdout and empty stderr. Same root cause family as quick-260416-uz8 (echo-commune handle leak), different spawn site.

**fix:** Two atomic commits in quick-260416-aaa:
- Commit A: New `src/common/win_spawn.rs` (Windows-only) with two helpers — `spawn_detached_no_inherit` (used by `live/start.rs::run` and `live_start_result` for the wrapper spawn) and `spawn_capture_no_inherit` (used by `live/wrapper/mod.rs::poll_psyche` for the poll subprocess spawn, preserving stdout/stderr capture via STARTUPINFOEX + PROC_THREAD_ATTRIBUTE_HANDLE_LIST). Both helpers pass `bInheritHandles=FALSE` (or selective inheritance for the capture helper) and `CREATE_BREAKAWAY_FROM_JOB` so the descendant escapes the host's job object.
- Commit B: Hardened the wrapper main loop in `src/live/wrapper/mod.rs` so a single empty-stdout exit from poll no longer ends the loop. Defensive backstop after 3 consecutive empty exits, with re-check of the ready file after each empty exit (legitimate teardown removes the ready file).

**validation:**
- All 122 tests pass (cargo test --release).
- Standalone smoke test of new binary: `poll smoke1 listen --psyche --pulse-interval 5 --setup` → exit=0, 5.2s, stdout="PULSE_TRIGGER (...)".
- Pre-fix repro analysis: standalone tester1 (60s pulse), tester3 (600s pulse), tester4 (300s pulse exact-match to doyle's broken iter 4), and tester2 (no pulse, 26 min) all completed cleanly from PowerShell, ruling out anything internal to `poll.rs` and confirming the bug is environmental (handle/job inheritance).

**end-to-end validation pending:** requires a fresh Claude Code session to reload the binary into the plugin cache, then `$LIVE start <id>` from a Bash background task, then 6+ min idle, observing the wrapper survives. Out of scope for this fix's session (would require killing the active live agent driving the work).
