---
status: resolved
trigger: |
  binary handoff failed at line 555 of C:\Users\decid\AppData\Local\spt\logs_latest\todlando.log. this is a regression. when did the behavior change? i've never seen a "HANDOFF_DEFER" flag before. --- diagnose and fix.
created: 2026-05-23
updated: 2026-05-23
resolved: 2026-05-23
fix_commits:
  - f105536  # fix(handoff): pulse_psyche clap default restores wrapper handoff argv backward-compat (corrects [1.11.9])
  - 2328747  # chore: bump spt plugin to v1.11.10
fix_quick_task: 260523-7zy-fix-v1-11-9-handoff-argv-backward-compat
---

> **RESOLVED 2026-05-23** via quick task `260523-7zy`. Fix landed as commits `f105536` (one-line clap `default_value = "0"` on `pulse_psyche` at `src/cli.rs:196` + new `psyche_wrapper_5arg_argv_backward_compat` regression test in `tests/handoff_integration.rs`) and `2328747` (v1.11.10 bump). Operator follow-up: run `docs/DEPLOY.ps1` (no `-Bump` flag, version pinned), then `$LIVE revive todlando` to bring the dead live agent back on the patched binary.

> **Status reverted from `root_cause_found` to `investigating` (2026-05-23):** First debugger's verdict ("user unfamiliarity, no defect, handoff succeeded") was incorrect. Operator clarified that the **wrapper exited**. Liveness check confirms all three pids dead:
> - listener `todlando` pid 71520 — DEAD
> - psyche `todlando-psyche` pid 31564 — DEAD
> - new wrapper pid 69036 (spawned at line 558) — DEAD
> - shared `parent_pid` 62240 in both info.json files belongs to `claude.exe.old.1779509296487` — the parent Claude Code shell, **not** a wrapper. There is no live wrapper for `todlando-psyche` right now.
>
> `info.json` files were written at 05:15:02 PDT and never updated since — proves the new processes came up briefly (long enough to claim the perches), then died without restart. This is a real regression in the wrapper handoff path, masked by the perch state writes happening before the crash.

# Debug: binary-handoff-defer-todlando

## Symptoms

- **Expected behavior**: Binary handoff completes cleanly (running listener self-migrates to new binary). No failures during deploy-time plugin version flip.
- **Actual behavior**: At line 555 of `C:\Users\decid\AppData\Local\spt\logs_latest\todlando.log`, poll exited with non-zero code and emitted a `HANDOFF_DEFER:todlando-psyche` stderr flag the user has never seen before. User claims this is a regression.
- **Error messages**:
  - `[05:15:02] poll exited code=2 stderr=HANDOFF_DEFER:todlando-psyche`
  - `[05:15:02] poll signaled HANDOFF_DEFER (exit=2) — running wrapper handoff`
  - `[05:15:02] HANDOFF wrapper (from defer): target=C:\Users\decid\.claude\plugins\cache\cplugs\spt\1.11.9\owl.exe`
  - `[05:15:02] HANDOFF wrapper: new pid=69036, exiting`
- **Timeline**: Reported as regression; "never seen HANDOFF_DEFER before." HANDOFF_DEFER protocol itself is deliberate (Phase 18.5 Bug #12, 2026-05-13). The *crash* of the new wrapper IS a regression — introduced by commit `3616ed1` (2026-05-23) which shipped in v1.11.9.
- **Reproduction**: Deploy any v ≥ 1.11.9 over a v ≤ 1.11.8 live agent with an active Psyche wrapper. The v1.11.8 wrapper's `perform_wrapper_handoff` spawns the v1.11.9 binary with 5-positional argv; v1.11.9's clap expects 6 (added `pulse_psyche`); clap rejects → process exits before writing a single log line → wrapper dies, then listener orphan-detects on next iter and dies too.

## Current Focus

- **hypothesis**: Confirmed. Commit `3616ed1 feat(live): default --period 480 + new --pulse-psyche flag` (2026-05-23 04:53 PDT, shipped in v1.11.9 via `fc868c3`) added a required positional argument `pulse_psyche: String` to `Commands::PsycheWrapper` in `src/cli.rs:188-197` with NO `#[arg(default_value = ...)]`. The same commit updated `perform_wrapper_handoff` in `src/live/wrapper/mod.rs:1682-1689` from `wrapper_args: [&str; 5]` to `[&str; 6]`. v1.11.8 wrappers (which were running on user's box prior to deploy) still spawn the new binary with the 5-arg shape. v1.11.9's clap parser rejects the 5-arg invocation with the standard clap error `error: the following required arguments were not provided: <PULSE_PSYCHE>` and non-zero exit. Because `spawn_detached_no_inherit` attaches no stdio, the stderr is dropped to the void; because the process exits before reaching `WrapperState::new` (which is where the first `log()` call happens), no log line is written. Net visible behavior: a brand-new pid appears in `info.json` for a single fsync window (written by the spawning side just before the crash) and then vanishes.
- **test**: Diff `3616ed1` for `cli.rs` PsycheWrapper signature → confirmed pre-3616ed1 = 4 positionals (5 incl. subcommand name), post = 5 positionals (6 incl. subcommand) — REQUIRED, no default. Liveness check on PIDs 69036/71520/31564 → all dead. `wrapper-state.json` at `owlery\todlando-psyche\` still on disk (114 bytes, gen=30, mtime 5:15:02 — written by old wrapper, never consumed by new). `load_and_delete` was never called → confirms `WrapperState::new` was never entered → confirms crash happened earlier (in clap parse).
- **expecting**: A clap `default_value = "0"` on the `pulse_psyche` arg restores argv backward-compat. State-file backward-compat already works (`#[serde(default)]` on `WrapperHandoffState.pulse_psyche` per same commit's deliberate decision). All other invariants preserved (state-file wins over argv per D-07; cold-start argv path unchanged for the listener handoff which has stable argv shape).
- **next_action**: Apply fix via `/gsd:quick` — single one-line clap attribute addition in `src/cli.rs:196`, plus a regression test in `tests/handoff_integration.rs` that asserts `_psyche-wrapper id 480 30 sess` (5-arg, no `pulse_psyche`) clap-parses successfully and defaults `pulse_psyche` to `"0"`. After fix lands and is deployed, the next handoff from v1.11.9 (or later) to a future version with argv changes should be tested via the existing fake-handoff harness.

## Evidence

- timestamp: 2026-05-23 — Log line 555: `poll exited code=2 stderr=HANDOFF_DEFER:todlando-psyche`. Log line 556 confirms orchestrator handled defer: "poll signaled HANDOFF_DEFER (exit=2) — running wrapper handoff". Lines 557-558 show new wrapper pid spawned and old process exiting — suggesting the "defer" is a coordinated handoff, not a failure.
- timestamp: 2026-05-23 — grep finds `HANDOFF_DEFER` references in: `src/owl/poll.rs`, `src/live/wrapper/mod.rs`, `src/common/handoff.rs`, `tests/handoff_integration.rs`, plus `.planning/quick/260513-63n-wire-tcp-wake-into-psyche-wrapper-inner-/*` (a quick-PLAN/RESEARCH/SUMMARY trio from 260513 mentioning HANDOFF_DEFER) and `.planning/STATE.md`. Strong signal this is a deliberate code path tied to a recent "wire-tcp-wake-into-psyche-wrapper" change, not a regression.
- timestamp: 2026-05-23 — `git log -S "HANDOFF_DEFER" --all --oneline -- src/` returns four commits, all from 2026-05-13 to 2026-05-15 (well before v1.11.9 bump): `62298fe feat(18.5-01): extend spawn_and_wait_inherit_stdio with env overrides + add handoff child/wrapper env constants`, `3193431 fix(18.5-01): rewrite poll.rs handoff branch — argv rewrite + duplicate bypass + wrapper defer`, `dfda1f7 feat(18.5-02): wrapper consumes poll exit code 2 for handoff defer`, `8182e2d fix(handoff): GREEN — Fix D sentinel-exit-and-respawn breaks multi-deploy chain growth` (2026-05-15 20:25). Definitive: feature, not regression — for the protocol. The CRASH is a separate regression.
- timestamp: 2026-05-23 — Source confirms protocol. `src/owl/poll.rs:493-503`: inner poll under `OWL_UNDER_WRAPPER=1` emits `HANDOFF_DEFER:<id>` and exits 2, **preserving perch state** (no ready-file removal, no listener close). `src/live/wrapper/mod.rs:1184-1214` consumes exit=2 and calls `perform_wrapper_handoff(&target)` which writes `wrapper-state.json` before spawning the new wrapper. `src/common/handoff.rs:36-40` documents the contract.
- timestamp: 2026-05-23 — **VERSION FORENSICS.** `installed_plugins.json` shows `spt@cplugs` was at v1.11.9 as of 2026-05-23 12:15:02Z (= 05:15:02 PDT, the exact moment of defer). Directory listing of `C:\Users\decid\.claude\plugins\cache\cplugs\spt\`: v1.11.8 dir mtime 04:29:38 PDT, v1.11.9 dir mtime 05:15:02 PDT. So the OLD wrapper running before the deploy was v1.11.8, and the deploy at 05:15:02 flipped `installed_plugins.json` from 1.11.8 → 1.11.9.
- timestamp: 2026-05-23 — **ARGV REGRESSION.** `git show 3616ed1 -- src/cli.rs` confirms the pre-commit `PsycheWrapper` had 4 positionals (`self_id, period, gen, session_name`); post-commit added `pulse_psyche: String` as a **required** 5th positional with NO `#[arg(default_value = ...)]`. Same commit's `src/live/wrapper/mod.rs` diff shows `wrapper_args: [&str; 5]` → `[&str; 6]`. Net consequence: v1.11.8 wrapper's `perform_wrapper_handoff` builds `["_psyche-wrapper", id, period, gen, session_name]` (5 args) and execs the v1.11.9 binary, which clap-rejects with "the following required arguments were not provided: <PULSE_PSYCHE>" and exit code != 0.
- timestamp: 2026-05-23 — **CRASH-BEFORE-LOG.** `todlando.log` line 559 is the last line ("HANDOFF wrapper: new pid=69036, exiting"). pid 69036 wrote ZERO log lines. The first `log()` call in `WrapperState::new` is the `wrapper_state.json` rehydrate path; `state.run()`'s first log line is the iteration counter. Neither fired. clap's failure path is `process::exit(2)` BEFORE main() returns to user code. Consistent with `wrapper-state.json` at `owlery\todlando-psyche\wrapper-state.json` being on disk (114 bytes, gen=30) but never re-published (no mtime update past 05:15:02) — i.e., `load_and_delete` was never invoked. Confirms the new wrapper never reached `WrapperState::new`.
- timestamp: 2026-05-23 — **STATE-FILE PATH IS CORRECT.** `wrapper-state.json` is written by `perform_wrapper_handoff` at `owlery::perch_dir(psyche_id).join("wrapper-state.json")` (flat layout); the reader at `lifecycle::new` uses `wrapper_state_path_resolved(self_id, psyche_id)` (nested-first with flat fallback, prefer fresher mtime per WR-02). The flat file at `owlery\todlando-psyche\wrapper-state.json` mtime 5:15:02 would correctly be picked over the stale nested one (mtime 5/22 01:55:06). State-path is NOT the bug.
- timestamp: 2026-05-23 — **LISTENER ARGV STABLE.** `build_handoff_child_argv` in `src/owl/poll.rs:1313-1325` produces `["poll", id, "listen", "--live" | "--psyche"]` — 3-4 args, shape stable across versions. v1.11.8 listener self-relaying to v1.11.9 listener works correctly; listener pid 71520 booted fine, wrote `info.json` at 5:15:02, then died later for a downstream reason (orphan detection — its parent wrapper is dead).
- timestamp: 2026-05-23 — **STATE-FILE BACKWARD-COMPAT IS HANDLED.** Commit 3616ed1 added `#[serde(default)]` to `WrapperHandoffState.pulse_psyche`, deliberately handling backward-compat for state files written by older wrappers. The commit author thought about state-file compat but missed the symmetric concern for argv compat. Argv backward-compat is the gap.

## Eliminated

- Regression introduced in v1.11.9 in the `HANDOFF_DEFER` protocol itself — HANDOFF_DEFER predates the v1.11.9 bump by ~10 days. v1.11.9 was the *target* of this handoff, not the source of the protocol.
- Poll always exits 2 — exit 2 only fires under `OWL_UNDER_WRAPPER=1` AND when `handoff_available()` returns Some.
- `wrapper-state.json` path mismatch (nested vs flat) — writer uses flat (correctly per the writer-stays-flat migration), reader uses fresher-of-(nested, flat) resolver. Both flat-resolves cleanly. Not the bug.
- `wrapper-state.json` serde schema mismatch — `#[serde(default)]` on `pulse_psyche` makes a v1.11.8-written state file readable by v1.11.9. Not the bug.
- DEPLOY.ps1 killed the new wrapper — the deploy script's `Stop-Process` only targets the OLD version dir's owl.exe processes (by image path lock); the v1.11.9 owl.exe just-spawned wouldn't match. Not the bug.
- Stdio-relay defect — wrapper handoff is `spawn_detached_no_inherit` (no relay, no chain), per Phase 18.4 D-06. Not the bug.
- Self-orphan false-positive — wrapper crashed BEFORE running any orphan check (didn't reach `WrapperState::new`). Not the bug.

## Eliminated — overturned by operator clarification

- ~~Handoff stalled — new perch `info.json` written at same wall-clock second proves new listener took over.~~ **OVERTURNED**: info.json write proves new processes briefly *started*, not that they survived. Liveness check on 2026-05-23 shows pids 69036/71520/31564 all dead; the only live process related is `claude.exe.old.1779509296487` (pid 62240) — the parent Claude shell, not a wrapper.

## Resolution

**root_cause**: Commit `3616ed1` (shipped in v1.11.9) added `pulse_psyche: String` as a **required** clap positional on `Commands::PsycheWrapper` without a `default_value`, while older wrappers (≤ v1.11.8) build the wrapper-handoff argv with only 4 positionals (no `pulse_psyche` slot). When a v1.11.8 wrapper detects the v1.11.9 binary as a handoff target and execs `_psyche-wrapper <id> <period> <gen> <session_name>`, v1.11.9's clap rejects the invocation with `error: the following required arguments were not provided: <PULSE_PSYCHE>` and exits non-zero. Because `spawn_detached_no_inherit` discards stdio, the clap error message is invisible; because the crash happens before user code runs, no log line is written; because `info.json` was already (re)written by the still-alive inner-poll path before the wrapper crash, the perch *appears* claimed for one fsync window. The wrapper dies; on the next listener iteration the listener orphan-detects (its wrapper-managed parent is gone) and itself dies; both perches end up dead with stale info.json and a still-on-disk wrapper-state.json that was never consumed.

**fix (proposed, NOT YET APPLIED)**:

Add `#[arg(default_value = "0")]` to `pulse_psyche` in `src/cli.rs:196`:

```rust
PsycheWrapper {
    self_id: String,
    period: u64,
    gen: u32,
    session_name: String,
    /// 260523-648: pulse-psyche gate passed as positional `"1"`/`"0"`.
    /// default_value handles argv backward-compat: a pre-3616ed1 wrapper
    /// spawning this binary with only 4 positionals (no pulse_psyche slot)
    /// will land here with pulse_psyche="0". The on-disk wrapper-state.json
    /// then wins per D-07 (its #[serde(default)] populates pulse_psyche=false
    /// for state files written by v1.11.8 wrappers).
    #[arg(default_value = "0")]
    pulse_psyche: String,
},
```

Total surface: 1 line + 1 doc comment. No behavior change for v1.11.9 → v1.11.9 handoffs (the new wrapper still passes 6-arg shape). v1.11.8 → v1.11.9 handoffs now succeed: clap fills `pulse_psyche="0"`, dispatch arm parses it as `pulse_psyche = false`, `WrapperState::new` runs, `load_and_delete` reads `wrapper-state.json` and gets `pulse_psyche=false` from `#[serde(default)]`, state-file value wins per D-07 lifecycle.rs:42-48 — result: rehydrated wrapper has `pulse_psyche=false`, identical to what a fresh `$LIVE start` would produce by default.

**Regression test (to add)**:

`tests/handoff_integration.rs` — assert that the v1.11.9+ binary's clap parser accepts the 5-arg legacy form `_psyche-wrapper <id> <period> <gen> <session_name>` AND that the missing positional defaults to `"0"`. Implementation: invoke the binary in a subprocess with the 5-arg form and `--help`-style introspection, OR call the clap parser directly via `Commands::try_parse_from(...)` and assert `pulse_psyche == "0"`.

**Existing state to clean up manually**:
- `wrapper-state.json` at `C:\Users\decid\AppData\Local\spt\owlery\todlando-psyche\wrapper-state.json` (gen=30, never consumed) — would be picked up by the next live wrapper invocation. Leaving it preserves the session_uuid for rehydration when the user manually re-runs `$LIVE revive todlando`. No cleanup required.
- Stale `info.json` files for `todlando` (pid 71520 dead) and `todlando-psyche` (pid 31564 dead) — these will be naturally refreshed on next `$LIVE revive todlando`. No cleanup required.
- Orphan working-perch dirs `todlando-w78/w79/w80` — pre-existing, not related to this bug. Leave alone.

**Fix routing per `CLAUDE.md` GSD enforcement**: apply via `/gsd-quick`. Single atomic commit shape:
```
fix(handoff): GREEN — pulse_psyche clap default restores wrapper handoff argv backward-compat
```
Touched files: `src/cli.rs` (1 attribute), `tests/handoff_integration.rs` (1 new test). Build and deploy via `docs/DEPLOY.ps1` (bump patch — v1.11.10) so any *next* deploy from v1.11.9 → v1.11.10 carries the same argv shape (no change), AND any *currently-running* v1.11.8 wrapper still alive on a user's machine will now hand off correctly into v1.11.10. (Self's todlando wrapper is already dead; the user will need to `$LIVE revive todlando` once to bring it back on v1.11.10.)
