---
slug: echo-gate-poll-exit-sentinel-triad
status: partially-resolved
created: 2026-05-24T10:49:18Z
updated: 2026-05-25T18:00:00Z
trigger: user-report
goal: find_and_fix
tdd_mode: false
---

# Debug Session: ECHO gate / poll-exit / sentinel detection triad

## Symptoms

User reports three related symptoms in the echo-commune / ECHO gate / poll-exit machinery. All evidence is in real log files (lines quoted exact in user report).

### Symptom 1 — ECHO gate opens but Psyche never awoken with commune output

- **Log:** `C:\Users\decid\AppData\Local\spt\logs_latest\ling.log`
- **Bad cases:** line 127, line 152 — ECHO gate fires, no Psyche wake follows
- **Good case for comparison:** line 68 — ECHO gate fires AND Psyche IS awoken

### Symptom 2 — poll-exit error path: 3 loops then wrapper self-terminates

- **Log A:** `C:\Users\decid\AppData\Local\spt\logs_latest\ling.log` line 164
- **Log B:** `C:\Users\decid\AppData\Local\spt\logs_latest\todlando.log` line 300

### Symptom 3 — ECHO gate repeatedly reports "no sentinel" during active session

- **Log:** `C:\Users\decid\AppData\Local\spt\logs_latest\todlando.log`
- **Range:** line 232 to end of file (~80+ minutes wall time)

## Evidence

### Symptom 1 — bad vs good frame headers

GOOD (ling.log:63 → wake at line 68):
```
ECHO_COMMUNE ... self_session=0690a7a8-9b1d-... exit=0 stdout_bytes=2768 excerpt_bytes=43967 excerpt_lines=21 ...
```
→ next poll: `poll returned 1957 bytes` → `[MSG] from=<EVENT type="echo_commune"` → Psyche resume.

BAD A (ling.log:130 → no wake):
```
ECHO_COMMUNE ... self_session=077ed7d9-1905-... exit=0 stdout_bytes=1213 excerpt_bytes=4557 excerpt_lines=2 ...
```
→ next poll: `poll returned 40 bytes` (heartbeat) → `[PULSE] pulse-psyche=off — skipping resume`.

BAD B (ling.log:155 → no wake):
```
ECHO_COMMUNE ... self_session=0690a7a8-9b1d-... exit=0 stdout_bytes=1269 excerpt_bytes=0 excerpt_lines=0 ...
```
→ next poll: 40-byte heartbeat → no resume.

### Symptom 2 — DUPLICATE storm signature

ling.log:164 (`[00:26:42]` — three identical exits in the SAME SECOND):
```
poll exited code=1 stderr=⚠ DUPLICATE:ling-psyche already has an active listener (pid 17256)
poll returned empty stdout (consecutive_empty=1/3, re-checking ready file and re-arming)
poll exited code=1 stderr=⚠ DUPLICATE:ling-psyche already has an active listener (pid 17256)
poll returned empty stdout (consecutive_empty=2/3, re-checking ready file and re-arming)
poll exited code=1 stderr=⚠ DUPLICATE:ling-psyche already has an active listener (pid 17256)
poll returned empty stdout (consecutive_empty=3/3, ...)
poll returned empty stdout too many times in a row, exiting loop as safety backstop
wrapper loop exited, cleaning up
```

todlando.log:300 — identical pattern with pid 70464.

### Symptom 3 — sentinel-absent loop

After 00:21:38 (successful resume + auto-commit), wrapper enters 80+ minutes of heartbeats (every ~8 min):
```
[ECHO] gate idle: no sentinel present (awaiting next Stop)
[PULSE] pulse-psyche=off — skipping resume; echo gate fires next iteration
```
No `.more-done` sentinel ever materialises until the wrapper dies via Symptom 2 at 01:49:41.

## Hypotheses

### Cycle 1 (REJECTED BY USER 2026-05-24)

S1-H1 (Haiku emitted empty COMMUNE because excerpt was tiny) — REJECTED. User: "stdout_bytes=1213 is NOT empty; user was actively interacting."
S2-H2 (PID-recycle collision after handoff) — REJECTED. User: "PID recycling has never been a problem on Windows for months."
S3-H3 (cross-session perch attribution) — REJECTED. User: "owl_id resolution has been stable."

### Cycle 2 — Hypotheses after user pushback (preserved for history; see Cycle 3 evidence below for resolution status)

#### Symptom 1 — H1-v2 (HIGH-CONFIDENCE: Self session_uuid drift)

(See cycle-2 entries earlier in the doc preserved through cycle 3.)

## Cycle 3 — Investigation against the running system

### Evidence collected during cycle 3

Read `C:/Users/decid/AppData/Local/spt/owlery/ling/info.json` (post-handoff,
2026-05-24 00:18 boot time):

```json
{"owl_id":"ling","started":"2026-05-24 03:36:37 PDT","mode":"listen",
 "pid":65764,"session_id":"077ed7d9-1905-44f2-a183-69f9f69abf16",
 "state":"live","parent_pid":64000,
 "cwd":"C:/Users/decid/Documents/projects/claude_skill_owl",
 "project_history":[...]}
```

Read `C:/Users/decid/AppData/Local/spt/owlery/ling/nested/ling-psyche/info.json`:

```json
{"owl_id":"ling-psyche","started":"2026-05-24 00:18:42 PDT","mode":"listen",
 "pid":17256,"session_id":"0690a7a8-9b1d-43b2-bce5-4d3919737936",
 "state":"psyche","parent_pid":20844}
```

Cross-referenced both UUIDs against `~/.claude/projects/`:

| UUID | jsonl location |
|---|---|
| `0690a7a8-9b1d-...` | `~/.claude/projects/C--Users-decid-Documents-projects-claude-skill-owl/0690a7a8-...jsonl` (USER session) |
| `077ed7d9-1905-...` | `~/.claude/projects/C--Users-decid-Documents-projects-claude-skill-owl/077ed7d9-...jsonl` (USER session) |
| `df846631-b34d-...` | `~/.claude/projects/C--Users-decid-AppData-Local-spt-psyches-tracked/df846631-...jsonl` (PSYCHE session — recorded at ling.log:7 init) |

### Cycle-3 conclusion on Symptom 1

**Bug 1 as the user described it — "wrong-perch read: nested psyche
info.json instead of self's info.json (or vice versa)" — is NOT
present in the current code.** The wrapper at
`src/live/wrapper/echo_fire.rs:195` correctly reads
`owlery::info_file(&self.self_id)` which resolves to the FLAT Self perch
`owlery/ling/info.json` (verified via `src/common/owlery.rs:204` →
`perch_dir(id).join("info.json")`).

The two UUIDs flapping in the frame headers (`0690a7a8` and `077ed7d9`)
are BOTH foreground Claude Code session UUIDs — neither is the Psyche's
session UUID (`df846631`). The PSYCHE's nested info.json carries
`0690a7a8` (the wrapper's frozen-at-spawn `OWL_SESSION_ID`, leaked per
the documented behavior in `src/owl/cleanup.rs::run` lines 76-86); the
SELF's flat info.json carried `077ed7d9` at BAD A frame time and
`0690a7a8` at BAD B frame time.

The flap is consistent with two concurrent foreground Claude windows on
the same project both racing `refresh_info_json_session_id` patches
against the same Self perch info.json (Self's stored `parent_pid` field
serves as the match key in the `find_perch_by_session` →
`find_perch_by_parent_pid` fallback at
`src/common/hook_output.rs:144-153`). When the user opens / closes
multiple Claude windows in the same project cwd, each window's
SessionStart hook patches Self's `info.session_id` to its own UUID,
producing the observed oscillation.

### Cycle-3 conclusion on Symptom 2 (DUPLICATE storm — FIXED)

H2-v2 confirmed — the nested psyche perch info.json is rewritten by the
wrapper's `poll_psyche` subprocess on every iteration with a
~120-200s-lifetime PID. After the subprocess exits the recently-released
PID often gets recycled to another live process, the next iteration's
DUPLICATE guard trips on the recycled-pid alive-check, three consecutive
false positives trigger `MAX_CONSECUTIVE_EMPTY_EXITS=3`, wrapper exits.

**Fix shipped:** commit on main HEAD —
`fix(poll): wrapper-owned psyche skips DUPLICATE guard + writes minimal info.json`

Two behavioral changes in `src/owl/poll.rs::run` scoped to
`psyche && parent.is_some()` (the wrapper-owned psyche poll case spawned
by `src/live/wrapper/mod.rs::poll_psyche`):

1. Bypass the DUPLICATE LISTENER GUARD entirely (the wrapper is the sole
   owner of its nested psyche perch — there is no legitimate competing
   listener to detect).
2. Write a schema-minimal `{"pid": <int>}` info.json instead of the full
   `InfoJson`. Stripping `session_id` / `owl_id` / `parent_pid` / `state`
   stops the cross-session `OWL_SESSION_ID` inheritance leak documented
   in `src/owl/cleanup.rs::run` (lines 76-86).

`src/common/types.rs::get_pid_from_info` and `get_parent_pid_from_info`
were migrated from typed `InfoJson` deserialization to
`serde_json::Value::get(...)` so the new minimal shape parses cleanly.
Legacy full-shape info.json files continue to parse identically.

### Cycle-3 conclusion on Symptom 1 / Bug 1 user-directed wrong-perch fix

**No code change applied for the user-directed "wrong perch read/write"
hypothesis** — neither the grep audit
(`grep -rn "info_file" src/owl src/live src/common`) nor the on-disk
evidence supports a regression where one perch's info.json is read /
written in place of another's. Every production reader of
`info_file(self_id)` resolves to the FLAT Self path; every production
reader of psyche info.json resolves through `resolve_info_file` with
`ParentHint::Explicit(self_id)` to the NESTED path. Each is correct.

The observed `self_session` UUID flap is best explained by two foreground
Claude windows racing `refresh_info_json_session_id`, not by a wrong-perch
bug. A possible follow-up fix lives at
`src/common/hook_output.rs::find_perch_by_session` — its parent_pid
fallback unconditionally `patch_session_id`s any matching perch, which
makes Self's `info.session_id` non-monotonic across concurrent Claude
windows. Fixing it would require either (a) refusing to patch when the
existing session_id is still active in another Claude window, or
(b) widening Self's session_id field to a set of "currently-attached
sessions." Either is structural and out of scope for this debug session.

**Important secondary effect of Bug 2 fix:** with the psyche perch's
info.json now carrying no session_id, the cleanup.rs `state == Psyche`
skip becomes redundant-but-harmless (the `session_id` substring search
the function uses can no longer match a wrapper-owned psyche perch).
Bug 3 (signoff stamp) is structurally moot for the same reason.

### Sub-question from user

> "Why would it be checking for psyche's PID presence during a point in
> time where it has no business spawning the psyche?"

Answer: the wrapper is NOT spawning the psyche LLM here — the inner
`owl poll <psyche> listen --psyche --parent <self> --once` subprocess is
spawned EVERY wrapper iteration as the message-listener for the psyche
perch, completely independent of whether the psyche claude session itself
is being resumed. The "psyche respawning" framing is wrong — this is just
the wrapper's normal per-iteration message-drain cycle, which happens to
re-enter `poll::run` and therefore re-run the DUPLICATE guard each time.
The fix (skip DUPLICATE guard for own psyche) closes the symptom by
removing the spurious guard rather than by suppressing the per-iteration
spawn (which is structurally required for message delivery).

### Symptom 3 status post-Bug-2 fix

S3 (sentinel-absent loop) was NOT directly addressed by the Bug 2 fix.
Per cycle-2 hypothesis H3-v2, the most likely cause was tied to S1 (Self
session_id drift) — and since the structural source of that drift
(concurrent Claude windows patching info.json) was not addressed, S3
remains possible in future sessions. However, Bug 2's wrapper-survival
fix means the wrapper no longer dies after ~95 minutes of sentinel-absent
loops, so even if S3 recurs the wrapper will continue to fire on the next
legitimate Stop event.

## Resolution

| Bug | Status | Commit |
|---|---|---|
| Bug 1 (wrong-perch read/write) | NOT REPRODUCED — see Cycle-3 conclusion above | — |
| Bug 2 (DUPLICATE storm + schema-shrink) | FIXED | `fix(poll): wrapper-owned psyche skips DUPLICATE guard + writes minimal info.json` |
| Bug 3 (signoff stamp) | STRUCTURALLY MOOT post-Bug-2 | — (covered by Bug 2) |

Session status set to `partially-resolved`: Bug 2 ships; Bug 1's
structural follow-up (multi-Claude-window race on
`patch_session_id`) is documented above and deferred per user
instruction (no version bump, no deploy this round).

## Cycle 4 — "fix did not work" re-report (2026-05-25): DEPLOY GAP, not regression

**User re-report:** `doyle.log:75` — wrapper hit the identical DUPLICATE
storm and self-terminated. "Previous fix apparently did not work."

**Verdict: the Bug-2 fix is correct and was never the problem in Cycle 4 —
doyle ran a binary built two hours BEFORE the fix commit.** Pure deploy
timing artifact. No code change required.

### Evidence

Source confirmed correct (fix present, not regressed):
- `src/owl/poll.rs:140` — `let wrapper_owned_psyche = psyche && parent.is_some();`
  and line 141 guard excludes it. Intact.
- `src/live/wrapper/mod.rs:1527-1540` — inner poll argv passes
  `poll <psyche> listen --psyche --parent <self> ...`, so at the failing
  call site `psyche=true && parent=Some` → `wrapper_owned_psyche=true` →
  guard bypassed. Unit test `poll_psyche_argv_contains_once_flag`
  (mod.rs:2372) asserts `--parent` at args[4]. (H2 "argv missing --parent"
  REJECTED.)

Binary timeline (the actual root cause):

| Artifact | Built / committed | Contains Bug-2 fix? |
|---|---|---|
| fix commit `6342852` | 2026-05-24 **05:47:51** | — (source of fix) |
| deployed `spt/1.11.15/owl.exe` | 2026-05-24 **03:36:32** | **NO** — 2h older than fix |
| deployed `spt/1.11.16/owl.exe` | 2026-05-25 **19:04:18** | YES (`6342852` is ancestor of HEAD) |

doyle gen=7 launched 2026-05-25 **17:15:58**, stormed + died **17:40:44** —
both BEFORE 1.11.16 was built (19:04). So doyle ran **1.11.15**, whose
binary predates the fix by two hours. The fix had been committed but the
binary was not rebuilt/redeployed until 1.11.16 today, *after* doyle was
already dead.

Active version now (`installed_plugins.json`) = **1.11.16** → the next
doyle launch picks up the guard-bypass. Bug 2 is closed in the running
binary as of the 1.11.16 deploy.

### Cycle-4 conclusion on Symptom 2 / Bug 2

| Bug | Status | Resolution |
|---|---|---|
| Bug 2 (DUPLICATE storm) | RESOLVED IN DEPLOYED BINARY | Fix `6342852` correct; reached the runtime only via the 1.11.16 deploy (2026-05-25 19:04). 1.11.15 (doyle's binary) predated the fix. |

**No further code change.** Empirical confirmation pending: relaunch a live
agent on 1.11.16 and confirm the wrapper survives a DUPLICATE-pid poll exit
without incrementing `consecutive_empty`.
