---
status: resolved
trigger: "/spt:live no-arg regression — agent emits old numbered-pick + sleep 10 timer instead of AskUserQuestion / pick-spec flow"
created: 2026-05-11T05:50:00Z
updated: 2026-05-11T05:55:00Z
---

## Current Focus

hypothesis: Failing session loaded SKILL.md from `cache/cplugs/spt/1.9.7/skills/live/` — a stale cache directory containing the **pre-c69a7bc** ID Recollection prose. `installed_plugins.json` pins `installPath` to 1.9.8, but Claude Code's command/skill resolver landed on 1.9.7 anyway (likely the trampoline → cached resolver fallback). Deploy script never wiped 1.9.7's `skills/`, only its `owl.exe` survived in keep-set logic.
test: Read the failing-session JSONL and inspect the actual injected skill body; confirm the literal Base directory path and the embedded ID Recollection text.
expecting: System prompt shows `Base directory: ...1.9.7\skills\live` AND the old "Read .claude/LIVE_AGENT_IDS.json" + "sleep 10 [ID SELECT TIMER]" prose verbatim.
next_action: Confirmed both. Root cause = stale 1.9.7/skills cache being preferred by CC resolver.

## Symptoms

expected: `/spt:live` (no arg) calls `$LIVE pick-spec`, dispatches on `kind`, fires `AskUserQuestion` for `pick`/`prompt-new`/Cancel paths. No `sleep` timer.
actual: Agent runs `$LIVE list`, then **emits prose** `"Active in project: doyle (live). Available IDs (offline, reusable): 1. todlando 2. deployah 3. doyle2. Pick number, or new ID. Auto-select todlando in 10s."`, then runs `sleep 10 && echo TIMER_FIRED` in background, auto-picks `todlando`, runs `$LIVE start todlando`. Never invokes `$LIVE pick-spec`. Never fires `AskUserQuestion`.
errors: (no error — wrong behavior)
reproduction: Fresh CC session opened in `claude_skill_owl` repo after v1.9.9 deploy + `/reload-plugins`. Failing session JSONL: `C:/Users/decid/.claude/projects/C--Users-decid-Documents-projects-claude-skill-owl/6ce0463d-a7c5-4733-8b8b-a8a965f1db32.jsonl`, session start `2026-05-11T05:21:16.887Z`, CC version `2.1.138`.
started: After v1.9.9 deploy via `docs/DEPLOY.ps1 -Bump patch` at ~2026-05-10 20:38-20:39 PDT (= 03:38-03:39 UTC on 5/11). Phase 26-06 atomic commits a03a780/3ec1f53/6264ced/bf13aeb landed in repo at 20:32-20:33 PDT.

## Eliminated

- hypothesis: Agent hallucinated the old flow (H4 in trigger).
  evidence: JSONL contains the literal pre-c69a7bc prose in the *skill body injected as user message*. `"ID Recollection\r\n\r\nWhen the user says /spt:live with **no ID**:\r\n1. Read .claude/LIVE_AGENT_IDS.json from the project root..."` — verbatim match to `git show c69a7bc^:plugin/spt/skills/live/SKILL.md` lines 102-110. Zero `pick-spec` mentions in the entire 137KB session log. Agent was following instructions, not hallucinating.
  timestamp: 2026-05-11T05:51:00Z

- hypothesis: Project-level `.claude/settings.json` in `traceable-reqs` is overriding plugin resolution (H2 sub-hypothesis).
  evidence: (a) `traceable-reqs/.claude/` exists but contains ONLY `LIVE_AGENT_IDS.json` — no `settings.json`. (b) The failing session was NOT in `traceable-reqs` — JSONL `cwd: C:\Users\decid\Documents\projects\claude_skill_owl`. The original trigger description misidentified the session repo.
  timestamp: 2026-05-11T05:52:00Z

- hypothesis: Session pre-dates Phase 26 deploy and held an old cached SKILL.md in memory (H3).
  evidence: Session start timestamp `2026-05-11T05:21:16Z` (= 2026-05-10 22:21 PDT) is ~1h42m AFTER the 1.9.9 deploy (20:39 PDT). Session was fresh. But the SKILL.md *content* injected was pre-c69a7bc — so the SKILL.md was stale, not the session. H3 wrong about mechanism, but the symptom (loading old content) is real.
  timestamp: 2026-05-11T05:52:30Z

- hypothesis: All on-disk SKILL.md copies have the new flow, so the old prose must come from somewhere unconventional (premise of H2).
  evidence: Premise is **false**. 1.9.7 cache directory at `cache/cplugs/spt/1.9.7/` still existed at session start time. Its `skills/` subdirectory has since been removed (current `ls` fails), but the JSONL "Base directory" line proves it existed at 05:21:16Z. The original `grep -rln "Available IDs (offline, reusable)"` returning zero hits was a snapshot taken AFTER `skills/` was deleted.
  timestamp: 2026-05-11T05:53:00Z

## Evidence

- timestamp: 2026-05-11T05:48:00Z
  checked: `cat C:/Users/decid/.claude/plugins/installed_plugins.json`
  found: `spt@cplugs` entry has `version: "1.9.8"`, `installPath: "C:\\Users\\decid\\.claude\\plugins\\cache\\cplugs\\spt\\1.9.8"`, `gitCommitSha: 64b8ccdea511`, `lastUpdated: 2026-05-11T01:57:23.957Z`. **Pointer did NOT advance to 1.9.9 despite the deploy.**
  implication: `claude plugin install spt@cplugs` (the script's final step) returned "already installed" without rewriting the version. CC's CLI does not re-resolve when the same `spt@cplugs` marketplace ref already has any version installed. This is a known no-op on the `claude` side, not a deploy bug per se.

- timestamp: 2026-05-11T05:49:00Z
  checked: `grep "Base directory for this skill" <session-jsonl>`
  found: Skill body injected at message 9 (response to `/spt:live` command) begins `"Base directory for this skill: C:\\Users\\decid\\.claude\\plugins\\cache\\cplugs\\spt\\1.9.7\\skills\\live\n\n# /spt:live\r\n..."`. The entire body is the pre-c69a7bc SKILL.md text, including `1. Read \`.claude/LIVE_AGENT_IDS.json\`...4. Start \`sleep 10\` background task with description \`[ID SELECT TIMER]\`...`.
  implication: **Claude Code resolved the spt:live skill against 1.9.7's `skills/live/SKILL.md`, NOT 1.9.8 (the version pinned in installed_plugins.json) and NOT 1.9.9 (latest cached).** This is the proximate root cause.

- timestamp: 2026-05-11T05:50:00Z
  checked: `stat C:/Users/decid/.claude/plugins/cache/cplugs/spt/1.9.7/` and contents listing
  found: 1.9.7 dir birth `2026-04-30 16:31`, last Modify `2026-05-10 20:39:28` (= deploy time). Current contents: ONLY `owl.exe` survives. `skills/` and `hooks/` were removed during deploy. No SKILL.md exists at the path the failing session's prompt declares as Base directory.
  implication: At deploy time, the keep-set was `{1.9.9, 1.9.8, prevInstallPath-leaf}`. 1.9.7 was a prune candidate — `Remove-Item -Recurse -Force` ran against it but **could not fully delete** because `owl.exe` was likely in use (in-flight handoff target). The cleanup partially succeeded: `skills/` + `hooks/` got removed, `owl.exe` did not. So the directory existed but had no skills at session start AS WELL. **However**, CC may have started its skill resolution before the prune completed, OR CC's command-name → skill-path resolver retains a stale path mapping from before the prune.

- timestamp: 2026-05-11T05:51:30Z
  checked: `git show c69a7bc^:plugin/spt/skills/live/SKILL.md` (pre-rewrite version)
  found: ID Recollection section (lines 102-110): `1. Read \`.claude/LIVE_AGENT_IDS.json\` from the project root...2. Check active IDs via \`/spt:list-live\` 3. Present unused IDs as a numbered list 4. Start \`sleep 10\` background task with description \`[ID SELECT TIMER]\` 5. **Race**: user responds before timer -- use their choice; timer fires -- auto-select topmost unused ID`. Exact match (post-paraphrasing) to the agent's observed output.
  implication: Confirms the agent's "Available IDs (offline, reusable)... Auto-select todlando in 10s... sleep 10" is the LLM's natural paraphrase of these 5 numbered steps from the pre-c69a7bc SKILL.md. Not a hallucination.

- timestamp: 2026-05-11T05:52:00Z
  checked: 1.9.8 and 1.9.9 SKILL.md content via `wc -l` and `head`
  found: 1.9.8 SKILL.md = 187 lines, has c69a7bc rewrite (pick-spec/AskUserQuestion present) but lacks 26-06 atomic patches (no "Cancel handling" subsection, no "forced picker rule", D7 row uses old wording). 1.9.9 SKILL.md = 205 lines, has all 26-06 patches.
  implication: Even if the resolver had picked 1.9.8, the behavior would be ~90% correct (would call pick-spec, would fire AskUserQuestion). Cancel-handling edge cases would be wrong but the catastrophic `sleep 10` race would not appear. The only way to get the observed failure is loading from 1.9.7 (or an even earlier cache).

- timestamp: 2026-05-11T05:53:00Z
  checked: Marketplace clone `~/.claude/plugins/marketplaces/cplugs/plugins/spt/skills/live/SKILL.md`
  found: Contains full 26-06 patched content (205 lines, AskUserQuestion + pick-spec + Cancel handling). Marketplace is correct.
  implication: Source-of-truth is good. The bug is exclusively in how CC selects WHICH cache version to load skills from.

- timestamp: 2026-05-11T05:54:00Z
  checked: DEPLOY.ps1 keep-set logic (lines 494-535)
  found: Keep-set = `{$Version (new), $prevVersion (from installed_plugins.json), $prevInstallPath-leaf}`. When deploying 1.9.9 with `installed_plugins.json` showing prev=1.9.8 installPath=`...\1.9.8`, keep-set = `{1.9.9, 1.9.8, 1.9.8}` = `{1.9.9, 1.9.8}`. 1.9.7 is a prune candidate. `Remove-Item -Recurse -Force` runs against it.
  implication: 1.9.7 SHOULD have been fully removed. The fact that `owl.exe` survives indicates the recursive delete hit a sharing violation (running process). Per DEPLOY.ps1 line 528: `"Could not remove ... (likely sharing violation -- owl.exe still executing from it; next deploy will retry)"`. **Deploy script's partial-cleanup-on-locked-file behavior left a half-pruned 1.9.7 directory that CC then picked up as still-valid skill source.**

- timestamp: 2026-05-11T05:54:30Z
  checked: `~/.claude/settings.json` env block
  found: `OWL` and `LIVE` env vars hardcoded to `cplugs/spt/1.9.8/owl.exe`. These were written by a SessionStart hook in some earlier session — but the failing session's SessionStart wrote `HANDOFF: trampoline -> ...\1.9.8\owl.exe` (per JSONL line 3). The binary path is 1.9.8, but the SKILL path is 1.9.7. **They diverged.**
  implication: CC's command-name resolver for `/spt:live` is NOT consulting `installed_plugins.json` `installPath`. It's using some independent mechanism — most likely directory-scan-then-version-sort, or a stale internal cache mapping `spt:live` → 1.9.7 path captured before 1.9.8/1.9.9 even existed.

## Resolution

root_cause: **Three-way version desynchronization caused by partial cleanup of obsolete cache version.**

  1. The marketplace clone, source, and 1.9.9 cache all correctly contain the post-26-06 SKILL.md with `AskUserQuestion`/`pick-spec` flow.
  2. `installed_plugins.json` still pins `spt@cplugs` to 1.9.8 because `claude plugin install spt@cplugs` is a no-op when ANY version of `spt@cplugs` is already installed — it does not rewrite the version pointer. The `lastUpdated` field gets touched, but `version`/`installPath`/`gitCommitSha` do not.
  3. The 1.9.7 cache directory survived the deploy's targeted prune: `Remove-Item -Recurse -Force` failed on the in-use `owl.exe` (sharing violation), but successfully deleted `skills/` and `hooks/`. The deploy script logs this as a warning and continues. Result: a "half-pruned" 1.9.7 directory.
  4. Despite `installed_plugins.json` pointing to 1.9.8, **CC's internal skill-name resolver loaded `spt:live`'s SKILL.md from the 1.9.7 path** at session start `2026-05-11T05:21:16Z`. The exact resolver mechanism is opaque (CC 2.1.138 internal), but the JSONL "Base directory for this skill" line proves it. Almost certainly an in-memory or on-disk skill-path cache CC built before the deploy and never invalidated by `/reload-plugins`.
  5. At that moment 1.9.7's `skills/live/SKILL.md` must have still existed (CC successfully loaded it). The `skills/` directory was deleted *during* the deploy at 20:39 PDT, but the file may have been read into a CC cache at session-start time and the directory deleted afterward — OR (more likely) CC had a stale skill-path → file-content cache populated from a prior session and never re-checked the filesystem.

  The cascading mistakes:
  - `claude plugin install spt@cplugs` no-op on "already installed" hides version drift from the user.
  - Deploy script's partial-cleanup-on-sharing-violation leaves stale `skills/` content discoverable for any CC session that built its skill cache before the deploy.
  - `/reload-plugins` apparently does not invalidate the skill-name → path cache (it touched `installed_plugins.json` lastUpdated at 01:57:23Z but did not flip version, and did not redirect skill resolution).

  Diagnosis: H1 (stale pointer) is partially correct; H3 (stale in-memory skill) is the right mechanism but with a stale on-disk cache as the source, not a stale session.

fix: **Immediate (this regression):**
  1. Force-flip `installed_plugins.json` to 1.9.9. Edit `C:\Users\decid\.claude\plugins\installed_plugins.json` → set spt@cplugs `version: "1.9.9"`, `installPath: "C:\\Users\\decid\\.claude\\plugins\\cache\\cplugs\\spt\\1.9.9"`, `gitCommitSha: <hash of current main>`, `lastUpdated: <now>`.
  2. Fully remove the 1.9.7 directory: `Remove-Item -Recurse -Force C:\Users\decid\.claude\plugins\cache\cplugs\spt\1.9.7`. If owl.exe sharing violation re-occurs, terminate any running owl.exe processes whose path resolves into 1.9.7 first (`taskkill /F /IM owl.exe` won't be selective; safer: enumerate via `Get-Process | Where-Object Path -like "*\1.9.7\*"`).
  3. In CC: run `/reload-plugins` AGAIN in the affected session (and any other live sessions). If the resolver-cache theory is correct, this should pick up 1.9.9's SKILL.md.
  4. If `/reload-plugins` still doesn't fix it, close + restart the Claude Code app (cold start invalidates any in-process skill cache).

  **Permanent (preventing recurrence) — deploy-script hardening:**
  - Before prune, kill any owl.exe processes running from prune candidates. PowerShell snippet: `Get-Process owl -ErrorAction SilentlyContinue | Where-Object { $_.Path -like "$CacheRoot\*" -and (Split-Path -Leaf (Split-Path -Parent $_.Path)) -in $pruneCandidates.Name } | Stop-Process -Force`.
  - After prune, verify each pruned directory is gone; if any remain (sharing violation), at minimum delete the `skills/` and `hooks/` subdirs of the keep-set's *previous* version too — so even a stale resolver pointing into an older version finds nothing rather than stale content. Better: rename the half-pruned dir to `.orphan-<timestamp>` so the directory name no longer matches CC's semver scan.
  - Replace the `claude plugin install spt@cplugs` no-op with an explicit installed_plugins.json patch that writes the new version, installPath, gitCommitSha, and lastUpdated atomically. The deploy already builds `$plugCacheVer` and `$Sha`; use them. (Re-introducing the JSON-patch step that DEPLOY comment line 578-581 says was previously replaced — but this time without PowerShell's JavaScriptSerializer, e.g. using `ConvertFrom-Json` / `ConvertTo-Json -Depth 10`.)
  - Bump the plugin version source-of-truth check: after deploy, verify `installed_plugins.json[spt@cplugs].version == $Version`. If not, fail loud — don't let drift hide.

verification: Manual — run `/spt:live` in a fresh CC session after applying the immediate fix and verify it calls `$LIVE pick-spec` and fires `AskUserQuestion` (or `ROOT CAUSE FOUND` if user instructs no-fix).

files_changed: [] (diagnosis-only; no code changed by debugger)

## Specialist Hint

`general` — root cause is filesystem state + deploy-script behavior, not language-specific. Fix involves PowerShell deploy hardening and a manual JSON edit. No Rust/TypeScript/Swift expertise needed.
