---
slug: sync-setup-data-loss
status: resolved
resolution: "Issues 1-2 fixed in Phase 35.2 (data-loss + idempotency); Issues 3-6 fixed in Phase 35.3 (error Display, doctor partial-state Warn row, exit-1 doc, inline recovery doc). Verified 2026-05-29."
resolved_in: [35.2, 35.3]
trigger: |
  Six interconnected failures in `psyche-sync-setup` skill when attaching a second machine
  to an existing agent repository. Reported via https://paste.rs/47EE5.

  Issue 1 (HIGH — Data Loss Risk): Setup uses force-push semantics for agent branches while
  rejecting `main` with normal semantics. Agent branches reported as `[new branch]`
  (indicating force behavior) despite remote copies existing at different SHAs. Reporter's
  local agents had last-run timestamps from first machine (GRAVITY-RUNNER), meaning the
  remote `a-adder` / `a-slammie` heads held GRAVITY-RUNNER's most recent writes — and they
  were just overwritten.

  Issue 2 (MEDIUM — Idempotency): Each machine generates independent empty-seed commit on
  `main`, causing non-fast-forward rejection when second machine attempts to push. Setup
  creates the divergence it then refuses to resolve.

  Issue 3 (MEDIUM — Recovery Path): No documented recovery for exit code 1; skill docs
  cover codes 0/2/3/4/5 only. Manual workaround requires `git update-ref` against bare repo.

  Issue 4 (LOW — Error Format): Error messages leak Rust `Debug` output
  (`GitFailed(Nonzero { stderr: "..." })`) rather than user-facing display formatting.

  Issue 5 (LOW — Documentation Gap): Exit code 1 absent from skill docs.

  Issue 6 (LOW — Silent Partial State): Doctor reports "sync: not configured" even after
  five successful branch pushes, obscuring partial success state.

  Reporter recommends: address Issue 1 first (data loss), then Issue 2 (safe second-machine
  attach), then bundle 3–6 into UX pass.
created: 2026-05-28
updated: 2026-05-28
priority: HIGH
---

# Debug Session: sync-setup-data-loss

## Symptoms

- **Expected:** `psyche-sync-setup` on second machine should non-destructively attach to existing remote agent repo, preserving prior-machine head commits on agent branches.
- **Actual:** Force-push overwrites remote agent branches (`a-adder`, `a-slammie`, …) with local-machine SHAs, destroying first-machine writes. `main` push rejected non-FF due to independent empty-seed commits.
- **Errors:**
  - Branch push reports `[new branch]` despite remote presence at divergent SHAs (force semantics)
  - `main` push rejected non-fast-forward
  - Error surface leaks Rust Debug repr: `GitFailed(Nonzero { stderr: "..." })`
  - Exit code 1 from skill — undocumented (docs cover 0/2/3/4/5)
- **Timeline:** Reproduces on second-machine attach to pre-existing agent storage repo. Not a regression of a previously working flow — appears to be initial-implementation defect.
- **Reproduction:**
  1. Run `/spt:psyche-sync-setup` on machine A — creates remote, pushes branches.
  2. Operate agents on machine A so remote agent-branch heads advance.
  3. Run `/spt:psyche-sync-setup` on machine B with existing remote — observe force-overwrite of agent branches + main non-FF rejection.

## Current Focus

reasoning_checkpoint:
  hypothesis: |
    `accept_flow` in `src/common/sync.rs` performs zero reconciliation against pre-existing remote state before its single seeding push. `ensure_seed` in `src/common/tracked.rs` produces non-deterministic per-machine root commits on `main`. Together these create the data-loss surface (Issue 1) and the main-divergence non-FF (Issue 2). No `--force` is in use anywhere — the reporter's force-push framing is incorrect at the flag level but correct about the impact surface.
  confirming_evidence:
    - "sync.rs:561-568 — single `git push --all origin` with no --force and no prior fetch/clone"
    - "sync.rs:489-515 — gh repo create idempotency on 'already exists' falls through with NO read of existing remote refs"
    - "tracked.rs:238-250 — commit-tree bootstrap takes no -p parent and no --date lock, yielding fresh wall-clock SHA per machine"
    - "tracked.rs:418-431 — agent branches `a-{id}` are created from local seed's `main`, rooting them in the machine-local empty bootstrap"
    - "Grep of src/ for force/+refs returns zero matches in push paths — only force_kill_process (unrelated)"
  falsification_test: |
    If reconciliation existed: there would be a `git fetch origin` call before the `push --all` in accept_flow, or an `origin/{branch}` rev-parse comparison loop. None exists.
    If determinism existed: `commit-tree` invocation would carry GIT_COMMITTER_DATE / GIT_AUTHOR_DATE env-overrides or `-c committer.date=...`. None present.
  fix_rationale: |
    Issue 1 — insert `git fetch origin` before push; replace `--all` with per-branch refspec push; reconcile divergent refs via `rebase -X theirs origin/{branch}` (matches pull_branch policy) before pushing. Per-ref outcome observability lets the dispatcher report which refs were rescued vs which still need user attention.
    Issue 2 — lock the bootstrap commit date (option A) so both machines' `tracked/seed/` converge on the same `main` SHA. Alternative (option B) is pull-then-push of `main` during accept_flow.
    Issue 4 — flip `{:?}` to `{}` in psyche_sync_setup.rs:70; Display impl already correct.
    Issue 6 — doctor surface needs evidence beyond `SyncState`; check for origin-remote configured + ls-remote ok when state=Unset.
  blind_spots:
    - "Have not reproduced the reporter's `[new branch]` literal output. Plausible that GitHub's per-repo policies or branch-protection state explain it; have not validated."
    - "Have not measured whether the push --all in step 5 actually fails partial-success on real GitHub for this exact divergence pattern. Hypothesis: per-ref non-FF rejection with overall non-zero exit; not verified."
    - "Option A (deterministic bootstrap) requires migration of existing installs with wall-clock bootstrap commits — that surface not yet designed."
    - "Did not check whether `gh auth setup-git` (called Step 2) sets any push config that could alter behavior."
next_action: |
  Diagnosis complete. Diagnose-only mode — do not apply fix. Return ROOT CAUSE FOUND report to parent. Fix work should be queued as a Phase plan via `/gsd:plan` covering Issue 1 + Issue 2 as primary, Issues 3-6 as UX-pass bundle.

## Evidence

- timestamp: 2026-05-28 (investigation pass)
  checked: `plugin/spt/skills/psyche-sync-setup/SKILL.md` (declared flow)
  found: Skill body delegates to `$OWL psyche-sync-setup`. Mentions "seeds the remote with `git push --all`". Skill is `argument-hint: "[--disable]"`, no `--force`/`--mirror` knob; exit-code surface documents 0/2/3/4/5 with no 1 entry.
  implication: All real push semantics live in the binary, not the skill. Recovery doc gap (Issue 5) confirmed at skill layer.

- timestamp: 2026-05-28
  checked: `src/owl/psyche_sync_setup.rs::run` (dispatch)
  found: Calls `sync::accept_flow(&user)`. On `Err(_)` other than `ScopeFallbackToBrowser`, runs `eprintln!("sync setup failed: {:?}", e); std::process::exit(1);` (lines 69-72).
  implication: Confirms Issue 4 (Debug repr leak — `{:?}` on `SyncError`, which wraps `GitError::Nonzero { stderr }`, surfacing the literal Rust struct syntax to users). Confirms Issue 5 (exit code 1 is the generic accept_flow failure, not in SKILL.md docs).

- timestamp: 2026-05-28
  checked: `src/common/sync.rs::accept_flow` (steps 1-6, lines 482-600)
  found: Step 5 (line 561-568) issues `git -C {seed} push --all origin` with NO `--force` flag, NO refspec, NO pre-fetch, NO pre-clone. The ONLY push in the setup path. Step 1 (line 489-515) handles `gh repo create` and explicitly FALLS THROUGH on "already exists" stderr (line 512: idempotency for repo creation). There is NO step that clones, fetches, pulls, or otherwise reconciles the local seed against pre-existing remote refs before push.
  implication: ROOT CAUSE for Issue 1 + Issue 2. The pre-existing remote state is completely ignored. Setup assumes local seed is the authoritative source and seeds the remote from scratch — which is true for first-machine attach but catastrophic for second-machine attach.

- timestamp: 2026-05-28
  checked: `src/common/tracked.rs::ensure_seed` (cold-path bootstrap, lines 198-260)
  found: Bare seed is initialized via `git init --bare`, then `main` is created by composing a fresh empty tree + `commit-tree` with bootstrap identity `spt-bootstrap <spt@local>`, message `"init: tracked seed"`, then `update-ref refs/heads/main {commit}` + `symbolic-ref HEAD refs/heads/main`. The commit-tree call (line 238) takes no parent (`-p`) — it's a root commit. Committer date defaults to wall-clock. NO `--date` lock, NO env-var override.
  implication: ROOT CAUSE for Issue 2. Each machine's bootstrap commit on `main` has a different committer timestamp → different SHA. Two machines' `main` branches are unrelated histories (no common ancestor). `git push origin main` from machine B is non-fast-forward by construction.

- timestamp: 2026-05-28
  checked: `src/common/tracked.rs::ensure_worktree` first-pass (lines 410-431)
  found: Agent/project worktree creation runs `git worktree add ../{scope}/{name} -b {branch} main`. The new agent branch `a-{id}` is created from local seed's `main` HEAD. On machine B, that HEAD is machine-B's empty bootstrap commit. Subsequent writes (commune/signoff) commit on top of THAT root.
  implication: ROOT CAUSE for Issue 1 (mechanism). Machine B's `a-adder` branch is rooted in a different empty commit from machine A's `a-adder`. They share NO history. Any `push origin a-adder` from machine B is non-FF vs the remote machine-A history.

- timestamp: 2026-05-28
  checked: `src/common/sync.rs::push_branch` (post-commit path, lines 311-319)
  found: `git -C {wt} push origin {branch}` — also NO `--force`. Used by `sync_after_commit`, which is post-Enabled. Both push call sites (setup `push --all` and runtime `push_branch`) are non-force.
  implication: NO code path applies `--force` or `+refspec`. Reporter's claim of "force-push for agent branches" is technically inaccurate at the flag level — but the OBSERVED data-loss risk is real via a different mechanism. Whatever `[new branch]` the reporter saw must come from one of: (a) git's local view (no `refs/remotes/origin/{br}` cache) labelling the push as "new" even though the server has a ref; (b) the remote `a-*` refs were transiently absent (e.g., manual recovery between attempts); (c) misread of partial output. Substantively the bug stands: setup unconditionally pushes local heads at a divergent remote, and the only thing keeping data alive is git's server-side non-FF rejection — fragile, not designed.

- timestamp: 2026-05-28
  checked: `src/common/sync.rs::SyncError` Display impl (lines 400-413) vs dispatch error print (psyche_sync_setup.rs:70)
  found: SyncError HAS a thoughtful Display impl ("git push failed during seeding: {}", etc.) but the dispatch uses `{:?}` (Debug), bypassing it entirely.
  implication: Issue 4 fix shape is one-character: `{:?}` → `{}`. Display impl is already correct and user-facing.

- timestamp: 2026-05-28
  checked: `src/owl/doctor.rs::check_sync_status` (line 1130-1145)
  found: Doctor sync surface keys ONLY off `SyncSettings.state`. If `accept_flow` succeeds at push but fails at `write_sync_settings` (Step 6 line 597), state stays `Unset` → doctor reports "not configured" despite a fully-seeded remote.
  implication: Issue 6 root cause — doctor cannot detect partial-success states because the "configured" signal and the "remote populated" reality have no shared evidence path. The settings.json write is the sole ground truth.

## Eliminated

- hypothesis: "Setup uses explicit `--force` flag for agent branches and plain push for main."
  evidence: Grepped all of `src/` for `force` / `+refs` / `--force` / `push.default`. Zero force semantics anywhere in either `accept_flow` setup push or `push_branch` runtime push. Only matches are `force_kill_process` (unrelated, subprocess management).
  timestamp: 2026-05-28
  note: Reporter's "force-push" framing is incorrect at the technical level. The actual mechanism (divergent histories, no pre-fetch, all-or-nothing push --all) still produces a real, serious data-loss surface — but the fix shape is different from what "remove --force" would suggest.

## Resolution

### root_cause (Issue 1 — Data Loss Risk on second-machine attach)

`src/common/sync.rs::accept_flow` performs zero remote-state reconciliation before its single seeding push. Concretely:

1. Step 1 (line 489-515): `gh repo create` succeeds OR is treated as idempotent when stderr contains "already exists". In the "already exists" branch (second-machine attach), code falls through to Step 4-5 with NO action taken against the existing remote content.
2. Step 4 (line 534-559): Wires `origin` URL on every local worktree.
3. Step 5 (line 561-568): `git -C {seed} push --all origin` — pushes every local ref. NO prior `git fetch origin`, NO `git clone`, NO ref comparison.

For a second-machine attach where the remote already holds first-machine writes on `a-{agent_id}` branches and on `main`, the local seed on machine B has independently-bootstrapped histories (see Issue 2 mechanism). The push attempt is non-FF for every ref present on both sides. Outcome depends on git's server-side policy:

- If GitHub server enforces `receive.denyNonFastForwards` (default for protected branches): push is rejected with non-FF for the divergent refs. `git push --all` is per-ref, so some refs (those only on the client side) succeed while divergent refs are rejected. Overall exit code is non-zero → `accept_flow` returns `Err(SyncError::GitFailed)` → dispatcher prints Debug-repr error and exits 1. Remote state is partially corrupted (new client-only branches added, divergent ones rejected).
- If a future code path (refactor, branch-protection-off, or operator-added `--force-with-lease` config) lets the push through, the divergent remote refs are silently overwritten — full data loss on first-machine writes.

The defensive posture relies entirely on git's default server-side check; it is NOT designed-in. The skill's contract ("idempotent — running again when already enabled reports current status") is violated for the cross-machine case.

### root_cause (Issue 2 — Independent empty-seed commits cause non-FF on main)

`src/common/tracked.rs::ensure_seed` cold path bootstraps `main` by:
- Hashing an empty tree (line 229-235)
- `commit-tree {tree} -m "init: tracked seed"` with synthetic identity (line 238-250) — NO `-p` parent, NO `--date` lock, NO deterministic content

Each machine performs this bootstrap independently against its own `$SPT_HOME`. The committer date defaults to wall-clock-at-bootstrap-time, so the resulting commit SHA differs per machine. Two machines' `main` branches are unrelated root commits with no common ancestor. The setup-time `push --all origin` therefore cannot fast-forward `main` from either side.

Setup CAUSES the divergence (independent bootstrap) and then REFUSES to resolve it (no fetch + rebase + retry).

### proposed_fix_shape (do NOT implement — gate through GSD)

**Issue 1 fix shape (high-level — split into a phase plan):**

1. Insert a remote-state probe BEFORE Step 4-5 in `accept_flow`. After `gh repo create` resolves (either success or "already exists" fall-through), run `git -C {seed} fetch origin` to populate `refs/remotes/origin/*`.
2. For each local ref in `seed/refs/heads/`:
   - If `refs/remotes/origin/{branch}` exists AND is NOT an ancestor of the local head → THIS is the data-loss surface. Branches:
     - (a) Local has commits not on remote AND remote has commits not on local → diverged; reconcile via `git rebase -X theirs origin/{branch}` (matches the post-commit pull policy) OR surface to user via AskUserQuestion with explicit "your local writes will be replayed on top of the remote — proceed?" prompt.
     - (b) Remote head is descendant of local head (local is behind) → fast-forward local first (no push needed yet).
     - (c) Local head is descendant of remote head (local is ahead) → safe push, no action.
   - If `refs/remotes/origin/{branch}` does not exist → safe to push as new branch.
3. After reconciliation, perform the push with explicit per-ref refspecs (`git push origin {branch}:{branch}`) rather than `--all`, so each ref's outcome is observable and a single divergent ref doesn't poison the whole batch.
4. Replace bare `push --all` with per-branch loop. Stops the all-or-nothing semantics.

**Issue 2 fix shape:**

The root commit must be deterministic across machines so all `tracked/seed/` bootstraps converge on the same `main` SHA. Two options:

- **(A) Deterministic empty-seed bootstrap (preferred — least new surface):** in `ensure_seed`, pass fixed `--date=1970-01-01T00:00:00Z` (or any locked epoch) via `-c committer.date=...` and `-c author.date=...` envvars (`GIT_COMMITTER_DATE` / `GIT_AUTHOR_DATE`) to the `commit-tree` invocation. Combined with the already-locked author/committer identity and the locked empty-tree content, this yields a byte-for-byte identical commit SHA on every machine. The `main` branches converge. Backward compat: existing installs already have a wall-clock-dated bootstrap commit; for those, `accept_flow` would need a one-time rewrite path (rewrite seed's `main` to point at the deterministic bootstrap SHA, OR detect divergence-at-bootstrap and resolve via Issue 1's reconciliation logic).

- **(B) Pull-then-push for main during accept_flow:** before any push, if remote has `main`, fetch it and reset local `seed/main` to track it. New worktrees created subsequently inherit the remote's `main`. This sidesteps the SHA divergence by surrendering machine B's empty bootstrap. Simpler to implement, but creates a one-way migration semantic and depends on remote `main` always being present (which it always is once first-machine attach completed).

(A) is the clean fix; (B) is the surgical fix that doesn't touch the bootstrap path. Recommendation: pursue (A); fall back to (B) if (A) is gated on broader refactor cost.

### proposed_fix_shape — deferred Issues 3, 4, 5, 6 (UX-pass phase, bundle)

- **Issue 4 (Debug-repr leak):** `src/owl/psyche_sync_setup.rs:70` — change `eprintln!("sync setup failed: {:?}", e)` to `eprintln!("sync setup failed: {}", e)`. `SyncError` already has the Display impl (`src/common/sync.rs:400-413`). One-character edit.

- **Issue 5 (Exit code 1 missing from SKILL.md):** add a bullet under "Interpret the exit code" in `plugin/spt/skills/psyche-sync-setup/SKILL.md`: `**1** — generic setup failure. Surface the error line to the user and offer to run \`$OWL doctor\` for diagnostics.`

- **Issue 3 (Recovery for exit 1):** doc-only addition. Once Issue 1 fix lands, the per-branch loop yields per-ref outcomes — the dispatcher can print which refs were divergent and how to recover (`git update-ref` manual instructions are unnecessary if reconciliation is automatic).

- **Issue 6 (Silent partial state in doctor):** in `src/owl/doctor.rs::check_sync_status`, after the `state`-keyed match, add a probe: if `state == Unset` BUT `seed/.git/config` shows an `origin` remote configured AND `git ls-remote origin` succeeds, surface a Warn row: "partial setup detected — origin configured but state=Unset (likely accept_flow failed at settings write); re-run /spt:psyche-sync-setup to converge." Better mid-term: record an `accept_flow_attempt_ts` in SyncSettings BEFORE the push, so doctor can distinguish "never attempted" from "attempted but write failed".

### files_changed

(none — diagnose-only mode; fix application gated through GSD per CLAUDE.md)

### verification

(deferred — no fix applied)
