# Local build + direct Fly deploy runbook

[doc->REQ-DEP-04] [doc->REQ-DEP-08] [doc->REQ-CLI-08]

Operator runbook for building the REBNO server image and Vite client bundle
**on the operator's machine** and deploying both directly to Fly.io, replacing
the prior GitHub Actions full-path workflow.

## When to use this

Use this runbook for **every** rebno-staging and rebno-prod deploy from 06.7-09
onward. The prior `.github/workflows/deploy-staging.yml` full-path build is
**out of service** — GitHub Actions storage quota was exhausted on 2026-05-17
and the project will not be refilling it. The fast-path (client-only) workflow
in the same file remains decommissioned alongside it; Playwright post-deploy
smoke is dropped indefinitely.

## Pre-flight

Confirmed once per machine, then trusted until something changes:

- **`flyctl` installed and authenticated.** `flyctl auth whoami` returns the
  same account that owns rebno-staging / rebno-prod. On Windows + Git Bash,
  flyctl ships its own SSH transport so the system `ssh` is not used.
- **Docker installed and running OR Fly remote builders available.** Prefer
  Docker + `flyctl deploy --local-only` when Docker is installed. If Docker is
  unavailable on the operator machine, use `flyctl deploy --remote-only`
  instead. That still deploys directly from the local checkout, but the image
  is built on Fly's remote builder.
- **Node 22 + pnpm 10.** The workspace currently uses pnpm 10.x and Node engines in
  `package.json`. `pnpm install --frozen-lockfile` must succeed cleanly.

## Preflight gate (MANDATORY)

[doc->REQ-MAP-15]

`pnpm preflight` MUST exit 0 before any `flyctl deploy` invocation below. The chain enforces the v1.1 convention locks and substitutes for the decommissioned GitHub Actions CI:

1. **LDtk version pin assertion** — `node -e` reads `tools/ldtk-version.txt`, asserts non-empty, echoes the pinned value (`LDtk pin: <version>`) to stdout.
2. `pnpm lint:room-layout` — room-layout schema-union drift guard.
3. `pnpm gate:no-inline-origin` — D-63 inline origin math gate (HYG-01).
4. `pnpm trace:check` — `traceable-reqs.toml` coverage.
5. `pnpm lint:no-req-placeholders` — `REQ-...-XX` placeholder drift-lock (HYG-02).
6. `pnpm lint:atlas-hash` — tileset PNG hash drift (active from Phase 8).
7. `node tools/scripts/check-conversion-regression.mjs` — converter round-trip (active from Phase 9).

First failure aborts the chain. Operator iterates one check at a time per CONTEXT.md Phase 7 D-14. Operator MUST also update `tools/ldtk-version.txt` when bumping LDtk locally per ADR 0012.

## Staging deploy

Sketch — substitute `${SHA}` with the current `HEAD` git SHA (lowercase
40-hex). Every step is idempotent except the symlink swap at the end.

For client-only changes, skip steps 1-2 when `flyctl status -a rebno-staging`
already shows the machine `started` and `/health` returns 200. Start at step 3,
build the Vite bundle from the current checkout, and still release it under the
current git SHA so `/data/client-assets/releases/${SHA}` maps to the commit UAT
is testing.

```bash
# 0. Sanity. Working tree clean, HEAD == origin/main, tests green.
SHA=$(git rev-parse HEAD)
test -z "$(git status --porcelain)" || { echo "dirty tree — refuse"; exit 1; }
pnpm install --frozen-lockfile
pnpm trace:check

# 1. Build the server image + push to Fly registry + deploy the machine.
#    Image tag mirrors the SHA so we can dual-tag for prod promotion later.
flyctl deploy \
  -a rebno-staging \
  --config apps/server/fly.staging.toml \
  --dockerfile apps/server/Dockerfile \
  --image-label "$SHA" \
  --local-only                              # requires local Docker

# If Docker is unavailable on the operator machine, use this equivalent
# direct-from-checkout deploy instead:
flyctl deploy \
  -a rebno-staging \
  --config apps/server/fly.staging.toml \
  --dockerfile apps/server/Dockerfile \
  --image-label "$SHA" \
  --remote-only

# 2. WAIT for the staging machine to be `started` + healthy BEFORE the next
#    step. flyctl deploy returns when the deploy record lands; the VM may
#    still be starting. The CI workflow's race-condition on this step is the
#    reason this runbook exists.
until flyctl status -a rebno-staging | grep -q 'started' \
  && curl -fsS https://rebno-staging.fly.dev/health >/dev/null; do
  sleep 2
done

# 3. Build the client bundle in staging mode. Vite writes into
#    apps/server/public/ (cleared first — this is also why public/.gitignore
#    + public/.gitkeep have to be restored after every deploy via
#    `git checkout apps/server/public/.gitignore apps/server/public/.gitkeep`).
pnpm --filter @rebno/client build:staging

# 4. Tar the static output. The path layout inside the tarball MUST match
#    what scripts/client-release.sh expects (`./index.html`,
#    `./.vite/manifest.json`, `./assets/*`).
tar -czf "client-assets-${SHA}.tgz" -C apps/server/public .

# 5. Upload the tarball + the release script. On Windows + Git Bash you MUST
#    prefix flyctl with MSYS_NO_PATHCONV=1 or the leading `/tmp/...` arg gets
#    mangled into a Windows C:\Users\... path. On Linux / macOS this is a
#    no-op.
MSYS_NO_PATHCONV=1 flyctl ssh sftp put \
  "client-assets-${SHA}.tgz" \
  "/tmp/client-assets-${SHA}.tgz" \
  -a rebno-staging
MSYS_NO_PATHCONV=1 flyctl ssh sftp put \
  scripts/client-release.sh \
  "/tmp/client-release-${SHA}.sh" \
  -a rebno-staging

# 6. Run the release script ON the Fly machine. Prefer `machine exec` over
#    `ssh console -C` on Windows: `ssh console` can run the command
#    successfully and then exit non-zero with "The handle is invalid".
#    It extracts the tarball under /data/client-assets/releases/${SHA}/,
#    sanity-checks index.html + .vite/manifest.json, atomically swaps
#    /data/client-assets/current to the new release, GCs old releases beyond
#    KEEP_LAST=5, and removes the uploaded tarball.
MACHINE_ID="48e0dedbde42e8"  # replace with current ID from `flyctl status -a rebno-staging`
MSYS_NO_PATHCONV=1 flyctl machine exec "$MACHINE_ID" -a rebno-staging \
  "sh -c 'bash /tmp/client-release-${SHA}.sh ${SHA}'"

# 7. Verify. The hashed JS in /data/client-assets/current/assets/ MUST match
#    the local apps/server/public/assets/ filename.
LOCAL_JS=$(ls apps/server/public/assets/ | grep -E '^index-.*\.js$' | head -1)
MSYS_NO_PATHCONV=1 flyctl machine exec "$MACHINE_ID" -a rebno-staging \
  "sh -c 'ls /data/client-assets/current/assets/'" | grep -q "$LOCAL_JS"
curl -fsS https://rebno-staging.fly.dev/health

# 8. Clean up local build artifacts. Vite wipes public/ as a side effect, so
#    the two tracked files (.gitignore, .gitkeep) need restoring.
rm -f "client-assets-${SHA}.tgz"
git checkout apps/server/public/.gitignore apps/server/public/.gitkeep
git clean -fdx apps/server/public/
```

## Prod deploy

Identical to staging with two substitutions:

- `--config apps/server/fly.prod.toml`
- `-a rebno-prod` everywhere

The image-tag promotion path (re-tagging a staging-tested image instead of
rebuilding) lived inside the CI workflow's dual-tag step; it is **not**
supported from this runbook. Prod always rebuilds from the same SHA the
operator just verified on staging. The trade-off is ~3 min of extra image
build vs. the indirection of re-tagging through a now-defunct CI surface.

## Why not GitHub Actions?

- **Storage quota exhausted** (2026-05-17). The free-tier 0.5 GB ceiling was
  hit by Docker layer cache + Playwright trace artifacts. The project is not
  upgrading to the paid tier and is not investing in a per-deploy GC of CI
  artifacts.
- **VM-start race in the full-path workflow.** Steps 21+ of
  `deploy-staging.yml` did `flyctl ssh sftp put` immediately after
  `flyctl deploy`, which races with VM start and fails with
  `app rebno-staging has no started VMs` ~30 % of the time. The CI workflow
  did not include a status-poll. This runbook's step 2 checks both Fly machine
  state and `/health`; `/health` is the stronger readiness gate because Fly
  status can temporarily report warnings while the app is already serving.
- **Playwright smoke is dropped.** The post-deploy two-client CLI-08 smoke
  test in the CI workflow needed both a browser bundle (~600 MB Chromium
  download) and trace artifact storage. With CI dead the smoke moves to
  manual HUMAN-UAT (`.planning/phases/06.7-network-model-client-trust-fall-trigger/06.7-HUMAN-UAT.md`).

## Pitfalls

- **MSYS path mangling** — see step 5. Without `MSYS_NO_PATHCONV=1`, Git Bash
  on Windows rewrites every absolute path starting with `/` into a Windows
  drive path before passing the arg to `flyctl.exe`.
- **Vite wipes apps/server/public/** — see step 8. `apps/server/public/`
  contains two tracked files (`.gitignore`, `.gitkeep`) plus build output;
  vite's `emptyOutDir: true` clears the whole directory each build. Restore
  the tracked files after every deploy or `git status` will show them as
  deletions.
- **Symlink swap MUST run.** The Fly image only ships `apps/server/public/`
  baked in; the machine reads from `/data/client-assets/current` via the
  STATIC_ASSETS_DIR resolver. Skipping step 6 leaves staging serving the
  previous SHA's bundle while the new image-side server-side code runs —
  exactly the "stale Phase 06.5 bundle" failure mode captured in
  `.claude/projects/.../memory/fly-fullpath-deploy-data-symlink-gap.md`.
- **fly-staging machine sleeps.** If `flyctl status` shows `stopped` before
  step 1, run `flyctl machine start <machine-id> -a rebno-staging` first.
  `flyctl deploy` will wake a stopped machine but the VM-start race in
  step 2 is then guaranteed to fire.
- **Windows `flyctl ssh console -C` false failure.** On Windows, `ssh console`
  can print successful remote command output and then exit non-zero with
  `The handle is invalid.` Use `flyctl machine exec <machine-id> ...` for
  release and verification commands.

## Related

- `scripts/client-release.sh` — on-machine release script invoked by step 6.
  Source of truth for the symlink-swap mechanics.
- `docs/deploy/ROLLBACK.md` — emergency rollback runbook (re-points the
  symlink to a prior `releases/${OLD_SHA}/`).
- `.planning/phases/06.7-network-model-client-trust-fall-trigger/06.7-HUMAN-UAT.md`
  — operator post-deploy verification suite (replaces Playwright smoke).
