# RESTORE.md — REBNO Operator Runbook

[doc->REQ-DEP-07] [int->REQ-DEP-07] [int->REQ-DEP-03]

Single source of truth for: per-env provisioning, cold restore from
Litestream/Tigris, point-in-time replay, Phase 4 carry-forward verification
(kill -9 / argon2 bench / multi-client smoke), secret rotation, combined
image+data rollback, legacy localList.txt ingest, and operator access
hardening.

**Linux determinism reference platform.** All `fly ssh console` commands
run on the Fly.io machine itself (Alpine Linux). Local commands assume a
POSIX shell (bash/zsh) on the operator's machine. On Windows, use WSL2 or
Git Bash for the local commands; `fly ssh console` commands run server-side
regardless.

---

## Table of Contents

1. [Prerequisites](#prerequisites)
2. [Per-Env Initial Setup (one-time per Fly app)](#per-env-initial-setup-one-time-per-fly-app)
3. [Cold Restore (<5 min target)](#cold-restore-5-min-target--dep-07-acceptance)
4. [Point-in-Time Replay](#point-in-time-replay)
5. [Phase 4 Carry-Forward Verification](#phase-4-carry-forward-verification)
   - [Test 1: SRV-08 kill -9 mid-tick](#test-1-srv-08-kill--9-mid-tick-recoverability)
   - [Test 2: argon2id prod-hardware bench](#test-2-argon2id-prod-hardware-bench)
   - [Test 3: Multi-client move+chat smoke](#test-3-multi-client-movechat-smoke)
6. [Secret Rotation](#secret-rotation)
7. [Combined Rollback (Bad Migration + Bad Image)](#combined-rollback-bad-migration--bad-image)
8. [Legacy localList.txt Ingest (D-17 ssh-sftp ritual)](#legacy-locallisttxt-ingest-d-17-ssh-sftp-ritual)
9. [Access Hardening (Fly Proxy IP Allowlist)](#access-hardening-fly-proxy-ip-allowlist)
10. [References](#references)

---

## Prerequisites

Before running any procedure in this runbook:

1. **`flyctl` installed** — `brew install flyctl` (macOS) or
   `curl https://fly.io/install.sh | sh` (Linux/WSL2). Verify with
   `flyctl version`.

2. **Authenticated** — `fly auth login` completed once. Token persists in
   `~/.fly/config.yml`. For CI, set `FLY_API_TOKEN` as a GitHub org-level
   secret (generated via `fly tokens create org`).

3. **GitHub secrets set** (CI pipeline prerequisites):
   - `FLY_API_TOKEN` — org-level; consumed by deploy-staging.yml and
     deploy-prod.yml (DEP-04).
   - `STAGING_INVITE_TOKEN` — consumed by deploy-staging.yml post-deploy
     soak step. Generate: `openssl rand -base64 24`.
   - `STAGING_WSS_URL` — GitHub variable (not secret); default value:
     `wss://rebno-staging.fly.dev`.

4. **Per-env secrets provisioned** — `BETTER_AUTH_SECRET` per app (see
   §Per-Env Initial Setup step 4). These MUST differ between staging and
   prod.

5. **`legacy/servers/enlyzeam-current/localList.txt` available locally**
   — this is the D-17 legacy credential source. It is gitignored under
   `legacy/` per CLAUDE.md Hard Rule 8. Verify it's present before
   executing §Legacy localList.txt Ingest.

6. **Operator IP known** — for staging + obs UI access hardening (D-04,
   D-15). Find your current IP: `curl -s https://api.ipify.org`.

---

## Per-Env Initial Setup (one-time per Fly app)

Run the following for each of `rebno-staging` and `rebno-prod`. Substitute
`APP` and the config path as shown. Run for `rebno-obs` separately —
follow `apps/obs/README.md §"One-time provisioning"` (different image +
env).

> **Pitfall:** Tigris bucket MUST be provisioned before first deploy.
> Litestream will fail to open its S3 replica target on boot if the bucket
> does not exist yet. Provision step 3 MUST precede any `fly deploy`.

```sh
APP=rebno-staging   # change to rebno-prod for prod
CONFIG=apps/server/fly.staging.toml  # or fly.prod.toml

# 1) Create the Fly app shell (skip if already created).
fly apps create $APP --org personal

# 2) Provision a persistent Fly Volume (10 GB; holds SQLite DB, WAL,
#    Ed25519 keys, and seed landing zone).
fly volumes create rebno_data \
  --app $APP \
  --region lax \
  --size 10

# 3) Provision a per-app Tigris S3 bucket.
#    This auto-injects AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,
#    AWS_ENDPOINT_URL_S3, and BUCKET_NAME as Fly secrets in $APP.
fly storage create --app $APP

# 4) Set required per-app secrets.
fly secrets set \
  BETTER_AUTH_SECRET="$(openssl rand -base64 32)" \
  --app $APP

# Staging only: invite-token middleware (D-04).
if [ "$APP" = "rebno-staging" ]; then
  fly secrets set \
    STAGING_INVITE_TOKEN="$(openssl rand -base64 24)" \
    --app $APP
fi

# Staging only: enable invite middleware.
if [ "$APP" = "rebno-staging" ]; then
  fly secrets set STAGING_MODE=1 --app $APP
fi

# 5) (Optional) Add HTTPS certificate for custom domain.
#    Wildcard A/AAAA records for *.rebno.decidel.com are already
#    configured in Namecheap DNS; flyctl certs add will verify
#    automatically without further DNS changes.
fly certs add rebno-staging.decidel.com --app $APP      # staging
# fly certs add rebno.decidel.com --app rebno-prod      # prod

# 6) First deploy (push-to-main fires deploy-staging.yml automatically
#    once CI is configured). For a manual first deploy:
fly deploy \
  --app $APP \
  --config $CONFIG \
  --image registry.fly.io/$APP:<git-sha>
```

For `rebno-obs`, run `fly apps create rebno-obs`, provision a volume and
Tigris bucket the same way, then follow `apps/obs/README.md §"One-time
provisioning"` for the `ZO_ROOT_USER_PASSWORD` secret and the OTel ingest
endpoint config.

---

## Cold Restore (<5 min target — DEP-07 acceptance)

**Scenario:** The Fly Volume holding `/data/rebno.db` is lost (volume
corruption, accidental deletion, or machine decommission). Tigris Litestream
replica is the recovery source.

**Target:** From `fly volumes create` through `/health` returning 200 with
DB state intact — under 5 minutes wall clock.

```sh
APP=rebno-staging
NEW_VOL=rebno_data_$(date +%s)  # unique name avoids collision

# 1) Create a fresh Fly Volume.
fly volumes create $NEW_VOL \
  --app $APP \
  --region lax \
  --size 10
echo "Volume created: $NEW_VOL"

# 2) Update fly.staging.toml (or fly.prod.toml) [[mounts]] source to
#    the new volume name, then commit + push to main. CI deploy-staging.yml
#    fires automatically. Alternatively deploy manually:
#
#      Edit apps/server/fly.staging.toml:
#        [[mounts]]
#          source = "rebno_data_<timestamp>"   ← new name
#          destination = "/data"
#
#    Then:
fly deploy \
  --app $APP \
  --config apps/server/fly.staging.toml \
  --image registry.fly.io/$APP:<last-good-sha>

# 3) The docker-entrypoint.sh restore step runs automatically on boot.
#    When /data/rebno.db is absent, it calls:
#      litestream restore -o /data/rebno.db s3://$BUCKET_NAME/rebno.db
#    pulling the latest snapshot + WAL frames from Tigris.
#    Watch progress:
fly logs --app $APP | grep -i "litestream"

# 4) Wait for /health to return 200 (Fly checks run every 10 s).
#    Poll until green:
until curl -sf "https://${APP}.fly.dev/health" > /dev/null 2>&1; do
  echo "Waiting for /health..."; sleep 5
done
echo "Server is up"

# 5) Verify data integrity inside the machine.
fly ssh console --app $APP --command \
  'sqlite3 /data/rebno.db "PRAGMA integrity_check; SELECT count(*) FROM accounts;"'
# Expected: integrity_check = ok; accounts row count matches pre-restore count.

# 6) Verify Litestream is replicating to the new volume's WAL.
fly ssh console --app $APP --command \
  'litestream snapshots s3://$BUCKET_NAME/rebno.db 2>&1 | tail -3'
# Or check fly logs for "litestream: replicating" lines.

# 7) Re-upload Ed25519 room signing key to /data/keys/.
#    Litestream replicates ONLY rebno.db; /data/keys/ is volume-local.
#    On a fresh volume the server auto-generates a new keypair, which
#    breaks signature verification on committed room layouts (rooms_loaded=0).
#    Restore the operator-side keypair so committed mvp-lobby/000.sig verifies.
MID=$(fly machines list --app $APP --json | jq -r '.[0].id')
PRIV_B64=$(base64 -w0 keys/rebno-room-signing.ed25519)
PUB_B64=$(base64 -w0 keys/rebno-room-signing.ed25519.pub.pem)
fly ssh console --app $APP --machine $MID -C "sh -c '
  mkdir -p /data/keys &&
  echo $PRIV_B64 | base64 -d > /data/keys/room_signing.ed25519 &&
  echo $PUB_B64  | base64 -d > /data/keys/room_signing.ed25519.pub.pem &&
  chmod 600 /data/keys/room_signing.ed25519
'"
fly machine restart $MID --app $APP
# Verify rooms_loaded > 0 after restart:
curl -sf https://${APP}.fly.dev/health
# Expected: {"status":"ok","ws_ready":true,"rooms_loaded":1}
```

**Drill expectation:** Record start time (before step 1) and end time
(when step 4 completes). Wall-clock target: ≤ 5 minutes. Record actual
time + integrity_check output in `.planning/phases/05-deploy/05-HUMAN-UAT.md`
Test 1.

**If restore takes >5 min:** Check Tigris network round-trip (`fly ping
rebno-staging` vs `AWS_ENDPOINT_URL_S3` latency). The snapshot size is
the dominant factor — if the DB is large, Litestream streams WAL frames
which is faster than a full copy. Ensure `litestream.yml` has `snapshot`
interval set (not just streaming frames) so the next restore uses a local
snapshot baseline.

---

## Point-in-Time Replay

Litestream retains WAL frames for the configured retention window
(`retention: 24h` in `apps/server/litestream.yml`). To recover to a
specific point in time (within the retention window):

```sh
# SSH into the machine
fly ssh console --app rebno-staging

# Inside the machine:
# List available snapshots to find the closest prior to target timestamp T.
litestream snapshots s3://$BUCKET_NAME/rebno.db

# Restore to a specific ISO-8601 timestamp (UTC).
# Replace <TIMESTAMP> with e.g. "2026-05-07T18:30:00Z"
TARGET_TS="<TIMESTAMP>"
litestream restore \
  -timestamp "$TARGET_TS" \
  -o /data/rebno.db.replay \
  s3://$BUCKET_NAME/rebno.db

# Inspect the replay DB before promoting it.
sqlite3 /data/rebno.db.replay "PRAGMA integrity_check; SELECT count(*) FROM accounts;"

# If the replay DB looks correct, stop traffic (scale to 0) before swap.
# From operator machine:
# fly scale count 0 --app rebno-staging   (removes all machines)
# Then from console:
mv /data/rebno.db /data/rebno.db.pre-replay-backup
mv /data/rebno.db.replay /data/rebno.db

# Restart machine to pick up new DB.
# fly machine restart <machine-id>
```

> **Note:** Litestream point-in-time replay is only available within the
> retention window. For events older than `retention: 24h`, you must
> restore the latest snapshot and accept data loss from the gap, or
> increase the retention period before the incident.

---

## Phase 4 Carry-Forward Verification

These three tests close Phase 4 manual verification debt on the first
`rebno-staging` deploy. Run them in order on a live staging machine with
at least one successful deploy completed. Record all results in
`.planning/phases/05-deploy/05-HUMAN-UAT.md`.

### Test 1: SRV-08 kill -9 mid-tick recoverability

**Purpose:** Verify that a POSIX SIGKILL to the Node process mid-SQLite-WAL-write
does not corrupt the database. Validates `journal_mode=WAL +
synchronous=NORMAL` survive an unclean shutdown on the actual Linux/Alpine
target (CONTEXT D-15; deferred from Phase 4 due to no Linux dev host).

**Verbatim 6-step procedure** (from
`.planning/phases/04-server-rebuild-mvp/04-09-SUMMARY.md §"Manual
Verification (Phase 5 Debt)"`):

1. **SSH into rebno-staging:**
   ```sh
   fly ssh console --app rebno-staging
   ```

2. **Connect a test client and join a room.** From a separate terminal
   on your operator machine, authenticate a test account via Better-Auth
   and connect to the Colyseus room:
   ```sh
   # Example using a scripted ws client or wscat:
   wscat -c "wss://rebno-staging.fly.dev/colyseus?invite=$STAGING_INVITE_TOKEN&token=<bearer>"
   ```
   Confirm a player join event appears in `fly logs --app rebno-staging`.
   Observe 20Hz tick messages in the logs.

3. **From the SSH session, find the Node process PID:**
   ```sh
   pgrep -f 'node dist/index.js'
   ```
   Record the PID (e.g., `PID=42`).

4. **SIGKILL mid-tick:**
   ```sh
   kill -9 $PID
   ```
   Fly will auto-restart the machine (configured: `auto_stop_machines = "off"`
   + Fly machine health checks; restart occurs within ~10-30 s).

5. **Wait for `/health` to return 200:**
   ```sh
   # From operator machine
   until curl -sf "https://rebno-staging.fly.dev/health"; do
     echo "Waiting..."; sleep 3
   done
   echo "Server back up"
   ```
   Expected: server restarts cleanly within 30 seconds.

6. **Verify database integrity and state:**
   ```sh
   fly ssh console --app rebno-staging --command \
     'sqlite3 /data/rebno.db "PRAGMA integrity_check; SELECT count(*) FROM characters;"'
   ```
   Expected: `integrity_check` returns `ok`. The `characters` table is
   queryable. Player state may be empty if the character hadn't yet been
   flushed by the heartbeat checkpoint — this is acceptable (CONTEXT D-14:
   characters flush at graceful-disconnect + SIGTERM, not on SIGKILL).
   No `database disk image is malformed` or `SQLITE_CORRUPT` errors.

**Pass criteria:** `integrity_check = ok`, server returns 200 within 30s,
no corruption errors in `fly logs`. Record time-to-200, integrity_check
output, and player-state observations in `05-HUMAN-UAT.md` Test 2.

---

### Test 2: argon2id prod-hardware bench

**Purpose:** Verify Phase 4 D-07 argon2id cost parameters
(`memoryCost=65536, timeCost=3, parallelism=4`) yield mean hash time
≥ 200ms on Fly's shared-cpu-2x hardware. Local Windows dev-box results
are meaningless here (mean was ~25ms on Win11 x64 — far too fast to
validate the prod band).

**Command:**
```sh
fly ssh console --app rebno-staging --command \
  'cd /app && N=10 node scripts/argon2-bench.mjs'
```

The script hashes a fixed payload 10 times and reports mean, stdev,
median, min, max, and an IN-BAND / TOO FAST / TOO SLOW verdict.

**Target band:** mean in `[200ms, 500ms]` (OWASP-2026 guidance for
shared-CPU 2-vCPU class machines).

**If mean < 200ms** (too fast — security parameters too weak):
- **Option A (preferred):** Bump `memoryCost: 65536 → 131072` in
  `apps/server/src/argon2-opts.ts`. Memory-cost doubling has larger
  impact than time-cost increment.
- **Option B:** Bump `timeCost: 3 → 4` (less RAM pressure; slower CPU
  per hash).
- After bumping: redeploy via push-to-main, re-run the bench, confirm
  the new mean is in-band.
- Note: existing argon2id password hashes are forward-compatible; users'
  next sign-in will be re-hashed with new params automatically (Better-Auth
  verify-and-rehash pattern).

**If mean > 500ms** (too slow — user-facing login latency unacceptable):
- Reduce `memoryCost: 65536 → 32768` OR `timeCost: 3 → 2`.
- Re-bench and confirm still ≥ 200ms.

Record mean, median, p95, final `memoryCost`, final `timeCost` in
`05-HUMAN-UAT.md` Test 3.

---

### Test 3: Multi-client move+chat smoke

**Purpose:** First end-to-end real-client traffic test against deployed
staging. Validates multi-client movement state sync, chat round-trips, and
Fly's 60s idle timeout is a non-issue with Colyseus 3s pingInterval
(DEP-08 sanity). Precursor to the automated 30-min soak (Plan 11).

**Procedure:**

1. **Authenticate two test accounts.** Use the Better-Auth REST API from
   your operator machine:
   ```sh
   # Account A
   curl -X POST https://rebno-staging.decidel.com/api/auth/sign-in/email \
     -H 'Content-Type: application/json' \
     -d '{"email":"test-a@example.com","password":"<pw>"}' \
     | jq -r '.token'
   # Save token as TOKEN_A

   # Account B (repeat with different email/password)
   ```

2. **Open two browser tabs** at:
   ```
   https://rebno-staging.decidel.com
   ```
   Or use two scripted Colyseus.js clients (from the Phase 4 integration
   harness `apps/server/test/authority.integ.test.ts` two-client setup):
   ```sh
   # From operator machine, two separate terminals:
   STAGING_WSS="wss://rebno-staging.fly.dev"
   INVITE="$STAGING_INVITE_TOKEN"
   # Connect client A: authenticate + joinOrCreate "rebno"
   # Connect client B: authenticate + joinOrCreate "rebno"
   ```

3. **Both clients join `rebno` room** via `client.joinOrCreate('rebno')`.
   Confirm both join events appear in `fly logs --app rebno-staging`.

4. **Drive movement + chat for 5 minutes:**
   - Send move intent messages from both clients at a realistic cadence
     (~60ms per input).
   - Send at least 5 chat messages from each client.
   - Observe that client A sees client B's position updates in state-diff
     broadcasts, and vice versa.
   - Observe chat messages appear on both sides.

5. **Record observations:**
   - Were there any spurious disconnects? (Check `fly logs` for
     `onLeave` events not triggered by your own disconnect.)
   - Did both clients see each other's player positions?
   - Did chat round-trip in both directions?
   - Any `RATE_LIMITED` events in logs? (Should be zero at normal
     cadence.)

**Pass criteria:** Zero spurious disconnects in 5 minutes; both clients
render the other's position and chat; no `RATE_LIMITED` at baseline
input cadence. Record disconnects, sync-ok, chat-ok in `05-HUMAN-UAT.md`
Test 4.

---

## Secret Rotation

On-demand only. No scheduled rotation in v1 (CONTEXT D-21). The procedures
below are authoritative; execute them when a secret is compromised or a
key rotation is operationally required.

| Secret | Rotation Command | Side Effects | Notes |
|--------|-----------------|-------------|-------|
| `BETTER_AUTH_SECRET` | `fly secrets set BETTER_AUTH_SECRET="$(openssl rand -base64 32)" --app <env>` | **All existing user sessions invalidated (force-logout).** Users must re-authenticate. | If visible to users (e.g., players mid-session), document in changelog. Use different values per env. |
| `STAGING_INVITE_TOKEN` | `fly secrets set STAGING_INVITE_TOKEN="$(openssl rand -base64 24)" --app rebno-staging` | All in-flight staging WS connections lacking the new token will be rejected on next handshake. | Update `STAGING_INVITE_TOKEN` GitHub secret to match. |
| `STAGING_MODE` | `fly secrets unset STAGING_MODE --app rebno-staging` (to disable invite gate) or `fly secrets set STAGING_MODE=1` (to re-enable) | Invite-token middleware becomes a no-op when unset. | Never set on `rebno-prod`. |
| `ZO_ROOT_USER_PASSWORD` | `fly secrets set ZO_ROOT_USER_PASSWORD="$(openssl rand -base64 24)" --app rebno-obs` | OpenObserve UI admin session invalidated; re-login required with new password. | Save the new password in your local password manager immediately. |
| Tigris bucket keys | `fly storage destroy --app <env>` then `fly storage create --app <env>` | **Litestream replication breaks until new `BUCKET_NAME` + `AWS_*` env vars propagate (requires redeploy).** Pre-migration snapshots from old bucket are **LOST** unless manually migrated. | Before destroying: run `fly ssh console --app <env> --command 'litestream snapshots s3://$BUCKET_NAME/rebno.db'` to list current snapshots; copy to a temp location via `litestream restore` if needed. |
| Ed25519 keypair | Delete `/data/keys/room_signing.ed25519` + restart | Server auto-regenerates keypair on boot (Phase 4 D-19). New pubkey must be extracted for Phase 6 client builds (`VITE_ROOM_SIGNING_PUBKEY`). | Phase 6 client builds with the stale pubkey will fail layout-signature verification. Coordinate with a Phase 6 client release. See §Legacy localList.txt Ingest for how to access the machine. |

**Ed25519 keypair rotation procedure:**
```sh
# 1) Delete private key (machine will regenerate on restart)
fly ssh console --app rebno-prod --command \
  'rm /data/keys/room_signing.ed25519 /data/keys/room_signing.pub'

# 2) Restart machine to trigger keypair regeneration.
fly machine list --app rebno-prod
fly machine restart <machine-id> --app rebno-prod

# 3) Extract new pubkey for Phase 6 client.
fly ssh console --app rebno-prod --command \
  'cat /data/keys/room_signing.pub'
# Record the base64-encoded pubkey; set it as VITE_ROOM_SIGNING_PUBKEY
# in the Phase 6 client Vite build (env file or GitHub Actions variable).
```

---

## Combined Rollback (Bad Migration + Bad Image)

When a deploy ships both a bad image AND a schema migration that caused
data corruption:

```sh
APP=rebno-prod
PRIOR_SHA=<the image SHA that was running before the bad deploy>
# Find it: fly releases --app $APP | head -5

# Step 1 — Roll the image back.
fly deploy \
  --app $APP \
  --image registry.fly.io/$APP:$PRIOR_SHA \
  --config apps/server/fly.prod.toml
# The entrypoint runs drizzle-kit migrate on the existing DB.
# If the bad migration already ran and is not idempotent, step 2 is needed.

# Step 2 — (Only if schema rollback is also required) Restore DB from
# pre-migrate snapshot.
#
# The docker-entrypoint.sh captures a pre-migrate snapshot before running
# drizzle-kit migrate (CONTEXT D-09):
#   litestream snapshots --output /data/snapshots/pre-migrate-<timestamp>.db
#
# List available pre-migrate snapshots:
fly ssh console --app $APP --command \
  'ls -lt /data/snapshots/ | head -10'

# Restore from the most recent pre-migrate snapshot:
fly ssh console --app $APP --command \
  'cp /data/snapshots/pre-migrate-<CLOSEST_TIMESTAMP>.db /data/rebno.db'

# Restart the machine to reload the restored DB.
fly machine list --app $APP
fly machine restart <machine-id> --app $APP

# Step 3 — Verify.
fly ssh console --app $APP --command \
  'sqlite3 /data/rebno.db "PRAGMA integrity_check; SELECT count(*) FROM accounts;"'

# Step 4 — Monitor health.
fly status --app $APP
curl -sf "https://${APP}.fly.dev/health"
```

**Drizzle migrations are forward-only.** Backwards-incompatible schema
changes (breaking column rename, column deletion) require a two-deploy
pattern: (1) add the new column as nullable + backfill → (2) make NOT NULL
in a second deploy once all rows are populated. This avoids needing DB
rollback for most schema changes.

---

## Legacy localList.txt Ingest (D-17 ssh-sftp ritual)

**SECURITY: DO NOT bake localList.txt into the Docker image.** Plaintext
credentials in the image would persist in the registry layer cache
indefinitely and could leak if the registry is ever made public. The
ssh-sftp approach is the only approved ingestion path (CLAUDE.md Hard
Rule 2; CONCERNS.md §"Plaintext player credentials checked into the archive
(CRITICAL)").

**Run once per environment.** Re-running the import is safe (idempotent
upsert), but the seed file must be deleted after first import.

```sh
APP=rebno-staging   # repeat for rebno-prod when Phase 7 (.bnu migration) is ready

# Step 1 — Upload localList.txt to the machine's /data/seed/ landing zone.
#   fly ssh sftp drops you into an interactive SFTP session against the machine.
fly ssh sftp --app $APP shell
```

Inside the SFTP shell:
```
sftp> mkdir /data/seed
sftp> put legacy/servers/enlyzeam-current/localList.txt /data/seed/localList.txt
sftp> bye
```

```sh
# Step 2 — Run the importer.
fly ssh console --app $APP --command \
  'cd /app && pnpm migrate:legacy-accounts /data/seed/localList.txt'
# Expected output: "Imported N accounts into legacy_credentials_staging"

# Step 3 — Verify row count matches expected.
#   Expected rows ≈ line count of localList.txt minus header lines.
fly ssh console --app $APP --command \
  'sqlite3 /data/rebno.db "SELECT count(*) FROM legacy_credentials_staging;"'
# Compare against: wc -l legacy/servers/enlyzeam-current/localList.txt

# Step 4 — DELETE the seed file from the machine.
fly ssh console --app $APP --command \
  'rm -v /data/seed/localList.txt && ls /data/seed/'
# Expected: /data/seed/ is empty.
```

Record expected rows, imported rows, and seed-file-deleted confirmation
in `.planning/phases/05-deploy/05-HUMAN-UAT.md` Test 5.

---

## Access Hardening (Fly Proxy IP Allowlist)

Staging carries real hashed credentials derived from localList.txt. The
invite-token middleware (D-04) provides one gate; the Fly proxy IP
allowlist provides the outer gate. Both layers must be active before the
legacy-credential import (§Legacy localList.txt Ingest).

Implementation: application-level Express middleware
(`apps/server/src/ip-allowlist.ts`) mounted before `makeStagingInvite`.
Fly does not natively provide CIDR allowlisting for shared-IP HTTP apps;
the middleware reads the Fly-Client-IP header (or X-Forwarded-For first
hop) and 403s any IP not in `OPERATOR_IP_ALLOWLIST`. `/health` is
exempted so Fly's health-check probe (Fly-internal IP) passes through.

```sh
# 1) Find your egress IP.
YOUR_IP=$(curl -s https://api.ipify.org)

# 2) Set the allowlist secret on rebno-staging (CSV for multiple IPs).
fly secrets set OPERATOR_IP_ALLOWLIST="$YOUR_IP" --app rebno-staging
# Multi-operator: fly secrets set OPERATOR_IP_ALLOWLIST="ip1,ip2,ip3"

# 3) Trigger redeploy so the secret reaches the running machine.
fly deploy --app rebno-staging \
  --config apps/server/fly.staging.toml \
  --dockerfile apps/server/Dockerfile \
  --remote-only --wait-timeout 90

# 4) Positive test (from your IP) — should reach invite gate (401):
curl -sS https://staging.rebno.decidel.com/api/auth/sign-in/email \
  -X POST -H 'Content-Type: application/json' -H 'Origin: https://staging.rebno.decidel.com' -d '{}'
# Expected: HTTP 401 {"error":"invite token required"}

# 5) Negative test (from a different IP, e.g. via a proxy) — should 403:
curl -s "https://api.allorigins.win/get?url=https%3A%2F%2Fstaging.rebno.decidel.com%2F"
# Expected: contents={"error":"forbidden"} http_code=403

# 6) /health stays open for Fly health checks regardless of allowlist:
curl -sS https://staging.rebno.decidel.com/health
# Expected: {"status":"ok",...}
```

For rebno-obs (when provisioned), repeat steps 2–5 with `--app rebno-obs`
and adjust the OpenObserve image config to honor the same env var (or
gate at a Fly proxy / Cloudflare layer).

To disable the allowlist entirely: `fly secrets unset OPERATOR_IP_ALLOWLIST
--app rebno-staging` (empty value → middleware no-ops).

Record allowlisted networks, negative test result, and date in
`.planning/phases/05-deploy/05-HUMAN-UAT.md` Test 7.

---

## References

- `.planning/phases/05-deploy/05-CONTEXT.md` — D-* implementation decisions
  (D-01 two-app split, D-04 staging access, D-09 entrypoint migration,
  D-10 one-shot legacy import, D-17 ssh-sftp ritual, D-18 secrets,
  D-19 Ed25519, D-20 Tigris, D-21 rotation policy)
- `.planning/phases/05-deploy/05-RESEARCH.md` — full Phase 5 research +
  Pitfall 7 (Tigris must precede first deploy)
- `.planning/phases/04-server-rebuild-mvp/04-09-SUMMARY.md` §"Manual
  Verification (Phase 5 Debt)" — verbatim 6-step kill -9 procedure
  (source for §Phase 4 Carry-Forward Verification Test 1 above)
- `.planning/phases/04-server-rebuild-mvp/04-HUMAN-UAT.md` — Phase 4
  partial results; Tests 1+2+3 close on first Phase 5 staging deploy
- `apps/obs/README.md` — rebno-obs one-time provisioning checklist
- `apps/server/scripts/argon2-bench.mjs` — argon2id bench script
- `apps/server/src/argon2-opts.ts` — production cost parameters
- `docs/adr/0002-persistence-layer.md` — SQLite + Litestream + Tigris ADR
- `docs/adr/0005-deploy-topology.md` — deploy topology ADR (two-app,
  region lax, staging access pattern; created in Plan 14)
- `docs/adr/0006-observability-stack.md` — observability stack ADR
  (OpenObserve self-hosted on rebno-obs; created in Plan 14)
- `litestream.io` docs — `litestream.yml` schema, restore CLI, RPO < 1s
- `fly.io/docs/` — fly.toml reference, fly volumes, fly storage (Tigris),
  fly secrets, fly ssh sftp
