# Ground-Truth Verification — Plan 03-03 Step 1

This file records empirical observations gathered before the scanner emits any
schema. The four assumptions verified here (A1, A2, A3, A7) are documented in
`.planning/phases/03-server-documentation-schemas/03-RESEARCH.md` §Assumptions Log.
The scanner hardcodes findings (e.g., `encoding: 'windows-1252'`) on the basis
of these checks. Re-run the same probes if upstream `legacy/` ever changes.

> **Note on environment.** This plan was executed inside a git worktree where
> the gitignored `legacy/` tree is **absent**. The probes below were therefore
> run against the canonical checkout at
> `C:/Users/decid/Documents/projects/rebno/legacy/servers/enlyzeam-current/` (read-only).
> Test code that depends on `legacy/` soft-skips when the path is absent — see
> `tests/integration/real-data.test.ts`.

---

## A1 — Windows-1252 encoding

**Probe:** first 200 bytes of `legacy/servers/enlyzeam-current/localList.txt`.

```text
56 61 6e 63 65 20 53 65 72 6f 72 69 0a 62 6e 6f 2e 63 72 65 61 74 65 28 29 3b 0a
"V" "a" "n" "c" "e" " " "S" "e" "r" "o" "r" "i" \n "b" "n" "o" "." "c" "r" "e" "a" "t" "e" "(" ")" ";" \n
```

All bytes 0..0x7F (pure ASCII) for the first 200 bytes — both windows-1252 and
UTF-8 decode this prefix identically. Single-LF line terminator (`0a`), not
CRLF. Two non-ASCII bytes were observed deeper in the file:

```text
…"FlrBlzr\xD3\xD9"…   (line 40 of the file)
```

`0xD3 0xD9` decoded as windows-1252 → `Ó Ù` (U+00D3, U+00D9). Decoded as UTF-8
those are not a valid 2-byte sequence (UTF-8 leading byte 0xD3 expects a
continuation in 0x80..0xBF; 0xD9 satisfies that, but the Unicode codepoint
that would yield is U+04D9 / Cyrillic — far outside the plausible username
character set for an English-language game from 2003).

**Verdict — A1 confirmed:** `encoding: 'windows-1252'` is correct.
GameMaker 5.3a's `file_text_*` family writes the host's OEM/ANSI codepage,
which on the original 2003 Windows hosts was windows-1252 (CP1252). Phase 4
SRV-10/11 must decode with `iconv-lite` against `windows-1252`, not UTF-8.

## A2 — Plaintext password rows in localList.txt

**Probe:** sample lines (split on `\n`).

| Line idx | Sample (truncated) |
|----------|--------------------|
| 0 | `Vance Serori` |
| 1 | `bno.create();` |
| 2 | `Goaspad` |
| 3 | `forte0` |
| 5 | `bahoobutt` |
| 10 | `Zak` |
| 12 | `Saber Mage` |
| 17 | `derteth12` |
| 40 | `FlrBlzr...` (with two windows-1252 bytes at the tail) |

Lines alternate username / password (matching `0392-users_load.gml`'s read
loop). None of the sampled "password" rows start with `$2a$`, `$2b$`, `$2y$`,
`$argon2`, or any other recognizable hash prefix. They are short
human-readable strings.

**Verdict — A2 confirmed:** rows are stored as **plaintext**. This corroborates
CLAUDE.md Hard Rule #2 ("the original `localList.txt` plaintext archive is a
security incident, not a feature") and locks Phase 4 SRV-10's migration plan:
**every imported row MUST be force-rotated** through argon2id before being
written to `users_v2`. The plaintext column is captured here only as a
schema fact; **scanner samples MUST redact the password column** to avoid
re-leaking credentials into the committed JSON.

## A3 — Complete inventory of save-touching scripts

**Probe:**

```bash
node -e "const fs=require('fs');const path=require('path');const dir='extracted/server-5-4/scripts';const out=[];for(const f of fs.readdirSync(dir).sort()){const t=fs.readFileSync(path.join(dir,f),'utf8');if(/file_text_open_/.test(t))out.push(f);}console.log(JSON.stringify(out,null,2))"
```

**Output (17 scripts, plan said "16-script" — 1 extra found, see delta):**

```json
[
  "0365-mb_backup.gml",
  "0366-mb_restore.gml",
  "0367-users_restore.gml",
  "0368-uarea_restore.gml",
  "0369-uarea_backup.gml",
  "0371-unews_backup.gml",
  "0372-unews_restore.gml",
  "0376-uinv_backup.gml",
  "0377-uinv_restore.gml",
  "0379-users_restore_old.gml",
  "0383-uhxb_restore.gml",
  "0384-uhxb_backup.gml",
  "0386-load_settings.gml",
  "0387-save_settings.gml",
  "0390-debug_log.gml",
  "0391-users_load_old.gml",
  "0392-users_load.gml"
]
```

**Delta from plan §read_first list (13 scripts) and from RESEARCH §Verified
format inventory (16 scripts):**

| Script | In plan? | In RESEARCH? | Action |
|--------|---------:|-------------:|--------|
| `0390-debug_log.gml` | no | no | New entry. Uses `file_text_open_append(global.debug)` to write a free-form log. Filename is a runtime variable (`global.debug`), no fixed extension. Documented as `runtime-debug-log` with `archived: true` (operator-only diagnostic; not part of any persistence schema, no parser stub needed). |
| `0391-users_load_old.gml` | no | yes | Reads `localList.txt` for `global.u_total` then has the per-user loop **commented out**. The active `localList.txt` consumer is `0392-users_load.gml`. Both share the same filename so we still emit one row for `localList.txt` with `load_script: 0392-users_load.gml` (the live one). |

**Verdict — A3 confirmed with one delta noted above.** Scanner walks all 17
scripts. `0390-debug_log.gml` is treated as an opaque append-log and emitted
with `archived: true` (no grammar derived). `0391-users_load_old.gml` is
deduplicated out (its active sibling 0392 dominates).

## A7 — User_DB_Superweird presence

**Probe:**

```bash
ls legacy/servers/enlyzeam-current/UserData/
ls legacy/servers/enlyzeam-current/
find legacy/servers/ -iname "*superweird*"
find legacy/servers/ -iname "User_DB*"
```

`enlyzeam-current/` top-level relevant entries:

```text
,2021_Note_About_Sources.txt
DebugLog.txt
MB_Log.bnb
MSettings.bno
Settings.bno
UserData/
localList.txt
remoteList.txt
server_log.txt
…  (no User_DB_Superweird anywhere)
```

`enlyzeam-current/UserData/` subdirs:

```text
Areas (VOID)/
HXB/
HXB Backup 09-09/
Inv/
Inv Backup 09-09/
MB_News/
```

`find … -iname "*superweird*"` returns **zero matches** anywhere under
`legacy/servers/`.

`find … -iname "User_DB*"` finds:

- `legacy/servers/enlyzeam-archive/BNO Master Files (J)/User_DB.bnu`
- `legacy/servers/enlyzeam-archive/User_DBUpdated.bnu`
- `legacy/servers/enlyzeam-current/BNO Master Files (J)/User_DB.bnu`
- `legacy/servers/local-current/BNO_Server/BNO Master Files (J)/User_DB.bnu`

…all `User_DB.bnu` (Japanese-master-files variant, NOT Superweird) and one
`User_DBUpdated.bnu` archive in `enlyzeam-archive/`. **No
`User_DB_Superweird*.bnu` file exists in any `legacy/` subtree.**

**Verdict — A7 confirmed:** Superweird is an archive-only schema (only the
`0379-users_restore_old.gml` GML script references it; no on-disk file is
present). The scanner therefore emits the Superweird grammar with
`archived: true` and `sample_records: []` (with the comment field
`archived_no_sample` set to true in the row). Phase 5 PAR-05 parity-checklist
must surface this as `legacy-superweird-import: not-applicable-no-extant-files`.

---

## Encoding decision matrix used by scanner

| File pattern | Encoding | Source |
|--------------|----------|--------|
| `*.bno`, `*.bnb`, `*.bnu`, `localList.txt`, `DebugLog.txt` | `windows-1252` | A1 above + GameMaker 5.3a `file_text_*` semantics |

Phase 4 SRV-10/11 imports `output/save-formats.ts` and consumes the
`encoding` field per FormatRow.
