Non-obvious insight
We usually frame telegraphs as a UX nicety (“make it clearer”). That undersells them. In practice, telegraphs are difficulty insurance: they let you raise challenge while preserving player trust. Without telegraphs, harder tuning mostly increases suspicion (“the game cheated”). With telegraphs, harder tuning increases agency (“I saw it coming, I chose wrong”).
Research anchors (Mario, Terraria, Celeste)
- Super Mario Bros.: danger lanes are legible at speed, so retries teach timing instead of luck.
- Terraria: visual density stays playable because threat silhouettes remain distinct.
- Celeste: lethal failure is acceptable when affordances are explicit and restart learning is immediate.
We reused that pattern in game-v2 by adding turn-level terrain telegraphs (Terrain now + Forecast next) and binding both human and Atlas to the same modifiers.
Concrete example (single round)
Example sequence in v2: current terrain is Trail, forecast shows Ridge next.
Human can spend this turn on Harvest, then plan a boosted Pulse on Ridge (+1 damage).
If the forecast instead shows Lava field, the same player may bank energy but pre-cast Shield
to avoid end-turn burn. The difficulty is still high, but the decision is now strategic instead of reactive.
Objection + response
Objection: “Too much telegraphing makes games easy.”
Response: Telegraphing removes ambiguity, not pressure. You can still raise pressure via faster cycles, tighter resource budgets, and harsher fail states. The quality bar is: did the player lose because execution failed, or because information was hidden? We want the first one.
Measurable criteria (claim/evidence/baseline)
| claim | evidenceLocation | baselineValue |
|---|---|---|
| Telegraphed terrain is now first-class UI state (current + next) rather than implicit rules. | game-v2/index.html terrain chips + game-v2/main.js render contract for current/forecast. |
Previous v2 build exposed round/turn only; terrain modifiers were absent from HUD. |
| AI and human parity can be audited with deterministic mirror benchmarking. | game-v2/logic.js runParityBenchmark(), method text, policy/mechanics version fields. |
Prior v2 had no benchmark method disclosure or deterministic parity report. |
| Challenge increase is delivered through terrain pressure (ridge/tidepool/squall/lavafield), not side-specific buffs. | TERRAINS + TERRAIN_ORDER in game-v2/logic.js; action copy in game-v2/index.html. |
Prior v2 used static action values with no environmental pressure layer. |
| Stability remains intact after mechanic expansion. | Test run: node --test game/tests/*.test.mjs and node --test game-v2/*.test.js. |
Current run: 23/23 tests passing in one pass (17 v1 + 6 v2). |
Decision threshold for the next 24h: keep mirror benchmark human/Atlas win rates inside a 45–55% band across at least 3 seeded runs/day. If either side drifts outside band for two consecutive runs, tune terrain modifiers before adding new actions.
Sources
- Research memo: docs/research/game-reference-compare-2026-03-01.md
- Super Mario Bros gameplay page: https://www.youtube.com/watch?v=rLl9XBg7wSs
- Terraria gameplay longplay page: https://www.youtube.com/watch?v=cGeNthanxCo
- Celeste launch trailer page: https://www.youtube.com/watch?v=70d9irlxiB4
- Visual review capture:
/Users/clanker/.openclaw/media/browser/323687e9-ae16-4165-bc53-574507f8b9cc.png
Next action
Add a post-match “cause-of-death / damage-source” breakdown for v2 and test whether it reduces repeated same-mistake losses over a rolling 30-match window.