What changed this pass
We used gameplay references from Terraria, Super Mario Bros., and Celeste to push Pupukea from score-only pressure into true challenge: the run now has real death conditions (3 HP, run ends at 0), harder multi-obstacle phase patterns, and explicit parity guarantees that AI and human runs obey the same mechanics.
Design principles pulled from references
- Terraria: readability must survive visual variety, so new hazards are silhouette-distinct (kiawe, reef spikes, lava rockfall, fallen palm).
- Super Mario Bros.: challenge comes from combinational timing, so phase spawns now produce chained patterns instead of mostly single hazards.
- Celeste: failure should be immediate and legible, so hits now communicate score + HP loss and can terminate a run.
Benchmark integrity changes (not just score inflation)
AI benchmark reporting moved to a two-lane contract: deterministic dev seeds and separate deterministic holdout seeds, with distribution stats and score hash. We also publish rule parity metadata so benchmark claims cannot hide AI-only assists.
- Mechanics version:
v13-hardcore-parity - Controller version:
tti-v3 - Parity stamp:
human-ai-shared-mechanics-v1 - Contract: median + p90 + min/max + death-rate + score hash
Current baseline snapshot
Using 36 dev runs + 14 holdout runs (60s config), results were:
dev median 10 (p90 33), holdout median 10 (p90 12), hash c86bed70,
death-rate currently high in both lanes. That is expected after the difficulty jump and gives a clean baseline for next balancing passes.
Claim–Evidence–Baseline
| claim | evidenceLocation | baselineValue |
|---|---|---|
| Pupukea now includes true fail/death conditions instead of only soft score penalties. | game/game.js (HP state + run-end on 0), game/game-core.js (obstacleDamageForType, applyDamage, isRunDead). |
Previous build only deducted score and could not hard-fail from collisions. |
| Obstacle/terrain challenge variety increased with Hawaiian-themed pattern composition. | game/game-core.js obstaclePattern; game/game.js new obstacle types + rendering. |
Previous build mostly spawned single obstacle events with lower combinational pressure. |
| AI benchmark integrity improved with dev/holdout lane separation and reproducibility hash. | benchmarkAutoplay() in game/game-core.js, benchmark panel text in game/game.js and game/index.html. |
Previous benchmark reported one lane (median/p90/range) without holdout lane or score hash. |
| Human and AI rules are explicitly parity-locked and exposed in UI/metadata. | RULE_PARITY_VERSION in core, parity chip in game HUD, benchmark metadata and method string. |
Parity was implied but not explicitly surfaced as a first-class contract in UI + metadata. |
Sources
- Reference memo: docs/research/game-reference-compare-2026-03-01.md
- Terraria official trailer page: https://www.youtube.com/watch?v=w7uOhFTrrq0
- Terraria gameplay longplay page: https://www.youtube.com/watch?v=cGeNthanxCo
- Super Mario Bros. gameplay page: https://www.youtube.com/watch?v=rLl9XBg7wSs
- Celeste launch trailer page: https://www.youtube.com/watch?v=70d9irlxiB4
- Captured evidence assets:
docs/research/game-reference-shots/*.png
Next action
Balance pass next: keep lethal integrity, but reduce non-informative deaths by tuning chain gap windows and adding post-run hit taxonomy (hit type + phase) so we can verify whether player learning velocity improves after each retry.