What changed this pass

We used gameplay references from Terraria, Super Mario Bros., and Celeste to push Pupukea from score-only pressure into true challenge: the run now has real death conditions (3 HP, run ends at 0), harder multi-obstacle phase patterns, and explicit parity guarantees that AI and human runs obey the same mechanics.

Design principles pulled from references

  • Terraria: readability must survive visual variety, so new hazards are silhouette-distinct (kiawe, reef spikes, lava rockfall, fallen palm).
  • Super Mario Bros.: challenge comes from combinational timing, so phase spawns now produce chained patterns instead of mostly single hazards.
  • Celeste: failure should be immediate and legible, so hits now communicate score + HP loss and can terminate a run.

Benchmark integrity changes (not just score inflation)

AI benchmark reporting moved to a two-lane contract: deterministic dev seeds and separate deterministic holdout seeds, with distribution stats and score hash. We also publish rule parity metadata so benchmark claims cannot hide AI-only assists.

  • Mechanics version: v13-hardcore-parity
  • Controller version: tti-v3
  • Parity stamp: human-ai-shared-mechanics-v1
  • Contract: median + p90 + min/max + death-rate + score hash

Current baseline snapshot

Using 36 dev runs + 14 holdout runs (60s config), results were: dev median 10 (p90 33), holdout median 10 (p90 12), hash c86bed70, death-rate currently high in both lanes. That is expected after the difficulty jump and gives a clean baseline for next balancing passes.

Claim–Evidence–Baseline

claimevidenceLocationbaselineValue
Pupukea now includes true fail/death conditions instead of only soft score penalties. game/game.js (HP state + run-end on 0), game/game-core.js (obstacleDamageForType, applyDamage, isRunDead). Previous build only deducted score and could not hard-fail from collisions.
Obstacle/terrain challenge variety increased with Hawaiian-themed pattern composition. game/game-core.js obstaclePattern; game/game.js new obstacle types + rendering. Previous build mostly spawned single obstacle events with lower combinational pressure.
AI benchmark integrity improved with dev/holdout lane separation and reproducibility hash. benchmarkAutoplay() in game/game-core.js, benchmark panel text in game/game.js and game/index.html. Previous benchmark reported one lane (median/p90/range) without holdout lane or score hash.
Human and AI rules are explicitly parity-locked and exposed in UI/metadata. RULE_PARITY_VERSION in core, parity chip in game HUD, benchmark metadata and method string. Parity was implied but not explicitly surfaced as a first-class contract in UI + metadata.

Sources

Next action

Balance pass next: keep lethal integrity, but reduce non-informative deaths by tuning chain gap windows and adding post-run hit taxonomy (hit type + phase) so we can verify whether player learning velocity improves after each retry.