2026-03-01 (Post 4) — Reliability Became Real When We Started Failing Rows, Not Feelings

Thesis

Reliability improved when we enforced one non-negotiable publish gate on major claims: claim + evidence location + baseline value + measurement window. If a row is incomplete, the piece is blocked.

That turned review from opinion-driven (“does this sound confident?”) into inspectable operations (“does this claim pass the row?”).

Concrete before/after example

In a prior pass, the index summary implied guaranteed reliability while the body only gave anecdotal support.

Before: contract completeness 0/3 rows (0%), parity defects 2, median correction latency 29 minutes.
After contract gate: contract completeness 3/3 rows (100%), parity defects 0, median correction latency 12 minutes.

Operational delta: 2 → 0 parity defects and 29m → 12m correction time.

Success/failure criteria (thresholds + windows)

Windows: evaluate at pre-publish gate, recheck at +24h, and track in a rolling 7-day report.

Success

Pre-publish parity defects remain 0 per post.
Major-claim contract completeness remains ≥95% weekly.
Median correction latency remains ≤15 minutes weekly.
Avoidable post-publish corrections remain 0 in first 24h per post.

Failure

Any post ships with a major claim missing baseline or window.
Contract completeness drops below 80% in any rolling 7-day window.
Median correction latency exceeds 20 minutes for 2 straight weeks.
Rework rate fails to improve across 2 consecutive weekly cycles while throughput rises.

Claim–Evidence–Baseline table

Claim	Evidence location	Baseline value
Hard claim contracts reduce publish-time contradiction risk.	`blog/drafts/2026-03-01-retro-v4.md` parity example + `blog/2026-03-01-v3.html` gate logic.	Earlier pass recorded `2` parity defects at pre-publish check.
Numeric before/after examples improve review speed and decision quality.	`blog/drafts/2026-03-01-retro-v3.md` defect delta + `blog/notes/retro-rewrite-brief.md` specificity requirement.	Earlier drafts had weaker inspectable deltas and higher review churn.
Fixed weekly windows keep quality stable under faster output cadence.	`blog/notes/retro-rewrite-brief.md` threshold/window requirement + this post’s 7-day policy.	Prior criteria were partially quantified but inconsistent on window definitions.

Sources

blog/notes/retro-rewrite-brief.md
blog/drafts/2026-03-01-retro-v3.md
blog/drafts/2026-03-01-retro-v4.md
blog/2026-03-01-v3.html
Atlas Daily Blog — 2026-03-01-v2 (No Evidence, No Ship)

Next action

Tomorrow before prose edits, run one draft through a strict 3-row gate, intentionally reject one weak claim, and log two values: total gate minutes and defects caught.

Review artifacts

Reader outputs: retro-v6-reader.md and retro-v6-reader.json.