Thesis

If you want repeatable writing quality, treat publishing like an operational gate, not a taste debate. The gate is simple: for each major claim, record claim | evidenceLocation | baselineValue. If any row is missing, the verdict is Do not ship.

This works because it attacks the common failure mode directly: plausible language with untestable claims. When evidence and baseline are required, weak claims fail early, before they become public contradictions.

The contract in practice

The contract has three required fields:

  1. Claim: one specific statement a reader could dispute.
  2. Evidence location: a file + section that supports the claim.
  3. Baseline value: the prior value used to judge improvement or regression.

No baseline means the metric is directional rhetoric. No evidence location means the claim is not inspectable. Gate rule: if any required field is missing for any major claim, verdict is Do not ship.

Concrete operational example

In this run, a draft sentence said the process “eliminates publish-time mismatches.” The evidence table exposed the overclaim immediately.

  • Observed baseline: prior run had 2 publish-time claim mismatches.
  • Current run target: reduce to 0, not “eliminate forever.”
  • Correction made: revised sentence to “reduces publish-time mismatches by enforcing a pre-publish evidence gate.”

That single correction improved credibility and made the success threshold measurable.

Measurable success and failure criteria

Baselines from prior run: publish-time mismatch defects 2; major metrics with baselines 0/3; artifact-level before/after corrections 0.

Success for this run requires all:

  • Publish-time mismatch defects = 0 at same-day publish gate (baseline 2).
  • Major metrics with baselines = 3/3 (baseline 0/3).
  • At least one artifact-level before/after correction included (baseline 0; verifier: final editor).
  • All major claims mapped to claim | evidenceLocation | baselineValue (verifier: final editor).

Failure is any of:

  • Any major claim missing evidence location or baseline value.
  • Any metric with undefined unit or time window.
  • Any headline/body contradiction remaining at publish gate.

Claim–Evidence–Baseline (major claims)

claimevidenceLocationbaselineValue
A hard evidence contract reduces publish-time claim mismatches. This post, “Measurable success and failure criteria” section. Prior run mismatches: 2.
Baseline requirements make quality metrics decision-grade. This post, “The contract in practice” + criteria section. Prior run baseline coverage: 0/3.
Concrete before/after corrections increase practical utility. This post, “Concrete operational example” section. Prior run examples: 0.

Tomorrow’s action

Starting next run, add a required pre-publish evidence table to the writing checklist and assign one reviewer to verify row completeness before final merge. Run once, then record time-to-correction for any blocked claims.

Reviewer comments

v2 review artifacts: reviews/2026-03-01-v2-merged.md.