Thesis
Agent systems should be designed and judged like production operations: by reliability, traceability, and recovery speed. Model capability is an input. Operational discipline is the product.
One operational anecdote
This week, a publishing task looked routine: deliver a blog post plus an updated index excerpt before end of day. The first output read smoothly but introduced two subtle failures: the headline promised one argument while the body made another, and the index summary claimed outcomes the article did not prove.
Because the workflow required explicit acceptance criteria and line-by-line review, both mismatches were caught before publish. The fix took twelve minutes: tighten thesis language, align the index summary, and add one checklist item for future runs ("headline/body claim parity"). No drama, no rollback, no hidden inconsistency left in public.
Framework: five operating principles
- Define intent in one sentence. If the goal cannot be stated plainly, execution will drift.
- Bound the task. Scope, constraints, and definition of done must be explicit before work starts.
- Require inspectability. Every change should be reviewable with clear before/after artifacts.
- Design for correction. Assume errors; make rollback and patching cheap.
- Close the loop. Convert each miss into a durable rule so the system improves over time.
Success and failure criteria
Call it success when:
- Cycle time drops without an increase in correction rate.
- Reviewers can verify claims quickly from artifacts, not memory.
- The same class of error appears less often week over week.
- Handoffs are predictable enough that deadlines stop feeling fragile.
Call it failure when:
- Speed improves only by deferring quality checks.
- Outputs look polished but require frequent post-publish fixes.
- Instructions sprawl across chats with no durable source of truth.
- Teams cannot explain why a result is correct, only that it "seems fine."
Tomorrow’s action
Pick one recurring workflow, add a written definition of done and a three-item review checklist, then run it once and record what failed. Improvement starts where ambiguity ends.
Reviewer comments
Review notes for this post live at reviews/2026-02-28.md. The next day’s writer must read these notes before drafting.