Thesis

Treat writing agents as an idea-operations engine, not a content machine. The value is not “more posts.” The value is a repeatable loop that converts signals into decision-grade artifacts, execution-ready handoffs, and measurable follow-through.

In practice, this means running writing as a system: sense, decide, execute, reflect — with a hard evidence gate on major claims.

The loop: sense → decide → execute → reflect

  1. Sense: ingest notes, incidents, meeting fragments, and strategic questions.
  2. Decide: produce a decision memo with options, tradeoffs, and recommendation.
  3. Execute: generate owner-tagged task packets/runbooks with acceptance criteria.
  4. Reflect: log outcomes, update patterns, and carry forward durable lessons.

This is still writing, but writing used as operational infrastructure.

Five capabilities beyond blogging

  • Decision memo synthesis → output artifact: recommendation memo with explicit tradeoffs.
  • Operational handoff authoring → output artifact: runbook/task packet with acceptance criteria.
  • Signal digestion and prioritization → output artifact: ranked problem brief.
  • Quality-control enforcement → output artifact: claim-evidence-baseline verification report.
  • Learning-memory formation → output artifact: weekly principle/anti-pattern update.

Concrete operational example

Example: Monday planning previously required ad-hoc synthesis across notes and chats. Baseline turnaround from raw notes to decision-ready plan was ~24h.

With the writing-agent loop:

  • Input: meeting notes + TODO fragments + blockers.
  • Output 1: decision memo (options, recommendation, risks).
  • Output 2: execution brief (owner, due window, acceptance criteria).
  • Output 3: risk register delta and tomorrow action.

Target turnaround becomes <8h for a complete packet, with evidence rows required before ship.

5-week experiment + success/failure criteria

Plan: week 1 baseline, week 2 decision artifacts, week 3 operational handoffs, week 4 cadence increase, week 5 consolidate and go/no-go.

Success (weekly rolling window):

  • Evidence completeness for major claims ≥ 95% (baseline: inconsistent pre-v2).
  • Median decision latency reduced by 30%+ vs week-1 baseline.
  • Major correction/rework rate reduced by 40%+ vs baseline.
  • Outputs with explicit owner + due-time next action ≥ 80%.

Failure is any of:

  • Evidence completeness drops below 80% in any week.
  • Throughput increases while rework does not improve.
  • Outputs repeatedly miss owners, thresholds, or timing.
  • Operators bypass the artifacts for 2 consecutive weeks (pause rollout).

Claim–Evidence–Baseline (major claims)

claimevidenceLocationbaselineValue
Multi-step writing-agent workflows improve reliability versus one-shot generation. https://www.anthropic.com/research/building-effective-agents (workflow pattern framing) + local planner/writer/reviewer pipeline. Baseline workflow emphasized post output count over loop closure quality.
Explicit control mechanisms improve writing system safety and consistency. https://martinfowler.com/articles/feature-toggles.html (control/lifecycle framing) + docs/artifact-schema.md gate. Baseline checks were present but less consistently enforced before v2 hard gate.
Writing cadence produces higher operational value when tied to action closure. https://www.benkuhn.net/writing/ (writing-as-thinking) + docs/writing-agent-project.md metrics. Baseline cadence primarily optimized for publishing, not decision packet throughput.

Gate rule: missing claim/evidence/baseline rows on major claims => Do not ship.

Tomorrow’s action

Run one real decision through the full loop tomorrow morning: generate a decision memo, execution brief, and reflection note in one pass, then score evidence completeness and cycle time.

Sources

These are the primary references behind the claim/evidence/baseline table above.

Reviewer comments

v3 review artifacts: reviews/2026-03-01-v3-merged.md.