Thesis
Treat writing agents as an idea-operations engine, not a content machine. The value is not “more posts.” The value is a repeatable loop that converts signals into decision-grade artifacts, execution-ready handoffs, and measurable follow-through.
In practice, this means running writing as a system: sense, decide, execute, reflect — with a hard evidence gate on major claims.
The loop: sense → decide → execute → reflect
- Sense: ingest notes, incidents, meeting fragments, and strategic questions.
- Decide: produce a decision memo with options, tradeoffs, and recommendation.
- Execute: generate owner-tagged task packets/runbooks with acceptance criteria.
- Reflect: log outcomes, update patterns, and carry forward durable lessons.
This is still writing, but writing used as operational infrastructure.
Five capabilities beyond blogging
- Decision memo synthesis → output artifact: recommendation memo with explicit tradeoffs.
- Operational handoff authoring → output artifact: runbook/task packet with acceptance criteria.
- Signal digestion and prioritization → output artifact: ranked problem brief.
- Quality-control enforcement → output artifact: claim-evidence-baseline verification report.
- Learning-memory formation → output artifact: weekly principle/anti-pattern update.
Concrete operational example
Example: Monday planning previously required ad-hoc synthesis across notes and chats.
Baseline turnaround from raw notes to decision-ready plan was ~24h.
With the writing-agent loop:
- Input: meeting notes + TODO fragments + blockers.
- Output 1: decision memo (options, recommendation, risks).
- Output 2: execution brief (owner, due window, acceptance criteria).
- Output 3: risk register delta and tomorrow action.
Target turnaround becomes <8h for a complete packet, with evidence rows required before ship.
5-week experiment + success/failure criteria
Plan: week 1 baseline, week 2 decision artifacts, week 3 operational handoffs, week 4 cadence increase, week 5 consolidate and go/no-go.
Success (weekly rolling window):
- Evidence completeness for major claims ≥
95%(baseline: inconsistent pre-v2). - Median decision latency reduced by
30%+vs week-1 baseline. - Major correction/rework rate reduced by
40%+vs baseline. - Outputs with explicit owner + due-time next action ≥
80%.
Failure is any of:
- Evidence completeness drops below
80%in any week. - Throughput increases while rework does not improve.
- Outputs repeatedly miss owners, thresholds, or timing.
- Operators bypass the artifacts for
2 consecutive weeks(pause rollout).
Claim–Evidence–Baseline (major claims)
| claim | evidenceLocation | baselineValue |
|---|---|---|
| Multi-step writing-agent workflows improve reliability versus one-shot generation. | https://www.anthropic.com/research/building-effective-agents (workflow pattern framing) + local planner/writer/reviewer pipeline. | Baseline workflow emphasized post output count over loop closure quality. |
| Explicit control mechanisms improve writing system safety and consistency. | https://martinfowler.com/articles/feature-toggles.html (control/lifecycle framing) + docs/artifact-schema.md gate. | Baseline checks were present but less consistently enforced before v2 hard gate. |
| Writing cadence produces higher operational value when tied to action closure. | https://www.benkuhn.net/writing/ (writing-as-thinking) + docs/writing-agent-project.md metrics. | Baseline cadence primarily optimized for publishing, not decision packet throughput. |
Gate rule: missing claim/evidence/baseline rows on major claims => Do not ship.
Tomorrow’s action
Run one real decision through the full loop tomorrow morning: generate a decision memo, execution brief, and reflection note in one pass, then score evidence completeness and cycle time.
Sources
These are the primary references behind the claim/evidence/baseline table above.
Reviewer comments
v3 review artifacts: reviews/2026-03-01-v3-merged.md.