aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/design/2026-06-16-autonomous-batch-execution-spec.org329
-rw-r--r--docs/design/2026-06-29-green-baseline-proposal.org72
-rw-r--r--docs/design/2026-06-29-lint-org-structural-checkers-proposal.org55
-rw-r--r--docs/design/2026-06-29-todo-cleanup-aging-proposal.org64
-rw-r--r--docs/design/2026-06-30-daily-drivers-tailscale-correction.org9
-rw-r--r--docs/design/2026-07-02-auto-flush-mechanism-note.org20
-rw-r--r--docs/specs/2026-06-16-autonomous-batch-execution-spec.org393
-rw-r--r--docs/specs/2026-06-16-encourage-kb-contribution-spec.org (renamed from docs/design/2026-06-16-encourage-kb-contribution-spec.org)11
-rw-r--r--docs/specs/2026-07-01-docs-lifecycle-spec.org360
-rw-r--r--docs/specs/agent-knowledge-base-spec.org (renamed from docs/agent-knowledge-base-spec.org)14
-rw-r--r--docs/specs/inbox-workflow-consolidation-spec.org (renamed from docs/inbox-workflow-consolidation-spec.org)13
-rw-r--r--docs/specs/wrapup-routing-spec.org (renamed from docs/design/wrapup-routing-spec.org)13
12 files changed, 1013 insertions, 340 deletions
diff --git a/docs/design/2026-06-16-autonomous-batch-execution-spec.org b/docs/design/2026-06-16-autonomous-batch-execution-spec.org
deleted file mode 100644
index e2e0f90..0000000
--- a/docs/design/2026-06-16-autonomous-batch-execution-spec.org
+++ /dev/null
@@ -1,329 +0,0 @@
-#+TITLE: Autonomous-Batch Task Execution — Spec
-#+AUTHOR: Craig Jennings & Claude
-#+DATE: 2026-06-16
-#+TODO: TODO | DONE SUPERSEDED CANCELLED
-
-* Metadata
-| Status | draft |
-|----------+--------------------------------------------------------------------|
-| Owner | Craig Jennings |
-|----------+--------------------------------------------------------------------|
-| Reviewer | Craig Jennings |
-|----------+--------------------------------------------------------------------|
-| Date | 2026-06-16 |
-|----------+--------------------------------------------------------------------|
-| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:2026-06-15-fix-speedrun-workflow-proposal.org][fix-speedrun proposal]] |
-|----------+--------------------------------------------------------------------|
-
-* Summary
-
-Two proposals arrived within a day of each other describing the same capability: have Claude work a batch of small, well-marked tasks autonomously, with a full quality bar per task and no per-step approval gate. The inbox-zero "Phase E" proposal drives it from a tag/priority query on a recurring loop; the "fix speedrun" proposal drives it from an explicit ordered list a human dictates in-session. This spec reconciles both into one feature: a single dedicated workflow, =work-the-backlog.org=, that holds the task-execution logic, with two thin callers feeding it. It also designs the instrumentation that measures whether the autonomy is actually paying off.
-
-* Problem / Context
-
-Craig has a standing backlog of small, solo-doable fixes across several projects, already marked with a tag convention (=:next:=, =:quick:+:solo:=). Doing them by hand one at a time is the bottleneck — each is 30 minutes or less, but the context-switch and the per-commit approval ceremony dominate the actual work. He wants Claude to burn these down unattended: on a recurring loop for the routed inbox case, and on demand when he batches a named list and says "fix speedrun, no approvals until done."
-
-Two separate proposals tried to answer this:
-
-- *Phase E* (in =inbox-zero.org=, edited in =.emacs.d= as a stopgap) bolted autonomous execution onto the inbox-zero workflow's on-demand and loop callers. The sender flagged the seam as the open question: coupling capture-routing with autonomous-implementation pollutes inbox-zero's three existing callers (startup, wrap-up, on-demand), two of which must never execute anything.
-- *fix speedrun* (a =.emacs.d= theme-studio session that worked well) is the same execution loop driven by an explicit ordered task set, with end-of-set paging and always-push.
-
-They overlap almost entirely. The execution loop — eligibility gate, act-vs-file decision, per-task quality bar, bounded run — is identical. Only the *input* differs (tag query vs explicit list) and the *session mode* differs (loop default vs no-approvals + always-push + page). Building them as two features would duplicate the execution logic and let the two copies drift. The forces: keep inbox-zero's callers clean, share one execution loop, and make the autonomy safe enough to run unattended on a 30-minute timer without Craig watching.
-
-A second, explicit ask from Craig: instrument this so its effectiveness is measurable. "Gather data on this and create some org-roam articles we can look at later." Autonomous execution that silently makes bad commits is worse than no autonomy; the only way to know which it is, is to measure tasks completed vs deferred vs reverted, and human corrections in the following session, over time.
-
-* Goals and Non-Goals
-
-** Goals
-- One workflow, =work-the-backlog.org=, owns the task-execution loop. Both input shapes (tag query, explicit list) and both session modes feed it.
-- inbox-zero's three existing callers stay clean: the loop caller chains into =work-the-backlog= *after* routing; startup and wrap-up never touch it.
-- "fix speedrun" is a thin named preset, not a second implementation: no-approvals session mode + always-push + end-of-set page, feeding an explicit ordered list.
-- Commit autonomy defaults to file-only (surface a diff, no auto-commit). A project opts into autonomous commit+push explicitly via its per-project waiver.
-- Hard guardrails: refuse to speedrun any task needing a design decision or carrying data-loss risk without a checkpoint; file a =VERIFY= and move on rather than guess-implement an underspecified task; a per-run cap / kill switch beyond "one task per run."
-- A lightweight per-run metrics log plus a periodic synthesis step that writes org-roam KB articles summarizing the trend.
-
-** Non-Goals
-- *Not* a replacement for =/start-work=. Tasks needing deliberation, design, or an hour-plus stay with =/start-work= and its approval gates. This feature only touches the small, marked, solo set.
-- *Not* a new tag convention. It reads the project's own priority/tag scheme header; it never invents or hardcodes tags across projects.
-- *Not* an inbox-routing change. =inbox-zero.org= keeps its A-D phases. The Phase E text added in =.emacs.d= as a stopgap is *removed* and its logic moves here.
-- *Not* a multi-project orchestrator. One run works one project's backlog. Cross-project handoff stays with =inbox-send= and the paging reply.
-- *Not* a credential-handling or external-API feature. Tasks that touch secrets or external mutations are out of the eligible set by the guardrail.
-
-** Scope tiers
-- *v1:* =work-the-backlog.org=; the eligibility gate reading the project's scheme header; the act-vs-file decision with VERIFY-on-ambiguity; file-only commit default with per-project opt-in; the loop caller wiring and inbox-zero Phase E removal; the "fix speedrun" preset with end-of-set =notify --persist= page; the per-run metrics log (structured JSONL).
-- *Out of scope:* a token-budget kill switch (cap is a task count in v1); cross-project batch runs; a dashboard or live UI over the metrics.
-- *vNext (log to todo.org):* the periodic org-roam synthesis step if it doesn't make v1; a token/cost budget alongside the task-count cap; auto-detection of "human corrected my autonomous commit" from the next session's diff.
-
-* Design
-
-** Overview
-
-The architecture is one execution workflow with two callers and one preset, plus an instrumentation sidecar.
-
-#+begin_example
- inbox-zero loop caller ──(after Phase D routing)──┐
- ├──▶ work-the-backlog.org ──▶ metrics log (JSONL)
- "fix speedrun" preset ──(explicit ordered list)───┘ │
- = no-approvals + always-push + end-page ▼
- periodic synthesis ──▶ org-roam KB articles
-#+end_example
-
-=work-the-backlog.org= is the only place the execution loop lives. It takes a *task set* (however assembled) and a *session mode* (which gates commit autonomy and paging), and works the set under a fixed safety contract. The two callers differ only in how they build the task set and which session mode they pass.
-
-This is the seam the Phase E sender asked for: separating capture-routing (inbox-zero) from autonomous-implementation (work-the-backlog) keeps inbox-zero's startup and wrap-up callers — which must never execute anything — untouched. The loop caller is the only one of inbox-zero's callers that chains forward into execution, and it does so as an explicit second step after routing completes, not as a phase buried inside inbox-zero.
-
-** The execution loop (two-altitude: caller's view)
-
-A caller hands =work-the-backlog= three things:
-
-1. *A task set* — either an explicit ordered list of task headings (fix speedrun), or the result of a tag/priority query against =todo.org= (the loop). The workflow does not care which; it receives an ordered list of candidate tasks.
-2. *A session mode* — =file-only= (default) or =autonomous-commit= (requires the project's per-project waiver), and a paging flag.
-3. *A run cap* — the maximum number of tasks to complete this run.
-
-It returns: per-task outcome (implemented+committed / implemented+diff-surfaced / deferred-VERIFY / deferred-too-large / skipped-ineligible), and a metrics record per task.
-
-** The execution loop (implementer's view)
-
-For the task set, in order, until the run cap is hit:
-
-1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task.
-2. *Scope read* of the relevant code. Cheap; just enough to make the act-vs-file call.
-3. *Act-vs-file decision* (below). File → record the deferral reason, next task.
-4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important, then close the task per =todo-format.md=.
-5. *Commit autonomy branch:*
- - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=.
- - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=.
-6. *Record metrics* for the task (the JSONL append, below).
-7. Decrement the cap. At zero, stop.
-
-After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary.
-
-** Eligibility gate
-
-A task is autonomous-safe when *all* hold:
-
-1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= is the "awaiting Craig's manual confirmation" marker; auto-implementing one defeats the manual check it represents.
-2. *Tagged per the project's autonomous-safe set* — resolved by reading the project's priority/tag scheme header at the top of its =todo.org=, not by hardcoding. The default reading is =:next:= OR both =:quick:= AND =:solo:=, but a project whose scheme declares a different autonomous-safe tag set overrides that.
-3. *Solo-doable* — no input or undecided judgment call from Craig.
-4. *Roughly 30 minutes or less* of focused work.
-
-** Act-vs-file decision (the guardrail)
-
-After the scope read, for each eligible candidate:
-
-- *Clear, bounded, solo, ≤ ~30 min* → implement.
-- *Needs a design decision, Craig's input, or discussion* → do NOT implement. File a one-line note on the task naming the input it needs; surface it.
-- *Carries data-loss risk without a checkpoint* (deletes data, rewrites persisted state, touches external/shared state irreversibly) → do NOT implement. File a =VERIFY= explaining the risk; surface it.
-- *Underspecified or already-satisfied* → do NOT guess-implement. File a =VERIFY= noting why (the fix-speedrun "raise max spans to 5 — every cap was already 8" case) and move on.
-- *An hour or more* → do NOT implement. File and surface as a =/start-work= candidate.
-
-When unsure which side a task falls on, file rather than implement. A wrong auto-implement costs more than a deferred task — it costs a revert *and* the human correction in the next session that the metrics are designed to catch.
-
-** Session modes and the "fix speedrun" preset
-
-Two orthogonal session-mode dimensions feed the loop:
-
-- *Commit autonomy:* =file-only= (default) or =autonomous-commit=. =autonomous-commit= is honored only when the project carries the per-project waiver (=.emacs.d= and =rulesets= have it; most projects do not). Absent the waiver, a request for =autonomous-commit= degrades to =file-only= and says so.
-- *Paging:* on or off. End-of-set only.
-
-"fix speedrun" is the named preset = =autonomous-commit= + always-push + paging-on, fed an *explicit ordered list*. It is not a separate code path; it is a label for that combination of mode flags plus the explicit-list input. The loop caller, by contrast, runs =file-only= (unless the project has the waiver and opts the loop into commits) with paging off, fed the *tag query*.
-
-** Bounding the run and the kill switch
-
-Default cap: one task per run for the loop caller — implement the highest-priority eligible candidate (=[#A]= before =[#B]= before =[#C]=), record, then stop and let the next tick continue. The fix-speedrun preset works the whole explicit list in order (the human bounded it by naming it), still one commit per task.
-
-The kill switch is a hard per-run task cap passed by the caller, independent of "one per run": even fix-speedrun stops at the cap and pages with the remainder listed. A loop that fires every 30 minutes and commits unattended needs a ceiling that a runaway can't exceed.
-
-** End-of-set paging
-
-When the set is done (or the cap is hit), if paging is on, fire one page — end-of-set only, never per-task:
-
-#+begin_src sh
-notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>" --persist
-#+end_src
-
-=--persist= keeps it on screen until dismissed (the page-me convention). The message carries the project name, the completed count, and the remaining count, so Craig can reply confirming ready + naming the next project in one turn. The page-signal wrapper removed 2026-06-12 is reconciled to =notify= here — there is no separate page-signal call.
-
-* Alternatives Considered
-
-** Fold execution into inbox-zero (the Phase E stopgap shape)
-- Good, because it's the smallest diff — the loop caller already runs inbox-zero, so execution is "one more phase."
-- Bad, because it couples capture-routing with implementation. inbox-zero has three callers; startup and wrap-up must never execute. A Phase E inside inbox-zero forces both to carry a "skip Phase E" caveat and risks a future caller running it by accident.
-- Neutral, because the eligibility-gate and act-vs-file text is identical either way — only its *home* differs.
-
-** Two separate features (keep Phase E and fix-speedrun distinct)
-- Good, because each proposal ships as written with no reconciliation work.
-- Bad, because the execution loop is duplicated in two places and will drift; a guardrail tightened in one won't reach the other. Two ways to do autonomous execution is two things to audit.
-- Neutral, because the input and session-mode differences are real — but they're thin caller-level differences, not a reason to fork the engine.
-
-** Autonomous-commit as the default
-- Good, because it's faster end-to-end with no diff to review.
-- Bad, because most projects lack the per-project waiver, and an unattended loop committing to a project that never opted in is exactly the failure the file-only default prevents. The blast radius of a bad autonomous commit is a revert plus lost trust in the loop.
-- Neutral, because the projects that *do* want it (=.emacs.d=, =rulesets=) opt in explicitly, so the capability is available where it's wanted without being the default everywhere.
-
-* Decisions [/]
-
-** TODO Where the eligibility gate reads its tag set
-- Owner / by-when: Craig / spec-review
-- Context: Phase E hardcoded =:next:= / =:quick:+:solo:=. Projects' priority/tag schemes vary, and the =todo-format.md= scheme header is the declared source of truth per project.
-- Decision: We will read the project's =todo.org= priority/tag scheme header to resolve the autonomous-safe tag set, defaulting to =:next:= OR =:quick:+:solo:= when the header doesn't declare an explicit autonomous-safe set.
-- Consequences: easier — one workflow works correctly across projects with different tag vocabularies; harder — a project with no scheme header (or a malformed one) needs a fallback, and the "default reading" has to be specified precisely enough that two projects agree on it.
-
-** TODO The do-not-auto-implement marker set
-- Owner / by-when: Craig / spec-review
-- Context: =VERIFY= means "awaiting Craig's manual confirmation" in =.emacs.d= and =rulesets=. Other projects may use =VERIFY= differently or not at all. The gate excludes =VERIFY=, =DOING=, =DONE=, =CANCELLED= by status, but the *marker semantics* are what matter.
-- Decision: We will define the do-not-auto-implement set as: any status that is not =TODO=, plus any task carrying a project-declared "hold" marker. The canonical default treats =VERIFY= as do-not-implement; a project overrides only by declaring its marker semantics in its scheme header.
-- Consequences: easier — the gate is portable and a project can't accidentally have its manual-check tasks auto-run; harder — requires the scheme header to carry marker semantics, which most don't yet, so the default has to be safe-by-omission (exclude anything not plainly =TODO=).
-
-** TODO Commit-autonomy opt-in mechanism
-- Owner / by-when: Craig / spec-review
-- Context: =file-only= is the default; =.emacs.d= and =rulesets= have a per-project waiver allowing autonomous commits. Where does the workflow *read* that a project has opted in?
-- Decision: We will read the opt-in from the project's existing per-project waiver location (the same place the commit discipline's "no approval gate" waiver lives — =notes.org= Workflow State or =CLAUDE.md=), not introduce a new config file.
-- Consequences: easier — no new config surface, reuses the existing waiver concept; harder — the waiver's exact location and format must be pinned so the workflow can detect it deterministically, and a project with the commit waiver but *not* wanting the loop to commit needs a way to say "waiver yes, loop-commit no" (two flags, not one).
-
-** TODO Run-cap default and the kill switch shape
-- Owner / by-when: Craig / spec-review
-- Context: The loop default is one task per run; fix-speedrun works an explicit list. Both need a hard ceiling a runaway can't exceed.
-- Decision: We will pass a hard per-run task cap from the caller (loop default 1; fix-speedrun = length of the explicit list, capped at a ceiling), and stop + page with the remainder when the cap is hit. v1 caps by task count, not token budget.
-- Consequences: easier — a simple integer the caller controls; bounded blast radius; harder — a task-count cap doesn't bound *cost* (one 30-min task can burn many tokens), so a token budget is vNext, and until then a pathological task can run long within a single cap slot.
-
-** TODO Metrics log location and format
-- Owner / by-when: Craig / spec-review
-- Context: Per-run metrics must land somewhere structured and queryable, per-project, and survive across sessions for the synthesis step to read.
-- Decision: We will append one JSONL record per task to a per-project log at =.ai/metrics/work-the-backlog.jsonl=, git-tracked, with the synthesis step reading the union across projects.
-- Consequences: easier — append-only JSONL is trivial to write and =jq=-queryable; per-project keeps it local to where the work happened; harder — a git-tracked log adds churn to every autonomous run's commit (or needs its own commit), and "union across projects" needs the synthesis step to know where every project's log lives.
-
-** TODO Synthesis cadence and trigger
-- Owner / by-when: Craig / spec-review
-- Context: Craig wants periodic org-roam articles summarizing the data. What triggers synthesis, and how often?
-- Decision: We will run synthesis on an explicit trigger ("synthesize backlog metrics") and optionally a weekly scheduled run, writing one KB node per synthesis under =~/org/roam/agents/= per the knowledge-base rule.
-- Consequences: easier — explicit trigger means no surprise writes, and the KB rule already governs node shape; harder — a weekly scheduled run needs a scheduler entry and the KB write-classification (personal-only) must gate it so work-project metrics never land in the KB.
-
-* Implementation phases
-
-** Phase 1 — Extract the execution loop into work-the-backlog.org
-Write =work-the-backlog.org= holding the eligibility gate, act-vs-file decision, per-task quality bar, and run-cap logic — taking a task set + session mode + cap as input. Remove the stopgap "Phase E" text from =inbox-zero.org= (restore it to its A-D shape) in the same change so there's one home, not two. Tree stays working: inbox-zero reverts to routing-only, and the new workflow is callable but not yet wired to the loop.
-
-** Phase 2 — Wire the two callers
-Add the loop caller's chain step (after inbox-zero Phase D, invoke work-the-backlog with the tag query + file-only + cap 1) and the "fix speedrun" preset (explicit list + autonomous-commit + always-push + paging-on). Both go through the same workflow. Tree stays working: each caller is independently testable.
-
-** Phase 3 — File-only vs autonomous-commit gate
-Implement the commit-autonomy branch: read the per-project waiver, degrade =autonomous-commit= to =file-only= when absent, surface the degrade. Tree stays working: default file-only behavior is the safe path even before the waiver-read lands.
-
-** Phase 4 — Guardrails and the page
-Implement the data-loss / design-decision refusal, the VERIFY-on-ambiguity filing, and the end-of-set =notify alarm ... --persist= page. Tree stays working: guardrails only ever *reduce* what runs, so adding them can't break a passing run.
-
-** Phase 5 — Metrics log
-Append the per-task JSONL record at each task outcome. Tree stays working: logging is a side effect that doesn't alter execution.
-
-** Phase 6 — Synthesis to org-roam
-Write the synthesis step: read the JSONL union, compute the per-run and trend metrics (below), write a KB node under =~/org/roam/agents/= per the knowledge-base rule, personal-projects-only classification enforced. Tree stays working: synthesis is read-only over the logs plus a KB write.
-
-* Acceptance criteria
-- [ ] =work-the-backlog.org= exists and is the only home for the execution loop; =inbox-zero.org= is back to its A-D routing-only shape with no Phase E.
-- [ ] The loop caller chains into work-the-backlog after routing; startup and wrap-up never invoke it.
-- [ ] "fix speedrun" runs as the preset (autonomous-commit + always-push + end-page) over an explicit ordered list, one commit per task.
-- [ ] A task tagged for autonomous execution but at status =VERIFY= / =DOING= / =DONE= / =CANCELLED= is skipped by the gate.
-- [ ] The eligibility tag set is read from the project's =todo.org= scheme header, not hardcoded.
-- [ ] In a project without the commit waiver, an =autonomous-commit= request degrades to file-only and says so; no commit is made.
-- [ ] A task carrying data-loss risk or needing a design decision is refused with a filed VERIFY, not implemented.
-- [ ] An underspecified / already-satisfied task files a VERIFY noting why and the run continues.
-- [ ] The run stops at the per-run cap and pages with the remaining tasks listed.
-- [ ] Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl=.
-- [ ] The synthesis step reads the logs and writes a KB node under =~/org/roam/agents/=; it refuses to write for work-classified projects.
-
-* Effectiveness measurement
-
-This section answers Craig's explicit ask: measure whether autonomous-batch execution is actually effective, and build the "gather data → org-roam articles" loop.
-
-** What "effective" means here
-
-The autonomy is effective if it completes real work that *stays* completed — i.e. tasks land green and the next session doesn't have to undo or fix them. The two failure modes to catch are (1) the loop defers everything (over-cautious, no value delivered) and (2) the loop implements badly (commits that get reverted or hand-corrected next session). Both are measurable.
-
-** Per-run metrics (the JSONL record)
-
-One record per task, appended to =.ai/metrics/work-the-backlog.jsonl= at each task outcome:
-
-| Field | Meaning |
-|-------------------+--------------------------------------------------------------------|
-| =ts= | ISO timestamp of the task outcome |
-|-------------------+--------------------------------------------------------------------|
-| =run_id= | UUID shared by all tasks in one run |
-|-------------------+--------------------------------------------------------------------|
-| =project= | project basename |
-|-------------------+--------------------------------------------------------------------|
-| =caller= | =loop= or =fix-speedrun= |
-|-------------------+--------------------------------------------------------------------|
-| =task= | task heading (slug) |
-|-------------------+--------------------------------------------------------------------|
-| =outcome= | implemented-committed / implemented-diff / deferred-verify / |
-| | deferred-too-large / skipped-ineligible |
-|-------------------+--------------------------------------------------------------------|
-| =defer_reason= | for deferrals: needs-input / data-loss / underspecified / too-large |
-|-------------------+--------------------------------------------------------------------|
-| =wall_clock_s= | seconds from task start to outcome |
-|-------------------+--------------------------------------------------------------------|
-| =commit_sha= | for committed tasks; empty otherwise |
-|-------------------+--------------------------------------------------------------------|
-| =review_findings= | count of /review-code Critical+Important findings on this task |
-|-------------------+--------------------------------------------------------------------|
-
-Per-run rollups computed at synthesis (not stored per record): tasks attempted, completed, VERIFY-deferred, reverted; wall-clock total; commits landed; review findings per commit.
-
-** The corrections signal (the key metric)
-
-The hardest and most valuable metric is *human corrections in the following session* — did Craig revert or hand-fix an autonomous commit? v1 captures the cheap proxy: at synthesis, for each =commit_sha=, check whether a later commit touching the same files reverted it or carries a "fix"/"revert" of that change within N days. A clean run is one where the autonomous commits survive untouched. (Auto-detecting "this later commit corrected that autonomous one" precisely is a vNext refinement; the proxy — reverted-or-touched-soon-after — is good enough to flag a problem run for human review.)
-
-** Where the data lands
-
-Per-project git-tracked JSONL at =.ai/metrics/work-the-backlog.jsonl=. Append-only, =jq=-queryable, survives across sessions and machines via the normal project sync. Git-tracked so the history is auditable and the synthesis step can read it from any clone.
-
-** The synthesis loop (gather → article)
-
-On the "synthesize backlog metrics" trigger (and optionally a weekly scheduled run):
-
-1. Read the JSONL union across the personal projects the synthesizer can see.
-2. Compute the rollups and the trend: completion rate over time, defer-reason distribution, review-findings-per-commit trend, and the corrections-signal flag count.
-3. Write one org-roam KB node under =~/org/roam/agents/YYYYMMDDHHMMSS-backlog-metrics-<window>.org= per the knowledge-base rule — filetags =:agent:metrics:=, a concise title, the rollup table, the trend narrative, and =[[id:...]]= links to prior synthesis nodes so the series is traceable.
-4. Enforce the KB write-classification: *personal projects only*. A work-classified project's metrics never write to the KB — they stay in that project's own =.ai/metrics/= log and the synthesizer reports the refusal per the KB refusal contract.
-
-The KB node is the artifact Craig reviews later — "are the autonomous runs completing more and getting corrected less over the last month?" reads off the trend table without re-querying raw logs.
-
-* Readiness dimensions
-
-- *Data model & ownership:* The task set is read from =todo.org= (project-owned, user-authored). The metrics JSONL is generated, append-only, git-tracked, project-owned. KB nodes are agent-generated under =~/org/roam/agents/= (never overwriting Craig's hand-authored nodes — link only). No editable region is co-owned.
-- *Errors, empty states & failure:* Empty task set → report "nothing eligible" and stop. Malformed scheme header → fall back to the default tag reading and surface the fallback. A task that fails mid-implementation → leave the tree working (don't commit a broken state), record the failure outcome, surface it, continue to the next task. No silent data loss: the data-loss guardrail refuses irreversible tasks outright.
-- *Security & privacy:* Tasks touching credentials or external mutations are excluded by the data-loss / external-state guardrail. The KB write is personal-projects-only; work metrics never leave the project. No secrets in the JSONL (task slugs and SHAs only).
-- *Observability:* The end-of-set page surfaces the run outcome. The per-task surface (implemented / deferred + reason / skipped) is the live progress view. The metrics log + KB synthesis is the long-run observability. A bad run is isolable from the JSONL (which task, which outcome, which review findings).
-- *Performance & scale:* Expected counts are small — a handful of tasks per run, one run per 30-min tick. No bottleneck at this scale. The cap bounds the worst case. Synthesis over months of JSONL is still a small file (one record per task).
-- *Reuse & lost opportunities:* Reuses =todo-format.md= for task close, =/review-code= and =/voice personal= for the quality bar, =notify= for paging, the knowledge-base rule for KB writes, the per-project waiver for commit-autonomy. No new config file (the opt-in rides the existing waiver). The execution loop is the one new shared asset.
-- *Architecture fit & weak points:* Integration points — inbox-zero loop caller (chain after Phase D), the per-project waiver location, =todo.org= scheme header, =~/org/roam/agents/=. Weak point: the commit-autonomy gate depends on deterministically reading the waiver; mitigated by defaulting to file-only when the read is ambiguous (fail safe, not open). Second weak point: a 30-min loop committing unattended; mitigated by the hard cap and file-only default.
-- *Config surface:* Per-project — commit-autonomy opt-in (via existing waiver), optional loop-commit flag, optional autonomous-safe tag override in the scheme header. Per-call — task set, session mode, run cap. Defaults: file-only, paging-off (loop) / paging-on (fix-speedrun), cap 1 (loop).
-- *Documentation plan:* The workflow file itself is the user/operator doc (matches inbox-zero.org's self-documenting style). The =.emacs.d= stopgap note and the fix-speedrun proposal are superseded by this spec; no separate migration doc needed beyond removing the Phase E text.
-- *Dev tooling:* N/A for new build targets — the workflows are prose, exercised by invocation. The metrics JSONL is =jq=-inspectable by hand; a tiny rollup helper may be added under =.ai/scripts/= if the synthesis prose proves to need it (decided at Phase 6, not a v1 prerequisite).
-- *Rollout, compatibility & rollback:* Rollout is removing Phase E from inbox-zero and adding work-the-backlog — both prose changes, instantly reversible. Compatibility: inbox-zero's three callers are unchanged except the loop caller gaining a forward chain. Rollback: delete work-the-backlog and the loop chain step; inbox-zero is already back to A-D. The file-only default means the worst pre-rollback state is surfaced diffs, not committed changes.
-- *External APIs & deps:* =notify alarm "Page" "<msg>" --persist= verified against =/home/cjennings/.local/bin/notify= and the page-me workflow. =~/org/roam/= KB write path and node shape verified against the knowledge-base rule. No external API calls.
-
-* Risks, Rabbit Holes, and Drawbacks
-
-- *The corrections signal is a proxy, not ground truth.* "A later commit touched the same files" over-counts (legitimate follow-up work) and under-counts (a correction in a different file). It's a flag for human review, not a verdict. Don't rabbit-hole on making it precise in v1 — the proxy plus a human glance is the design.
-- *Waiver detection drift.* If the per-project waiver location moves or its format changes, the commit-autonomy gate could mis-read. Mitigation: fail safe to file-only. Pin the waiver format in the Phase 3 decision before building.
-- *Unattended-commit blast radius.* The headline risk. Mitigated three ways: file-only default, the hard cap, and the data-loss guardrail. The metrics loop is the fourth layer — it makes a bad run visible after the fact even if the first three let something through.
-- *Scope creep into /start-work territory.* The temptation to let "≤ 30 min" stretch. The act-vs-file gate and the "when unsure, file" rule are the brake; keep them strict.
-
-* Testing / Verification / Rollout
-
-Verification is by invocation against a project's real =todo.org=: run the loop caller in file-only mode and confirm it surfaces diffs without committing; run fix-speedrun against a small explicit list in a waiver-carrying project and confirm one commit per task + the end page; plant a =VERIFY=-status task and a data-loss task and confirm both are skipped/refused; confirm the JSONL grows one record per task; run synthesis and confirm a KB node lands (personal project) or is refused (work project). Rollout is the Phase 1-6 sequence, each leaving the tree working; the file-only default makes early phases safe to ship before the commit and paging phases land.
-
-* References / Appendix
-
-- [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal (inbox-zero stopgap)]] and [[file:../../working/inbox-zero-phase-e/sender-note.org][its sender note with the 5 open questions]].
-- [[file:2026-06-15-fix-speedrun-workflow-proposal.org][fix-speedrun proposal]].
-- [[file:../../.ai/workflows/inbox-zero.org][inbox-zero.org (canonical, A-D)]] — the routing workflow this feature decouples from.
-- =~/code/rulesets/claude-rules/knowledge-base.md= — the org-roam write contract the synthesis step follows.
-
-* Review and iteration history
-** 2026-06-16 Tue — author
-- What: initial draft reconciling the Phase E and fix-speedrun proposals into one work-the-backlog.org feature, plus the effectiveness-measurement instrumentation.
-- Why: two overlapping proposals arrived within a day; building them separately would duplicate the execution loop and let it drift. Craig also asked explicitly for measurement + org-roam synthesis.
-- Artifacts: this spec; the two source proposals under docs/design/ and working/inbox-zero-phase-e/.
diff --git a/docs/design/2026-06-29-green-baseline-proposal.org b/docs/design/2026-06-29-green-baseline-proposal.org
new file mode 100644
index 0000000..47de18d
--- /dev/null
+++ b/docs/design/2026-06-29-green-baseline-proposal.org
@@ -0,0 +1,72 @@
+#+TITLE: Proposal: ensure a green test run before starting work
+#+AUTHOR: Craig Jennings (via .emacs.d session)
+#+DATE: 2026-06-29
+
+* Why
+
+While running a multi-task refactor speedrun in =.emacs.d=, the very first full
+=make test= surfaced a failing test (=test-system-cmd-restart-emacs-no-service-aborts=)
+that turned out to be *pre-existing* -- it failed on clean HEAD, unrelated to
+the work. It had nothing to do with the task; it just happened to fail on this
+machine (a native-comp mock that bypasses =symbol-function= redefinition, real
+check passing because the box has =emacs.service=).
+
+Two costs landed because the red was already there when work began:
+
+1. Every later "did I break this?" suite run carried a known failure, so the
+ green bar became "only that one fails" instead of a clean pass I could read
+ at a glance. Easy to let a *new* regression hide behind the familiar red.
+2. The work assumed the tree was in a known-good state. It wasn't, and nothing
+ in the workflow forced that assumption to be checked first.
+
+The fix is cheap and general: run the suite *before* starting work, and clear
+(or explicitly triage) any failure before the work begins. A green start
+confirms we're in the known-good place we think we are, and any issue is fixed
+before it can be confused with our own changes.
+
+* Proposed change 1 -- claude-rules/verification.md
+
+Add a section (suggested placement: right after =## The Rule=, before
+=## What Fresh Means=). Proposed text, ready to paste:
+
+#+begin_example
+## Green Baseline Before Starting Work
+
+Run the test suite before you start work, not only before you finish. A clean run at the start confirms the tree is in the known-good state you assume it is, so the baseline you build on and measure your changes against is actually green.
+
+If the suite is red before you touch anything, fix or explicitly triage the failure first. A pre-existing failure left in place poisons every later "did I break this?" check: you can't separate your own regressions from the noise, and the end-of-work run stops being readable as pass/fail at a glance. Work that assumes a known-good base may also be built on a broken assumption you never saw.
+
+When a pre-existing failure genuinely can't be fixed before the work begins (out of scope, or it needs a decision), record it as a tracked task with the diagnosis and carry its name forward. The green bar for the rest of the work is then explicitly "only this known failure remains," not a silent tolerance for red.
+
+This is the start-of-work counterpart to the Before Committing gate below: one confirms the ground is solid before you build, the other confirms you didn't crack it.
+#+end_example
+
+* Proposed change 2 -- start-work skill, Pre-work phase
+
+The start-work skill already has a Pre-work phase (eligibility, fetch-and-reconcile
+against base, source-code check that the problem still exists). Add a green-baseline
+step to that phase:
+
+- Run the project's test suite before claiming the work.
+- If it's fully green, proceed.
+- If it's red, fix the failure first, or (when out of scope / needs a decision)
+ file a tracked task with the diagnosis and carry its name forward as the only
+ tolerated failure for this work.
+- Surface the baseline result so "we started from green" is on the record.
+
+This makes the verification.md principle operational at the exact moment it
+matters -- the start of a task -- the same way the Verify phase and the
+Review-and-Publish flow operationalize the end-of-work gates.
+
+* Note on why this came as a proposal, not a direct edit
+
+=.emacs.d='s =.claude/rules/*.md= are symlinks into =~/code/rulesets/claude-rules/=,
+so editing =verification.md= from the downstream session would modify the
+rulesets canonical directly. Per the cross-project rule, downstream sessions
+send rulesets the proposed change rather than editing its canonical in place.
+Hence this note instead of a local stopgap edit. The start-work skill isn't
+installed on this machine to edit anyway.
+
+The test that surfaced all this is already fixed in =.emacs.d= (commit on main:
+mock =executable-find= at the boundary instead of the helper). The durable
+process change is the part that belongs here.
diff --git a/docs/design/2026-06-29-lint-org-structural-checkers-proposal.org b/docs/design/2026-06-29-lint-org-structural-checkers-proposal.org
new file mode 100644
index 0000000..c464aca
--- /dev/null
+++ b/docs/design/2026-06-29-lint-org-structural-checkers-proposal.org
@@ -0,0 +1,55 @@
+#+TITLE: lint-org.el — four structural heading checkers org-lint doesn't cover
+
+* What changed (from .emacs.d, 2026-06-29)
+
+Added four custom judgment checkers to =lint-org.el=, following the existing
+=lo--check-tables= / =lo--check-level2-dated-headers= pattern (custom scans run
+after the org-lint pass, emitting judgment items, never auto-fixed):
+
+- =indented-heading= — a line of whitespace + stars + space OUTSIDE any block.
+ org parses a heading only at column 0, so leading whitespace silently demotes
+ a would-be heading to body text: the task vanishes from the agenda and never
+ archives. The worst defect class (an invisible task) and entirely silent
+ today. Skips indented stars inside =#+begin_/#+end_= blocks (legit content).
+- =empty-heading= — a line of bare stars with no title.
+- =malformed-priority-cookie= — a =[#x]=-shaped token org rejected (lowercase,
+ multi-char, non-letter) left stranded where a cookie would be. Checks only the
+ first cookie token per heading; skips verbatim-wrapped =[#D]= in dated-log
+ titles.
+- =level2-done-without-closed= — a level-2 DONE/CANCELLED with no CLOSED line.
+ Directly supports the todo-cleanup aging step (sent separately today): an
+ undated completed task gets force-archived immediately, so flagging it lets
+ the human add CLOSED first.
+
+Two attached files (edited canonical candidates): =lint-org.el=,
+=tests/test-lint-org.el=.
+
+* Why
+
+org-lint validates links, drawers, blocks, and babel — but NOT heading
+well-formedness. On Craig's .emacs.d todo.org a missing org-bullet in the live
+buffer prompted the question "is the file structurally okay?", and org-lint
+(even unfiltered, all checkers) reported nothing actionable. These four close
+the gap. They are general (any org file), not project-specific.
+
+* Design notes for the canonical
+
+- All four are regex-based, NOT org-element/keyword-based, so they don't depend
+ on which TODO keywords the batch Emacs happens to recognize (lint-org.el does
+ not set =org-todo-keywords=). The =level2-done-without-closed= done set is a
+ defconst =lo-done-keywords= (DONE/CANCELLED) for easy extension.
+- *Gotcha worth carrying in the canonical:* =case-fold-search= defaults to t, so
+ a naive =[A-Z]= cookie check accepts =[#a]= as valid and =\(DONE\|CANCELLED\)=
+ matches the title words "done"/"cancelled". Both letter-sensitive checkers
+ bind =case-fold-search nil=. (Caught by a failing test before it shipped.)
+- Wired into =lo-process-file= after =lo--check-level2-dated-headers=. Judgment
+ output already flows through the existing report + followups-file machinery.
+- 8 new ERT tests (good-input-silent + bad-input-flagged for each, plus
+ block-skip and verbatim-skip boundary cases). 44/44 green. Zero false
+ positives on a real 5600-line todo.org.
+
+* Note
+
+=make task-sorted= in .emacs.d now runs =lint-org.el todo.org= after the
+archive, so these checkers also gate the task-hygiene target. Makefiles aren't
+template-synced; that wiring is project-local (noted for context).
diff --git a/docs/design/2026-06-29-todo-cleanup-aging-proposal.org b/docs/design/2026-06-29-todo-cleanup-aging-proposal.org
new file mode 100644
index 0000000..5a18990
--- /dev/null
+++ b/docs/design/2026-06-29-todo-cleanup-aging-proposal.org
@@ -0,0 +1,64 @@
+#+TITLE: todo-cleanup.el — add Resolved-section file-aging to --archive-done
+
+* What changed (from .emacs.d, 2026-06-29)
+
+Extended =todo-cleanup.el='s =--archive-done= mode (the =make task-sorted=
+target) with a SECOND step, run after the existing Open Work -> Resolved move:
+
+- *Age the Resolved section.* Level-2 DONE/CANCELLED subtrees whose CLOSED date
+ is older than =tc-archive-retain-days= (default 7) — AND any with no parseable
+ CLOSED date — move out of the in-file Resolved section to =tc-archive-file=
+ (default =archive/task-archive.org= beside the todo file). Only tasks closed
+ within the last week stay in todo.org itself.
+
+Two files are attached (the edited canonical candidates):
+- =todo-cleanup.el=
+- =tests/test-todo-cleanup.el=
+
+* Why
+
+Craig's .emacs.d todo.org had grown to 768KB / 9616 lines, ~44% of it a
+243-task in-file "Resolved" section. The existing =--archive-done= only moved
+closures Open Work -> Resolved (same file), so the file grew without bound. The
+new step keeps only the last week of closed tasks in the file and sheds the rest
+to a git-tracked archive sibling. After this run: 207 aged out, todo.org
+9616 -> 5625 lines.
+
+* Design notes for the canonical
+
+- New defvars: =tc-archive-retain-days= (7; nil disables the step, preserving
+ legacy in-file-only behavior), =tc-archive-reference-date= ((YEAR MONTH DAY),
+ nil=real today — mockable for deterministic tests), =tc-archive-file= (nil =>
+ =archive/task-archive.org= beside the todo file).
+- Policy: KEEP iff CLOSED date present AND within the window (cutoff inclusive).
+ Older OR undated => archive. The undated->archive call is deliberate ("keep
+ the last week and that's it"); an earlier undated->keep version left 14 legacy
+ undated tasks behind and read as two weeks.
+- The aging step honors =--check= (previews + reports, writes nothing).
+- Report: an additive "N aged subtree(s) moved to task-archive.org" line, only
+ when N>0, so the existing real-mode-no-op silence tests are unaffected.
+- Archive file scaffold: =#+TITLE: Task Archive= / =#+FILETAGS: :archive:= /
+ =* Resolved (archived)=; aged subtrees append as level-2 children; created on
+ first use, appended to thereafter (one scaffold, never duplicated).
+- Tests: =tc-test--reset= now sets the aging knobs OFF (retain nil) so the
+ existing in-file-move + sync tests are untouched by the wall clock; a new
+ =tc-test--age= harness re-enables them with a fixed reference date and a temp
+ archive file. 6 new tests (old+undated move, cutoff-inclusive stay, disabled,
+ idempotent, check-no-write, straggler pipeline, append-preserves). 34/34 green.
+
+* Cross-project consideration for your value gate
+
+Default is ON (retain 7) for ALL consuming projects once this syncs. A project's
+first =task-sorted= after the sync will shed everything in its Resolved section
+older than a week to a new =archive/task-archive.org=. That's the intended
+feature, but flag it — projects with a large historical Resolved section will see
+a big first-run move (git-tracked, recoverable). Adjust the default or gate it if
+you'd rather it be opt-in per project.
+
+* Companion (project-local, NOT synced)
+
+.emacs.d's Makefile =task-sorted= target now also runs =lint-org.el todo.org=
+after the archive, as a structural-safety pass (org-lint catches links/drawers/
+blocks; we separately verified heading-level structure by hand). Makefiles aren't
+template-synced, so this is per-project — noting it in case the pattern is worth
+documenting alongside the tool.
diff --git a/docs/design/2026-06-30-daily-drivers-tailscale-correction.org b/docs/design/2026-06-30-daily-drivers-tailscale-correction.org
new file mode 100644
index 0000000..9e2bb52
--- /dev/null
+++ b/docs/design/2026-06-30-daily-drivers-tailscale-correction.org
@@ -0,0 +1,9 @@
+#+TITLE: Correction to claude-rules/daily-drivers.md: it states 'the
+#+SOURCE: from .emacs.d
+#+DATE: 2026-06-30 13:20:30 -0400
+
+Correction to claude-rules/daily-drivers.md: it states 'the agent can't reach the other machine; the point is to surface...'. That assumption is now demonstrably false. On 2026-06-30, from velox, I drove ratio directly over tailscale ssh — git fetch + reset --hard to repair ratio's .emacs.d after a history rewrite, plus scp'd a file across. Both daily drivers are on the same tailnet and reachable from each other.
+
+Gotcha worth capturing in the rule: the BARE hostname does not resolve (ssh ratio -> 'Could not resolve hostname'), which makes it look unreachable. The tailscale IP (e.g. 100.71.182.1) and the MagicDNS name (ratio.tailf3bb8c.ts.net) DO resolve and connect. First connection from a given address fails host-key verification under BatchMode; -o StrictHostKeyChecking=accept-new clears it. 'tailscale status' lists every node's IP + online state.
+
+Suggested rule change: reframe daily-drivers.md from 'can't reach, so surface it' to 'CAN reach over tailscale ssh — so the agent can directly sync/verify/repair the other daily driver, not just flag it'. Keep the flag-it guidance as the fallback for when tailscale is actually down. Add the bare-hostname-doesn't-resolve / use-tailscale-IP-or-MagicDNS gotcha. uname -n still tells you which machine you're on.
diff --git a/docs/design/2026-07-02-auto-flush-mechanism-note.org b/docs/design/2026-07-02-auto-flush-mechanism-note.org
new file mode 100644
index 0000000..fbe06ae
--- /dev/null
+++ b/docs/design/2026-07-02-auto-flush-mechanism-note.org
@@ -0,0 +1,20 @@
+#+TITLE: AUTO-FLUSH capability — proven live in the archsetup session
+#+SOURCE: from archsetup
+#+DATE: 2026-07-02 01:26:20 -0400
+
+AUTO-FLUSH capability — proven live in the archsetup session 2026-07-02, Craig asks that it be promoted to all projects and recommended as part of the no-approvals speedrun to keep sessions sharp.
+
+Problem: /clear is a user-only keystroke, so long autonomous sessions either bloat or hit arbitrary auto-compaction. Craig can't always be around to type it.
+
+Mechanism (companion script: self-inject.sh, sent separately to this inbox):
+1. At a clean task boundary, the agent refreshes .ai/session-context.org exactly as the flush skill does (checkpoint with Active Goal / Decisions / Next Steps).
+2. It derives its own tmux pane: match pane_pid from 'tmux list-panes -a' against its process ancestry (the ai launcher runs every agent session inside tmux, so this holds everywhere).
+3. It arms the injection VIA THE TMUX SERVER — tmux run-shell -b "sleep 25; tmux send-keys -t %N -l '/clear'; tmux send-keys -t %N Enter; sleep 15; tmux send-keys -t %N -l 'go — auto-flush resume: read .ai/session-context.org and continue per Next Steps'; tmux send-keys -t %N Enter" — and immediately ends its turn so the prompt is idle when the keys land.
+4. /clear fires the SessionStart hook (which already points a fresh context at notes.org + session-context.org), and the injected resume line starts the next turn. Zero human keystrokes.
+
+Gotchas learned the hard way:
+- A detached child (setsid/nohup/&) of a tool call DIES when the tool call ends; only tmux run-shell -b (server-owned) survives the turn boundary.
+- Under run-shell the process is a child of the tmux server, so ancestry-based pane detection can't run there — derive the pane first from the agent's shell, pass it explicitly.
+- Collision: if the user is typing when the keys fire, the injection merges into their input (a real /clear became '/clearto' mid-word). Fine for unattended sessions; warn the user to keep hands off the armed window if present.
+
+Suggested integration: an 'auto' mode on the flush skill (checkpoint, then self-inject instead of prompting the user), plus a line in the no-approvals speedrun workflow to auto-flush at clean boundaries when context grows heavy. The script could live in claude-templates' .ai/scripts/ so every project gets it on sync.
diff --git a/docs/specs/2026-06-16-autonomous-batch-execution-spec.org b/docs/specs/2026-06-16-autonomous-batch-execution-spec.org
new file mode 100644
index 0000000..23fc574
--- /dev/null
+++ b/docs/specs/2026-06-16-autonomous-batch-execution-spec.org
@@ -0,0 +1,393 @@
+#+TITLE: Autonomous-Batch Task Execution — Spec
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-06-16
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* IMPLEMENTED Autonomous-Batch Task Execution — Spec
+:PROPERTIES:
+:ID: 90f623cd-fdbe-4f5c-b63d-b2f84d9151cf
+:END:
+- 2026-07-02 Thu @ 05:26:07 -0400 — DOING → IMPLEMENTED: all six phases built (work-the-backlog.org, both callers, the waiver gate, checklist/Q&A/page mechanics, metrics record, KB synthesis) and the live trial validated — run c726f526, 3/3 tasks as reviewed commits with the pre-flight Q&A, page, and metrics all exercised. Craig confirmed and granted :LOOP_MAY_COMMIT:.
+- 2026-07-02 Thu @ 00:44:59 -0400 — READY → DOING: spec-response decomposition ran — the speedrun build parent in todo.org carries the :SPEC_ID: binding, one task per phase (1-6) plus the live-trial validation and the flip-to-IMPLEMENTED task. Phase 0 had already landed 2026-07-01.
+- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to READY (evidence-based, human-confirmed)
+
+* Metadata
+| Status | implemented |
+|----------+--------------------------------------------------------------------|
+| Owner | Craig Jennings |
+|----------+--------------------------------------------------------------------|
+| Reviewer | Craig Jennings |
+|----------+--------------------------------------------------------------------|
+| Date | 2026-06-16 |
+|----------+--------------------------------------------------------------------|
+| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:../design/2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] |
+|----------+--------------------------------------------------------------------|
+
+* Summary
+
+Two proposals arrived within a day of each other describing the same capability: have Claude work a batch of small, well-marked tasks autonomously, with a full quality bar per task and no per-step approval gate. The inbox-zero "Phase E" proposal drives it from a tag/priority query on a recurring loop; the "speedrun" proposal drives it from an explicit ordered list a human dictates in-session. This spec reconciles both into one feature: a single dedicated workflow, =work-the-backlog.org=, that holds the task-execution logic, with two thin callers feeding it. It also designs the instrumentation that measures whether the autonomy is actually paying off.
+
+* Problem / Context
+
+Craig has a standing backlog of small, solo-doable fixes across several projects, already marked with a tag convention (=:next:=, =:quick:+:solo:=). Doing them by hand one at a time is the bottleneck — the context-switch and the per-commit approval ceremony dominate the actual work. He wants Claude to burn these down unattended: on a recurring loop for the routed inbox case, and on demand when he batches a named list and says "speedrun, no approvals until done." The speedrun is the away-from-desk / working-on-something-else mode, so it must be able to take on larger tasks too — not only sub-30-minute ones — or it forces him to stay at the desk for anything non-trivial.
+
+Two separate proposals tried to answer this:
+
+- *Phase E* (in =inbox-zero.org=, edited in =.emacs.d= as a stopgap) bolted autonomous execution onto the inbox-zero workflow's on-demand and loop callers. The sender flagged the seam as the open question: coupling capture-routing with autonomous-implementation pollutes inbox-zero's three existing callers (startup, wrap-up, on-demand), two of which must never execute anything.
+- *speedrun* (a =.emacs.d= theme-studio session that worked well) is the same execution loop driven by an explicit ordered task set, with end-of-set paging and always-push.
+
+They overlap almost entirely. The execution loop — eligibility gate, act-vs-file decision, per-task quality bar, bounded run — is identical. Only the *input* differs (tag query vs explicit list) and the *session mode* differs (loop default vs no-approvals + always-push + page). Building them as two features would duplicate the execution logic and let the two copies drift. The forces: keep inbox-zero's callers clean, share one execution loop, and make the autonomy safe enough to run unattended on a 30-minute timer without Craig watching.
+
+A second, explicit ask from Craig: instrument this so its effectiveness is measurable. "Gather data on this and create some org-roam articles we can look at later." Autonomous execution that silently makes bad commits is worse than no autonomy; the only way to know which it is, is to measure tasks completed vs deferred vs reverted, and human corrections in the following session, over time.
+
+* Goals and Non-Goals
+
+** Goals
+- One workflow, =work-the-backlog.org=, owns the task-execution loop. Both input shapes (tag query, explicit list) and both session modes feed it.
+- inbox-zero's three existing callers stay clean: the loop caller chains into =work-the-backlog= *after* routing; startup and wrap-up never touch it.
+- The *no-approvals speedrun* is a thin named preset, not a second implementation: autonomous-commit + always-push + end-of-set page, fed an explicit ordered list, with all approvals front-loaded into a single pre-flight step (below) so the run itself is uninterrupted.
+- Eligibility is decided by *crisp, checkable criteria*, not adjectives: a mechanical tag/status gate (=:solo:= + status =TODO=), then a per-task defer checklist whose keystone is "can I write the failing test from the task text without inventing a requirement?" Task *size* is explicitly not a gate — a large task is decomposed into per-logical-commit chunks, not deferred.
+- The autonomy tags (=:solo:=, =:quick:=) carry hard definitions in =todo-format.md= and are applied + enforced as a mandatory step in the task-review and task-audit workflows, so the run-time gate trusts the author's tag instead of re-deriving it.
+- Commit autonomy defaults to file-only (surface a diff, no auto-commit). A project opts into autonomous commit+push explicitly via its per-project waiver.
+- Hard guardrails: refuse any task carrying data-loss / irreversible / external-state risk without a checkpoint; gather any one-or-two quick decisions a task needs *up front* (speedrun) rather than guessing; file a =VERIFY= for anything underspecified or needing design deliberation; a per-run cap / kill switch beyond "one task per run."
+- A lightweight per-run metrics log plus a periodic synthesis step that writes org-roam KB articles summarizing the trend.
+
+** Non-Goals
+- *Not* a replacement for =/start-work=. Tasks needing deliberation or design stay with =/start-work= and its approval gates. This feature only touches the marked, solo set — regardless of size.
+- *Not* a new tag convention. It reads the project's own priority/tag scheme header; it never invents or hardcodes tags across projects.
+- *Not* an inbox-routing change. =inbox-zero.org= keeps its A-D phases. The Phase E text added in =.emacs.d= as a stopgap is *removed* and its logic moves here.
+- *Not* a multi-project orchestrator. One run works one project's backlog. Cross-project handoff stays with =inbox-send= and the paging reply.
+- *Not* a credential-handling or external-API feature. Tasks that touch secrets or external mutations are out of the eligible set by the guardrail.
+
+** Scope tiers
+- *v1:* =work-the-backlog.org=; crisp =:solo:= / =:quick:= definitions in =todo-format.md= plus their mandatory application in task-review and task-audit; the eligibility gate (=:solo:= + status =TODO=, read against the project's scheme header); the act-vs-file *defer checklist* (test-writability keystone, enumerated data-loss list, already-satisfied, design-deliberation); the no-approvals speedrun's pre-flight decision-gathering step; file-only commit default with per-project opt-in; the loop caller wiring and inbox-zero Phase E removal; the speedrun preset with end-of-set =notify --persist= page; the per-run metrics log (structured JSONL).
+- *Out of scope:* a token-budget kill switch (cap is a task count in v1); cross-project batch runs; a dashboard or live UI over the metrics.
+- *vNext (log to todo.org):* the periodic org-roam synthesis step if it doesn't make v1; a token/cost budget alongside the task-count cap (more pressing now that task size is uncapped — a single large task can run long in the unattended loop); auto-detection of "human corrected my autonomous commit" from the next session's diff.
+
+* Design
+
+** Overview
+
+The architecture is one execution workflow with two callers and one preset, plus an instrumentation sidecar.
+
+#+begin_example
+ inbox-zero loop caller ──(after Phase D routing)──┐
+ ├──▶ work-the-backlog.org ──▶ metrics log (JSONL)
+ no-approvals speedrun ──(explicit ordered list)──┘ │
+ = pre-flight Q&A + autonomous-commit + push + page ▼
+ periodic synthesis ──▶ org-roam KB articles
+#+end_example
+
+=work-the-backlog.org= is the only place the execution loop lives. It takes a *task set* (however assembled) and a *session mode* (which gates commit autonomy and paging), and works the set under a fixed safety contract. The two callers differ only in how they build the task set and which session mode they pass.
+
+This is the seam the Phase E sender asked for: separating capture-routing (inbox-zero) from autonomous-implementation (work-the-backlog) keeps inbox-zero's startup and wrap-up callers — which must never execute anything — untouched. The loop caller is the only one of inbox-zero's callers that chains forward into execution, and it does so as an explicit second step after routing completes, not as a phase buried inside inbox-zero.
+
+** The execution loop (two-altitude: caller's view)
+
+A caller hands =work-the-backlog= three things:
+
+1. *A task set* — either an explicit ordered list of task headings (speedrun), or the result of a tag/priority query against =todo.org= (the loop). The workflow does not care which; it receives an ordered list of candidate tasks.
+2. *A session mode* — =file-only= (default) or =autonomous-commit= (requires the project's per-project waiver), and a paging flag.
+3. *A run cap* — the maximum number of tasks to complete this run.
+
+It returns: per-task outcome (implemented+committed / implemented+diff-surfaced / deferred-VERIFY / dropped-by-craig / skipped-ineligible), and a metrics record per task.
+
+** The execution loop (implementer's view)
+
+For the task set, in order, until the run cap is hit:
+
+1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task.
+2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist.
+3. *Defer checklist* (below). Any hit → record the deferral reason (or, under the speedrun preset, route the quick-question gap to the pre-flight Q&A), next task.
+4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important, then close the task per =todo-format.md=. Decompose into as many logical commits as the change needs — size is not capped.
+5. *Commit autonomy branch:*
+ - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=.
+ - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=.
+6. *Record metrics* for the task (the JSONL append, below).
+7. Decrement the cap. At zero, stop.
+
+After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary.
+
+** Eligibility gate (mechanical — no judgment)
+
+A task is autonomous-safe when *both* hold. This layer is a lookup, not a judgment; all the judgment lives in the defer checklist below.
+
+1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= is the "awaiting Craig's manual confirmation" marker; auto-implementing one defeats the manual check it represents. The do-not-implement set is safe-by-omission: anything not plainly =TODO= (plus any project-declared "hold" marker) is out.
+2. *Tagged =:solo:=* — the autonomy tag, resolved against the project's priority/tag scheme header (not hardcoded). =:solo:= carries a hard definition (see Tag definitions, below): the task is completable without Craig's involvement beyond at most one or two quick decisions answerable up front, with no design deliberation. A project whose scheme declares a different autonomous-safe tag set overrides the default. Priority / =:next:= drive *ordering* within the eligible set, not eligibility.
+
+Task *size* is deliberately absent from this gate. The old "≤ ~30 minutes / one logical commit" criterion is removed: a large but well-specified, decision-free task is in scope and is decomposed into per-logical-commit chunks during implementation. Size never sends a task to =/start-work=; only *deliberation* or *risk* does (the checklist below). This is what makes the speedrun usable as an away-from-desk mode rather than a sub-30-minute-only mode.
+
+*** Tag definitions (land in =todo-format.md=, enforced in task-review + task-audit)
+
+- *=:solo:= — autonomy.* The task can be completed without Craig's involvement, except for at most one or two quick decisions that can be stated and answered before the run starts. No open design question, no "weigh these approaches," no waiting on Craig mid-task. This is the eligibility tag.
+- *=:quick:= — effort hint only.* A small, fast task. Informational for batching and estimating a run's duration; *not* an eligibility gate (size no longer gates).
+
+Both tags are applied at task creation and *re-checked as a mandatory step* in the task-review and task-audit workflows, so the run-time gate can trust the author's tag rather than re-derive autonomy and effort from the task body. A task-review or task-audit that skips the =:solo:= / =:quick:= assessment is incomplete.
+
+** Act-vs-file decision (the defer checklist)
+
+After the scope read, run each eligible candidate through the checklist below. Each item is a concrete, answerable question, not an adjective. *Any* hit — or any "unsure" — sends the task to defer (or, for a quick-decision gap under the speedrun preset, to the pre-flight Q&A). Only a task that clears every item is implemented.
+
+1. *Test-writability (the keystone).* Can I write the failing test from the task text — plus any decisions gathered up front — without inventing a requirement? *No / unsure* → underspecified. Under the speedrun preset, if the gap is one or two quick answerable questions, route it to the pre-flight Q&A; otherwise file a =VERIFY= noting what's missing. Under the unattended loop, file the =VERIFY= (no one to ask). This replaces the old "clear / bounded / underspecified" adjectives with an action that fails loudly: if the red test isn't writable, the task isn't ready.
+2. *Data-loss / irreversible / external operation.* Does implementing it require any of: =rm= of non-scratch data, =git reset --hard= / force-push, =DROP= / =DELETE= / =TRUNCATE=, file truncate/overwrite of persisted content, a schema or data migration, any external or shared-state mutation, any credential touch? *Yes* → do NOT implement; file a =VERIFY= naming the risk. This is the hard safety gate; an upfront answer never overrides it without an explicit checkpoint. Replaces the vague "data-loss risk" with an enumerated, greppable set.
+3. *Already-satisfied.* Does the scope read show the desired end-state already holds? *Yes* → file a =VERIFY= noting it (the "raise max spans to 5 — every cap was already 8" case) and move on. Don't make a no-op change.
+4. *Design deliberation.* Does the task carry an unresolved design question, a "weigh these approaches" with real tradeoffs, or a TBD that isn't a quick factual answer? *Yes* → under the speedrun preset, if it collapses to one or two quick questions, route to pre-flight Q&A; otherwise file and surface as a =/start-work= candidate. Under the loop, file. The discriminator is now *quick-answerable question* vs *deliberation* — not task size.
+
+A task that clears 1–4 is implemented under the project's commit discipline, decomposed into as many logical commits as the change needs. When genuinely unsure which side a task falls on, defer — a wrong auto-implement costs a revert *and* the next-session correction the metrics are designed to catch.
+
+** Pre-flight decision gathering (the no-approvals speedrun's only interaction)
+
+The speedrun preset front-loads every approval into one step before the run, so the run itself is uninterrupted — that is what "no approvals" means. It is *not* "no input ever"; it is "all input first, then hands-off."
+
+When Craig kicks off a speedrun over an explicit list:
+
+1. *Gather* the named task set.
+2. *Scope-read and classify* each task against the eligibility gate + defer checklist: ready (clears the checklist), needs-quick-decisions (one or two upfront-answerable questions — checklist item 1 or 4), or drop (data-loss / irreversible, or design deliberation that isn't a quick question).
+3. *Order* the list (priority, then the author's ordering / =:next:=).
+4. *Intro the work* — present the ordered plan: what will run, what was dropped and why, and the batched questions for the needs-quick-decisions tasks.
+5. *Craig answers each question, or says "skip this"* → a skipped task is removed from the run (recorded =dropped-by-craig=); an answered task has the answer recorded so implementation works from the decision, not a guess.
+6. *Run the finalized list autonomously* — no further approvals until done.
+7. *End-of-set page* with completed + remaining + skipped.
+
+The unattended *loop* caller has no human at kickoff, so it cannot gather decisions: there, a needs-quick-decisions task simply defers (files its note) like any other checklist hit. The pre-flight Q&A is a speedrun-preset capability, not a loop one.
+
+** Session modes and the no-approvals speedrun preset
+
+Two orthogonal session-mode dimensions feed the loop:
+
+- *Commit autonomy:* =file-only= (default) or =autonomous-commit=. =autonomous-commit= is honored only when the project carries the per-project waiver (=.emacs.d= and =rulesets= have it; most projects do not). Absent the waiver, a request for =autonomous-commit= degrades to =file-only= and says so.
+- *Paging:* on or off. End-of-set only.
+
+The *no-approvals speedrun* is the named preset = =autonomous-commit= + always-push + paging-on, fed an *explicit ordered list*, run after the pre-flight decision-gathering step above. It is not a separate code path; it is a label for that combination of mode flags plus the explicit-list input, with the pre-flight Q&A as its only interactive moment. The loop caller, by contrast, runs =file-only= (unless the project has the waiver and opts the loop into commits) with paging off, fed the *tag query*, with no pre-flight step.
+
+** Bounding the run and the kill switch
+
+Default cap: one task per run for the loop caller — implement the highest-priority eligible candidate (=[#A]= before =[#B]= before =[#C]=), record, then stop and let the next tick continue. The speedrun preset works the whole explicit list in order (the human bounded it by naming it), still one commit per logical change.
+
+The kill switch is a hard per-run task cap passed by the caller, independent of "one per run": even the speedrun stops at the cap and pages with the remainder listed. A loop that fires every 30 minutes and commits unattended needs a ceiling that a runaway can't exceed. With task size now uncapped, the count cap no longer bounds *cost* — a single large task can run long — so a token/cost budget is the most pressing vNext addition.
+
+** End-of-set paging
+
+When the set is done (or the cap is hit), if paging is on, fire one page — end-of-set only, never per-task:
+
+#+begin_src sh
+notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>" --persist
+#+end_src
+
+=--persist= keeps it on screen until dismissed (the page-me convention). The message carries the project name, the completed count, and the remaining count, so Craig can reply confirming ready + naming the next project in one turn. The page-signal wrapper removed 2026-06-12 is reconciled to =notify= here — there is no separate page-signal call.
+
+* Alternatives Considered
+
+** Fold execution into inbox-zero (the Phase E stopgap shape)
+- Good, because it's the smallest diff — the loop caller already runs inbox-zero, so execution is "one more phase."
+- Bad, because it couples capture-routing with implementation. inbox-zero has three callers; startup and wrap-up must never execute. A Phase E inside inbox-zero forces both to carry a "skip Phase E" caveat and risks a future caller running it by accident.
+- Neutral, because the eligibility-gate and defer-checklist text is identical either way — only its *home* differs.
+
+** Two separate features (keep Phase E and speedrun distinct)
+- Good, because each proposal ships as written with no reconciliation work.
+- Bad, because the execution loop is duplicated in two places and will drift; a guardrail tightened in one won't reach the other. Two ways to do autonomous execution is two things to audit.
+- Neutral, because the input and session-mode differences are real — but they're thin caller-level differences, not a reason to fork the engine.
+
+** Keep the task-size gate (defer anything over ~30 minutes)
+- Good, because it bounds per-task cost and blast radius with a single number.
+- Bad, because it defeats the away-from-desk use case — anything non-trivial bounces back to Craig, so he can't actually leave. Size correlates poorly with risk; a large mechanical refactor is safer than a tiny change to persisted state.
+- Neutral, because the things size was a proxy for (risk, cost) are covered directly — risk by the data-loss checklist, cost by the run cap (and the vNext token budget). The defer checklist's deliberation item, not size, is what routes genuine =/start-work= tasks out.
+
+** Autonomous-commit as the default
+- Good, because it's faster end-to-end with no diff to review.
+- Bad, because most projects lack the per-project waiver, and an unattended loop committing to a project that never opted in is exactly the failure the file-only default prevents. The blast radius of a bad autonomous commit is a revert plus lost trust in the loop.
+- Neutral, because the projects that *do* want it (=.emacs.d=, =rulesets=) opt in explicitly, so the capability is available where it's wanted without being the default everywhere.
+
+* Decisions [8/8]
+
+** DONE Eligibility tag set and where it's read
+- Owner / by-when: Craig / spec-review
+- Context: Projects' priority/tag schemes vary, and the =todo-format.md= scheme header is the declared per-project source of truth. Task size is no longer a gate, so eligibility rests on the autonomy tag, not an effort cap.
+- Decision: Eligibility = status =TODO= AND the =:solo:= autonomy tag, resolved against the project's scheme header (a project may declare a different autonomous-safe set). Priority / =:next:= drive ordering, not eligibility. =:quick:= is an effort hint, never a gate.
+- Consequences: easier — one workflow works across projects with different vocab, and the gate is a pure lookup; harder — a project with no/malformed scheme header needs a fallback, and the default (=:solo:=) must be defined precisely enough that two projects agree.
+
+** DONE Crisp =:solo:= / =:quick:= definitions, enforced in task-review + task-audit
+- Owner / by-when: Craig / spec-review
+- Context: The run-time gate is only as crisp as the tags. Today =:quick:= / =:solo:= are listed in the scheme header with no hard definition, and nothing enforces that tasks get assessed for them.
+- Decision: Define =:solo:= (completable without Craig beyond at most one-or-two upfront-answerable quick decisions; no design deliberation) and =:quick:= (small/fast effort hint only) in =todo-format.md=, and make assessing both a *mandatory step* in the task-review and task-audit workflows. A review/audit that skips the assessment is incomplete.
+- Consequences: easier — authoring-time judgment by the human who knows the answer, and the run-time gate trusts the tag; harder — task-review and task-audit grow a required step, and existing untagged tasks need a back-fill pass.
+
+** DONE The do-not-auto-implement marker set
+- Owner / by-when: Craig / spec-review
+- Context: =VERIFY= means "awaiting Craig's manual confirmation"; other projects may use markers differently.
+- Decision: Do-not-implement = any status that is not =TODO=, plus any project-declared "hold" marker. Safe-by-omission: exclude anything not plainly =TODO=.
+- Consequences: easier — portable, and manual-check tasks can't auto-run; harder — richer per-project overrides need marker semantics in the scheme header, which most lack, so the default must stay conservative.
+
+** DONE Pre-flight decision gathering for the speedrun preset
+- Owner / by-when: Craig / spec-review
+- Context: Forcing every decision-needing task to defer wastes the away-from-desk use case — many tasks need only one or two quick answers Craig could give at kickoff. The speedrun is interactive at its start but must be hands-off after.
+- Decision: The speedrun preset gathers + orders the set, intros the work, and batches all needed quick decisions into one pre-flight Q&A; Craig answers or says "skip this" (drops the task); the run then proceeds with zero further approvals. The unattended loop has no kickoff human, so it defers decision-needing tasks instead.
+- Consequences: easier — "no approvals" becomes "all approvals first," which fits working-while-away, and larger / lightly-underspecified tasks become runnable; harder — the classifier must reliably split quick-question vs real-deliberation, and the recorded answers must reach the implementer so it works from the decision, not a guess.
+
+** DONE Commit-autonomy opt-in mechanism
+- Owner / by-when: Craig / spec-review
+- Context: =file-only= is the default; =.emacs.d= and =rulesets= have a per-project waiver allowing autonomous commits. Where does the workflow *read* that a project has opted in?
+- Decision: Read the opt-in from the project's existing per-project waiver location (=notes.org= Workflow State or =CLAUDE.md=), not a new config file. Two flags: "has commit waiver" and "loop may commit" can differ.
+- Consequences: easier — no new config surface, reuses the existing waiver concept; harder — the waiver location/format must be pinned for deterministic detection, and "waiver yes, loop-commit no" needs the two-flag split.
+
+** DONE Run-cap default and the kill switch shape
+- Owner / by-when: Craig / spec-review
+- Context: The loop default is one task per run; the speedrun works an explicit list. Both need a hard ceiling. Task size is now uncapped, so a single task can be large.
+- Decision: The caller passes a hard per-run task cap (loop default 1; speedrun = length of the explicit list, capped at a ceiling); stop + page with the remainder when the cap is hit. v1 caps by task count, not token budget.
+- Consequences: easier — a simple caller-controlled integer with a bounded task count; harder — a count cap doesn't bound *cost*, and with size uncapped a single large task can run long, so a token budget is vNext and more pressing than before.
+
+** DONE Metrics log location and format
+- Owner / by-when: Craig / spec-review
+- Context: Per-run metrics must land somewhere structured and queryable, per-project, and survive across sessions for the synthesis step to read.
+- Decision: Append one JSONL record per task to a per-project log at =.ai/metrics/work-the-backlog.jsonl=, git-tracked, with the synthesis step reading the union across projects.
+- Consequences: easier — append-only JSONL is trivial to write and =jq=-queryable, and per-project keeps it local to the work; harder — a git-tracked log adds commit churn, and "union across projects" needs the synthesis step to know where every log lives.
+
+** DONE Synthesis cadence and trigger
+- Owner / by-when: Craig / spec-review
+- Context: Craig wants periodic org-roam articles summarizing the data. What triggers synthesis, and how often?
+- Decision: Run synthesis on an explicit trigger ("synthesize backlog metrics") and optionally a weekly scheduled run, writing one KB node per synthesis under =~/org/roam/agents/= per the knowledge-base rule.
+- Consequences: easier — an explicit trigger means no surprise writes, and the KB rule already governs node shape; harder — a weekly run needs a scheduler entry, and the personal-only write-classification must gate it so work-project metrics never land in the KB.
+
+* Implementation phases
+
+** Phase 0 — Tag definitions + task-review/audit enforcement
+Add the hard =:solo:= / =:quick:= definitions to =todo-format.md=, and add the mandatory tag-assessment step to the task-review and task-audit workflows. Independent of the workflow build; lands first so the eligibility gate has crisp tags to read and existing tasks start getting assessed. Tree stays working: these are rule + workflow prose additions.
+
+** Phase 1 — Extract the execution loop into work-the-backlog.org
+Write =work-the-backlog.org= holding the eligibility gate, defer checklist, per-task quality bar, and run-cap logic — taking a task set + session mode + cap as input. Remove the stopgap "Phase E" text from =inbox-zero.org= (restore it to its A-D shape) in the same change so there's one home, not two. Tree stays working: inbox-zero reverts to routing-only, and the new workflow is callable but not yet wired to the loop.
+
+** Phase 2 — Wire the two callers
+Add the loop caller's chain step (after inbox-zero Phase D, invoke work-the-backlog with the tag query + file-only + cap 1) and the no-approvals speedrun preset (pre-flight decision-gathering → explicit list + autonomous-commit + always-push + paging-on). Both go through the same workflow; only the speedrun runs the pre-flight Q&A. Tree stays working: each caller is independently testable.
+
+** Phase 3 — File-only vs autonomous-commit gate
+Implement the commit-autonomy branch: read the per-project waiver, degrade =autonomous-commit= to =file-only= when absent, surface the degrade. Tree stays working: default file-only behavior is the safe path even before the waiver-read lands.
+
+** Phase 4 — The defer checklist, pre-flight Q&A, and the page
+Implement the act-vs-file defer checklist (test-writability keystone, enumerated data-loss list, already-satisfied, design-deliberation), the speedrun pre-flight decision-gathering (gather → classify → order → intro → batch-ask → skip/answer), the =VERIFY=-on-ambiguity filing, and the end-of-set =notify alarm ... --persist= page. Tree stays working: the checklist only ever *reduces* what runs, and the pre-flight step only runs under the speedrun preset.
+
+** Phase 5 — Metrics log
+Append the per-task JSONL record at each task outcome. Tree stays working: logging is a side effect that doesn't alter execution.
+
+** Phase 6 — Synthesis to org-roam
+Write the synthesis step: read the JSONL union, compute the per-run and trend metrics (below), write a KB node under =~/org/roam/agents/= per the knowledge-base rule, personal-projects-only classification enforced. Tree stays working: synthesis is read-only over the logs plus a KB write.
+
+* Acceptance criteria
+- [ ] =work-the-backlog.org= exists and is the only home for the execution loop; =inbox-zero.org= is back to its A-D routing-only shape with no Phase E.
+- [ ] The loop caller chains into work-the-backlog after routing; startup and wrap-up never invoke it.
+- [ ] The no-approvals speedrun runs as the preset (pre-flight Q&A → autonomous-commit + always-push + end-page) over an explicit ordered list, one commit per logical change.
+- [ ] =:solo:= and =:quick:= carry hard definitions in =todo-format.md=, and task-review + task-audit both refuse to complete without assessing them.
+- [ ] Eligibility = status =TODO= AND =:solo:=, read from the project's scheme header, not hardcoded; a =VERIFY= / =DOING= / =DONE= / =CANCELLED= task is skipped by the gate.
+- [ ] Task size never sends a task to =/start-work=; a large but =:solo:=, well-specified task runs and is decomposed into per-logical-commit chunks.
+- [ ] The defer checklist fires correctly: a task whose red test isn't writable (and isn't a quick-question gap), one carrying an enumerated data-loss operation, an already-satisfied one, and one needing design deliberation are each deferred (or routed to pre-flight Q&A under the speedrun), not implemented.
+- [ ] Under the speedrun preset, a task needing one or two quick decisions is surfaced in the pre-flight Q&A; "skip this" drops it, an answer is recorded and used; the run then proceeds with no further approvals.
+- [ ] Under the unattended loop, a decision-needing task defers (no pre-flight Q&A).
+- [ ] In a project without the commit waiver, an =autonomous-commit= request degrades to file-only and says so; no commit is made.
+- [ ] The run stops at the per-run cap and pages with the remaining tasks listed.
+- [ ] Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl=.
+- [ ] The synthesis step reads the logs and writes a KB node under =~/org/roam/agents/=; it refuses to write for work-classified projects.
+
+* Effectiveness measurement
+
+This section answers Craig's explicit ask: measure whether autonomous-batch execution is actually effective, and build the "gather data → org-roam articles" loop.
+
+** What "effective" means here
+
+The autonomy is effective if it completes real work that *stays* completed — i.e. tasks land green and the next session doesn't have to undo or fix them. The two failure modes to catch are (1) the loop defers everything (over-cautious, no value delivered) and (2) the loop implements badly (commits that get reverted or hand-corrected next session). Both are measurable.
+
+** Per-run metrics (the JSONL record)
+
+One record per task, appended to =.ai/metrics/work-the-backlog.jsonl= at each task outcome:
+
+| Field | Meaning |
+|-------------------+--------------------------------------------------------------------|
+| =ts= | ISO timestamp of the task outcome |
+|-------------------+--------------------------------------------------------------------|
+| =run_id= | UUID shared by all tasks in one run |
+|-------------------+--------------------------------------------------------------------|
+| =project= | project basename |
+|-------------------+--------------------------------------------------------------------|
+| =caller= | =loop= or =speedrun= |
+|-------------------+--------------------------------------------------------------------|
+| =task= | task heading (slug) |
+|-------------------+--------------------------------------------------------------------|
+| =outcome= | implemented-committed / implemented-diff / deferred-verify / |
+| | skipped-ineligible / dropped-by-craig (skipped at pre-flight) |
+|-------------------+--------------------------------------------------------------------|
+| =defer_reason= | underspecified / data-loss / already-satisfied / needs-deliberation |
+|-------------------+--------------------------------------------------------------------|
+| =upfront_decision=| true if a pre-flight answer was recorded and used for this task |
+|-------------------+--------------------------------------------------------------------|
+| =wall_clock_s= | seconds from task start to outcome |
+|-------------------+--------------------------------------------------------------------|
+| =commit_sha= | for committed tasks; empty otherwise |
+|-------------------+--------------------------------------------------------------------|
+| =review_findings= | count of /review-code Critical+Important findings on this task |
+|-------------------+--------------------------------------------------------------------|
+
+Per-run rollups computed at synthesis (not stored per record): tasks attempted, completed, VERIFY-deferred, dropped-by-craig, reverted; wall-clock total; commits landed; review findings per commit.
+
+** The corrections signal (the key metric)
+
+The hardest and most valuable metric is *human corrections in the following session* — did Craig revert or hand-fix an autonomous commit? v1 captures the cheap proxy: at synthesis, for each =commit_sha=, check whether a later commit touching the same files reverted it or carries a "fix"/"revert" of that change within N days. A clean run is one where the autonomous commits survive untouched. (Auto-detecting "this later commit corrected that autonomous one" precisely is a vNext refinement; the proxy — reverted-or-touched-soon-after — is good enough to flag a problem run for human review.)
+
+** Where the data lands
+
+Per-project git-tracked JSONL at =.ai/metrics/work-the-backlog.jsonl=. Append-only, =jq=-queryable, survives across sessions and machines via the normal project sync. Git-tracked so the history is auditable and the synthesis step can read it from any clone.
+
+** The synthesis loop (gather → article)
+
+On the "synthesize backlog metrics" trigger (and optionally a weekly scheduled run):
+
+1. Read the JSONL union across the personal projects the synthesizer can see.
+2. Compute the rollups and the trend: completion rate over time, defer-reason distribution, review-findings-per-commit trend, and the corrections-signal flag count.
+3. Write one org-roam KB node under =~/org/roam/agents/YYYYMMDDHHMMSS-backlog-metrics-<window>.org= per the knowledge-base rule — filetags =:agent:metrics:=, a concise title, the rollup table, the trend narrative, and =[[id:...]]= links to prior synthesis nodes so the series is traceable.
+4. Enforce the KB write-classification: *personal projects only*. A work-classified project's metrics never write to the KB — they stay in that project's own =.ai/metrics/= log and the synthesizer reports the refusal per the KB refusal contract.
+
+The KB node is the artifact Craig reviews later — "are the autonomous runs completing more and getting corrected less over the last month?" reads off the trend table without re-querying raw logs.
+
+* Readiness dimensions
+
+- *Data model & ownership:* The task set is read from =todo.org= (project-owned, user-authored). The metrics JSONL is generated, append-only, git-tracked, project-owned. KB nodes are agent-generated under =~/org/roam/agents/= (never overwriting Craig's hand-authored nodes — link only). No editable region is co-owned.
+- *Errors, empty states & failure:* Empty task set → report "nothing eligible" and stop. Malformed scheme header → fall back to the default tag reading and surface the fallback. A task that fails mid-implementation → leave the tree working (don't commit a broken state), record the failure outcome, surface it, continue to the next task. No silent data loss: the data-loss guardrail refuses irreversible tasks outright.
+- *Security & privacy:* Tasks touching credentials or external mutations are excluded by the data-loss / external-state checklist item. The KB write is personal-projects-only; work metrics never leave the project. No secrets in the JSONL (task slugs and SHAs only).
+- *Observability:* The end-of-set page surfaces the run outcome. The per-task surface (implemented / deferred + reason / dropped / skipped) is the live progress view. The metrics log + KB synthesis is the long-run observability. A bad run is isolable from the JSONL (which task, which outcome, which review findings).
+- *Performance & scale:* Expected counts are small — a handful of tasks per run, one run per 30-min tick. No bottleneck at this scale. The cap bounds the worst case on task count; with size uncapped, a single large task is the cost outlier the vNext token budget addresses. Synthesis over months of JSONL is still a small file (one record per task).
+- *Reuse & lost opportunities:* Reuses =todo-format.md= for task close + the tag definitions, =/review-code= and =/voice personal= for the quality bar, =notify= for paging, the knowledge-base rule for KB writes, the per-project waiver for commit-autonomy, and task-review / task-audit for tag enforcement. No new config file (the opt-in rides the existing waiver). The execution loop is the one new shared asset.
+- *Architecture fit & weak points:* Integration points — inbox-zero loop caller (chain after Phase D), the per-project waiver location, =todo.org= scheme header, task-review / task-audit, =~/org/roam/agents/=. Weak point: the commit-autonomy gate depends on deterministically reading the waiver; mitigated by defaulting to file-only when the read is ambiguous (fail safe, not open). Second weak point: a 30-min loop committing unattended with uncapped task size; mitigated by the hard count cap and file-only default, with the token budget as the vNext backstop.
+- *Config surface:* Per-project — commit-autonomy opt-in (via existing waiver), optional loop-commit flag, optional autonomous-safe tag override in the scheme header. Per-call — task set, session mode, run cap. Defaults: file-only, paging-off (loop) / paging-on (speedrun), cap 1 (loop).
+- *Documentation plan:* The workflow file itself is the user/operator doc (matches inbox-zero.org's self-documenting style). The =.emacs.d= stopgap note and the speedrun proposal are superseded by this spec; no separate migration doc needed beyond removing the Phase E text.
+- *Dev tooling:* N/A for new build targets — the workflows are prose, exercised by invocation. The metrics JSONL is =jq=-inspectable by hand; a tiny rollup helper may be added under =.ai/scripts/= if the synthesis prose proves to need it (decided at Phase 6, not a v1 prerequisite).
+- *Rollout, compatibility & rollback:* Rollout is removing Phase E from inbox-zero and adding work-the-backlog — both prose changes, instantly reversible. Compatibility: inbox-zero's three callers are unchanged except the loop caller gaining a forward chain. Rollback: delete work-the-backlog and the loop chain step; inbox-zero is already back to A-D. The file-only default means the worst pre-rollback state is surfaced diffs, not committed changes.
+- *External APIs & deps:* =notify alarm "Page" "<msg>" --persist= verified against =/home/cjennings/.local/bin/notify= and the page-me workflow. =~/org/roam/= KB write path and node shape verified against the knowledge-base rule. No external API calls.
+
+* Risks, Rabbit Holes, and Drawbacks
+
+- *The corrections signal is a proxy, not ground truth.* "A later commit touched the same files" over-counts (legitimate follow-up work) and under-counts (a correction in a different file). It's a flag for human review, not a verdict. Don't rabbit-hole on making it precise in v1 — the proxy plus a human glance is the design.
+- *Waiver detection drift.* If the per-project waiver location moves or its format changes, the commit-autonomy gate could mis-read. Mitigation: fail safe to file-only. Pin the waiver format in the Phase 3 decision before building.
+- *Unattended-commit blast radius.* The headline risk. Mitigated four ways: file-only default, the hard cap, the data-loss checklist item, and the metrics loop (which makes a bad run visible after the fact even if the first three let something through). With task size uncapped, the cost dimension of this risk grows — the vNext token budget is the planned fifth layer.
+- *Scope creep into /start-work territory.* Size is intentionally no longer the brake. The brake is the defer checklist's design-deliberation item plus the "when unsure, defer" rule — keep item 4 strict so genuine deliberation-class tasks still route out even when they're tagged =:solo:= by mistake.
+- *Pre-flight classifier error.* The speedrun's gather step has to split quick-answerable-question from real-deliberation. Misclassifying a deliberation task as a quick question puts a half-baked decision into an autonomous run. Mitigation: when the question isn't answerable in one or two lines, treat it as deliberation and drop it from the run, not as a pre-flight question.
+
+* Testing / Verification / Rollout
+
+Verification is by invocation against a project's real =todo.org=: run the loop caller in file-only mode and confirm it surfaces diffs without committing; run the speedrun against a small explicit list in a waiver-carrying project and confirm the pre-flight Q&A fires, "skip this" drops a task, an answer is recorded and used, then one commit per logical change + the end page; plant a =VERIFY=-status task, a data-loss task, an already-satisfied task, and a large-but-=:solo:= task and confirm the first three are skipped/refused while the large one runs and decomposes; confirm the JSONL grows one record per task; run synthesis and confirm a KB node lands (personal project) or is refused (work project). Rollout is the Phase 0-6 sequence, each leaving the tree working; the file-only default makes early phases safe to ship before the commit and paging phases land.
+
+* References / Appendix
+
+- [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal (inbox-zero stopgap)]] and [[file:../../working/inbox-zero-phase-e/sender-note.org][its sender note with the 5 open questions]].
+- [[file:../design/2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] (file retains its original on-disk name pending a rename pass).
+- [[file:../../.ai/workflows/inbox-zero.org][inbox-zero.org (canonical, A-D)]] — the routing workflow this feature decouples from.
+- =~/code/rulesets/claude-rules/knowledge-base.md= — the org-roam write contract the synthesis step follows.
+
+* Review and iteration history
+** 2026-06-16 Tue — author
+- What: initial draft reconciling the Phase E and fix-speedrun proposals into one work-the-backlog.org feature, plus the effectiveness-measurement instrumentation.
+- Why: two overlapping proposals arrived within a day; building them separately would duplicate the execution loop and let it drift. Craig also asked explicitly for measurement + org-roam synthesis.
+- Artifacts: this spec; the two source proposals under docs/design/ and working/inbox-zero-phase-e/.
+** 2026-06-28 Sun — revision (Craig)
+- What: removed the task-size gate (size no longer defers; large tasks decompose into per-commit chunks); recast the act-vs-file rule as a crisp four-item defer checklist keyed on test-writability; added crisp =:solo:= / =:quick:= definitions destined for =todo-format.md= and made their assessment mandatory in task-review + task-audit; added the speedrun's pre-flight decision-gathering step (batch the quick questions up front, "skip this" drops a task, then run hands-off); renamed "fix speedrun" → "no-approvals speedrun" in prose. Status stays draft pending ratification of the revised decisions.
+- Why: the original criteria were adjectives, not checkable; the size gate forced Craig to stay at his desk for anything non-trivial, defeating the away-from-desk use case; and decision-needing tasks were over-deferred when many need only a quick upfront answer.
+** 2026-06-29 Mon — ratified
+- What: Craig ratified all eight revised decisions; Status → ready. Implementation-ready across Phase 0 (tag definitions + task-review/audit enforcement) through Phase 6 (synthesis).
+- Why: the crisp defer checklist and the pre-flight-Q&A design resolved the "criteria too soft" and "size shouldn't gate" concerns that held the spec in draft.
diff --git a/docs/design/2026-06-16-encourage-kb-contribution-spec.org b/docs/specs/2026-06-16-encourage-kb-contribution-spec.org
index cf8111b..cfbfe79 100644
--- a/docs/design/2026-06-16-encourage-kb-contribution-spec.org
+++ b/docs/specs/2026-06-16-encourage-kb-contribution-spec.org
@@ -1,10 +1,17 @@
#+TITLE: Encourage Org-Roam KB Contribution Across Workflows — Spec
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2026-06-16
-#+TODO: TODO | DONE SUPERSEDED CANCELLED
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* READY Encourage Org-Roam KB Contribution Across Workflows — Spec
+:PROPERTIES:
+:ID: f67f5f45-5aa1-4a5a-8704-d636e4e16f75
+:END:
+- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to READY (evidence-based, human-confirmed)
* Metadata
-| Status | approved (decisions ratified 2026-06-20) |
+| Status | ready |
|----------+------------------------------------------------|
| Owner | Craig Jennings |
|----------+------------------------------------------------|
diff --git a/docs/specs/2026-07-01-docs-lifecycle-spec.org b/docs/specs/2026-07-01-docs-lifecycle-spec.org
new file mode 100644
index 0000000..dcf88c8
--- /dev/null
+++ b/docs/specs/2026-07-01-docs-lifecycle-spec.org
@@ -0,0 +1,360 @@
+#+TITLE: Docs Lifecycle — Spec
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-07-01
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* DOING Docs lifecycle
+:PROPERTIES:
+:ID: 80b0787b-4a60-4c82-8a16-b383d3e3c8f2
+:END:
+- 2026-07-01 Wed @ 23:34:15 -0400 — READY → DOING: spec-response decomposition ran — build parent in todo.org carries the :SPEC_ID: binding, one task per phase plus the flip-to-IMPLEMENTED task and the manual-testing child. First live exercise of the transition-ownership table.
+- 2026-07-01 Wed @ 23:22:50 -0400 — DRAFT → READY: Codex re-review found all fourteen review findings closed and no remaining blocking implementation-readiness gaps.
+- 2026-07-01 Wed @ 22:54:41 -0400 — verify pass on the second responder round: all five fixes held, findings 1-9 unregressed, verdict ready; three minor nits folded in (scoped id-link criterion, untracked-copy cleanup in the recovery recipe, two stale prose spots). Stays DRAFT pending the reviewers' flip.
+- 2026-07-01 Wed @ 22:46:52 -0400 — second responder pass: all five re-review findings fixed (fourteen of fourteen closed); stays DRAFT — the READY flip belongs to the reviewers this round.
+- 2026-07-01 Wed @ 22:41:33 -0400 — READY → DRAFT: Codex re-review found five new blocking implementation-readiness gaps after the response pass.
+- 2026-07-01 Wed @ 22:41:21 -0400 — DRAFT → READY: dual independent review (Codex + fresh-context Claude agent, both initially Not ready), all nine findings fixed, verify pass by the original reviewer returned ready; flip authorized by Craig.
+- 2026-07-01 Wed @ 22:13:00 -0400 — drafted from the five decisions settled 2026-06-28 (todo.org "Spec storage location + lifecycle-status convention").
+
+* Metadata
+| Status | doing |
+|----------+------------------------------------------------------------------|
+| Owner | Craig Jennings |
+|----------+------------------------------------------------------------------|
+| Reviewer | Craig Jennings |
+|----------+------------------------------------------------------------------|
+| Date | 2026-07-01 |
+|----------+------------------------------------------------------------------|
+| Related | [[file:../design/2026-06-15-spec-storage-lifecycle-proposal.org][source proposal]]; todo.org "Spec storage location + |
+| | lifecycle-status convention" |
+|----------+------------------------------------------------------------------|
+
+* Summary
+
+Formal specs and working notes currently share one directory per project, and a spec's lifecycle state (drafted, in progress, shipped, dead) is invisible without opening the file. This spec adopts two coupled conventions — a location split (=docs/specs/= for formal specs, =docs/design/= for notes) and an authoritative in-file status carried by an org TODO keyword on a top-level status heading — plus =org-id= links for rename-safety, a general =docs-lifecycle= rule capturing the shape, and a one-time confirmed retrofit that sorts every project's existing pile.
+
+* Problem / Context
+
+.emacs.d triaged ~28 design docs and had to run a four-agent sweep reading every spec against the code to reconstruct which had shipped (6 implemented, 8 in progress, 12 not started, 1 superseded). Nothing in the filename, location, or file records the state, so the answer to "what's open?" degrades into "open every file and infer." rulesets has the same shape: 41 files in =docs/design/= of which only 3 carry a formal spec spine, plus two =-spec.org= files misfiled at the =docs/= root. The cost compounds with every doc added, and every project inherits the problem through the shared spec-create workflow.
+
+Two forces beyond triage cost:
+
+- *Links are load-bearing.* =todo.org= tasks, session archives, and sibling docs link specs by =file:= path. Any convention that renames or moves files on every status change (the filename-suffix approach) breaks those links repeatedly across a cross-linked, template-synced doc set.
+- *The convention is worthless if legacy docs stay misfiled* (Craig, 2026-06-28). Template sync distributes rules and workflows but cannot perform a one-time per-project migration, so the design must include a reach mechanism that gets each project's existing pile sorted once.
+
+* Goals and Non-Goals
+
+** Goals
+- A directory listing answers "which docs are specs, and what state is each in" without opening files.
+- Status transitions cost one small in-file edit (keyword + history line + Metadata mirror) — no rename, no link surgery.
+- Cross-doc spec links survive moves and renames.
+- The shape is captured once as a general rule (=docs-lifecycle=) so future artifact collections (brainstorm piles, recording queues) can reuse it.
+- Every existing project's =docs/design/= pile gets sorted exactly once, with human confirmation on each classification.
+
+** Non-Goals
+- No automation of status flips — the keyword is edited by whoever changes the state (spec-create, spec-review, spec-response, or a human), not by a watcher.
+- No retroactive rewriting of session archives or git history that reference old paths; only live inbound links (=todo.org=, =notes.org=, docs) are updated by the retrofit.
+- No new tracking database or index file — the files are the index.
+
+** Scope tiers
+- v1: the location split, the status-heading convention, the org-id link standard, the =docs-lifecycle= rule, spec-create/spec-review/spec-response updates, the retrofit helper + startup nudge, and the rulesets pilot.
+- Out of scope: applying the lifecycle shape to non-doc collections (the rule documents the pattern; adopting it elsewhere is per-collection work).
+- vNext: an org-agenda custom view over =docs/specs/*.org= keyed on the status keywords (nice-to-have once the keywords exist; log to todo.org).
+
+* Design
+
+** The location split
+
+- =docs/specs/= — formal specs only. A *spec* is a doc proposing a buildable change that carries a =Decisions= section and =Implementation phases= (the spec-create spine). Filenames keep the existing =YYYY-MM-DD-<topic>-spec.org= shape — the =-spec.org= suffix stays because spec-review's Phase 0 precondition keys on it; only the *status* suffixes from the original proposal are dropped.
+- =docs/design/= — everything else: brainstorms, inventories, proposals, research notes, frozen source material. Review findings live inside the spec they review (current spec-review behavior), so standalone review files are legacy notes and stay in =docs/design/=.
+
+** The status heading (the authoritative record)
+
+Each spec's first element after the file header is a single top-level *status heading* carrying the org TODO keyword:
+
+#+begin_example
+,#+TODO: TODO | DONE
+,#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+,* DOING <spec short name>
+:PROPERTIES:
+:ID: <uuid>
+:END:
+- <dated one-line history entries, newest first>
+#+end_example
+
+- *The keyword is authoritative.* The Metadata table's =Status= field mirrors it in lowercase for readers already in the table, and a status transition updates keyword + history line + mirror in the same edit; on disagreement the heading wins.
+- *Two keyword sequences, no collisions.* The lifecycle sequence *joins* — never replaces — the =TODO | DONE= sequence that the =* Decisions= and =* Review findings= task machinery depends on. The two sequences share no keyword (the old header's =SUPERSEDED CANCELLED= done-states migrate to the lifecycle sequence; a legacy =CANCELLED= decision heading still parses as a done-state there, so =[/]= cookies stay mechanically correct). The retrofit rewrites each legacy header to carry both lines.
+- *Vocabulary:* =DRAFT= (being written) → =READY= (review passed, buildable) → =DOING= (implementation in progress) → =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= (terminal).
+- *Transition ownership — every flip has a named owner:*
+ - =DRAFT= — spec-create stamps it at authoring time.
+ - =DRAFT= → =READY= — spec-review, on a passing gate (keyword + history line + mirror in the review pass).
+ - =READY= → =DOING= — spec-response, when it decomposes the phases into build tasks. *The decomposition writes the spec-to-task binding:* the =todo.org= parent task it creates (or updates) carries a =:SPEC_ID:= property holding the spec's status-heading UUID. That property is the durable join between the spec and its build work.
+ - =DOING= → =IMPLEMENTED= — the session that completes the final implementation phase. To make that a tracked obligation rather than a memory, spec-response's phase-to-task breakdown *always emits a final task*: "flip the spec to IMPLEMENTED + history line," as a child of the bound parent. Safety net: task-audit's reconcile pass runs one query — for each =docs/specs/*.org= whose keyword is =DOING=, find the =todo.org= task with the matching =:SPEC_ID:=; flag the spec when that parent is =DONE=/=CANCELLED=, archived, or missing. Checking the *parent's* keyword (not "are all child tasks closed") sidesteps both the flip-task chicken-and-egg (the parent only closes after the flip task ran) and =--convert-subtasks= rewriting completed children into dated entries (dated children never affect the parent's keyword). This is the mechanism whose absence produced the .emacs.d six-shipped-specs-with-no-record failure; "a human remembers" is explicitly not the design.
+ - =SUPERSEDED= / =CANCELLED= — whoever makes the call, with the reason in the history line.
+- *Glanceability without opening files:* one grep gives the full board —
+
+ #+begin_src sh
+ rg -H '^\* (DRAFT|READY|DOING|IMPLEMENTED|SUPERSEDED|CANCELLED) ' docs/specs/
+ #+end_src
+
+ and because the keyword sits on a real org heading, an org-agenda view over =docs/specs/= works for free (the vNext item).
+- *The heading body is the dated status history* — one line per transition (=YYYY-MM-DD Day @ HH:MM:SS -ZZZZ — <what changed, by whom>=), the record a filename could never carry.
+- Why a dedicated status heading rather than restructuring each spec under one top-level heading: demoting every section in every existing spec is a large, link-hostile rewrite; a prepended heading is additive, retrofittable by script, and leaves the familiar flat section layout untouched.
+
+** Rename-safe links
+
+The status heading carries an =:ID:= UUID, assigned at authoring time (and by the retrofit for legacy specs). The target state is that cross-doc references to a spec use =[[id:<uuid>]]= rather than =file:= paths, so any future move can't orphan them. =file:= links remain fine for intra-doc anchors and for notes that never move. The KB's existing id-resolution recipe applies: =rg ':ID:[[:space:]]+<uuid>' docs/=.
+
+*Staged conversion — ids assigned now, links converted only when clickable.* =org-id-locations= only indexes agenda files and files org has visited, so a fresh =:ID:= in =docs/specs/= won't resolve on click in a live Emacs until the id index learns about project docs. =org-id-extra-files= is not a glob mechanism — it's a literal file list, only consulted under =org-id-track-globally= — so "point it at the globs" is not executable as written. The sequencing is therefore:
+
+1. *Pilot and retrofit rewrite =file:= links only* (path recomputation per the relink contract). Every link stays clickable throughout; no conversion window exists.
+2. *:ID: properties are still assigned* during the sort — harmless, and they make the later conversion mechanical.
+3. *Link conversion to =id:= is a separate follow-up pass*, gated on .emacs.d landing an executable id-index mechanism: enumerate each project's =docs/specs/*.org= into =org-id-extra-files= as real file names (a small function globbing at startup, with =org-id-track-globally= t), or a periodic =org-id-update-id-locations= over that enumeration — verified by clicking a known id link. The Phase 4 note to .emacs.d carries this ask; the =rg= recipe is the fallback for non-Emacs consumers either way.
+
+** The =docs-lifecycle= rule (the generalization)
+
+A new =claude-rules/docs-lifecycle.md= captures the reusable shape, with spec-create as the first instance:
+
+1. Separate formal artifacts from working notes by location.
+2. Lifecycle state lives *in* the artifact, on a scannable, greppable carrier (an org keyword heading), with a dated history.
+3. Links use rename-safe identifiers.
+4. A growing collection earns this treatment when "which of these are live?" starts requiring a file-by-file read.
+
+** The retrofit (reach mechanism for existing piles)
+
+A synced helper, =spec-sort=, run once per project. *Canonical placement:* like every synced asset, the helper and all workflow edits land in rulesets' canonical tree first — =claude-templates/.ai/scripts/spec-sort= with its bats tests in =claude-templates/.ai/scripts/tests/= (the glob-discovered suite), workflow changes in =claude-templates/.ai/workflows/= — then =scripts/sync-check.sh --fix= propagates the committed =.ai/= mirror and both sides commit together. A mirror-only edit is reverted by the next sync; nothing in this feature is exempt from that contract. Downstream projects receive everything through the normal startup rsync. The run itself, per project:
+
+1. *Classify* each =docs/**/*.org= outside =docs/specs/= by one predicate: a doc carrying *both* a =Decisions= heading *and* an =Implementation phases= heading is a spec candidate; everything else is a note. (A =Metadata= table alone does not qualify — real counter-case: =docs/design/task-review.org= has a Metadata table and no spine, and is a note.) The heuristic *proposes*; a human confirms every move (classification is a judgment call — Craig, 2026-06-28).
+2. *Move* confirmed specs to =docs/specs/=, *renaming to carry the =-spec.org= suffix* when the file lacks it (spec-review's Phase 0 precondition requires it — a retrofitted spec must be reviewable in its new home). Prepend the status heading, assign an =:ID:=, and rewrite the keyword header to the two-sequence form above. *The proposed keyword is evidence-based, not laundered:* the doc's own Status field is one signal among several, because stale Status fields are exactly what caused the original .emacs.d sweep. For each candidate the helper shows an evidence panel — the current Status value, the decision/finding cookie states, the state and heading of any =todo.org= task that links or binds to the doc, the most recent history/review entries, and (where cheap) whether artifacts the phases name actually exist — and proposes the keyword the evidence supports. When the evidence is inconclusive, the default is the most conservative *non-terminal* state it supports (never a terminal one). =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= are never applied without an explicit human-stated reason, recorded in the status-history line.
+3. *Relink* under an explicit contract:
+ - *Rewritten roots (project-owned):* =todo.org=, =.ai/notes.org=, =docs/**=, =.ai/project-workflows/=, =.ai/project-scripts/=. The rewrite recomputes each link's relative path from the linking file's directory to the new location. *All rewrites stay =file:= links* — conversion to =[[id:...]]= is the separate follow-up pass gated on the Emacs id-index mechanism (see Rename-safe links), never part of a sort run.
+ - *Reported, never rewritten:* =.ai/sessions/= archives (frozen history), git history, and synced template paths (=.ai/workflows/=, =.ai/scripts/=, =.ai/protocols.org=) — a downstream edit there is reverted by the next template sync, so the report names the canonical rulesets file that needs the edit instead.
+ - *Supported link shapes:* org =[[file:...]]= links, relative or project-root-anchored, with or without a description. Bare-path mentions in prose or scripts are *reported for manual handling*, never rewritten.
+ - *Safety:* dry-run report is the default; =--apply= writes, under a fail-safe contract sized to the fact that one run mutates filenames, links, headers, and =.ai/notes.org= together:
+ - *Clean-worktree preflight.* =--apply= refuses on a dirty git tree (=git status --porcelain= non-empty) unless =--allow-dirty= is passed, which prints exactly what recovery loses. A clean tree is what makes recovery trivially safe.
+ - *Validate, then write.* The full move + relink plan — every source, destination, and link edit — is computed and validated first (every link parses, every target is unambiguous, every destination path is free), written to a plan file for inspection, and only then executed from that recorded plan. Ambiguous cases (two candidates sharing a basename, an unparseable link) block validation: listed, untouched, non-zero exit until each is resolved or explicitly waived.
+ - *Failure mid-apply is not a shrug.* Any write failure or a failed post-apply residue grep stops the run, names what was and wasn't applied (from the plan), and prints the recovery recipe — =git restore= over the plan's touched paths *plus* deletion of the plan's newly-created destination paths (=git restore= reverts tracked edits but doesn't remove untracked copies the move created). Safe by construction because preflight required a clean tree; the project is never silently left half-migrated.
+ - After a successful apply, the residue grep for each old path across the rewritten roots must return zero or =spec-sort= exits non-zero naming the residue.
+4. *Stamp* =:LAST_SPEC_SORT: YYYY-MM-DD= in =.ai/notes.org='s =* Workflow State= section — the same surface as =:LAST_AUDIT:= and =:LAST_INBOX_PROCESS:=, created idempotently (append the section if the file lacks it) exactly as task-audit already does.
+
+*The startup nudge — concrete contract.* Phase A's parallel batch gains one read-only probe:
+
+#+begin_src bash
+{ [ -d docs/design ] || [ -n "$(find docs -maxdepth 1 -name '*-spec.org' -print -quit 2>/dev/null)" ]; } \
+ && ! grep -qs ':LAST_SPEC_SORT:' .ai/notes.org \
+ && echo "spec-sort: unsorted docs present" || true
+#+end_src
+
+(Phase 4 refined the stray-root check from =compgen= to =find=: =compgen= is bash-only and zsh aborts on an unmatched glob, so the original snippet false-negatived on stray root specs under zsh.)
+
+(The probe also fires on stray =docs/*-spec.org= root files, so a project whose only misfiled specs sit at the =docs/= root still gets nudged.)
+
+Phase C surfaces one line when the probe printed ("this project's docs pile has never been spec-sorted — say 'run spec-sort' to sort it") and stays silent otherwise. Projects with nothing to sort — no =docs/design/= and no stray root specs — never see it; a stamped marker permanently clears it.
+
+* Alternatives Considered
+
+** Filename status suffix (=-spec-doing.org=, =-spec-implemented.org=)
+- Good, because the state is visible in a bare =ls= with no tooling.
+- Bad, because every transition renames a file in a cross-linked, template-synced doc set — each rename is link surgery or a broken link, and the churn lands in git history and inbound =todo.org= links.
+- Neutral, because the ls-visibility it buys is matched by the one-line =rg= over status headings.
+- Rejected 2026-06-28 (Craig chose org-keyword over his earlier filename-suffix lean).
+
+** Status field in the Metadata table only (no keyword)
+- Good, because the field already exists and needs no new structure.
+- Bad, because a table cell is neither org-agenda-scannable nor reliably greppable across format drift, and it carries no dated history.
+- Neutral, because the field stays anyway — as the in-table mirror.
+
+** Relink-helper instead of org-id (keep =file:= links, fix them on every move)
+- Good, because readers see plain paths.
+- Bad, because it makes every future move a tooling event, and one missed run silently breaks links — the failure mode is invisible until someone clicks.
+- Neutral, because the retrofit needs relink logic once regardless; org-id just makes it a one-time need.
+
+* Decisions [5/5]
+
+All five were settled with Craig on 2026-06-28 (recorded in todo.org; migrated here per that note).
+
+** DONE Location split — adopt
+- Context: specs and notes share one directory; telling them apart requires opening files.
+- Decision: =docs/specs/= for formal specs (Decisions + phases spine); =docs/design/= for notes. Documented in spec-create and the docs-lifecycle rule.
+- Consequences: easier — a listing answers "what's formal"; harder — one-time migration and link updates (the retrofit).
+
+** DONE Status mechanism — org keyword authoritative, no filename suffix
+- Context: filename suffix vs org keyword; suffix wins =ls= visibility, keyword wins link stability and zero-rename transitions.
+- Decision: the org TODO keyword on the spec's top status heading is authoritative, mirrored by the Metadata =Status= field. No status suffixes in filenames.
+- Consequences: easier — a transition is one keyword edit and links never break; harder — glanceability needs the one-line =rg= (or the vNext agenda view) instead of bare =ls=.
+- (Refined in review, 2026-07-01: "one keyword edit" became "three lines in one file" — keyword + history line + Metadata mirror. The ratified decision stands; see Review findings.)
+
+** DONE Link safety — org-id for cross-doc spec links
+- Context: both the migration move and any future rename break =file:= links.
+- Decision: specs carry =:ID:= UUIDs on the status heading; cross-doc references use =[[id:...]]=.
+- Consequences: easier — moves are free; harder — following a link outside org needs the =rg ':ID:'= lookup.
+- (Refined in review, 2026-07-01: the decision stands; the *sequencing* is staged — IDs are assigned at sort time, but link conversion to =id:= waits for the executable Emacs id-index mechanism, so no window exists where converted links don't click. See Review findings.)
+
+** DONE Generalize as a =docs-lifecycle= rule
+- Context: the shape (in-artifact lifecycle state, formal-vs-notes split, rename-safe links) recurs for any processed-document collection.
+- Decision: capture it in =claude-rules/docs-lifecycle.md= with spec-create as the first instance.
+- Consequences: easier — the next collection reuses a decided pattern; harder — the rule must stay honest as the spec instance evolves.
+
+** DONE Retrofit existing files across ALL projects
+- Context: template sync distributes conventions but cannot perform a per-project one-time migration; legacy piles would stay misfiled forever.
+- Decision: ship a confirmed classify-move-relink helper (=spec-sort=) plus a startup nudge gated on =:LAST_SPEC_SORT:=; the helper proposes, a human confirms. Pilot on rulesets first.
+- Consequences: easier — every project converges without manual archaeology; harder — the helper needs real relink logic and tests, and classification stays a judgment call.
+
+* Review findings [14/14]
+:PROPERTIES:
+:ID: cc77a7f6-e4c3-488a-ac3b-e739420a5c2b
+:END:
+
+Two independent reviews (Codex, 2026-07-01 22:22; a fresh-context Claude agent, 2026-07-01 22:25) converged on =Not ready= with the same worst finding. All nine findings were dispositioned accept and fixed in the responder pass below; each carries its response.
+
+** DONE Org TODO vocabulary drops decision and finding task states :blocking:
+(Codex; the Claude reviewer found the same, adding that keywords must be unique across sequences so a naive two-line fix collides on =SUPERSEDED=/=CANCELLED=.) The spec's example header replaced the file-level keyword vocabulary, so =TODO=/=DONE= stopped being task states and the =[/]= cookies that gate readiness went vacuous — this file itself was the first casualty.
+Response: the scheme is now two collision-free sequences — =TODO | DONE= for decisions/findings, =DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED= for lifecycle (the old header's =SUPERSEDED CANCELLED= done-states migrate to the lifecycle sequence, and a legacy =CANCELLED= decision still parses as a done-state, so cookies stay correct). This file's own header now carries both lines; the Design section documents the two-sequence rule and the retrofit rewrites legacy headers to it. New acceptance criterion: cookies must compute by org, not hand counting.
+
+** DONE Relink behavior is too vague for a safe migration :blocking:
+(Codex; the Claude reviewer independently flagged the synced-.ai/ slice — a downstream rewrite there is reverted by the next template sync, e.g. =startup.org:154='s reference to a spec candidate.) The retrofit named no scan scope, link-shape list, rewrite rule, residue policy, or dry-run format — the implementer would have had to invent the migration's data-safety contract.
+Response: the retrofit section now carries the explicit contract: rewritten roots (=todo.org=, =.ai/notes.org=, =docs/**=, project-owned =.ai/= dirs), reported-never-rewritten surfaces (=.ai/sessions/=, git history, synced template paths — with the canonical rulesets file named in the report), supported link shapes (org =file:= links; bare paths report-only), relative-path recomputation, dry-run default with =--apply=, post-apply residue grep gating exit status, and refuse-loudly on ambiguity.
+
+** DONE Sort marker and startup nudge do not name the actual state surface :blocking:
+(Codex; the Claude reviewer rated the same gap minor — Codex's version was sharper: startup reads =.ai/notes.org=, not a root =notes.org=, and Workflow State may not exist.)
+Response: the marker is pinned to =.ai/notes.org='s =* Workflow State= (the =:LAST_AUDIT:= / =:LAST_INBOX_PROCESS:= surface), created idempotently as task-audit already does; the Design section now spells the Phase A probe command, its exact fire condition, and the Phase C one-liner.
+
+** DONE Phase order can strand legacy specs behind the new review precondition :blocking:
+(Codex; the Claude reviewer found the same at medium severity.) Hardening spec-review's path precondition in Phase 1 while piles stay unsorted until Phases 3-4 would make every legacy spec unreviewable in the gap.
+Response: Phase 1 now carries the compatibility rule — legacy =-spec.org= locations stay reviewable (with a "run spec-sort" nudge) until the project stamps =:LAST_SPEC_SORT:=; the precondition hardens only after. Acceptance criterion 5 updated to match.
+
+** DONE No owner for the DOING → IMPLEMENTED flip :blocking:
+(Claude reviewer.) spec-create owns =DRAFT= and spec-review owns =DRAFT= → =READY=, but implementation finishes outside the spec trio, and "a human edits it" is the exact mechanism whose failure produced this spec (.emacs.d's six shipped-but-unmarked specs).
+Response: the Design section now has a transition-ownership table naming an owner for every flip. =READY= → =DOING= belongs to spec-response; =DOING= → =IMPLEMENTED= is a tracked obligation — spec-response's phase-to-task breakdown always emits a final "flip the spec" task — with task-audit's reconcile pass as the safety net (flag any =DOING= spec whose implementation tasks are all closed). Phase 1 includes both workflow edits.
+
+** DONE Classification heuristic is precedence-ambiguous
+(Claude reviewer.) "Decisions plus phases or Metadata table" reads two ways, and =docs/design/task-review.org= (Metadata table, no spine) classifies differently under each.
+Response: one predicate now — spec candidate iff the doc carries *both* a =Decisions= heading *and* an =Implementation phases= heading; a Metadata table alone does not qualify. The task-review.org counter-case is cited in the retrofit step.
+
+** DONE spec-sort never renames moved files to the -spec.org suffix
+(Claude reviewer.) spec-review's Phase 0 hard-requires the suffix, so a retrofitted legacy spec without it would be unreviewable in its new home.
+Response: retrofit step 2 now renames moved files to carry =-spec.org= when they lack it; the relink pass covers the rename like any move. Acceptance criterion 3 checks the suffix on the re-homed root specs.
+
+** DONE Clicked id: links won't resolve in Craig's Emacs
+(Claude reviewer.) =org-id-locations= indexes only agenda and visited files, so fresh =:ID:=s in =docs/specs/= are invisible-until-clicked broken — the convention would trade visible link breakage for invisible breakage.
+Response: named as an explicit .emacs.d-side prerequisite in the Rename-safe-links section (=org-id-extra-files= over =docs/specs/= globs, or periodic =org-id-update-id-locations=), carried in the Phase 4 note to .emacs.d, with the =rg= recipe as the interim fallback.
+
+** DONE Acceptance criterion 2 contradicts the Metadata Status mirror
+(Claude reviewer.) "Exactly one keyword edit" was irreconcilable with the mandated mirror update.
+Response: a transition is now defined everywhere as three lines in one file — keyword, history line, mirror — still no rename and no link edits. Goals, Design, and criterion 2 all say the same thing.
+
+** DONE Synced helper placement ignores the canonical/mirror split :blocking:
+The spec says to build =.ai/scripts/spec-sort= and update =.ai/workflows/= behavior, but rulesets' current contract is that =claude-templates/.ai/= is canonical and the repo-root =.ai/= tree is only the committed mirror kept honest by =scripts/sync-check.sh=. =CLAUDE.md= explicitly warns that mirror-only edits get silently reverted by the next sync, and =make test= runs the mirror-side tests only after the canonical copy has been synced. V1 should say every shared workflow/script edit lands in =claude-templates/.ai/{workflows,scripts}/= first, then =scripts/sync-check.sh --fix= updates the mirror; =spec-sort= tests should be placed in the synced script-test tree and the acceptance criteria should include =sync-check= / workflow-integrity where relevant. (blocking)
+Response: the retrofit section now opens with the canonical-placement contract (helper + tests in =claude-templates/.ai/scripts{,/tests}/=, workflow edits canonical-side, =sync-check --fix= propagates, both sides commit together); Phases 1 and 2 name it per artifact; new acceptance criterion requires =sync-check= to exit clean after the build commits.
+
+** DONE Task-audit safety net has no spec-to-task binding :blocking:
+The spec says task-audit flags a =DOING= spec whose implementation tasks are all closed, but current =task-audit.org= audits open =todo.org= tasks and has no model for scanning =docs/specs/=, finding a spec's implementation tasks, or deciding "all closed" after =todo-cleanup.el --convert-subtasks= rewrites completed child tasks into dated entries. The added final "flip to IMPLEMENTED" task also means there may always be one open task, so a naive "all tasks closed" check never fires. V1 should define the binding spec-response writes into =todo.org= (for example a parent task property or stable link to the spec ID), the exact audit query, how converted dated entries count, and whether the final flip task is excluded from or satisfies the reconciliation rule. (blocking)
+Response: spec-response's decomposition now stamps a =:SPEC_ID:= property (the spec's status-heading UUID) on the build parent task — the durable binding. The audit query is defined: for each =DOING= spec, find the task with matching =:SPEC_ID:=; flag when that parent is closed, archived, or missing. Checking the parent's keyword (not "all children closed") dissolves both the flip-task chicken-and-egg and the dated-entry conversion concern. New acceptance criterion exercises the flag.
+
+** DONE spec-sort apply path can leave a half-migrated tree :blocking:
+The retrofit contract has dry-run by default and a post-apply residue grep, but it does not say what happens when =--apply= has moved files and then a relink, parse, or residue check fails. Because the operation mutates filenames, links, headers, IDs, and =.ai/notes.org= together, a partial failure can strand the project in the exact mixed state the tool is meant to prevent. V1 should require a clean-worktree preflight (or an explicit dirty-tree refusal/override), validate the full move/relink plan before the first write, write from a single recorded plan, and define recovery behavior for every failed apply: no files moved, automatic rollback, or a printed =git restore= / =git revert= recovery recipe that is safe for uncommitted local edits. (blocking)
+Response: the relink contract's safety block now specifies the fail-safe apply: clean-worktree preflight (refuse on dirty, explicit =--allow-dirty= override that prints what recovery loses), full plan computed + validated + written to a plan file before the first write, execution from the recorded plan, and mid-apply failure stopping with a named applied/not-applied breakdown plus the =git restore= recovery recipe — safe by construction because preflight required a clean tree. Bats covers the preflight and the forced-failure recovery output (Phase 2, plus a new acceptance criterion).
+
+** DONE org-id Emacs prerequisite is not executable as written :blocking:
+The spec says the .emacs.d-side fix can be =org-id-extra-files= over =docs/specs/= globs, but Emacs' own docstring says =org-id-extra-files= is a list of additional files and is only relevant when =org-id-track-globally= is set; it does not establish that project glob strings will be expanded or that every project root will be discovered. The rollout also converts links during the rulesets pilot before the Phase 4 note asks .emacs.d to make clicked =id:= links resolvable. V1 should either keep =file:= links until the Emacs support has landed, or specify the executable Emacs-side implementation precisely: how project =docs/specs/*.org= files are enumerated into =org-id-extra-files= or fed to =org-id-update-id-locations=, when it runs, how it is tested, and how rollout avoids a window where converted links do not click through. (blocking)
+Response: took the fork that removes the window entirely — the pilot and every sort run rewrite =file:= links only; =:ID:= properties are still assigned (harmless, enables later mechanics); conversion to =id:= is a separate follow-up pass gated on .emacs.d landing an executable id-index mechanism, now specified concretely (enumerate =docs/specs/*.org= into =org-id-extra-files= as real file names under =org-id-track-globally=, or feed the enumeration to =org-id-update-id-locations=; verified by clicking a known link). Decision 3 carries a sequencing-refinement note; a new acceptance criterion asserts zero =id:= links exist after the pilot.
+
+** DONE Status confirmation can still encode stale reality :blocking:
+The retrofit proposes lifecycle status from a doc's current =Status= field or review history, then asks a human to confirm. Those are the same stale/incomplete signals that caused the original .emacs.d sweep: shipped specs and dead specs were only knowable by reading code/tasks against the spec. If =spec-sort= only confirms a guessed keyword, the pilot can produce a clean-looking board whose state is still wrong. V1 should define status-confirmation evidence: for each spec candidate, what sources the helper shows (current Status, decision/finding cookies, linked =todo.org= parent state, recent history, matching implementation files/tests), what default is allowed when evidence is inconclusive, and that =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= require an explicit reason in the status history line. (blocking)
+Response: retrofit step 2 now defines the evidence panel the helper shows per candidate (Status value, cookie states, bound/linking =todo.org= task state, recent history entries, cheap existence checks on phase-named artifacts) with the keyword proposed from the evidence, not the Status field alone. Inconclusive evidence defaults to the most conservative non-terminal state; =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= always require an explicit human-stated reason recorded in the history line.
+
+* Implementation phases
+
+** Phase 1 — Rule + template updates
+Write =claude-rules/docs-lifecycle.md=. Update spec-create (emit into =docs/specs/=, the two-sequence keyword header, status heading with =:ID:= in the template, transition mechanics), spec-review (path expectation with the compatibility rule below; flipping =DRAFT= → =READY= on a passing review updates keyword + history + mirror), spec-response (owns =READY= → =DOING=; its decomposition stamps the =:SPEC_ID:= binding on the build parent and always emits the final "flip to IMPLEMENTED" task), and task-audit (one reconcile bullet running the =:SPEC_ID:= query: a =DOING= spec whose bound parent is closed, archived, or missing gets flagged). All four are synced assets: edits land in =claude-templates/.ai/= (and =claude-rules/=), the mirror follows via =sync-check --fix=, both commit together. *Compatibility rule:* spec-review keeps accepting legacy =-spec.org= locations (=docs/= root, =docs/design/=) until the project's =:LAST_SPEC_SORT:= is stamped, nudging "run spec-sort" when it meets one; only after the stamp does the =docs/specs/= precondition harden. No legacy spec is ever unreviewable during the transition. Tree stays working: new specs land in the new shape; old specs remain reviewable until their project sorts.
+
+** Phase 2 — The =spec-sort= helper
+Build =claude-templates/.ai/scripts/spec-sort= (classify → evidence-based confirm → plan + validate → move + rename + prepend status heading + assign =:ID:= → relink =file:= references → stamp =:LAST_SPEC_SORT:=), with bats coverage in =claude-templates/.ai/scripts/tests/= (glob-discovered by =make test=) for classification, the evidence/confirm gate, plan validation, moving + renaming, relinking, the clean-worktree preflight, mid-apply failure recovery output, idempotence, and the marker stamp. Mirror synced via =sync-check --fix= in the same commit. Tree stays working: the script is callable but nothing invokes it yet.
+
+** Phase 3 — Pilot on rulesets
+Run =spec-sort= against rulesets' own =docs/= (41 design files, 3 spec-spine candidates, 2 stray root specs). Fix what the pilot surfaces before any other project runs it. Tree stays working: moves are confirmed one by one, links updated in the same pass.
+
+** Phase 4 — Startup nudge + broadcast
+Add the Phase A probe + Phase C nudge line (the concrete contract in the Design retrofit section). Send .emacs.d a note that the convention is live, its ~28-doc pile is ready to sort, and the id-index mechanism is its side of the staged link conversion: enumerate each project's =docs/specs/*.org= into =org-id-extra-files= as real file names (with =org-id-track-globally= t) or feed that enumeration to a periodic =org-id-update-id-locations=, verified by clicking a known id link. The =id:= link-conversion pass across projects runs only after that lands — it is follow-up work, not part of v1's sort runs. Tree stays working: the nudge is one read-only line per session until acted on; every link is a working =file:= link until conversion day.
+
+* Acceptance criteria
+- [ ] =rg '^\* (DRAFT|READY|DOING|IMPLEMENTED|SUPERSEDED|CANCELLED) ' docs/specs/= lists every rulesets spec with its state, and the answer matches reality.
+- [ ] A status transition on a spec changes exactly three lines in one file — the keyword, a history line, and the Metadata mirror — with no rename and no link edits.
+- [ ] Every doc remaining in rulesets =docs/design/= is a note (lacks the Decisions + Implementation-phases spine); both stray =docs/= root specs are re-homed and carry the =-spec.org= suffix.
+- [ ] All inbound links in the rewritten roots resolve after the pilot, and the post-apply residue grep returns zero.
+- [ ] The spec's own decision/finding =[/]= cookies compute correctly under the two-sequence keyword header (org, not hand counting).
+- [ ] spec-create emits new specs into =docs/specs/= in the new shape; spec-review accepts legacy locations until =:LAST_SPEC_SORT:= is stamped and refuses them after.
+- [ ] Every helper/workflow artifact of this feature lives canonical-side (=claude-templates/.ai/=, =claude-rules/=) with the mirror in sync — =scripts/sync-check.sh= exits clean after the build commits.
+- [ ] A =DOING= spec whose =:SPEC_ID:=-bound parent task is closed or missing is flagged by task-audit's reconcile pass (exercised in the pilot or a fixture).
+- [ ] =spec-sort --apply= on a dirty worktree refuses (absent the override); a forced mid-apply failure in the bats suite yields the named-recovery output, not a half-migrated tree.
+- [ ] After the pilot, no link the sort *rewrote* uses =[[id:...]]= form and no rewritten root gained a new =id:= link targeting a spec (conversion is the gated follow-up); every rewritten link is a resolving =file:= link. The check scopes to actual rewritten and spec-target links — literal prose mentions of the id syntax (which already exist in =todo.org= and older specs) don't count, so a naive whole-file grep is the wrong implementation.
+- [ ] A project with an unsorted =docs/design/= gets the startup nudge; one confirmed =spec-sort= run clears it via =:LAST_SPEC_SORT:=.
+
+* Readiness dimensions
+- Data model & ownership: the spec file owns its state; the Metadata mirror is display-only. No external index to drift.
+- Errors, empty states & failure: =spec-sort= on a project with no =docs/= is a silent no-op; an ambiguous classification is surfaced, never auto-moved; a relink pass that finds zero inbound links is normal.
+- Security & privacy: N/A because the docs are already in-repo; no new exposure surface.
+- Observability: the status grep is the dashboard; =spec-sort= prints every proposed move and every rewritten link.
+- Performance & scale: N/A because collections are tens of files; everything is one-shot or grep-speed.
+- Reuse & lost opportunities: reuses org TODO keywords, org-id, the existing scheme-header pattern of declared vocabularies, and spec-review's in-file findings convention.
+- Architecture fit & weak points: weak point is the classification heuristic — mitigated by the confirm gate. The status heading is additive, so old readers of spec files see one extra heading and nothing breaks.
+- Config surface: none new — one marker line (=:LAST_SPEC_SORT:=) in the existing Workflow State section.
+- Documentation plan: the docs-lifecycle rule is the documentation; spec-create's template is the worked example.
+- Dev tooling: =spec-sort= ships with bats tests under the existing glob-discovered suite.
+- Rollout, compatibility & rollback: additive per project, one project at a time, rulesets first. Rollback of a sort is =git revert= of the pilot commit (moves + relinks are one commit).
+- External APIs & deps: N/A — plain files, =rg=, =uuidgen=.
+
+* Risks, Rabbit Holes, and Drawbacks
+- *Relink misses an inbound link shape* (org radio links, bare paths in scripts). Dodge: the pilot greps for the old path after moving and fails loudly on any residue.
+- *Heuristic over-classifies notes as specs.* Dodge: the confirm gate is mandatory; the helper never moves unconfirmed.
+- *Keyword vocabulary drift* between this spec, the rule, and spec-create's template. Dodge: the rule names the vocabulary once and the others link it.
+
+* Testing / Verification / Rollout
+bats for =spec-sort= (classification, the evidence/confirm gate, plan validation, move + rename, relink, the clean-worktree preflight, forced mid-apply failure recovery output, idempotence, marker stamp). The pilot run on rulesets is the live verification; the post-move residue grep is the acceptance check. Rollout is per-project via the startup nudge, each run human-confirmed.
+
+* References / Appendix
+- Source proposal: [[file:../design/2026-06-15-spec-storage-lifecycle-proposal.org]] (.emacs.d handoff, 2026-06-15).
+- Decisions record: todo.org "Spec storage location + lifecycle-status convention" (settled 2026-06-28).
+- This file is the convention's first resident: it lives in =docs/specs/=, carries the status heading + =:ID:=, and drops the status filename suffix.
+
+* Review and iteration history
+** 2026-07-01 Wed @ 22:13:00 -0400 — Claude — author
+- What: initial draft, written from the five pre-ratified decisions.
+- Why: the queued-specs half of the 2026-06-30 session goal; decisions were settled 2026-06-28 and needed migration into a buildable spec.
+- Artifacts: todo.org task "Spec storage location + lifecycle-status convention"; source proposal above.
+
+** 2026-07-01 Wed @ 22:22:34 -0400 — Codex — reviewer
+- What changed or was recommended: rubric =Not ready=. Four blocking findings were added: preserve Org task keywords while adding lifecycle status, make =spec-sort= relinking executable and failure-safe, define the actual =.ai/notes.org= marker/startup-nudge contract, and avoid stranding legacy specs behind a stricter path precondition before retrofit.
+- Why: current rulesets workflows still depend on =TODO= / =DONE= decision and finding tasks, startup state lives in =.ai/notes.org=, and the repo still contains formal specs outside =docs/specs/= until the migration runs.
+- Artifacts: Review findings section; current-state checks against =.ai/workflows/spec-create.org=, =.ai/workflows/spec-review.org=, =.ai/workflows/startup.org=, =scripts/sync-check.sh=, and =todo.org=.
+
+** 2026-07-01 Wed @ 22:25:00 -0400 — Claude (fresh-context agent) — reviewer
+- What: rubric =Not ready=. Independently found Codex's keyword-vocabulary blocker (adding the cross-sequence uniqueness wrinkle) and the stranded-legacy-specs and marker-surface gaps, plus five findings of its own: no owner for the =DOING= → =IMPLEMENTED= flip (blocking), the precedence-ambiguous classification heuristic, the missing =-spec.org= rename in spec-sort, org-id click-resolution in a live Emacs, and the criterion-2/mirror contradiction.
+- Why: fresh-eyes adversarial pass requested by Craig after his own read found nothing; the two reviews converging on the same worst bug from independent context is the confidence signal.
+- Artifacts: Review findings section (findings 5-9); spot-checks against real repo files (=docs/design/task-review.org=, the two stray root specs, =startup.org:154=).
+
+** 2026-07-01 Wed @ 22:46:52 -0400 — Claude — second responder pass
+- What: fixed all five of Codex's re-review findings in place (fourteen of fourteen closed): canonical-placement contract for every synced artifact (+ sync-check acceptance criterion), the =:SPEC_ID:= spec-to-task binding with the parent-keyword audit query (dissolving the flip-task chicken-and-egg), the fail-safe =--apply= contract (clean-tree preflight, validate-then-write from a recorded plan, named recovery), staged id-link conversion (pilot rewrites =file:= links only; =id:= conversion gated on the concrete .emacs.d id-index mechanism — the fork Craig approved), and evidence-based status confirmation (evidence panel, conservative non-terminal default, reasons required for terminal states). Status stays DRAFT; the READY flip belongs to the reviewers this round.
+- Why: Craig approved fixing all five ("1", 2026-07-01), including the keep-file:-links-through-pilot fork.
+- Artifacts: per-finding responses inline; the fixed Design/phase/criteria sections.
+
+** 2026-07-01 Wed @ 23:22:50 -0400 — Codex — reviewer
+- What changed or was recommended: rubric =Ready=. No new blocking findings. The second responder pass closed all five Codex re-review blockers without regressing the first nine findings, and the spec now gives implementers concrete contracts for canonical synced assets, =:SPEC_ID:= task binding, fail-safe =spec-sort --apply= behavior, staged id-link conversion, evidence-based status confirmation, phase sequencing, and test coverage.
+- Why: the current spec can be implemented and tested without hidden product decisions; remaining vNext work is separately tracked.
+- Artifacts: status heading flipped to =READY=; =* Decisions= [5/5]; =* Review findings= [14/14]; Emacs batch cookie check.
+
+** 2026-07-01 Wed @ 22:41:21 -0400 — Claude (fresh-context agent) — verify pass; Claude — READY flip
+- What: the original reviewer re-read the fixed spec against its own nine findings: all held, none regressed, verdict ready. It re-ran the classification predicate live (exactly 5 candidates; task-review.org excluded) and confirmed org computes the cookies. Two non-blocking minors folded in before the flip: a refinement note under Decision 2 (whose frozen body still said "one keyword edit") and a wider nudge probe that also fires on stray =docs/*-spec.org= root files. Status flipped DRAFT → READY.
+- Why: Craig authorized the flip contingent on the verify pass clearing; it did.
+- Artifacts: the status heading's history line; verify-pass report in the session record.
+
+** 2026-07-01 Wed @ 22:41:33 -0400 — Codex — reviewer
+- What changed or was recommended: rubric =Not ready=. Five new blocking findings were added after the response pass: make shared workflow/script edits obey the =claude-templates/.ai/= canonical plus =.ai/= mirror contract; define how task-audit binds a =DOING= spec to its implementation tasks; make =spec-sort --apply= failure-safe; turn the org-id Emacs prerequisite into an executable rollout step; and require status confirmation to be evidence-based rather than a rubber-stamp of stale fields.
+- Why: the response fixed the original keyword/relink/precondition issues but introduced new integration points in synced template assets, task-audit, Emacs id resolution, and migration safety that are not yet buildable from the spec.
+- Artifacts: Review findings section; checks against =CLAUDE.md=, =scripts/sync-check.sh=, =.ai/workflows/task-audit.org=, =.ai/workflows/startup.org=, =.ai/notes.org=, current =docs/= inventory, and Emacs batch/docstring checks for Org TODO cookies and =org-id-extra-files=.
+
+** 2026-07-01 Wed @ 22:30:06 -0400 — Claude — responder
+- What: merged both reviews into one findings ledger (nine findings, all dispositioned accept) and fixed all nine in place: two-sequence keyword header (applied to this file itself), transition-ownership table with the tracked flip-to-IMPLEMENTED task, single classification predicate, the -spec.org rename step, the full relink data-safety contract, the =.ai/notes.org= marker + Phase A/C startup contract, the legacy-location compatibility rule, the org-id Emacs prerequisite, and the three-line transition definition. Acceptance criteria updated to match.
+- Why: Craig approved fixing all nine ("1", 2026-07-01); none touched the five ratified decisions.
+- Artifacts: Review findings section (responses inline per finding); the fixed sections themselves.
diff --git a/docs/agent-knowledge-base-spec.org b/docs/specs/agent-knowledge-base-spec.org
index 78ff9bd..e36d897 100644
--- a/docs/agent-knowledge-base-spec.org
+++ b/docs/specs/agent-knowledge-base-spec.org
@@ -1,12 +1,20 @@
#+TITLE: Agent Knowledge Base on Org-roam — Spec
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2026-06-10
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* IMPLEMENTED Agent Knowledge Base on Org-roam — Spec
+:PROPERTIES:
+:ID: 08a5ec99-9e1e-40e4-8241-e8a41e9de49f
+:END:
+- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to IMPLEMENTED (reason: v1 (Phases 0-4) shipped 2026-06-10 on Craig's go; KB live at ~/org/roam with the knowledge-base rule installed machine-wide)
* Metadata
-| Status | implemented — v1 (Phases 0-4) shipped 2026-06-10 on Craig's go; manual validation + other-machine clones outstanding (todo.org) |
+| Status | implemented |
| Owner | Craig Jennings |
| Reviewer | Craig Jennings; Codex (2026-06-10) |
-| Related | [[file:../todo.org][todo.org — "Check that memories are sync'd across machines via git"]] |
+| Related | [[file:../../todo.org][todo.org — "Check that memories are sync'd across machines via git"]] |
This spec supersedes the 2026-06-05 draft (formerly docs/design/2026-06-05-org-roam-knowledge-base-spec.org, removed; content in git history), folding in Craig's 2026-06-10 ratification answers and restructuring to the spec-create format.
@@ -308,4 +316,4 @@ Modified recommendations from the 2026-06-10 Codex review, with reasons. Everyth
** 2026-06-10 Wed @ 17:31:10 -0500 — Codex — reviewer
- What changed or was recommended: re-ran the spec-review workflow after the caveat resolution. Rubric: ready. No new blocking or medium-priority findings; no review file written. Confirmed the implementation phases and test-surface tasks are already represented under the existing parent task in todo.org.
- Why: the prior blockers are dispositioned, the work-root denylist is confirmed, the pointer-rule install path matches the current Makefile RULES glob, and v1's manual/agent-runnable verification surface is explicit.
-- Artifacts: this file; [[file:../todo.org][todo.org]] parent task "Check that memories are sync'd across machines via git".
+- Artifacts: this file; [[file:../../todo.org][todo.org]] parent task "Check that memories are sync'd across machines via git".
diff --git a/docs/inbox-workflow-consolidation-spec.org b/docs/specs/inbox-workflow-consolidation-spec.org
index 2e158b6..4543e77 100644
--- a/docs/inbox-workflow-consolidation-spec.org
+++ b/docs/specs/inbox-workflow-consolidation-spec.org
@@ -1,16 +1,23 @@
#+TITLE: Inbox Workflow Consolidation — Spec
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2026-06-23
-#+TODO: TODO | DONE SUPERSEDED CANCELLED
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* READY Inbox Workflow Consolidation — Spec
+:PROPERTIES:
+:ID: a7fe2a10-dfa8-4ba3-a11a-e7b1288b7573
+:END:
+- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to READY (evidence-based, human-confirmed)
* Metadata
-| Status | Ready — review incorporated (Codex, 2026-06-23) |
+| Status | ready |
|----------+-------------------------------------------------------------|
| Owner | Craig |
|----------+-------------------------------------------------------------|
| Reviewer | Craig |
|----------+-------------------------------------------------------------|
-| Related | [[file:../todo.org][Consolidate inbox/triage workflows + scheduled inbox check]] |
+| Related | [[file:../../todo.org][Consolidate inbox/triage workflows + scheduled inbox check]] |
|----------+-------------------------------------------------------------|
* Summary
diff --git a/docs/design/wrapup-routing-spec.org b/docs/specs/wrapup-routing-spec.org
index 434f8d9..1a150fc 100644
--- a/docs/design/wrapup-routing-spec.org
+++ b/docs/specs/wrapup-routing-spec.org
@@ -1,16 +1,23 @@
#+TITLE: Wrap-Up Inbox/Transcript Routing — Spec
#+AUTHOR: Craig Jennings
#+DATE: 2026-06-13
-#+TODO: TODO | DONE SUPERSEDED CANCELLED
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* DOING Wrap-Up Inbox/Transcript Routing — Spec
+:PROPERTIES:
+:ID: 00b47414-2213-4a99-be35-48ceb266fc08
+:END:
+- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to DOING (evidence-based, human-confirmed)
* Metadata
-| Status | Ready — review incorporated (spec-review, 2026-06-21) |
+| Status | doing |
|----------+-----------------------------------------------------|
| Owner | Craig Jennings |
|----------+-----------------------------------------------------|
| Reviewer | Codex (spec-review) |
|----------+-----------------------------------------------------|
-| Related | [[file:../../todo.org][todo.org: wrap-up routing task]] · [[file:2026-06-13-wrapup-inbox-transcript-routing-proposal.org][archsetup proposal]] |
+| Related | [[file:../../todo.org][todo.org: wrap-up routing task]] · [[file:../design/2026-06-13-wrapup-inbox-transcript-routing-proposal.org][archsetup proposal]] |
|----------+-----------------------------------------------------|
* Summary