aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/design/2026-06-16-autonomous-batch-execution-spec.org329
-rw-r--r--docs/design/2026-06-21-anki-titlefix-proposal.org57
-rw-r--r--docs/design/2026-06-21-apkg-to-orgdrill-buildreq.org68
-rw-r--r--docs/design/2026-06-21-flashcard-stats-refutation-proposal.org57
-rw-r--r--docs/design/2026-06-21-host-identity-guard-proposal.org54
-rw-r--r--docs/design/2026-06-22-inbox-zero-capture-hardening.org39
-rw-r--r--docs/design/2026-06-23-install-lang-claude-md-gap.org31
-rw-r--r--docs/design/2026-06-23-wrap-teardown-shutdown-proposal.org124
-rw-r--r--docs/design/2026-06-27-bug-priority-matrix-cover-note.org5
-rw-r--r--docs/design/2026-06-27-bug-priority-matrix-proposal.md70
-rw-r--r--docs/design/2026-06-29-green-baseline-proposal.org72
-rw-r--r--docs/design/2026-06-29-lint-org-structural-checkers-proposal.org55
-rw-r--r--docs/design/2026-06-29-todo-cleanup-aging-proposal.org64
-rw-r--r--docs/design/2026-06-30-daily-drivers-tailscale-correction.org9
-rw-r--r--docs/design/2026-07-02-auto-flush-mechanism-note.org20
-rw-r--r--docs/specs/2026-06-16-autonomous-batch-execution-spec.org393
-rw-r--r--docs/specs/2026-06-16-encourage-kb-contribution-spec.org (renamed from docs/design/2026-06-16-encourage-kb-contribution-spec.org)11
-rw-r--r--docs/specs/2026-07-01-docs-lifecycle-spec.org360
-rw-r--r--docs/specs/agent-knowledge-base-spec.org (renamed from docs/agent-knowledge-base-spec.org)14
-rw-r--r--docs/specs/inbox-workflow-consolidation-spec.org199
-rw-r--r--docs/specs/wrapup-routing-spec.org (renamed from docs/design/wrapup-routing-spec.org)13
21 files changed, 1707 insertions, 337 deletions
diff --git a/docs/design/2026-06-16-autonomous-batch-execution-spec.org b/docs/design/2026-06-16-autonomous-batch-execution-spec.org
deleted file mode 100644
index e2e0f90..0000000
--- a/docs/design/2026-06-16-autonomous-batch-execution-spec.org
+++ /dev/null
@@ -1,329 +0,0 @@
-#+TITLE: Autonomous-Batch Task Execution — Spec
-#+AUTHOR: Craig Jennings & Claude
-#+DATE: 2026-06-16
-#+TODO: TODO | DONE SUPERSEDED CANCELLED
-
-* Metadata
-| Status | draft |
-|----------+--------------------------------------------------------------------|
-| Owner | Craig Jennings |
-|----------+--------------------------------------------------------------------|
-| Reviewer | Craig Jennings |
-|----------+--------------------------------------------------------------------|
-| Date | 2026-06-16 |
-|----------+--------------------------------------------------------------------|
-| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:2026-06-15-fix-speedrun-workflow-proposal.org][fix-speedrun proposal]] |
-|----------+--------------------------------------------------------------------|
-
-* Summary
-
-Two proposals arrived within a day of each other describing the same capability: have Claude work a batch of small, well-marked tasks autonomously, with a full quality bar per task and no per-step approval gate. The inbox-zero "Phase E" proposal drives it from a tag/priority query on a recurring loop; the "fix speedrun" proposal drives it from an explicit ordered list a human dictates in-session. This spec reconciles both into one feature: a single dedicated workflow, =work-the-backlog.org=, that holds the task-execution logic, with two thin callers feeding it. It also designs the instrumentation that measures whether the autonomy is actually paying off.
-
-* Problem / Context
-
-Craig has a standing backlog of small, solo-doable fixes across several projects, already marked with a tag convention (=:next:=, =:quick:+:solo:=). Doing them by hand one at a time is the bottleneck — each is 30 minutes or less, but the context-switch and the per-commit approval ceremony dominate the actual work. He wants Claude to burn these down unattended: on a recurring loop for the routed inbox case, and on demand when he batches a named list and says "fix speedrun, no approvals until done."
-
-Two separate proposals tried to answer this:
-
-- *Phase E* (in =inbox-zero.org=, edited in =.emacs.d= as a stopgap) bolted autonomous execution onto the inbox-zero workflow's on-demand and loop callers. The sender flagged the seam as the open question: coupling capture-routing with autonomous-implementation pollutes inbox-zero's three existing callers (startup, wrap-up, on-demand), two of which must never execute anything.
-- *fix speedrun* (a =.emacs.d= theme-studio session that worked well) is the same execution loop driven by an explicit ordered task set, with end-of-set paging and always-push.
-
-They overlap almost entirely. The execution loop — eligibility gate, act-vs-file decision, per-task quality bar, bounded run — is identical. Only the *input* differs (tag query vs explicit list) and the *session mode* differs (loop default vs no-approvals + always-push + page). Building them as two features would duplicate the execution logic and let the two copies drift. The forces: keep inbox-zero's callers clean, share one execution loop, and make the autonomy safe enough to run unattended on a 30-minute timer without Craig watching.
-
-A second, explicit ask from Craig: instrument this so its effectiveness is measurable. "Gather data on this and create some org-roam articles we can look at later." Autonomous execution that silently makes bad commits is worse than no autonomy; the only way to know which it is, is to measure tasks completed vs deferred vs reverted, and human corrections in the following session, over time.
-
-* Goals and Non-Goals
-
-** Goals
-- One workflow, =work-the-backlog.org=, owns the task-execution loop. Both input shapes (tag query, explicit list) and both session modes feed it.
-- inbox-zero's three existing callers stay clean: the loop caller chains into =work-the-backlog= *after* routing; startup and wrap-up never touch it.
-- "fix speedrun" is a thin named preset, not a second implementation: no-approvals session mode + always-push + end-of-set page, feeding an explicit ordered list.
-- Commit autonomy defaults to file-only (surface a diff, no auto-commit). A project opts into autonomous commit+push explicitly via its per-project waiver.
-- Hard guardrails: refuse to speedrun any task needing a design decision or carrying data-loss risk without a checkpoint; file a =VERIFY= and move on rather than guess-implement an underspecified task; a per-run cap / kill switch beyond "one task per run."
-- A lightweight per-run metrics log plus a periodic synthesis step that writes org-roam KB articles summarizing the trend.
-
-** Non-Goals
-- *Not* a replacement for =/start-work=. Tasks needing deliberation, design, or an hour-plus stay with =/start-work= and its approval gates. This feature only touches the small, marked, solo set.
-- *Not* a new tag convention. It reads the project's own priority/tag scheme header; it never invents or hardcodes tags across projects.
-- *Not* an inbox-routing change. =inbox-zero.org= keeps its A-D phases. The Phase E text added in =.emacs.d= as a stopgap is *removed* and its logic moves here.
-- *Not* a multi-project orchestrator. One run works one project's backlog. Cross-project handoff stays with =inbox-send= and the paging reply.
-- *Not* a credential-handling or external-API feature. Tasks that touch secrets or external mutations are out of the eligible set by the guardrail.
-
-** Scope tiers
-- *v1:* =work-the-backlog.org=; the eligibility gate reading the project's scheme header; the act-vs-file decision with VERIFY-on-ambiguity; file-only commit default with per-project opt-in; the loop caller wiring and inbox-zero Phase E removal; the "fix speedrun" preset with end-of-set =notify --persist= page; the per-run metrics log (structured JSONL).
-- *Out of scope:* a token-budget kill switch (cap is a task count in v1); cross-project batch runs; a dashboard or live UI over the metrics.
-- *vNext (log to todo.org):* the periodic org-roam synthesis step if it doesn't make v1; a token/cost budget alongside the task-count cap; auto-detection of "human corrected my autonomous commit" from the next session's diff.
-
-* Design
-
-** Overview
-
-The architecture is one execution workflow with two callers and one preset, plus an instrumentation sidecar.
-
-#+begin_example
- inbox-zero loop caller ──(after Phase D routing)──┐
- ├──▶ work-the-backlog.org ──▶ metrics log (JSONL)
- "fix speedrun" preset ──(explicit ordered list)───┘ │
- = no-approvals + always-push + end-page ▼
- periodic synthesis ──▶ org-roam KB articles
-#+end_example
-
-=work-the-backlog.org= is the only place the execution loop lives. It takes a *task set* (however assembled) and a *session mode* (which gates commit autonomy and paging), and works the set under a fixed safety contract. The two callers differ only in how they build the task set and which session mode they pass.
-
-This is the seam the Phase E sender asked for: separating capture-routing (inbox-zero) from autonomous-implementation (work-the-backlog) keeps inbox-zero's startup and wrap-up callers — which must never execute anything — untouched. The loop caller is the only one of inbox-zero's callers that chains forward into execution, and it does so as an explicit second step after routing completes, not as a phase buried inside inbox-zero.
-
-** The execution loop (two-altitude: caller's view)
-
-A caller hands =work-the-backlog= three things:
-
-1. *A task set* — either an explicit ordered list of task headings (fix speedrun), or the result of a tag/priority query against =todo.org= (the loop). The workflow does not care which; it receives an ordered list of candidate tasks.
-2. *A session mode* — =file-only= (default) or =autonomous-commit= (requires the project's per-project waiver), and a paging flag.
-3. *A run cap* — the maximum number of tasks to complete this run.
-
-It returns: per-task outcome (implemented+committed / implemented+diff-surfaced / deferred-VERIFY / deferred-too-large / skipped-ineligible), and a metrics record per task.
-
-** The execution loop (implementer's view)
-
-For the task set, in order, until the run cap is hit:
-
-1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task.
-2. *Scope read* of the relevant code. Cheap; just enough to make the act-vs-file call.
-3. *Act-vs-file decision* (below). File → record the deferral reason, next task.
-4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important, then close the task per =todo-format.md=.
-5. *Commit autonomy branch:*
- - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=.
- - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=.
-6. *Record metrics* for the task (the JSONL append, below).
-7. Decrement the cap. At zero, stop.
-
-After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary.
-
-** Eligibility gate
-
-A task is autonomous-safe when *all* hold:
-
-1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= is the "awaiting Craig's manual confirmation" marker; auto-implementing one defeats the manual check it represents.
-2. *Tagged per the project's autonomous-safe set* — resolved by reading the project's priority/tag scheme header at the top of its =todo.org=, not by hardcoding. The default reading is =:next:= OR both =:quick:= AND =:solo:=, but a project whose scheme declares a different autonomous-safe tag set overrides that.
-3. *Solo-doable* — no input or undecided judgment call from Craig.
-4. *Roughly 30 minutes or less* of focused work.
-
-** Act-vs-file decision (the guardrail)
-
-After the scope read, for each eligible candidate:
-
-- *Clear, bounded, solo, ≤ ~30 min* → implement.
-- *Needs a design decision, Craig's input, or discussion* → do NOT implement. File a one-line note on the task naming the input it needs; surface it.
-- *Carries data-loss risk without a checkpoint* (deletes data, rewrites persisted state, touches external/shared state irreversibly) → do NOT implement. File a =VERIFY= explaining the risk; surface it.
-- *Underspecified or already-satisfied* → do NOT guess-implement. File a =VERIFY= noting why (the fix-speedrun "raise max spans to 5 — every cap was already 8" case) and move on.
-- *An hour or more* → do NOT implement. File and surface as a =/start-work= candidate.
-
-When unsure which side a task falls on, file rather than implement. A wrong auto-implement costs more than a deferred task — it costs a revert *and* the human correction in the next session that the metrics are designed to catch.
-
-** Session modes and the "fix speedrun" preset
-
-Two orthogonal session-mode dimensions feed the loop:
-
-- *Commit autonomy:* =file-only= (default) or =autonomous-commit=. =autonomous-commit= is honored only when the project carries the per-project waiver (=.emacs.d= and =rulesets= have it; most projects do not). Absent the waiver, a request for =autonomous-commit= degrades to =file-only= and says so.
-- *Paging:* on or off. End-of-set only.
-
-"fix speedrun" is the named preset = =autonomous-commit= + always-push + paging-on, fed an *explicit ordered list*. It is not a separate code path; it is a label for that combination of mode flags plus the explicit-list input. The loop caller, by contrast, runs =file-only= (unless the project has the waiver and opts the loop into commits) with paging off, fed the *tag query*.
-
-** Bounding the run and the kill switch
-
-Default cap: one task per run for the loop caller — implement the highest-priority eligible candidate (=[#A]= before =[#B]= before =[#C]=), record, then stop and let the next tick continue. The fix-speedrun preset works the whole explicit list in order (the human bounded it by naming it), still one commit per task.
-
-The kill switch is a hard per-run task cap passed by the caller, independent of "one per run": even fix-speedrun stops at the cap and pages with the remainder listed. A loop that fires every 30 minutes and commits unattended needs a ceiling that a runaway can't exceed.
-
-** End-of-set paging
-
-When the set is done (or the cap is hit), if paging is on, fire one page — end-of-set only, never per-task:
-
-#+begin_src sh
-notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>" --persist
-#+end_src
-
-=--persist= keeps it on screen until dismissed (the page-me convention). The message carries the project name, the completed count, and the remaining count, so Craig can reply confirming ready + naming the next project in one turn. The page-signal wrapper removed 2026-06-12 is reconciled to =notify= here — there is no separate page-signal call.
-
-* Alternatives Considered
-
-** Fold execution into inbox-zero (the Phase E stopgap shape)
-- Good, because it's the smallest diff — the loop caller already runs inbox-zero, so execution is "one more phase."
-- Bad, because it couples capture-routing with implementation. inbox-zero has three callers; startup and wrap-up must never execute. A Phase E inside inbox-zero forces both to carry a "skip Phase E" caveat and risks a future caller running it by accident.
-- Neutral, because the eligibility-gate and act-vs-file text is identical either way — only its *home* differs.
-
-** Two separate features (keep Phase E and fix-speedrun distinct)
-- Good, because each proposal ships as written with no reconciliation work.
-- Bad, because the execution loop is duplicated in two places and will drift; a guardrail tightened in one won't reach the other. Two ways to do autonomous execution is two things to audit.
-- Neutral, because the input and session-mode differences are real — but they're thin caller-level differences, not a reason to fork the engine.
-
-** Autonomous-commit as the default
-- Good, because it's faster end-to-end with no diff to review.
-- Bad, because most projects lack the per-project waiver, and an unattended loop committing to a project that never opted in is exactly the failure the file-only default prevents. The blast radius of a bad autonomous commit is a revert plus lost trust in the loop.
-- Neutral, because the projects that *do* want it (=.emacs.d=, =rulesets=) opt in explicitly, so the capability is available where it's wanted without being the default everywhere.
-
-* Decisions [/]
-
-** TODO Where the eligibility gate reads its tag set
-- Owner / by-when: Craig / spec-review
-- Context: Phase E hardcoded =:next:= / =:quick:+:solo:=. Projects' priority/tag schemes vary, and the =todo-format.md= scheme header is the declared source of truth per project.
-- Decision: We will read the project's =todo.org= priority/tag scheme header to resolve the autonomous-safe tag set, defaulting to =:next:= OR =:quick:+:solo:= when the header doesn't declare an explicit autonomous-safe set.
-- Consequences: easier — one workflow works correctly across projects with different tag vocabularies; harder — a project with no scheme header (or a malformed one) needs a fallback, and the "default reading" has to be specified precisely enough that two projects agree on it.
-
-** TODO The do-not-auto-implement marker set
-- Owner / by-when: Craig / spec-review
-- Context: =VERIFY= means "awaiting Craig's manual confirmation" in =.emacs.d= and =rulesets=. Other projects may use =VERIFY= differently or not at all. The gate excludes =VERIFY=, =DOING=, =DONE=, =CANCELLED= by status, but the *marker semantics* are what matter.
-- Decision: We will define the do-not-auto-implement set as: any status that is not =TODO=, plus any task carrying a project-declared "hold" marker. The canonical default treats =VERIFY= as do-not-implement; a project overrides only by declaring its marker semantics in its scheme header.
-- Consequences: easier — the gate is portable and a project can't accidentally have its manual-check tasks auto-run; harder — requires the scheme header to carry marker semantics, which most don't yet, so the default has to be safe-by-omission (exclude anything not plainly =TODO=).
-
-** TODO Commit-autonomy opt-in mechanism
-- Owner / by-when: Craig / spec-review
-- Context: =file-only= is the default; =.emacs.d= and =rulesets= have a per-project waiver allowing autonomous commits. Where does the workflow *read* that a project has opted in?
-- Decision: We will read the opt-in from the project's existing per-project waiver location (the same place the commit discipline's "no approval gate" waiver lives — =notes.org= Workflow State or =CLAUDE.md=), not introduce a new config file.
-- Consequences: easier — no new config surface, reuses the existing waiver concept; harder — the waiver's exact location and format must be pinned so the workflow can detect it deterministically, and a project with the commit waiver but *not* wanting the loop to commit needs a way to say "waiver yes, loop-commit no" (two flags, not one).
-
-** TODO Run-cap default and the kill switch shape
-- Owner / by-when: Craig / spec-review
-- Context: The loop default is one task per run; fix-speedrun works an explicit list. Both need a hard ceiling a runaway can't exceed.
-- Decision: We will pass a hard per-run task cap from the caller (loop default 1; fix-speedrun = length of the explicit list, capped at a ceiling), and stop + page with the remainder when the cap is hit. v1 caps by task count, not token budget.
-- Consequences: easier — a simple integer the caller controls; bounded blast radius; harder — a task-count cap doesn't bound *cost* (one 30-min task can burn many tokens), so a token budget is vNext, and until then a pathological task can run long within a single cap slot.
-
-** TODO Metrics log location and format
-- Owner / by-when: Craig / spec-review
-- Context: Per-run metrics must land somewhere structured and queryable, per-project, and survive across sessions for the synthesis step to read.
-- Decision: We will append one JSONL record per task to a per-project log at =.ai/metrics/work-the-backlog.jsonl=, git-tracked, with the synthesis step reading the union across projects.
-- Consequences: easier — append-only JSONL is trivial to write and =jq=-queryable; per-project keeps it local to where the work happened; harder — a git-tracked log adds churn to every autonomous run's commit (or needs its own commit), and "union across projects" needs the synthesis step to know where every project's log lives.
-
-** TODO Synthesis cadence and trigger
-- Owner / by-when: Craig / spec-review
-- Context: Craig wants periodic org-roam articles summarizing the data. What triggers synthesis, and how often?
-- Decision: We will run synthesis on an explicit trigger ("synthesize backlog metrics") and optionally a weekly scheduled run, writing one KB node per synthesis under =~/org/roam/agents/= per the knowledge-base rule.
-- Consequences: easier — explicit trigger means no surprise writes, and the KB rule already governs node shape; harder — a weekly scheduled run needs a scheduler entry and the KB write-classification (personal-only) must gate it so work-project metrics never land in the KB.
-
-* Implementation phases
-
-** Phase 1 — Extract the execution loop into work-the-backlog.org
-Write =work-the-backlog.org= holding the eligibility gate, act-vs-file decision, per-task quality bar, and run-cap logic — taking a task set + session mode + cap as input. Remove the stopgap "Phase E" text from =inbox-zero.org= (restore it to its A-D shape) in the same change so there's one home, not two. Tree stays working: inbox-zero reverts to routing-only, and the new workflow is callable but not yet wired to the loop.
-
-** Phase 2 — Wire the two callers
-Add the loop caller's chain step (after inbox-zero Phase D, invoke work-the-backlog with the tag query + file-only + cap 1) and the "fix speedrun" preset (explicit list + autonomous-commit + always-push + paging-on). Both go through the same workflow. Tree stays working: each caller is independently testable.
-
-** Phase 3 — File-only vs autonomous-commit gate
-Implement the commit-autonomy branch: read the per-project waiver, degrade =autonomous-commit= to =file-only= when absent, surface the degrade. Tree stays working: default file-only behavior is the safe path even before the waiver-read lands.
-
-** Phase 4 — Guardrails and the page
-Implement the data-loss / design-decision refusal, the VERIFY-on-ambiguity filing, and the end-of-set =notify alarm ... --persist= page. Tree stays working: guardrails only ever *reduce* what runs, so adding them can't break a passing run.
-
-** Phase 5 — Metrics log
-Append the per-task JSONL record at each task outcome. Tree stays working: logging is a side effect that doesn't alter execution.
-
-** Phase 6 — Synthesis to org-roam
-Write the synthesis step: read the JSONL union, compute the per-run and trend metrics (below), write a KB node under =~/org/roam/agents/= per the knowledge-base rule, personal-projects-only classification enforced. Tree stays working: synthesis is read-only over the logs plus a KB write.
-
-* Acceptance criteria
-- [ ] =work-the-backlog.org= exists and is the only home for the execution loop; =inbox-zero.org= is back to its A-D routing-only shape with no Phase E.
-- [ ] The loop caller chains into work-the-backlog after routing; startup and wrap-up never invoke it.
-- [ ] "fix speedrun" runs as the preset (autonomous-commit + always-push + end-page) over an explicit ordered list, one commit per task.
-- [ ] A task tagged for autonomous execution but at status =VERIFY= / =DOING= / =DONE= / =CANCELLED= is skipped by the gate.
-- [ ] The eligibility tag set is read from the project's =todo.org= scheme header, not hardcoded.
-- [ ] In a project without the commit waiver, an =autonomous-commit= request degrades to file-only and says so; no commit is made.
-- [ ] A task carrying data-loss risk or needing a design decision is refused with a filed VERIFY, not implemented.
-- [ ] An underspecified / already-satisfied task files a VERIFY noting why and the run continues.
-- [ ] The run stops at the per-run cap and pages with the remaining tasks listed.
-- [ ] Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl=.
-- [ ] The synthesis step reads the logs and writes a KB node under =~/org/roam/agents/=; it refuses to write for work-classified projects.
-
-* Effectiveness measurement
-
-This section answers Craig's explicit ask: measure whether autonomous-batch execution is actually effective, and build the "gather data → org-roam articles" loop.
-
-** What "effective" means here
-
-The autonomy is effective if it completes real work that *stays* completed — i.e. tasks land green and the next session doesn't have to undo or fix them. The two failure modes to catch are (1) the loop defers everything (over-cautious, no value delivered) and (2) the loop implements badly (commits that get reverted or hand-corrected next session). Both are measurable.
-
-** Per-run metrics (the JSONL record)
-
-One record per task, appended to =.ai/metrics/work-the-backlog.jsonl= at each task outcome:
-
-| Field | Meaning |
-|-------------------+--------------------------------------------------------------------|
-| =ts= | ISO timestamp of the task outcome |
-|-------------------+--------------------------------------------------------------------|
-| =run_id= | UUID shared by all tasks in one run |
-|-------------------+--------------------------------------------------------------------|
-| =project= | project basename |
-|-------------------+--------------------------------------------------------------------|
-| =caller= | =loop= or =fix-speedrun= |
-|-------------------+--------------------------------------------------------------------|
-| =task= | task heading (slug) |
-|-------------------+--------------------------------------------------------------------|
-| =outcome= | implemented-committed / implemented-diff / deferred-verify / |
-| | deferred-too-large / skipped-ineligible |
-|-------------------+--------------------------------------------------------------------|
-| =defer_reason= | for deferrals: needs-input / data-loss / underspecified / too-large |
-|-------------------+--------------------------------------------------------------------|
-| =wall_clock_s= | seconds from task start to outcome |
-|-------------------+--------------------------------------------------------------------|
-| =commit_sha= | for committed tasks; empty otherwise |
-|-------------------+--------------------------------------------------------------------|
-| =review_findings= | count of /review-code Critical+Important findings on this task |
-|-------------------+--------------------------------------------------------------------|
-
-Per-run rollups computed at synthesis (not stored per record): tasks attempted, completed, VERIFY-deferred, reverted; wall-clock total; commits landed; review findings per commit.
-
-** The corrections signal (the key metric)
-
-The hardest and most valuable metric is *human corrections in the following session* — did Craig revert or hand-fix an autonomous commit? v1 captures the cheap proxy: at synthesis, for each =commit_sha=, check whether a later commit touching the same files reverted it or carries a "fix"/"revert" of that change within N days. A clean run is one where the autonomous commits survive untouched. (Auto-detecting "this later commit corrected that autonomous one" precisely is a vNext refinement; the proxy — reverted-or-touched-soon-after — is good enough to flag a problem run for human review.)
-
-** Where the data lands
-
-Per-project git-tracked JSONL at =.ai/metrics/work-the-backlog.jsonl=. Append-only, =jq=-queryable, survives across sessions and machines via the normal project sync. Git-tracked so the history is auditable and the synthesis step can read it from any clone.
-
-** The synthesis loop (gather → article)
-
-On the "synthesize backlog metrics" trigger (and optionally a weekly scheduled run):
-
-1. Read the JSONL union across the personal projects the synthesizer can see.
-2. Compute the rollups and the trend: completion rate over time, defer-reason distribution, review-findings-per-commit trend, and the corrections-signal flag count.
-3. Write one org-roam KB node under =~/org/roam/agents/YYYYMMDDHHMMSS-backlog-metrics-<window>.org= per the knowledge-base rule — filetags =:agent:metrics:=, a concise title, the rollup table, the trend narrative, and =[[id:...]]= links to prior synthesis nodes so the series is traceable.
-4. Enforce the KB write-classification: *personal projects only*. A work-classified project's metrics never write to the KB — they stay in that project's own =.ai/metrics/= log and the synthesizer reports the refusal per the KB refusal contract.
-
-The KB node is the artifact Craig reviews later — "are the autonomous runs completing more and getting corrected less over the last month?" reads off the trend table without re-querying raw logs.
-
-* Readiness dimensions
-
-- *Data model & ownership:* The task set is read from =todo.org= (project-owned, user-authored). The metrics JSONL is generated, append-only, git-tracked, project-owned. KB nodes are agent-generated under =~/org/roam/agents/= (never overwriting Craig's hand-authored nodes — link only). No editable region is co-owned.
-- *Errors, empty states & failure:* Empty task set → report "nothing eligible" and stop. Malformed scheme header → fall back to the default tag reading and surface the fallback. A task that fails mid-implementation → leave the tree working (don't commit a broken state), record the failure outcome, surface it, continue to the next task. No silent data loss: the data-loss guardrail refuses irreversible tasks outright.
-- *Security & privacy:* Tasks touching credentials or external mutations are excluded by the data-loss / external-state guardrail. The KB write is personal-projects-only; work metrics never leave the project. No secrets in the JSONL (task slugs and SHAs only).
-- *Observability:* The end-of-set page surfaces the run outcome. The per-task surface (implemented / deferred + reason / skipped) is the live progress view. The metrics log + KB synthesis is the long-run observability. A bad run is isolable from the JSONL (which task, which outcome, which review findings).
-- *Performance & scale:* Expected counts are small — a handful of tasks per run, one run per 30-min tick. No bottleneck at this scale. The cap bounds the worst case. Synthesis over months of JSONL is still a small file (one record per task).
-- *Reuse & lost opportunities:* Reuses =todo-format.md= for task close, =/review-code= and =/voice personal= for the quality bar, =notify= for paging, the knowledge-base rule for KB writes, the per-project waiver for commit-autonomy. No new config file (the opt-in rides the existing waiver). The execution loop is the one new shared asset.
-- *Architecture fit & weak points:* Integration points — inbox-zero loop caller (chain after Phase D), the per-project waiver location, =todo.org= scheme header, =~/org/roam/agents/=. Weak point: the commit-autonomy gate depends on deterministically reading the waiver; mitigated by defaulting to file-only when the read is ambiguous (fail safe, not open). Second weak point: a 30-min loop committing unattended; mitigated by the hard cap and file-only default.
-- *Config surface:* Per-project — commit-autonomy opt-in (via existing waiver), optional loop-commit flag, optional autonomous-safe tag override in the scheme header. Per-call — task set, session mode, run cap. Defaults: file-only, paging-off (loop) / paging-on (fix-speedrun), cap 1 (loop).
-- *Documentation plan:* The workflow file itself is the user/operator doc (matches inbox-zero.org's self-documenting style). The =.emacs.d= stopgap note and the fix-speedrun proposal are superseded by this spec; no separate migration doc needed beyond removing the Phase E text.
-- *Dev tooling:* N/A for new build targets — the workflows are prose, exercised by invocation. The metrics JSONL is =jq=-inspectable by hand; a tiny rollup helper may be added under =.ai/scripts/= if the synthesis prose proves to need it (decided at Phase 6, not a v1 prerequisite).
-- *Rollout, compatibility & rollback:* Rollout is removing Phase E from inbox-zero and adding work-the-backlog — both prose changes, instantly reversible. Compatibility: inbox-zero's three callers are unchanged except the loop caller gaining a forward chain. Rollback: delete work-the-backlog and the loop chain step; inbox-zero is already back to A-D. The file-only default means the worst pre-rollback state is surfaced diffs, not committed changes.
-- *External APIs & deps:* =notify alarm "Page" "<msg>" --persist= verified against =/home/cjennings/.local/bin/notify= and the page-me workflow. =~/org/roam/= KB write path and node shape verified against the knowledge-base rule. No external API calls.
-
-* Risks, Rabbit Holes, and Drawbacks
-
-- *The corrections signal is a proxy, not ground truth.* "A later commit touched the same files" over-counts (legitimate follow-up work) and under-counts (a correction in a different file). It's a flag for human review, not a verdict. Don't rabbit-hole on making it precise in v1 — the proxy plus a human glance is the design.
-- *Waiver detection drift.* If the per-project waiver location moves or its format changes, the commit-autonomy gate could mis-read. Mitigation: fail safe to file-only. Pin the waiver format in the Phase 3 decision before building.
-- *Unattended-commit blast radius.* The headline risk. Mitigated three ways: file-only default, the hard cap, and the data-loss guardrail. The metrics loop is the fourth layer — it makes a bad run visible after the fact even if the first three let something through.
-- *Scope creep into /start-work territory.* The temptation to let "≤ 30 min" stretch. The act-vs-file gate and the "when unsure, file" rule are the brake; keep them strict.
-
-* Testing / Verification / Rollout
-
-Verification is by invocation against a project's real =todo.org=: run the loop caller in file-only mode and confirm it surfaces diffs without committing; run fix-speedrun against a small explicit list in a waiver-carrying project and confirm one commit per task + the end page; plant a =VERIFY=-status task and a data-loss task and confirm both are skipped/refused; confirm the JSONL grows one record per task; run synthesis and confirm a KB node lands (personal project) or is refused (work project). Rollout is the Phase 1-6 sequence, each leaving the tree working; the file-only default makes early phases safe to ship before the commit and paging phases land.
-
-* References / Appendix
-
-- [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal (inbox-zero stopgap)]] and [[file:../../working/inbox-zero-phase-e/sender-note.org][its sender note with the 5 open questions]].
-- [[file:2026-06-15-fix-speedrun-workflow-proposal.org][fix-speedrun proposal]].
-- [[file:../../.ai/workflows/inbox-zero.org][inbox-zero.org (canonical, A-D)]] — the routing workflow this feature decouples from.
-- =~/code/rulesets/claude-rules/knowledge-base.md= — the org-roam write contract the synthesis step follows.
-
-* Review and iteration history
-** 2026-06-16 Tue — author
-- What: initial draft reconciling the Phase E and fix-speedrun proposals into one work-the-backlog.org feature, plus the effectiveness-measurement instrumentation.
-- Why: two overlapping proposals arrived within a day; building them separately would duplicate the execution loop and let it drift. Craig also asked explicitly for measurement + org-roam synthesis.
-- Artifacts: this spec; the two source proposals under docs/design/ and working/inbox-zero-phase-e/.
diff --git a/docs/design/2026-06-21-anki-titlefix-proposal.org b/docs/design/2026-06-21-anki-titlefix-proposal.org
new file mode 100644
index 0000000..08b8c13
--- /dev/null
+++ b/docs/design/2026-06-21-anki-titlefix-proposal.org
@@ -0,0 +1,57 @@
+#+TITLE: Proposal — flashcard-to-anki.py deck name should come from #+TITLE
+
+From: home session, 2026-06-21. Two attached files are the edited
+canonical scripts (flashcard-to-anki.py + its test). Applied locally in
+home as a stopgap; this is the durable proposal for the rulesets
+canonical. Please reconcile and re-sync.
+
+* The bug (longstanding)
+
+flashcard-to-anki.py's default_deck_name returned input_path.stem (the
+filename), so every deck generated through flashcard-sync (which passes no
+--deck) was named after the file, e.g. "personal-drill" / "health-drill"
+/ "kit", not the curated #+TITLE.
+
+flashcard-review.org already documents the intended behavior: "The
+#+TITLE line drives ... the Anki deck name on the phone" and "derives the
+Anki deck ID from the deck name." The script never matched the doc.
+deepsat only looked correct because its first run used an explicit
+--deck "DeepSat Flashcards".
+
+* The fix
+
+default_deck_name(input_path, org_text) now scans for a #+TITLE: line
+(case-insensitive, surrounding whitespace trimmed) and returns it; falls
+back to input_path.stem when there's no non-empty #+TITLE. main() passes
+the already-read org_text. Help text + module docstring updated.
+
+TDD: the two old deck-name tests asserted the buggy basename behavior —
+rewrote them. New tests cover title-driven naming, trimming,
+case-insensitive #+title, basename fallback (no title), and basename
+fallback (blank title). Full file: 29 pass.
+
+No companion script changes needed: flashcard-sync passes no --deck so it
+picks up the new default automatically, and flashcard-stats.py already
+reads #+TITLE. flashcard-review.org needs no change (the script now
+matches what it already says).
+
+* Migration caveat (worth a line in the doc if you want)
+
+Deck ID derives from the deck name, so this fix changes the ID for any
+deck previously generated without --deck. On next import those land as
+new decks; the old basename-named decks keep their review history and
+must be deleted by hand. The workflow's existing "Stable-ID caveat"
+already covers the mechanics. In home this affected personal-drill,
+health-drill, kit (regenerated this session as Personal / Health / KIT,
+with titles also stripped of "Flashcards"/"Drill" per Craig). deepsat is
+unaffected (already title-named).
+
+* Related idea (separate, not in these files) — apkg → org-drill converter
+
+deepsat-fundamentals.apkg (100-card DeepSat subset, made once with
+--deck "DeepSat Fundamentals") has no saved .org source anywhere. Craig
+wants an apkg → org-drill converter — the inverse of flashcard-to-anki.py
+— to recover orphaned decks and pull phone-authored cards back into the
+org source-of-truth. Flagging as a candidate rulesets tool alongside the
+flashcard-* family; deepsat-fundamentals is the concrete first use case.
+Not built yet; raising for the backlog.
diff --git a/docs/design/2026-06-21-apkg-to-orgdrill-buildreq.org b/docs/design/2026-06-21-apkg-to-orgdrill-buildreq.org
new file mode 100644
index 0000000..37a866f
--- /dev/null
+++ b/docs/design/2026-06-21-apkg-to-orgdrill-buildreq.org
@@ -0,0 +1,68 @@
+#+TITLE: Build request — apkg → org-drill converter (inverse of flashcard-to-anki.py)
+
+From: home session, 2026-06-21. Craig wants this built (backlogged, not
+urgent). Standalone build request — the earlier anki-title-fix-proposal
+only mentioned it in passing; this is the real ask.
+
+* Why
+
+The flashcard pipeline is one-directional (org-drill → apkg). Decks
+authored or curated on the phone, and orphaned apkgs whose .org source
+was never saved, can't get back into the org source-of-truth. Concrete
+case: deepsat-fundamentals.apkg — a 100-card DeepSat subset generated
+once with --deck "DeepSat Fundamentals" — has no .org source anywhere on
+ratio, velox, or in work git history. The converter recovers it and makes
+phone → org round-tripping possible.
+
+* What — contract (inverse of flashcard-to-anki.py)
+
+Input: an Anki =.apkg= (a zip containing collection.anki2 / .anki21
+sqlite, plus a media blob).
+Output: an org-drill =.org= file in the house canonical shape that
+flashcard-stats.py / flashcard-to-anki.py already agree on.
+
+Mapping (mirror flashcard-to-anki.py's parse/build):
+- Deck name (from the apkg) → =#+TITLE:=.
+- Each note → =** <Front> :drill:= with the Back as the body.
+- Card tag → top-level =* Section= grouping (inverse of section_to_tag;
+ cards sharing a tag collect under one section; the slug won't round-trip
+ to the exact original section title, so this is best-effort — emit the
+ tag as the section heading and let a human retitle).
+- Back HTML → org: convert =<br>= back to newlines; unescape
+ =&amp;/&lt;/&gt;=; strip the =<hr id="answer">= the card template adds
+ (the Back field itself shouldn't contain it, but guard anyway).
+- Generate a fresh =:ID:= UUID per card in a =:PROPERTIES:= drawer so the
+ output is immediately org-drill-valid and round-trips back through
+ flashcard-to-anki.py. (Note: GUIDs in flashcard-to-anki.py are derived
+ from the front text, not the :ID:, so a regenerated apkg still matches
+ existing phone cards by front — call that out in the docstring.)
+
+Edge cases to cover in tests (Normal/Boundary/Error):
+- Multiple decks in one apkg (emit one file per deck, or error asking for
+ a deck filter — pick one and document it).
+- Notes with multiple fields / non-basic note types (the pipeline only
+ models Front/Back — skip or warn on others, don't silently drop).
+- HTML entities, embedded =<br>=, and any =Source:= footer surviving
+ round-trip.
+- Empty back; media references (flag, since org side has no media path).
+- collection.anki2 vs .anki21 schema differences.
+
+* Where it lives
+
+Rulesets-owned, beside the flashcard-* family
+(=claude-templates/.ai/scripts/=): suggest =anki-to-flashcard.py= (or
+=apkg-to-orgdrill.py= — your naming call). Add tests under
+=scripts/tests/=. A new file can't be built downstream — home/.ai/scripts/
+is wiped to match the template by the startup =--delete= rsync — so this
+has to be built in the rulesets canonical. PEP 723 uv-run script like its
+sibling; genanki isn't needed for reading (stdlib =zipfile= + =sqlite3=
+suffice), so it has no runtime deps.
+
+* Acceptance
+
+Round-trip test: take a known org-drill source, run it through
+flashcard-to-anki.py, run the result back through this converter, and
+assert the cards (front/back/section) match the original (modulo
+regenerated :ID:s and best-effort section titles). Plus: run it on the
+real deepsat-fundamentals.apkg and hand the recovered .org back so its
+source can be filed (work project).
diff --git a/docs/design/2026-06-21-flashcard-stats-refutation-proposal.org b/docs/design/2026-06-21-flashcard-stats-refutation-proposal.org
new file mode 100644
index 0000000..bbbe175
--- /dev/null
+++ b/docs/design/2026-06-21-flashcard-stats-refutation-proposal.org
@@ -0,0 +1,57 @@
+#+TITLE: Proposal — flashcard-stats.py refutation / claim-prompt mode
+
+From: home session, 2026-06-21. Backlog, not urgent. Relates to the
+refutation-drill deck being built in the home project.
+
+* Problem
+
+A new card family doesn't fit the linter: the *refutation / claim-prompt*
+card. Its heading is a bare false claim ("The earth is flat.") and its
+body is the rebuttal. This is a legit org-drill simple card (org-drill is
+happy), but flashcard-stats.py — built for Q&A decks — trips two BLOCKING
+checks on every such card, both false positives:
+
+- *non-prompt heading*: a declarative claim has no '?' and no
+ imperative verb, so it reads as "topic-as-heading not yet rewritten".
+ But for this family the declarative claim IS the intended prompt.
+- *answer leakage*: the claim's words necessarily reappear in the
+ refutation, so front/back overlap is high. But the answer (the rebuttal)
+ is not given away by the claim — there's no actual leakage.
+
+Concrete: the home refutation-drill.org (6 cards) reports 6 non-prompt
+headings + 1 leakage WARN, so flashcard-sync's gate blocks it entirely.
+The deck currently has to be generated with the flashcard-to-anki.py
+override, losing the safety net.
+
+* Proposed fix
+
+A per-deck opt-in marker that switches the two checks off for that file
+only. Two options (your call):
+
+1. A file-level keyword: =#+DECK_KIND: refutation= near the top. When
+ present, flashcard-stats skips the non-prompt-heading check and the
+ answer-leakage check for the whole file (keeps the others:
+ missing-:ID:, *** Answer sub-headers, duplicate fronts, the
+ non-blocking NOTEs).
+2. A per-card tag: cards tagged =:claim:= (alongside =:drill:=) are
+ exempted from those two checks individually.
+
+Option 1 is simpler and matches how this deck works (the whole file is
+one family). Option 2 is finer-grained if a deck ever mixes families.
+
+Either way: document the new card family in flashcard-review.org (a
+"Refutation / claim-prompt cards" subsection under Canonical Card Shape —
+heading is the bare claim, body is snap-response + backups + named-fallacy
++ restate, Source footer), and note that flashcard-sync then works
+normally on these decks.
+
+* Affected files
+- =flashcard-stats.py= — the check skip + (option 1) keyword parse / (option 2) tag check.
+- =flashcard-review.org= — document the family + the marker.
+- =flashcard-to-anki.py= / =flashcard-sync= — no change needed (they don't gate on heading form).
+- Tests: add cases for a refutation-marked file passing despite declarative headings + claim/answer overlap.
+
+* Companion context
+The home deck's card format and the org-drill-fine / Anki-linter-fights
+finding are written up in home:refutation-drill-sources.org (Tooling
+note). The override command is documented there too.
diff --git a/docs/design/2026-06-21-host-identity-guard-proposal.org b/docs/design/2026-06-21-host-identity-guard-proposal.org
new file mode 100644
index 0000000..f389825
--- /dev/null
+++ b/docs/design/2026-06-21-host-identity-guard-proposal.org
@@ -0,0 +1,54 @@
+#+TITLE: From archsetup — hardcoded machine identity in CLAUDE.md (consider fleet-wide)
+#+DATE: 2026-06-21
+
+* What we did
+
+Built a Super+F Dirvish popup in the archsetup/dotfiles + .emacs.d projects,
+modeled on the existing Super+Shift+N org-capture popup (launcher script names an
+emacsclient frame, Hyprland window rules float it, an Emacs command runs in the
+frame and q closes it). Cross-project: dotfiles half committed from archsetup,
+Emacs half handed off to .emacs.d's inbox.
+
+* The bug it surfaced
+
+While stowing on this machine, =make stow hyprland= pulled the *velox* host tier,
+and =uname -n= returned =velox=. But archsetup's CLAUDE.md asserted, as a fixed
+fact, "This machine is **ratio**." It was simply wrong on velox — a stale
+identity baked into a per-project doc that travels to every machine via git.
+
+I'd been reasoning from that line all session (e.g. "the touchpad-auto reminder
+is velox-only, and we're on ratio, so skip it") — exactly backwards. A hardcoded
+"this machine is X" in a synced/tracked project file is a latent trap on any
+multi-machine setup: the file is identical on every host, so the claim is false
+on every host but one.
+
+* The fix (this project)
+
+Replaced the fixed identity with a runtime instruction. The attached CLAUDE.md
+now reads, in the Notes section:
+
+ Never assume which machine this is — always run =uname -n= to find the hostname
+ (the =hostname= binary is absent, so =uname -n= is the source of truth;
+ =uname -r= is the kernel release, not the host). The fleet is ratio
+ (workstation) and velox (laptop), both Hyprland (Wayland)...
+
+(Craig initially said =uname -r=; that's the kernel release. =uname -n= is the
+nodename/hostname, which is what the stow host-tier logic already keys on.)
+
+* Why this is a rulesets concern
+
+This isn't an archsetup-only quirk. Any project whose CLAUDE.md / notes get
+synced or cloned across machines can hardcode environment identity — current
+host, current OS, "the laptop", an IP, a display name — and be wrong everywhere
+the doc lands but the origin. rulesets governs how every project's CLAUDE.md and
+rules are shaped, so it's the right layer to consider a general guard:
+
+- A rule (claude-rules) along the lines of: don't assert mutable
+ environment/host identity as a fixed fact in a tracked/synced project file;
+ derive it at runtime (=uname -n= for host, etc.) and name the command.
+- Possibly a startup or codify-time lint that flags "this machine is <name>" /
+ "the current host is" style claims in CLAUDE.md.
+
+Sending the edited CLAUDE.md (attached separately) plus this note so the rulesets
+session can decide whether to codify the broader pattern. Proposal, not a
+directive — your value gate applies.
diff --git a/docs/design/2026-06-22-inbox-zero-capture-hardening.org b/docs/design/2026-06-22-inbox-zero-capture-hardening.org
new file mode 100644
index 0000000..69acf94
--- /dev/null
+++ b/docs/design/2026-06-22-inbox-zero-capture-hardening.org
@@ -0,0 +1,39 @@
+#+TITLE: inbox-zero Phase D wedges live org-capture sessions on the roam inbox
+
+* The bug
+
+Phase D of =inbox-zero.org= removes claimed items from =~/org/roam/inbox.org= by editing the file on disk (Edit / sed / Write). That collides with any live org-capture session Craig has open against the same file.
+
+org-capture works through an *indirect buffer* cloned from the target file. When the inbox-zero disk write lands and Emacs reverts the main =inbox.org= buffer underneath, the indirect capture buffers are left pointing at stale state and wedge — they can no longer finalize cleanly with =C-c C-c=. The visible symptom is org-capture failing, and one or more orphaned =CAPTURE-*inbox.org= / =CAPTURE-N-inbox.org= buffers piling up as Craig retries.
+
+Hit live on 2026-06-22 during a home-session inbox-zero: I filed three home items, wrote =inbox.org= on disk, and Craig's open capture wedged, leaving two orphaned =CAPTURE-inbox.org= buffers. No data was lost that time (the orphaned buffers held only existing file content, not a freshly-typed item), but that was luck — had he typed an item into the capture before it wedged, finalizing the stale buffer afterward would have written it back against the post-edit file and could have clobbered the routing or a foreign item.
+
+* Why it's worth fixing in the canonical
+
+=inbox-zero.org= is a rulesets-owned synced workflow that runs in every project (startup nudge, wrap-up sub-step, on demand), and the roam inbox is the single shared file all of them edit. Craig edits that same file live in Emacs and captures into it constantly. So the collision window recurs in every project, every session — not a home-only quirk. A local fix in home gets reverted by the next template sync, so the durable fix has to land in rulesets.
+
+* Proposed fix (recommended: guard before the disk write)
+
+Add a pre-edit guard to Phase D, before removing any claimed items:
+
+1. If Emacs is reachable (=emacsclient -e t= succeeds), check for live capture buffers targeting the roam inbox:
+
+ #+begin_src bash
+ emacsclient -e '(mapcar #(quote buffer-name)
+ (seq-filter (lambda (b) (string-match-p "CAPTURE.*inbox" (buffer-name b)))
+ (buffer-list)))' 2>/dev/null
+ #+end_src
+
+2. If any =CAPTURE-*inbox*= buffer exists, *stop before editing* and surface it: "You have a live org-capture session open against the roam inbox — finalize (=C-c C-c=) or abort (=C-c C-k=) it before I route items, otherwise the edit will wedge the capture." Resume Phase D only once it's clear. This mirrors the existing pull-before-edit / surface-and-stop discipline already in Phase D.
+
+3. Independently, when Emacs has =inbox.org= open and *unmodified* (the common case, no live capture), the disk edit is benign — Emacs reverts a clean buffer without complaint. Optionally trigger an explicit =revert-buffer= via emacsclient afterward so the buffer is immediately consistent rather than lazily on next focus.
+
+* Alternative (heavier): do the removal through Emacs when it's running
+
+Instead of editing on disk, when Emacs is reachable, perform the claimed-item removal inside the running daemon (find the buffer, delete the items, save), and fall back to the disk edit only when Emacs isn't running. This keeps Emacs's buffer authoritative and sidesteps the disk/buffer divergence entirely. It's more code and more failure surface for arbitrary item removal, so I'd lean on the guard above unless you want the stronger guarantee.
+
+* Note for whoever builds it
+
+The =emacs.md= rule already covers "don't make Craig restart Emacs; push changes into the running daemon." This is the same principle one layer out: don't edit a file *on disk* that the running daemon is actively editing/capturing into. Worth a line in =emacs.md= too, or at least a cross-reference from inbox-zero Phase D.
+
+Origin: home, 2026-06-22.
diff --git a/docs/design/2026-06-23-install-lang-claude-md-gap.org b/docs/design/2026-06-23-install-lang-claude-md-gap.org
new file mode 100644
index 0000000..cf16256
--- /dev/null
+++ b/docs/design/2026-06-23-install-lang-claude-md-gap.org
@@ -0,0 +1,31 @@
+#+TITLE: install-lang CLAUDE.md gap — non-elisp projects get a wrong or missing CLAUDE.md
+#+DATE: 2026-06-23
+
+Surfaced while running the archangel .ai/ conversion you sent (the 2026-06-20 handoff). archangel is a bash project — 437 =.sh= files; the only =.el=/=.py= in the tree are under =work/x86_64/airootfs/= (archiso staging, not source). Per your handoff I installed both elisp and python bundles. The result exposed two coupled issues that block CLAUDE.md consistency across projects.
+
+* The two findings
+
+1. *Only the elisp bundle ships a CLAUDE.md template.* =languages/elisp/CLAUDE.md= exists; =languages/python/=, =go/=, =typescript/= ship none. =install-lang.sh= guards on =[ -f "$SRC/CLAUDE.md" ]=, so a bundle without a template silently contributes nothing — no line printed, no file seeded.
+
+2. *No shell/bash bundle exists* (only elisp, go, python, typescript). archangel and archsetup are bash projects with no bundle that fits.
+
+* The consequence
+
+- Install python (or go/ts) alone → project gets *no* CLAUDE.md.
+- Install elisp + anything → project gets the elisp stub, whose first line is "Elisp project." Because install-lang seeds CLAUDE.md only on first install and never overwrites without FORCE=1, install order doesn't matter — the elisp template is the only one available, so it always wins.
+- Net: archangel, a bash project, ended up with a CLAUDE.md headed "Elisp project." An inaccurate CLAUDE.md is worse than none — it mislabels the project for every future session.
+
+* Proposals (rulesets' call)
+
+1. *Add a shell/bash language bundle.* This is the real gap for archangel/archsetup and any other shell-heavy project.
+2. *Give every bundle its own CLAUDE.md template*, or ship a language-neutral default so install-lang always seeds an accurate (or at least non-misleading) header. A stub that says "<LANG> project — customize this" is only safe when the bundle actually matches the language.
+3. *Consider the multi-bundle case* — when a project installs more than one bundle, the CLAUDE.md "Project" line shouldn't hardcode a single language picked by which template happened to exist.
+
+* Companion files to reconcile
+
+- =scripts/install-lang.sh= — the seed-on-first-install / no-overwrite logic (sections 3 and 3b) is correct; the gap is the missing templates and missing bash bundle, not this logic.
+- =languages/elisp/CLAUDE.md= — the only template today; pattern to replicate per language.
+
+* What archangel did locally (stopgap)
+
+Installed both bundles as you asked; the generic =.claude/rules/= and gitignore hygiene are the real gain there. I flagged the elisp-stub mismatch to Craig and offered to hand-write archangel's CLAUDE.md as a bash ISO-build project. That local fix doesn't address the cross-project pattern — hence this note.
diff --git a/docs/design/2026-06-23-wrap-teardown-shutdown-proposal.org b/docs/design/2026-06-23-wrap-teardown-shutdown-proposal.org
new file mode 100644
index 0000000..a47aa2d
--- /dev/null
+++ b/docs/design/2026-06-23-wrap-teardown-shutdown-proposal.org
@@ -0,0 +1,124 @@
+#+TITLE: Proposal — wrap-it-up teardown + "wrap it up and shutdown" variant
+
+* Source
+
+Raised by Craig in a home-project session, 2026-06-23, after talking the
+design through. Two related additions to =wrap-it-up.org=. Both touch the
+Claude-session lifecycle (workflow + hook + the =ai-term= buffer/tmux pair),
+so they're rulesets — with one companion piece that has to live in
+=.emacs.d/modules/ai-term.el= (flagged below). Originally floated as an
+archsetup task; archsetup owns the Hyprland/waybar layer, not the
+Claude-session lifecycle, so it was re-routed here.
+
+* Architecture this depends on (so the design is grounded)
+
+- =ai-term.el= (=.emacs.d=) is the in-Emacs launcher: a vertical-split vterm
+ buffer running a tmux session named =aiv-<project-basename>= (prefix
+ =aiv-=). Layering: =claude= process → tmux session =aiv-<proj>= → Emacs
+ vterm buffer.
+- Killing the tmux session takes the =claude= process with it, so "quit
+ Claude Code" is a *consequence* of killing =aiv-<proj>=, not a separate
+ step.
+- Hooks already exist under =~/.claude/hooks/= (e.g. =session-clear-resume.sh=,
+ =precompact-priorities.sh=) — the teardown trigger fits that pattern.
+- =sudo= is =NOPASSWD: ALL= on Craig's machines, so =sudo shutdown now= runs
+ unattended.
+
+* Item 1 — wrap-up also removes the buffer, quits Claude, removes the tmux session
+
+Recommend: yes, with one structural rule — the wrap-up runs *inside* the
+things it tears down, so teardown is self-terminating and must be the last,
+decoupled action, or the valediction may not flush before the session dies.
+
+Design:
+1. *Teardown lives in =ai-term.el=* (companion, see below): one function
+ =cj/ai-term-quit= that kills the =aiv-<proj>= tmux session (takes =claude=
+ with it), kills the vterm buffer, and restores the saved window geometry —
+ =ai-term.el= already owns the buffer↔session pair and the geometry logic.
+2. *Trigger from a Stop / SessionEnd hook, not inline.* Wrap-up does all its
+ git/archive work, delivers the valediction, then drops a sentinel (flag
+ file, e.g. =/tmp/ai-wrap-teardown-<session>=). The hook fires when Claude
+ finishes, sees the sentinel, and runs =cj/ai-term-quit= via =emacsclient=.
+ Decoupling guarantees the valediction lands before the session dies.
+3. *Gate on commit+push verified* — never tear down before the session record
+ is pushed (wrap-up's existing Step 4 / validation checklist already
+ enforces push; teardown is strictly after it).
+4. *Phrase split — teardown IS the default* (Craig's decision 2026-06-23).
+ Bare "wrap it up" does the full wrap AND removes the buffer/session/quits —
+ that's his typical case. The non-destructive variant gets the explicit
+ qualifier: "wrap it up with summary" summarizes + commits + pushes +
+ archives but keeps the buffer (no teardown), so the summary stays readable.
+ So: "wrap it up" → teardown; "wrap it up with summary" → no teardown;
+ "wrap it up and shutdown" → wrap + poweroff (supersedes teardown, Item 2).
+
+* Item 2 — "wrap it up and shutdown": 10-count then =sudo shutdown now=
+
+Recommend: yes, but the safety gate is load-bearing and the countdown has a
+rendering gotcha.
+
+Design:
+1. *"Only ai-term left" = hard blocking precondition*, evaluated BEFORE the
+ countdown. Count live sessions (=tmux ls | grep '^aiv-'= or
+ =pgrep -fc claude=). If more than this one is alive, ABORT the shutdown,
+ list what's running, and fall back to a normal wrap. Never power the box
+ off out from under another active Claude session. This is the most
+ important part of the item.
+2. *The live countdown can't run through Claude's tool output.* The Bash tool
+ buffers stdout until the command returns, so a =for i in $(seq 10 -1 1);
+ sleep 1= prints all ten at once at the end, not one per second. It has to
+ run detached or in Emacs:
+ - tty writer: =for i in $(seq 10 -1 1); do printf '\rShutting down in %2d…'
+ "$i" > /dev/tty; sleep 1; done; sudo shutdown now= (backgrounded), or
+ - an Emacs =run-at-time= timer printing 10→1 in the echo area, then
+ =(shell-command "sudo shutdown now")=.
+3. *Make it abort-able* (Ctrl-C / keypress cancels). A 10-second countdown's
+ whole purpose is a last-chance window; a non-cancellable one is just a
+ delay.
+4. *Sequencing.* "...and shutdown" supersedes Item 1's teardown — if the box
+ is powering off, killing the buffer/session first is moot. Wrap (commit +
+ push + archive) → session-count gate → countdown → =shutdown=.
+
+Packaging: a small rulesets bin script (e.g. =ai-wrap-shutdown=) doing the
+gate → abort-able countdown → shutdown, invoked by the workflow after the wrap
+commit/push. Countdown either in that script (tty) or handed to Emacs.
+
+* Companion — required change in =.emacs.d/modules/ai-term.el=
+
+Item 1's teardown function =cj/ai-term-quit= must live in =ai-term.el= (it
+owns =aiv-<proj>= session naming, the vterm buffer, and geometry restore).
+rulesets owns the workflow + hook + bin script that *call* it; =.emacs.d= owns
+the function itself. Spec for the =.emacs.d= side:
+
+- =cj/ai-term-quit (&optional project)= — resolve the =aiv-<basename>= session
+ for the current/!named project, =tmux kill-session= it, =kill-buffer= the
+ associated vterm buffer, restore saved geometry. Idempotent / no-op if the
+ session or buffer is already gone. Callable from =emacsclient -e= so the
+ Stop hook can invoke it headlessly.
+- (Optional) a count helper =cj/ai-term-live-count= so the Item-2 gate can ask
+ Emacs how many ai-term sessions are live, as an alternative to =tmux ls= /
+ =pgrep=.
+
+When rulesets builds the workflow/hook side, route this companion to
+=.emacs.d= (inbox-send) so the two land together.
+
+* Open decisions for Craig
+
+- Phrase set: DECIDED (2026-06-23) — "wrap it up" tears down (default);
+ "wrap it up with summary" wraps without teardown; "wrap it up and shutdown"
+ is the poweroff variant. Remaining nuance: confirm the exact non-destructive
+ qualifier wording is "with summary" (vs e.g. "and summarize").
+- Countdown home: tty-writer bin script vs Emacs timer. (Emacs timer reads
+ cleaner inside the vterm and is trivially abort-able.)
+- Session-count mechanism for the gate: =tmux ls=, =pgrep claude=, or
+ =cj/ai-term-live-count=.
+
+* Verify
+
+- Item 1: bare "wrap it up" → valediction renders fully, THEN buffer +
+ =aiv-<proj>= session + claude all gone, geometry restored; "wrap it up with
+ summary" → wrap completes but the buffer stays intact (no teardown).
+- Item 2 gate: with a second =aiv-*= session alive, "wrap it up and shutdown"
+ refuses, lists the other session, and does a normal wrap (no poweroff).
+- Item 2 happy path: sole session → 10→1 renders one-per-second, is
+ cancellable, then =shutdown= fires.
+- Teardown never runs before commit+push is verified.
diff --git a/docs/design/2026-06-27-bug-priority-matrix-cover-note.org b/docs/design/2026-06-27-bug-priority-matrix-cover-note.org
new file mode 100644
index 0000000..d6398e0
--- /dev/null
+++ b/docs/design/2026-06-27-bug-priority-matrix-cover-note.org
@@ -0,0 +1,5 @@
+#+TITLE: Cover note for the bug-priority-matrix-proposal.md just sent
+#+SOURCE: from emacs-wttrin
+#+DATE: 2026-06-27 23:57:30 -0400
+
+Cover note for the bug-priority-matrix-proposal.md just sent. WHAT: a Severity × Frequency bug-priority matrix (P1-P4) distilled from wttrin's priority scheme today. It derives a bug's priority from two facts (severity, frequency) instead of opinion, maps each cell to a release vehicle, and — for projects running the todo-format [#A]-[#D] scheme — has bugs derive their letter from the matrix while features keep their roadmap judgment. Includes a special-category rule (privacy/security/safety graded on severity alone) and a tie-in that makes a 'no open [#A]' release gate fact-based. WHY SEND: this is a generic defect-management pattern, not wttrin-specific, so it belongs upstream to distribute with any code-centric project (a claude-rule and/or a todo.org priority-scheme template snippet — your call on placement and whether it folds into todo-format.md or stands alone). TWO ASKS: (1) consider a companion NON-CODING matrix for work-product/ops/decision defects — the same frequency×severity shape likely generalizes beyond bugs; (2) treat this as a LIVING DOCUMENT — adjust bands and wording as projects use it until it's as mature as the coding matrix. SOURCING: the matrix is a standard industry pattern; I adapted a generic version and the wttrin worked-example. Nothing work-confidential is copied — please keep the upstream artifact generic and uncited to any private reference. wttrin keeps its local copy in todo.org as the stopgap until you distribute the canonical.
diff --git a/docs/design/2026-06-27-bug-priority-matrix-proposal.md b/docs/design/2026-06-27-bug-priority-matrix-proposal.md
new file mode 100644
index 0000000..4b49769
--- /dev/null
+++ b/docs/design/2026-06-27-bug-priority-matrix-proposal.md
@@ -0,0 +1,70 @@
+# Bug Priority — Severity × Frequency Matrix (proposed rule for code projects)
+
+Applies to: code-centric projects (a `:bug:`-tracking `todo.org`, an issue tracker, or any defect backlog).
+
+Status: proposed. Distilled from wttrin's priority scheme on 2026-06-27. Generic defect-management pattern — adapt the bands per project. Treat as a living document; refine the wording and bands as projects use it, the way wttrin's matrix will mature in use.
+
+## The problem
+
+Without a systematic scheme, bugs get prioritized by who's loudest, who has the most authority, or who's most frustrated. Most bugs drift to "medium," and no one can say what medium means. Whether a bug is "high priority" stays a matter of opinion, so the release decision becomes a judgment call every time.
+
+## The approach — derive priority from two facts
+
+Priority should fall out of two relatively objective factors, not an argument:
+
+1. Severity — how bad is it when the bug occurs? (Service down / data loss / security or privacy leak at one end; cosmetic nit at the other.)
+2. Frequency — how often will a user hit it? (Every user every time at one end; rare edge case at the other.)
+
+Put severity on one axis and frequency on the other. Each cell maps to a priority level; each level maps to a release vehicle.
+
+```
+| Frequency / Severity | Critical | Major | Minor | Cosmetic |
+|------------------------+----------+-------+-------+----------|
+| Every user, every time | P1 | P1 | P2 | P3 |
+| Most users, frequently | P1 | P2 | P3 | P4 |
+| Some users, sometimes | P2 | P3 | P3 | P4 |
+| Rare edge case | P2 | P3 | P4 | P4 |
+```
+
+Response per level:
+
+- P1 / Critical — fix in the current release or patch. Blocks a release.
+- P2 / High — next patch release.
+- P3 / Medium — next major release.
+- P4 / Low — backlog, fix when convenient.
+
+(A P0 / showstopper tier — all hands, emergency release — sits above P1 for projects that need it.)
+
+## Adapt the bands per project
+
+The axis labels are generic; each project defines what Critical/Major/Minor/Cosmetic and the frequency rows mean in its own terms. wttrin's worked example:
+
+- Critical: no weather at all on the common path (crash, hard error), wrong weather shown as if correct (silent data error), or a privacy/security leak.
+- Major: a feature broken but with a workaround.
+- Minor: degraded but usable.
+- Cosmetic: a pure visual nit.
+- Frequency rows map to default-config-every-time → common-action → feature/config-specific → unusual-input/environment.
+
+## Special category — severity alone
+
+Some defects don't correlate with frequency: privacy/security leaks, compliance violations, safety issues. Grade these on severity alone — one occurrence with the right consequences is a showstopper no matter how rarely it'd be noticed. (wttrin's worked case: a home address leaked into public git history — Critical on severity alone, scrubbed 2026-06-26.)
+
+## Integrating with a project's `[#A]`–`[#D]` priority scheme
+
+Projects that already run a letter-priority scheme (the rulesets `todo-format.md` convention) keep it for features — deciding feature X is more urgent than feature Y is a legitimate roadmap judgment. Bugs are different: their letter should be derived, not argued.
+
+Rule: a `:bug:`'s letter comes from the matrix, not a choice.
+
+- P1, P2 → `[#A]` (release-gating)
+- P3 → `[#C]` (scheduled fix, has a workaround)
+- P4 → `[#D]` (backlog)
+
+(The exact letter mapping is a per-project knob; this is wttrin's.) Features keep their `[#A]`–`[#D]` judgment as a roadmap call; only bugs derive their letter from the matrix.
+
+## Tie-in to release criteria
+
+When a project gates a release on a soak or a "no open [#A]" rule, the matrix makes that gate fact-based: "no open [#A] bugs" becomes "no open P1/P2," which the matrix produces. A severity-graded soak follows naturally — a P1/P2/P3 found in the window resets the clock and must be fixed; a P4 logs to backlog without resetting.
+
+## The payoff
+
+Once each priority maps to a release vehicle, you've pre-decided most issues and only discuss the genuine edge cases. The release decision stops being a soul-search and becomes arithmetic. When there's disagreement on a bug's priority, the matrix wins.
diff --git a/docs/design/2026-06-29-green-baseline-proposal.org b/docs/design/2026-06-29-green-baseline-proposal.org
new file mode 100644
index 0000000..47de18d
--- /dev/null
+++ b/docs/design/2026-06-29-green-baseline-proposal.org
@@ -0,0 +1,72 @@
+#+TITLE: Proposal: ensure a green test run before starting work
+#+AUTHOR: Craig Jennings (via .emacs.d session)
+#+DATE: 2026-06-29
+
+* Why
+
+While running a multi-task refactor speedrun in =.emacs.d=, the very first full
+=make test= surfaced a failing test (=test-system-cmd-restart-emacs-no-service-aborts=)
+that turned out to be *pre-existing* -- it failed on clean HEAD, unrelated to
+the work. It had nothing to do with the task; it just happened to fail on this
+machine (a native-comp mock that bypasses =symbol-function= redefinition, real
+check passing because the box has =emacs.service=).
+
+Two costs landed because the red was already there when work began:
+
+1. Every later "did I break this?" suite run carried a known failure, so the
+ green bar became "only that one fails" instead of a clean pass I could read
+ at a glance. Easy to let a *new* regression hide behind the familiar red.
+2. The work assumed the tree was in a known-good state. It wasn't, and nothing
+ in the workflow forced that assumption to be checked first.
+
+The fix is cheap and general: run the suite *before* starting work, and clear
+(or explicitly triage) any failure before the work begins. A green start
+confirms we're in the known-good place we think we are, and any issue is fixed
+before it can be confused with our own changes.
+
+* Proposed change 1 -- claude-rules/verification.md
+
+Add a section (suggested placement: right after =## The Rule=, before
+=## What Fresh Means=). Proposed text, ready to paste:
+
+#+begin_example
+## Green Baseline Before Starting Work
+
+Run the test suite before you start work, not only before you finish. A clean run at the start confirms the tree is in the known-good state you assume it is, so the baseline you build on and measure your changes against is actually green.
+
+If the suite is red before you touch anything, fix or explicitly triage the failure first. A pre-existing failure left in place poisons every later "did I break this?" check: you can't separate your own regressions from the noise, and the end-of-work run stops being readable as pass/fail at a glance. Work that assumes a known-good base may also be built on a broken assumption you never saw.
+
+When a pre-existing failure genuinely can't be fixed before the work begins (out of scope, or it needs a decision), record it as a tracked task with the diagnosis and carry its name forward. The green bar for the rest of the work is then explicitly "only this known failure remains," not a silent tolerance for red.
+
+This is the start-of-work counterpart to the Before Committing gate below: one confirms the ground is solid before you build, the other confirms you didn't crack it.
+#+end_example
+
+* Proposed change 2 -- start-work skill, Pre-work phase
+
+The start-work skill already has a Pre-work phase (eligibility, fetch-and-reconcile
+against base, source-code check that the problem still exists). Add a green-baseline
+step to that phase:
+
+- Run the project's test suite before claiming the work.
+- If it's fully green, proceed.
+- If it's red, fix the failure first, or (when out of scope / needs a decision)
+ file a tracked task with the diagnosis and carry its name forward as the only
+ tolerated failure for this work.
+- Surface the baseline result so "we started from green" is on the record.
+
+This makes the verification.md principle operational at the exact moment it
+matters -- the start of a task -- the same way the Verify phase and the
+Review-and-Publish flow operationalize the end-of-work gates.
+
+* Note on why this came as a proposal, not a direct edit
+
+=.emacs.d='s =.claude/rules/*.md= are symlinks into =~/code/rulesets/claude-rules/=,
+so editing =verification.md= from the downstream session would modify the
+rulesets canonical directly. Per the cross-project rule, downstream sessions
+send rulesets the proposed change rather than editing its canonical in place.
+Hence this note instead of a local stopgap edit. The start-work skill isn't
+installed on this machine to edit anyway.
+
+The test that surfaced all this is already fixed in =.emacs.d= (commit on main:
+mock =executable-find= at the boundary instead of the helper). The durable
+process change is the part that belongs here.
diff --git a/docs/design/2026-06-29-lint-org-structural-checkers-proposal.org b/docs/design/2026-06-29-lint-org-structural-checkers-proposal.org
new file mode 100644
index 0000000..c464aca
--- /dev/null
+++ b/docs/design/2026-06-29-lint-org-structural-checkers-proposal.org
@@ -0,0 +1,55 @@
+#+TITLE: lint-org.el — four structural heading checkers org-lint doesn't cover
+
+* What changed (from .emacs.d, 2026-06-29)
+
+Added four custom judgment checkers to =lint-org.el=, following the existing
+=lo--check-tables= / =lo--check-level2-dated-headers= pattern (custom scans run
+after the org-lint pass, emitting judgment items, never auto-fixed):
+
+- =indented-heading= — a line of whitespace + stars + space OUTSIDE any block.
+ org parses a heading only at column 0, so leading whitespace silently demotes
+ a would-be heading to body text: the task vanishes from the agenda and never
+ archives. The worst defect class (an invisible task) and entirely silent
+ today. Skips indented stars inside =#+begin_/#+end_= blocks (legit content).
+- =empty-heading= — a line of bare stars with no title.
+- =malformed-priority-cookie= — a =[#x]=-shaped token org rejected (lowercase,
+ multi-char, non-letter) left stranded where a cookie would be. Checks only the
+ first cookie token per heading; skips verbatim-wrapped =[#D]= in dated-log
+ titles.
+- =level2-done-without-closed= — a level-2 DONE/CANCELLED with no CLOSED line.
+ Directly supports the todo-cleanup aging step (sent separately today): an
+ undated completed task gets force-archived immediately, so flagging it lets
+ the human add CLOSED first.
+
+Two attached files (edited canonical candidates): =lint-org.el=,
+=tests/test-lint-org.el=.
+
+* Why
+
+org-lint validates links, drawers, blocks, and babel — but NOT heading
+well-formedness. On Craig's .emacs.d todo.org a missing org-bullet in the live
+buffer prompted the question "is the file structurally okay?", and org-lint
+(even unfiltered, all checkers) reported nothing actionable. These four close
+the gap. They are general (any org file), not project-specific.
+
+* Design notes for the canonical
+
+- All four are regex-based, NOT org-element/keyword-based, so they don't depend
+ on which TODO keywords the batch Emacs happens to recognize (lint-org.el does
+ not set =org-todo-keywords=). The =level2-done-without-closed= done set is a
+ defconst =lo-done-keywords= (DONE/CANCELLED) for easy extension.
+- *Gotcha worth carrying in the canonical:* =case-fold-search= defaults to t, so
+ a naive =[A-Z]= cookie check accepts =[#a]= as valid and =\(DONE\|CANCELLED\)=
+ matches the title words "done"/"cancelled". Both letter-sensitive checkers
+ bind =case-fold-search nil=. (Caught by a failing test before it shipped.)
+- Wired into =lo-process-file= after =lo--check-level2-dated-headers=. Judgment
+ output already flows through the existing report + followups-file machinery.
+- 8 new ERT tests (good-input-silent + bad-input-flagged for each, plus
+ block-skip and verbatim-skip boundary cases). 44/44 green. Zero false
+ positives on a real 5600-line todo.org.
+
+* Note
+
+=make task-sorted= in .emacs.d now runs =lint-org.el todo.org= after the
+archive, so these checkers also gate the task-hygiene target. Makefiles aren't
+template-synced; that wiring is project-local (noted for context).
diff --git a/docs/design/2026-06-29-todo-cleanup-aging-proposal.org b/docs/design/2026-06-29-todo-cleanup-aging-proposal.org
new file mode 100644
index 0000000..5a18990
--- /dev/null
+++ b/docs/design/2026-06-29-todo-cleanup-aging-proposal.org
@@ -0,0 +1,64 @@
+#+TITLE: todo-cleanup.el — add Resolved-section file-aging to --archive-done
+
+* What changed (from .emacs.d, 2026-06-29)
+
+Extended =todo-cleanup.el='s =--archive-done= mode (the =make task-sorted=
+target) with a SECOND step, run after the existing Open Work -> Resolved move:
+
+- *Age the Resolved section.* Level-2 DONE/CANCELLED subtrees whose CLOSED date
+ is older than =tc-archive-retain-days= (default 7) — AND any with no parseable
+ CLOSED date — move out of the in-file Resolved section to =tc-archive-file=
+ (default =archive/task-archive.org= beside the todo file). Only tasks closed
+ within the last week stay in todo.org itself.
+
+Two files are attached (the edited canonical candidates):
+- =todo-cleanup.el=
+- =tests/test-todo-cleanup.el=
+
+* Why
+
+Craig's .emacs.d todo.org had grown to 768KB / 9616 lines, ~44% of it a
+243-task in-file "Resolved" section. The existing =--archive-done= only moved
+closures Open Work -> Resolved (same file), so the file grew without bound. The
+new step keeps only the last week of closed tasks in the file and sheds the rest
+to a git-tracked archive sibling. After this run: 207 aged out, todo.org
+9616 -> 5625 lines.
+
+* Design notes for the canonical
+
+- New defvars: =tc-archive-retain-days= (7; nil disables the step, preserving
+ legacy in-file-only behavior), =tc-archive-reference-date= ((YEAR MONTH DAY),
+ nil=real today — mockable for deterministic tests), =tc-archive-file= (nil =>
+ =archive/task-archive.org= beside the todo file).
+- Policy: KEEP iff CLOSED date present AND within the window (cutoff inclusive).
+ Older OR undated => archive. The undated->archive call is deliberate ("keep
+ the last week and that's it"); an earlier undated->keep version left 14 legacy
+ undated tasks behind and read as two weeks.
+- The aging step honors =--check= (previews + reports, writes nothing).
+- Report: an additive "N aged subtree(s) moved to task-archive.org" line, only
+ when N>0, so the existing real-mode-no-op silence tests are unaffected.
+- Archive file scaffold: =#+TITLE: Task Archive= / =#+FILETAGS: :archive:= /
+ =* Resolved (archived)=; aged subtrees append as level-2 children; created on
+ first use, appended to thereafter (one scaffold, never duplicated).
+- Tests: =tc-test--reset= now sets the aging knobs OFF (retain nil) so the
+ existing in-file-move + sync tests are untouched by the wall clock; a new
+ =tc-test--age= harness re-enables them with a fixed reference date and a temp
+ archive file. 6 new tests (old+undated move, cutoff-inclusive stay, disabled,
+ idempotent, check-no-write, straggler pipeline, append-preserves). 34/34 green.
+
+* Cross-project consideration for your value gate
+
+Default is ON (retain 7) for ALL consuming projects once this syncs. A project's
+first =task-sorted= after the sync will shed everything in its Resolved section
+older than a week to a new =archive/task-archive.org=. That's the intended
+feature, but flag it — projects with a large historical Resolved section will see
+a big first-run move (git-tracked, recoverable). Adjust the default or gate it if
+you'd rather it be opt-in per project.
+
+* Companion (project-local, NOT synced)
+
+.emacs.d's Makefile =task-sorted= target now also runs =lint-org.el todo.org=
+after the archive, as a structural-safety pass (org-lint catches links/drawers/
+blocks; we separately verified heading-level structure by hand). Makefiles aren't
+template-synced, so this is per-project — noting it in case the pattern is worth
+documenting alongside the tool.
diff --git a/docs/design/2026-06-30-daily-drivers-tailscale-correction.org b/docs/design/2026-06-30-daily-drivers-tailscale-correction.org
new file mode 100644
index 0000000..9e2bb52
--- /dev/null
+++ b/docs/design/2026-06-30-daily-drivers-tailscale-correction.org
@@ -0,0 +1,9 @@
+#+TITLE: Correction to claude-rules/daily-drivers.md: it states 'the
+#+SOURCE: from .emacs.d
+#+DATE: 2026-06-30 13:20:30 -0400
+
+Correction to claude-rules/daily-drivers.md: it states 'the agent can't reach the other machine; the point is to surface...'. That assumption is now demonstrably false. On 2026-06-30, from velox, I drove ratio directly over tailscale ssh — git fetch + reset --hard to repair ratio's .emacs.d after a history rewrite, plus scp'd a file across. Both daily drivers are on the same tailnet and reachable from each other.
+
+Gotcha worth capturing in the rule: the BARE hostname does not resolve (ssh ratio -> 'Could not resolve hostname'), which makes it look unreachable. The tailscale IP (e.g. 100.71.182.1) and the MagicDNS name (ratio.tailf3bb8c.ts.net) DO resolve and connect. First connection from a given address fails host-key verification under BatchMode; -o StrictHostKeyChecking=accept-new clears it. 'tailscale status' lists every node's IP + online state.
+
+Suggested rule change: reframe daily-drivers.md from 'can't reach, so surface it' to 'CAN reach over tailscale ssh — so the agent can directly sync/verify/repair the other daily driver, not just flag it'. Keep the flag-it guidance as the fallback for when tailscale is actually down. Add the bare-hostname-doesn't-resolve / use-tailscale-IP-or-MagicDNS gotcha. uname -n still tells you which machine you're on.
diff --git a/docs/design/2026-07-02-auto-flush-mechanism-note.org b/docs/design/2026-07-02-auto-flush-mechanism-note.org
new file mode 100644
index 0000000..fbe06ae
--- /dev/null
+++ b/docs/design/2026-07-02-auto-flush-mechanism-note.org
@@ -0,0 +1,20 @@
+#+TITLE: AUTO-FLUSH capability — proven live in the archsetup session
+#+SOURCE: from archsetup
+#+DATE: 2026-07-02 01:26:20 -0400
+
+AUTO-FLUSH capability — proven live in the archsetup session 2026-07-02, Craig asks that it be promoted to all projects and recommended as part of the no-approvals speedrun to keep sessions sharp.
+
+Problem: /clear is a user-only keystroke, so long autonomous sessions either bloat or hit arbitrary auto-compaction. Craig can't always be around to type it.
+
+Mechanism (companion script: self-inject.sh, sent separately to this inbox):
+1. At a clean task boundary, the agent refreshes .ai/session-context.org exactly as the flush skill does (checkpoint with Active Goal / Decisions / Next Steps).
+2. It derives its own tmux pane: match pane_pid from 'tmux list-panes -a' against its process ancestry (the ai launcher runs every agent session inside tmux, so this holds everywhere).
+3. It arms the injection VIA THE TMUX SERVER — tmux run-shell -b "sleep 25; tmux send-keys -t %N -l '/clear'; tmux send-keys -t %N Enter; sleep 15; tmux send-keys -t %N -l 'go — auto-flush resume: read .ai/session-context.org and continue per Next Steps'; tmux send-keys -t %N Enter" — and immediately ends its turn so the prompt is idle when the keys land.
+4. /clear fires the SessionStart hook (which already points a fresh context at notes.org + session-context.org), and the injected resume line starts the next turn. Zero human keystrokes.
+
+Gotchas learned the hard way:
+- A detached child (setsid/nohup/&) of a tool call DIES when the tool call ends; only tmux run-shell -b (server-owned) survives the turn boundary.
+- Under run-shell the process is a child of the tmux server, so ancestry-based pane detection can't run there — derive the pane first from the agent's shell, pass it explicitly.
+- Collision: if the user is typing when the keys fire, the injection merges into their input (a real /clear became '/clearto' mid-word). Fine for unattended sessions; warn the user to keep hands off the armed window if present.
+
+Suggested integration: an 'auto' mode on the flush skill (checkpoint, then self-inject instead of prompting the user), plus a line in the no-approvals speedrun workflow to auto-flush at clean boundaries when context grows heavy. The script could live in claude-templates' .ai/scripts/ so every project gets it on sync.
diff --git a/docs/specs/2026-06-16-autonomous-batch-execution-spec.org b/docs/specs/2026-06-16-autonomous-batch-execution-spec.org
new file mode 100644
index 0000000..23fc574
--- /dev/null
+++ b/docs/specs/2026-06-16-autonomous-batch-execution-spec.org
@@ -0,0 +1,393 @@
+#+TITLE: Autonomous-Batch Task Execution — Spec
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-06-16
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* IMPLEMENTED Autonomous-Batch Task Execution — Spec
+:PROPERTIES:
+:ID: 90f623cd-fdbe-4f5c-b63d-b2f84d9151cf
+:END:
+- 2026-07-02 Thu @ 05:26:07 -0400 — DOING → IMPLEMENTED: all six phases built (work-the-backlog.org, both callers, the waiver gate, checklist/Q&A/page mechanics, metrics record, KB synthesis) and the live trial validated — run c726f526, 3/3 tasks as reviewed commits with the pre-flight Q&A, page, and metrics all exercised. Craig confirmed and granted :LOOP_MAY_COMMIT:.
+- 2026-07-02 Thu @ 00:44:59 -0400 — READY → DOING: spec-response decomposition ran — the speedrun build parent in todo.org carries the :SPEC_ID: binding, one task per phase (1-6) plus the live-trial validation and the flip-to-IMPLEMENTED task. Phase 0 had already landed 2026-07-01.
+- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to READY (evidence-based, human-confirmed)
+
+* Metadata
+| Status | implemented |
+|----------+--------------------------------------------------------------------|
+| Owner | Craig Jennings |
+|----------+--------------------------------------------------------------------|
+| Reviewer | Craig Jennings |
+|----------+--------------------------------------------------------------------|
+| Date | 2026-06-16 |
+|----------+--------------------------------------------------------------------|
+| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:../design/2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] |
+|----------+--------------------------------------------------------------------|
+
+* Summary
+
+Two proposals arrived within a day of each other describing the same capability: have Claude work a batch of small, well-marked tasks autonomously, with a full quality bar per task and no per-step approval gate. The inbox-zero "Phase E" proposal drives it from a tag/priority query on a recurring loop; the "speedrun" proposal drives it from an explicit ordered list a human dictates in-session. This spec reconciles both into one feature: a single dedicated workflow, =work-the-backlog.org=, that holds the task-execution logic, with two thin callers feeding it. It also designs the instrumentation that measures whether the autonomy is actually paying off.
+
+* Problem / Context
+
+Craig has a standing backlog of small, solo-doable fixes across several projects, already marked with a tag convention (=:next:=, =:quick:+:solo:=). Doing them by hand one at a time is the bottleneck — the context-switch and the per-commit approval ceremony dominate the actual work. He wants Claude to burn these down unattended: on a recurring loop for the routed inbox case, and on demand when he batches a named list and says "speedrun, no approvals until done." The speedrun is the away-from-desk / working-on-something-else mode, so it must be able to take on larger tasks too — not only sub-30-minute ones — or it forces him to stay at the desk for anything non-trivial.
+
+Two separate proposals tried to answer this:
+
+- *Phase E* (in =inbox-zero.org=, edited in =.emacs.d= as a stopgap) bolted autonomous execution onto the inbox-zero workflow's on-demand and loop callers. The sender flagged the seam as the open question: coupling capture-routing with autonomous-implementation pollutes inbox-zero's three existing callers (startup, wrap-up, on-demand), two of which must never execute anything.
+- *speedrun* (a =.emacs.d= theme-studio session that worked well) is the same execution loop driven by an explicit ordered task set, with end-of-set paging and always-push.
+
+They overlap almost entirely. The execution loop — eligibility gate, act-vs-file decision, per-task quality bar, bounded run — is identical. Only the *input* differs (tag query vs explicit list) and the *session mode* differs (loop default vs no-approvals + always-push + page). Building them as two features would duplicate the execution logic and let the two copies drift. The forces: keep inbox-zero's callers clean, share one execution loop, and make the autonomy safe enough to run unattended on a 30-minute timer without Craig watching.
+
+A second, explicit ask from Craig: instrument this so its effectiveness is measurable. "Gather data on this and create some org-roam articles we can look at later." Autonomous execution that silently makes bad commits is worse than no autonomy; the only way to know which it is, is to measure tasks completed vs deferred vs reverted, and human corrections in the following session, over time.
+
+* Goals and Non-Goals
+
+** Goals
+- One workflow, =work-the-backlog.org=, owns the task-execution loop. Both input shapes (tag query, explicit list) and both session modes feed it.
+- inbox-zero's three existing callers stay clean: the loop caller chains into =work-the-backlog= *after* routing; startup and wrap-up never touch it.
+- The *no-approvals speedrun* is a thin named preset, not a second implementation: autonomous-commit + always-push + end-of-set page, fed an explicit ordered list, with all approvals front-loaded into a single pre-flight step (below) so the run itself is uninterrupted.
+- Eligibility is decided by *crisp, checkable criteria*, not adjectives: a mechanical tag/status gate (=:solo:= + status =TODO=), then a per-task defer checklist whose keystone is "can I write the failing test from the task text without inventing a requirement?" Task *size* is explicitly not a gate — a large task is decomposed into per-logical-commit chunks, not deferred.
+- The autonomy tags (=:solo:=, =:quick:=) carry hard definitions in =todo-format.md= and are applied + enforced as a mandatory step in the task-review and task-audit workflows, so the run-time gate trusts the author's tag instead of re-deriving it.
+- Commit autonomy defaults to file-only (surface a diff, no auto-commit). A project opts into autonomous commit+push explicitly via its per-project waiver.
+- Hard guardrails: refuse any task carrying data-loss / irreversible / external-state risk without a checkpoint; gather any one-or-two quick decisions a task needs *up front* (speedrun) rather than guessing; file a =VERIFY= for anything underspecified or needing design deliberation; a per-run cap / kill switch beyond "one task per run."
+- A lightweight per-run metrics log plus a periodic synthesis step that writes org-roam KB articles summarizing the trend.
+
+** Non-Goals
+- *Not* a replacement for =/start-work=. Tasks needing deliberation or design stay with =/start-work= and its approval gates. This feature only touches the marked, solo set — regardless of size.
+- *Not* a new tag convention. It reads the project's own priority/tag scheme header; it never invents or hardcodes tags across projects.
+- *Not* an inbox-routing change. =inbox-zero.org= keeps its A-D phases. The Phase E text added in =.emacs.d= as a stopgap is *removed* and its logic moves here.
+- *Not* a multi-project orchestrator. One run works one project's backlog. Cross-project handoff stays with =inbox-send= and the paging reply.
+- *Not* a credential-handling or external-API feature. Tasks that touch secrets or external mutations are out of the eligible set by the guardrail.
+
+** Scope tiers
+- *v1:* =work-the-backlog.org=; crisp =:solo:= / =:quick:= definitions in =todo-format.md= plus their mandatory application in task-review and task-audit; the eligibility gate (=:solo:= + status =TODO=, read against the project's scheme header); the act-vs-file *defer checklist* (test-writability keystone, enumerated data-loss list, already-satisfied, design-deliberation); the no-approvals speedrun's pre-flight decision-gathering step; file-only commit default with per-project opt-in; the loop caller wiring and inbox-zero Phase E removal; the speedrun preset with end-of-set =notify --persist= page; the per-run metrics log (structured JSONL).
+- *Out of scope:* a token-budget kill switch (cap is a task count in v1); cross-project batch runs; a dashboard or live UI over the metrics.
+- *vNext (log to todo.org):* the periodic org-roam synthesis step if it doesn't make v1; a token/cost budget alongside the task-count cap (more pressing now that task size is uncapped — a single large task can run long in the unattended loop); auto-detection of "human corrected my autonomous commit" from the next session's diff.
+
+* Design
+
+** Overview
+
+The architecture is one execution workflow with two callers and one preset, plus an instrumentation sidecar.
+
+#+begin_example
+ inbox-zero loop caller ──(after Phase D routing)──┐
+ ├──▶ work-the-backlog.org ──▶ metrics log (JSONL)
+ no-approvals speedrun ──(explicit ordered list)──┘ │
+ = pre-flight Q&A + autonomous-commit + push + page ▼
+ periodic synthesis ──▶ org-roam KB articles
+#+end_example
+
+=work-the-backlog.org= is the only place the execution loop lives. It takes a *task set* (however assembled) and a *session mode* (which gates commit autonomy and paging), and works the set under a fixed safety contract. The two callers differ only in how they build the task set and which session mode they pass.
+
+This is the seam the Phase E sender asked for: separating capture-routing (inbox-zero) from autonomous-implementation (work-the-backlog) keeps inbox-zero's startup and wrap-up callers — which must never execute anything — untouched. The loop caller is the only one of inbox-zero's callers that chains forward into execution, and it does so as an explicit second step after routing completes, not as a phase buried inside inbox-zero.
+
+** The execution loop (two-altitude: caller's view)
+
+A caller hands =work-the-backlog= three things:
+
+1. *A task set* — either an explicit ordered list of task headings (speedrun), or the result of a tag/priority query against =todo.org= (the loop). The workflow does not care which; it receives an ordered list of candidate tasks.
+2. *A session mode* — =file-only= (default) or =autonomous-commit= (requires the project's per-project waiver), and a paging flag.
+3. *A run cap* — the maximum number of tasks to complete this run.
+
+It returns: per-task outcome (implemented+committed / implemented+diff-surfaced / deferred-VERIFY / dropped-by-craig / skipped-ineligible), and a metrics record per task.
+
+** The execution loop (implementer's view)
+
+For the task set, in order, until the run cap is hit:
+
+1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task.
+2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist.
+3. *Defer checklist* (below). Any hit → record the deferral reason (or, under the speedrun preset, route the quick-question gap to the pre-flight Q&A), next task.
+4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important, then close the task per =todo-format.md=. Decompose into as many logical commits as the change needs — size is not capped.
+5. *Commit autonomy branch:*
+ - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=.
+ - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=.
+6. *Record metrics* for the task (the JSONL append, below).
+7. Decrement the cap. At zero, stop.
+
+After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary.
+
+** Eligibility gate (mechanical — no judgment)
+
+A task is autonomous-safe when *both* hold. This layer is a lookup, not a judgment; all the judgment lives in the defer checklist below.
+
+1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= is the "awaiting Craig's manual confirmation" marker; auto-implementing one defeats the manual check it represents. The do-not-implement set is safe-by-omission: anything not plainly =TODO= (plus any project-declared "hold" marker) is out.
+2. *Tagged =:solo:=* — the autonomy tag, resolved against the project's priority/tag scheme header (not hardcoded). =:solo:= carries a hard definition (see Tag definitions, below): the task is completable without Craig's involvement beyond at most one or two quick decisions answerable up front, with no design deliberation. A project whose scheme declares a different autonomous-safe tag set overrides the default. Priority / =:next:= drive *ordering* within the eligible set, not eligibility.
+
+Task *size* is deliberately absent from this gate. The old "≤ ~30 minutes / one logical commit" criterion is removed: a large but well-specified, decision-free task is in scope and is decomposed into per-logical-commit chunks during implementation. Size never sends a task to =/start-work=; only *deliberation* or *risk* does (the checklist below). This is what makes the speedrun usable as an away-from-desk mode rather than a sub-30-minute-only mode.
+
+*** Tag definitions (land in =todo-format.md=, enforced in task-review + task-audit)
+
+- *=:solo:= — autonomy.* The task can be completed without Craig's involvement, except for at most one or two quick decisions that can be stated and answered before the run starts. No open design question, no "weigh these approaches," no waiting on Craig mid-task. This is the eligibility tag.
+- *=:quick:= — effort hint only.* A small, fast task. Informational for batching and estimating a run's duration; *not* an eligibility gate (size no longer gates).
+
+Both tags are applied at task creation and *re-checked as a mandatory step* in the task-review and task-audit workflows, so the run-time gate can trust the author's tag rather than re-derive autonomy and effort from the task body. A task-review or task-audit that skips the =:solo:= / =:quick:= assessment is incomplete.
+
+** Act-vs-file decision (the defer checklist)
+
+After the scope read, run each eligible candidate through the checklist below. Each item is a concrete, answerable question, not an adjective. *Any* hit — or any "unsure" — sends the task to defer (or, for a quick-decision gap under the speedrun preset, to the pre-flight Q&A). Only a task that clears every item is implemented.
+
+1. *Test-writability (the keystone).* Can I write the failing test from the task text — plus any decisions gathered up front — without inventing a requirement? *No / unsure* → underspecified. Under the speedrun preset, if the gap is one or two quick answerable questions, route it to the pre-flight Q&A; otherwise file a =VERIFY= noting what's missing. Under the unattended loop, file the =VERIFY= (no one to ask). This replaces the old "clear / bounded / underspecified" adjectives with an action that fails loudly: if the red test isn't writable, the task isn't ready.
+2. *Data-loss / irreversible / external operation.* Does implementing it require any of: =rm= of non-scratch data, =git reset --hard= / force-push, =DROP= / =DELETE= / =TRUNCATE=, file truncate/overwrite of persisted content, a schema or data migration, any external or shared-state mutation, any credential touch? *Yes* → do NOT implement; file a =VERIFY= naming the risk. This is the hard safety gate; an upfront answer never overrides it without an explicit checkpoint. Replaces the vague "data-loss risk" with an enumerated, greppable set.
+3. *Already-satisfied.* Does the scope read show the desired end-state already holds? *Yes* → file a =VERIFY= noting it (the "raise max spans to 5 — every cap was already 8" case) and move on. Don't make a no-op change.
+4. *Design deliberation.* Does the task carry an unresolved design question, a "weigh these approaches" with real tradeoffs, or a TBD that isn't a quick factual answer? *Yes* → under the speedrun preset, if it collapses to one or two quick questions, route to pre-flight Q&A; otherwise file and surface as a =/start-work= candidate. Under the loop, file. The discriminator is now *quick-answerable question* vs *deliberation* — not task size.
+
+A task that clears 1–4 is implemented under the project's commit discipline, decomposed into as many logical commits as the change needs. When genuinely unsure which side a task falls on, defer — a wrong auto-implement costs a revert *and* the next-session correction the metrics are designed to catch.
+
+** Pre-flight decision gathering (the no-approvals speedrun's only interaction)
+
+The speedrun preset front-loads every approval into one step before the run, so the run itself is uninterrupted — that is what "no approvals" means. It is *not* "no input ever"; it is "all input first, then hands-off."
+
+When Craig kicks off a speedrun over an explicit list:
+
+1. *Gather* the named task set.
+2. *Scope-read and classify* each task against the eligibility gate + defer checklist: ready (clears the checklist), needs-quick-decisions (one or two upfront-answerable questions — checklist item 1 or 4), or drop (data-loss / irreversible, or design deliberation that isn't a quick question).
+3. *Order* the list (priority, then the author's ordering / =:next:=).
+4. *Intro the work* — present the ordered plan: what will run, what was dropped and why, and the batched questions for the needs-quick-decisions tasks.
+5. *Craig answers each question, or says "skip this"* → a skipped task is removed from the run (recorded =dropped-by-craig=); an answered task has the answer recorded so implementation works from the decision, not a guess.
+6. *Run the finalized list autonomously* — no further approvals until done.
+7. *End-of-set page* with completed + remaining + skipped.
+
+The unattended *loop* caller has no human at kickoff, so it cannot gather decisions: there, a needs-quick-decisions task simply defers (files its note) like any other checklist hit. The pre-flight Q&A is a speedrun-preset capability, not a loop one.
+
+** Session modes and the no-approvals speedrun preset
+
+Two orthogonal session-mode dimensions feed the loop:
+
+- *Commit autonomy:* =file-only= (default) or =autonomous-commit=. =autonomous-commit= is honored only when the project carries the per-project waiver (=.emacs.d= and =rulesets= have it; most projects do not). Absent the waiver, a request for =autonomous-commit= degrades to =file-only= and says so.
+- *Paging:* on or off. End-of-set only.
+
+The *no-approvals speedrun* is the named preset = =autonomous-commit= + always-push + paging-on, fed an *explicit ordered list*, run after the pre-flight decision-gathering step above. It is not a separate code path; it is a label for that combination of mode flags plus the explicit-list input, with the pre-flight Q&A as its only interactive moment. The loop caller, by contrast, runs =file-only= (unless the project has the waiver and opts the loop into commits) with paging off, fed the *tag query*, with no pre-flight step.
+
+** Bounding the run and the kill switch
+
+Default cap: one task per run for the loop caller — implement the highest-priority eligible candidate (=[#A]= before =[#B]= before =[#C]=), record, then stop and let the next tick continue. The speedrun preset works the whole explicit list in order (the human bounded it by naming it), still one commit per logical change.
+
+The kill switch is a hard per-run task cap passed by the caller, independent of "one per run": even the speedrun stops at the cap and pages with the remainder listed. A loop that fires every 30 minutes and commits unattended needs a ceiling that a runaway can't exceed. With task size now uncapped, the count cap no longer bounds *cost* — a single large task can run long — so a token/cost budget is the most pressing vNext addition.
+
+** End-of-set paging
+
+When the set is done (or the cap is hit), if paging is on, fire one page — end-of-set only, never per-task:
+
+#+begin_src sh
+notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>" --persist
+#+end_src
+
+=--persist= keeps it on screen until dismissed (the page-me convention). The message carries the project name, the completed count, and the remaining count, so Craig can reply confirming ready + naming the next project in one turn. The page-signal wrapper removed 2026-06-12 is reconciled to =notify= here — there is no separate page-signal call.
+
+* Alternatives Considered
+
+** Fold execution into inbox-zero (the Phase E stopgap shape)
+- Good, because it's the smallest diff — the loop caller already runs inbox-zero, so execution is "one more phase."
+- Bad, because it couples capture-routing with implementation. inbox-zero has three callers; startup and wrap-up must never execute. A Phase E inside inbox-zero forces both to carry a "skip Phase E" caveat and risks a future caller running it by accident.
+- Neutral, because the eligibility-gate and defer-checklist text is identical either way — only its *home* differs.
+
+** Two separate features (keep Phase E and speedrun distinct)
+- Good, because each proposal ships as written with no reconciliation work.
+- Bad, because the execution loop is duplicated in two places and will drift; a guardrail tightened in one won't reach the other. Two ways to do autonomous execution is two things to audit.
+- Neutral, because the input and session-mode differences are real — but they're thin caller-level differences, not a reason to fork the engine.
+
+** Keep the task-size gate (defer anything over ~30 minutes)
+- Good, because it bounds per-task cost and blast radius with a single number.
+- Bad, because it defeats the away-from-desk use case — anything non-trivial bounces back to Craig, so he can't actually leave. Size correlates poorly with risk; a large mechanical refactor is safer than a tiny change to persisted state.
+- Neutral, because the things size was a proxy for (risk, cost) are covered directly — risk by the data-loss checklist, cost by the run cap (and the vNext token budget). The defer checklist's deliberation item, not size, is what routes genuine =/start-work= tasks out.
+
+** Autonomous-commit as the default
+- Good, because it's faster end-to-end with no diff to review.
+- Bad, because most projects lack the per-project waiver, and an unattended loop committing to a project that never opted in is exactly the failure the file-only default prevents. The blast radius of a bad autonomous commit is a revert plus lost trust in the loop.
+- Neutral, because the projects that *do* want it (=.emacs.d=, =rulesets=) opt in explicitly, so the capability is available where it's wanted without being the default everywhere.
+
+* Decisions [8/8]
+
+** DONE Eligibility tag set and where it's read
+- Owner / by-when: Craig / spec-review
+- Context: Projects' priority/tag schemes vary, and the =todo-format.md= scheme header is the declared per-project source of truth. Task size is no longer a gate, so eligibility rests on the autonomy tag, not an effort cap.
+- Decision: Eligibility = status =TODO= AND the =:solo:= autonomy tag, resolved against the project's scheme header (a project may declare a different autonomous-safe set). Priority / =:next:= drive ordering, not eligibility. =:quick:= is an effort hint, never a gate.
+- Consequences: easier — one workflow works across projects with different vocab, and the gate is a pure lookup; harder — a project with no/malformed scheme header needs a fallback, and the default (=:solo:=) must be defined precisely enough that two projects agree.
+
+** DONE Crisp =:solo:= / =:quick:= definitions, enforced in task-review + task-audit
+- Owner / by-when: Craig / spec-review
+- Context: The run-time gate is only as crisp as the tags. Today =:quick:= / =:solo:= are listed in the scheme header with no hard definition, and nothing enforces that tasks get assessed for them.
+- Decision: Define =:solo:= (completable without Craig beyond at most one-or-two upfront-answerable quick decisions; no design deliberation) and =:quick:= (small/fast effort hint only) in =todo-format.md=, and make assessing both a *mandatory step* in the task-review and task-audit workflows. A review/audit that skips the assessment is incomplete.
+- Consequences: easier — authoring-time judgment by the human who knows the answer, and the run-time gate trusts the tag; harder — task-review and task-audit grow a required step, and existing untagged tasks need a back-fill pass.
+
+** DONE The do-not-auto-implement marker set
+- Owner / by-when: Craig / spec-review
+- Context: =VERIFY= means "awaiting Craig's manual confirmation"; other projects may use markers differently.
+- Decision: Do-not-implement = any status that is not =TODO=, plus any project-declared "hold" marker. Safe-by-omission: exclude anything not plainly =TODO=.
+- Consequences: easier — portable, and manual-check tasks can't auto-run; harder — richer per-project overrides need marker semantics in the scheme header, which most lack, so the default must stay conservative.
+
+** DONE Pre-flight decision gathering for the speedrun preset
+- Owner / by-when: Craig / spec-review
+- Context: Forcing every decision-needing task to defer wastes the away-from-desk use case — many tasks need only one or two quick answers Craig could give at kickoff. The speedrun is interactive at its start but must be hands-off after.
+- Decision: The speedrun preset gathers + orders the set, intros the work, and batches all needed quick decisions into one pre-flight Q&A; Craig answers or says "skip this" (drops the task); the run then proceeds with zero further approvals. The unattended loop has no kickoff human, so it defers decision-needing tasks instead.
+- Consequences: easier — "no approvals" becomes "all approvals first," which fits working-while-away, and larger / lightly-underspecified tasks become runnable; harder — the classifier must reliably split quick-question vs real-deliberation, and the recorded answers must reach the implementer so it works from the decision, not a guess.
+
+** DONE Commit-autonomy opt-in mechanism
+- Owner / by-when: Craig / spec-review
+- Context: =file-only= is the default; =.emacs.d= and =rulesets= have a per-project waiver allowing autonomous commits. Where does the workflow *read* that a project has opted in?
+- Decision: Read the opt-in from the project's existing per-project waiver location (=notes.org= Workflow State or =CLAUDE.md=), not a new config file. Two flags: "has commit waiver" and "loop may commit" can differ.
+- Consequences: easier — no new config surface, reuses the existing waiver concept; harder — the waiver location/format must be pinned for deterministic detection, and "waiver yes, loop-commit no" needs the two-flag split.
+
+** DONE Run-cap default and the kill switch shape
+- Owner / by-when: Craig / spec-review
+- Context: The loop default is one task per run; the speedrun works an explicit list. Both need a hard ceiling. Task size is now uncapped, so a single task can be large.
+- Decision: The caller passes a hard per-run task cap (loop default 1; speedrun = length of the explicit list, capped at a ceiling); stop + page with the remainder when the cap is hit. v1 caps by task count, not token budget.
+- Consequences: easier — a simple caller-controlled integer with a bounded task count; harder — a count cap doesn't bound *cost*, and with size uncapped a single large task can run long, so a token budget is vNext and more pressing than before.
+
+** DONE Metrics log location and format
+- Owner / by-when: Craig / spec-review
+- Context: Per-run metrics must land somewhere structured and queryable, per-project, and survive across sessions for the synthesis step to read.
+- Decision: Append one JSONL record per task to a per-project log at =.ai/metrics/work-the-backlog.jsonl=, git-tracked, with the synthesis step reading the union across projects.
+- Consequences: easier — append-only JSONL is trivial to write and =jq=-queryable, and per-project keeps it local to the work; harder — a git-tracked log adds commit churn, and "union across projects" needs the synthesis step to know where every log lives.
+
+** DONE Synthesis cadence and trigger
+- Owner / by-when: Craig / spec-review
+- Context: Craig wants periodic org-roam articles summarizing the data. What triggers synthesis, and how often?
+- Decision: Run synthesis on an explicit trigger ("synthesize backlog metrics") and optionally a weekly scheduled run, writing one KB node per synthesis under =~/org/roam/agents/= per the knowledge-base rule.
+- Consequences: easier — an explicit trigger means no surprise writes, and the KB rule already governs node shape; harder — a weekly run needs a scheduler entry, and the personal-only write-classification must gate it so work-project metrics never land in the KB.
+
+* Implementation phases
+
+** Phase 0 — Tag definitions + task-review/audit enforcement
+Add the hard =:solo:= / =:quick:= definitions to =todo-format.md=, and add the mandatory tag-assessment step to the task-review and task-audit workflows. Independent of the workflow build; lands first so the eligibility gate has crisp tags to read and existing tasks start getting assessed. Tree stays working: these are rule + workflow prose additions.
+
+** Phase 1 — Extract the execution loop into work-the-backlog.org
+Write =work-the-backlog.org= holding the eligibility gate, defer checklist, per-task quality bar, and run-cap logic — taking a task set + session mode + cap as input. Remove the stopgap "Phase E" text from =inbox-zero.org= (restore it to its A-D shape) in the same change so there's one home, not two. Tree stays working: inbox-zero reverts to routing-only, and the new workflow is callable but not yet wired to the loop.
+
+** Phase 2 — Wire the two callers
+Add the loop caller's chain step (after inbox-zero Phase D, invoke work-the-backlog with the tag query + file-only + cap 1) and the no-approvals speedrun preset (pre-flight decision-gathering → explicit list + autonomous-commit + always-push + paging-on). Both go through the same workflow; only the speedrun runs the pre-flight Q&A. Tree stays working: each caller is independently testable.
+
+** Phase 3 — File-only vs autonomous-commit gate
+Implement the commit-autonomy branch: read the per-project waiver, degrade =autonomous-commit= to =file-only= when absent, surface the degrade. Tree stays working: default file-only behavior is the safe path even before the waiver-read lands.
+
+** Phase 4 — The defer checklist, pre-flight Q&A, and the page
+Implement the act-vs-file defer checklist (test-writability keystone, enumerated data-loss list, already-satisfied, design-deliberation), the speedrun pre-flight decision-gathering (gather → classify → order → intro → batch-ask → skip/answer), the =VERIFY=-on-ambiguity filing, and the end-of-set =notify alarm ... --persist= page. Tree stays working: the checklist only ever *reduces* what runs, and the pre-flight step only runs under the speedrun preset.
+
+** Phase 5 — Metrics log
+Append the per-task JSONL record at each task outcome. Tree stays working: logging is a side effect that doesn't alter execution.
+
+** Phase 6 — Synthesis to org-roam
+Write the synthesis step: read the JSONL union, compute the per-run and trend metrics (below), write a KB node under =~/org/roam/agents/= per the knowledge-base rule, personal-projects-only classification enforced. Tree stays working: synthesis is read-only over the logs plus a KB write.
+
+* Acceptance criteria
+- [ ] =work-the-backlog.org= exists and is the only home for the execution loop; =inbox-zero.org= is back to its A-D routing-only shape with no Phase E.
+- [ ] The loop caller chains into work-the-backlog after routing; startup and wrap-up never invoke it.
+- [ ] The no-approvals speedrun runs as the preset (pre-flight Q&A → autonomous-commit + always-push + end-page) over an explicit ordered list, one commit per logical change.
+- [ ] =:solo:= and =:quick:= carry hard definitions in =todo-format.md=, and task-review + task-audit both refuse to complete without assessing them.
+- [ ] Eligibility = status =TODO= AND =:solo:=, read from the project's scheme header, not hardcoded; a =VERIFY= / =DOING= / =DONE= / =CANCELLED= task is skipped by the gate.
+- [ ] Task size never sends a task to =/start-work=; a large but =:solo:=, well-specified task runs and is decomposed into per-logical-commit chunks.
+- [ ] The defer checklist fires correctly: a task whose red test isn't writable (and isn't a quick-question gap), one carrying an enumerated data-loss operation, an already-satisfied one, and one needing design deliberation are each deferred (or routed to pre-flight Q&A under the speedrun), not implemented.
+- [ ] Under the speedrun preset, a task needing one or two quick decisions is surfaced in the pre-flight Q&A; "skip this" drops it, an answer is recorded and used; the run then proceeds with no further approvals.
+- [ ] Under the unattended loop, a decision-needing task defers (no pre-flight Q&A).
+- [ ] In a project without the commit waiver, an =autonomous-commit= request degrades to file-only and says so; no commit is made.
+- [ ] The run stops at the per-run cap and pages with the remaining tasks listed.
+- [ ] Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl=.
+- [ ] The synthesis step reads the logs and writes a KB node under =~/org/roam/agents/=; it refuses to write for work-classified projects.
+
+* Effectiveness measurement
+
+This section answers Craig's explicit ask: measure whether autonomous-batch execution is actually effective, and build the "gather data → org-roam articles" loop.
+
+** What "effective" means here
+
+The autonomy is effective if it completes real work that *stays* completed — i.e. tasks land green and the next session doesn't have to undo or fix them. The two failure modes to catch are (1) the loop defers everything (over-cautious, no value delivered) and (2) the loop implements badly (commits that get reverted or hand-corrected next session). Both are measurable.
+
+** Per-run metrics (the JSONL record)
+
+One record per task, appended to =.ai/metrics/work-the-backlog.jsonl= at each task outcome:
+
+| Field | Meaning |
+|-------------------+--------------------------------------------------------------------|
+| =ts= | ISO timestamp of the task outcome |
+|-------------------+--------------------------------------------------------------------|
+| =run_id= | UUID shared by all tasks in one run |
+|-------------------+--------------------------------------------------------------------|
+| =project= | project basename |
+|-------------------+--------------------------------------------------------------------|
+| =caller= | =loop= or =speedrun= |
+|-------------------+--------------------------------------------------------------------|
+| =task= | task heading (slug) |
+|-------------------+--------------------------------------------------------------------|
+| =outcome= | implemented-committed / implemented-diff / deferred-verify / |
+| | skipped-ineligible / dropped-by-craig (skipped at pre-flight) |
+|-------------------+--------------------------------------------------------------------|
+| =defer_reason= | underspecified / data-loss / already-satisfied / needs-deliberation |
+|-------------------+--------------------------------------------------------------------|
+| =upfront_decision=| true if a pre-flight answer was recorded and used for this task |
+|-------------------+--------------------------------------------------------------------|
+| =wall_clock_s= | seconds from task start to outcome |
+|-------------------+--------------------------------------------------------------------|
+| =commit_sha= | for committed tasks; empty otherwise |
+|-------------------+--------------------------------------------------------------------|
+| =review_findings= | count of /review-code Critical+Important findings on this task |
+|-------------------+--------------------------------------------------------------------|
+
+Per-run rollups computed at synthesis (not stored per record): tasks attempted, completed, VERIFY-deferred, dropped-by-craig, reverted; wall-clock total; commits landed; review findings per commit.
+
+** The corrections signal (the key metric)
+
+The hardest and most valuable metric is *human corrections in the following session* — did Craig revert or hand-fix an autonomous commit? v1 captures the cheap proxy: at synthesis, for each =commit_sha=, check whether a later commit touching the same files reverted it or carries a "fix"/"revert" of that change within N days. A clean run is one where the autonomous commits survive untouched. (Auto-detecting "this later commit corrected that autonomous one" precisely is a vNext refinement; the proxy — reverted-or-touched-soon-after — is good enough to flag a problem run for human review.)
+
+** Where the data lands
+
+Per-project git-tracked JSONL at =.ai/metrics/work-the-backlog.jsonl=. Append-only, =jq=-queryable, survives across sessions and machines via the normal project sync. Git-tracked so the history is auditable and the synthesis step can read it from any clone.
+
+** The synthesis loop (gather → article)
+
+On the "synthesize backlog metrics" trigger (and optionally a weekly scheduled run):
+
+1. Read the JSONL union across the personal projects the synthesizer can see.
+2. Compute the rollups and the trend: completion rate over time, defer-reason distribution, review-findings-per-commit trend, and the corrections-signal flag count.
+3. Write one org-roam KB node under =~/org/roam/agents/YYYYMMDDHHMMSS-backlog-metrics-<window>.org= per the knowledge-base rule — filetags =:agent:metrics:=, a concise title, the rollup table, the trend narrative, and =[[id:...]]= links to prior synthesis nodes so the series is traceable.
+4. Enforce the KB write-classification: *personal projects only*. A work-classified project's metrics never write to the KB — they stay in that project's own =.ai/metrics/= log and the synthesizer reports the refusal per the KB refusal contract.
+
+The KB node is the artifact Craig reviews later — "are the autonomous runs completing more and getting corrected less over the last month?" reads off the trend table without re-querying raw logs.
+
+* Readiness dimensions
+
+- *Data model & ownership:* The task set is read from =todo.org= (project-owned, user-authored). The metrics JSONL is generated, append-only, git-tracked, project-owned. KB nodes are agent-generated under =~/org/roam/agents/= (never overwriting Craig's hand-authored nodes — link only). No editable region is co-owned.
+- *Errors, empty states & failure:* Empty task set → report "nothing eligible" and stop. Malformed scheme header → fall back to the default tag reading and surface the fallback. A task that fails mid-implementation → leave the tree working (don't commit a broken state), record the failure outcome, surface it, continue to the next task. No silent data loss: the data-loss guardrail refuses irreversible tasks outright.
+- *Security & privacy:* Tasks touching credentials or external mutations are excluded by the data-loss / external-state checklist item. The KB write is personal-projects-only; work metrics never leave the project. No secrets in the JSONL (task slugs and SHAs only).
+- *Observability:* The end-of-set page surfaces the run outcome. The per-task surface (implemented / deferred + reason / dropped / skipped) is the live progress view. The metrics log + KB synthesis is the long-run observability. A bad run is isolable from the JSONL (which task, which outcome, which review findings).
+- *Performance & scale:* Expected counts are small — a handful of tasks per run, one run per 30-min tick. No bottleneck at this scale. The cap bounds the worst case on task count; with size uncapped, a single large task is the cost outlier the vNext token budget addresses. Synthesis over months of JSONL is still a small file (one record per task).
+- *Reuse & lost opportunities:* Reuses =todo-format.md= for task close + the tag definitions, =/review-code= and =/voice personal= for the quality bar, =notify= for paging, the knowledge-base rule for KB writes, the per-project waiver for commit-autonomy, and task-review / task-audit for tag enforcement. No new config file (the opt-in rides the existing waiver). The execution loop is the one new shared asset.
+- *Architecture fit & weak points:* Integration points — inbox-zero loop caller (chain after Phase D), the per-project waiver location, =todo.org= scheme header, task-review / task-audit, =~/org/roam/agents/=. Weak point: the commit-autonomy gate depends on deterministically reading the waiver; mitigated by defaulting to file-only when the read is ambiguous (fail safe, not open). Second weak point: a 30-min loop committing unattended with uncapped task size; mitigated by the hard count cap and file-only default, with the token budget as the vNext backstop.
+- *Config surface:* Per-project — commit-autonomy opt-in (via existing waiver), optional loop-commit flag, optional autonomous-safe tag override in the scheme header. Per-call — task set, session mode, run cap. Defaults: file-only, paging-off (loop) / paging-on (speedrun), cap 1 (loop).
+- *Documentation plan:* The workflow file itself is the user/operator doc (matches inbox-zero.org's self-documenting style). The =.emacs.d= stopgap note and the speedrun proposal are superseded by this spec; no separate migration doc needed beyond removing the Phase E text.
+- *Dev tooling:* N/A for new build targets — the workflows are prose, exercised by invocation. The metrics JSONL is =jq=-inspectable by hand; a tiny rollup helper may be added under =.ai/scripts/= if the synthesis prose proves to need it (decided at Phase 6, not a v1 prerequisite).
+- *Rollout, compatibility & rollback:* Rollout is removing Phase E from inbox-zero and adding work-the-backlog — both prose changes, instantly reversible. Compatibility: inbox-zero's three callers are unchanged except the loop caller gaining a forward chain. Rollback: delete work-the-backlog and the loop chain step; inbox-zero is already back to A-D. The file-only default means the worst pre-rollback state is surfaced diffs, not committed changes.
+- *External APIs & deps:* =notify alarm "Page" "<msg>" --persist= verified against =/home/cjennings/.local/bin/notify= and the page-me workflow. =~/org/roam/= KB write path and node shape verified against the knowledge-base rule. No external API calls.
+
+* Risks, Rabbit Holes, and Drawbacks
+
+- *The corrections signal is a proxy, not ground truth.* "A later commit touched the same files" over-counts (legitimate follow-up work) and under-counts (a correction in a different file). It's a flag for human review, not a verdict. Don't rabbit-hole on making it precise in v1 — the proxy plus a human glance is the design.
+- *Waiver detection drift.* If the per-project waiver location moves or its format changes, the commit-autonomy gate could mis-read. Mitigation: fail safe to file-only. Pin the waiver format in the Phase 3 decision before building.
+- *Unattended-commit blast radius.* The headline risk. Mitigated four ways: file-only default, the hard cap, the data-loss checklist item, and the metrics loop (which makes a bad run visible after the fact even if the first three let something through). With task size uncapped, the cost dimension of this risk grows — the vNext token budget is the planned fifth layer.
+- *Scope creep into /start-work territory.* Size is intentionally no longer the brake. The brake is the defer checklist's design-deliberation item plus the "when unsure, defer" rule — keep item 4 strict so genuine deliberation-class tasks still route out even when they're tagged =:solo:= by mistake.
+- *Pre-flight classifier error.* The speedrun's gather step has to split quick-answerable-question from real-deliberation. Misclassifying a deliberation task as a quick question puts a half-baked decision into an autonomous run. Mitigation: when the question isn't answerable in one or two lines, treat it as deliberation and drop it from the run, not as a pre-flight question.
+
+* Testing / Verification / Rollout
+
+Verification is by invocation against a project's real =todo.org=: run the loop caller in file-only mode and confirm it surfaces diffs without committing; run the speedrun against a small explicit list in a waiver-carrying project and confirm the pre-flight Q&A fires, "skip this" drops a task, an answer is recorded and used, then one commit per logical change + the end page; plant a =VERIFY=-status task, a data-loss task, an already-satisfied task, and a large-but-=:solo:= task and confirm the first three are skipped/refused while the large one runs and decomposes; confirm the JSONL grows one record per task; run synthesis and confirm a KB node lands (personal project) or is refused (work project). Rollout is the Phase 0-6 sequence, each leaving the tree working; the file-only default makes early phases safe to ship before the commit and paging phases land.
+
+* References / Appendix
+
+- [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal (inbox-zero stopgap)]] and [[file:../../working/inbox-zero-phase-e/sender-note.org][its sender note with the 5 open questions]].
+- [[file:../design/2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] (file retains its original on-disk name pending a rename pass).
+- [[file:../../.ai/workflows/inbox-zero.org][inbox-zero.org (canonical, A-D)]] — the routing workflow this feature decouples from.
+- =~/code/rulesets/claude-rules/knowledge-base.md= — the org-roam write contract the synthesis step follows.
+
+* Review and iteration history
+** 2026-06-16 Tue — author
+- What: initial draft reconciling the Phase E and fix-speedrun proposals into one work-the-backlog.org feature, plus the effectiveness-measurement instrumentation.
+- Why: two overlapping proposals arrived within a day; building them separately would duplicate the execution loop and let it drift. Craig also asked explicitly for measurement + org-roam synthesis.
+- Artifacts: this spec; the two source proposals under docs/design/ and working/inbox-zero-phase-e/.
+** 2026-06-28 Sun — revision (Craig)
+- What: removed the task-size gate (size no longer defers; large tasks decompose into per-commit chunks); recast the act-vs-file rule as a crisp four-item defer checklist keyed on test-writability; added crisp =:solo:= / =:quick:= definitions destined for =todo-format.md= and made their assessment mandatory in task-review + task-audit; added the speedrun's pre-flight decision-gathering step (batch the quick questions up front, "skip this" drops a task, then run hands-off); renamed "fix speedrun" → "no-approvals speedrun" in prose. Status stays draft pending ratification of the revised decisions.
+- Why: the original criteria were adjectives, not checkable; the size gate forced Craig to stay at his desk for anything non-trivial, defeating the away-from-desk use case; and decision-needing tasks were over-deferred when many need only a quick upfront answer.
+** 2026-06-29 Mon — ratified
+- What: Craig ratified all eight revised decisions; Status → ready. Implementation-ready across Phase 0 (tag definitions + task-review/audit enforcement) through Phase 6 (synthesis).
+- Why: the crisp defer checklist and the pre-flight-Q&A design resolved the "criteria too soft" and "size shouldn't gate" concerns that held the spec in draft.
diff --git a/docs/design/2026-06-16-encourage-kb-contribution-spec.org b/docs/specs/2026-06-16-encourage-kb-contribution-spec.org
index cf8111b..cfbfe79 100644
--- a/docs/design/2026-06-16-encourage-kb-contribution-spec.org
+++ b/docs/specs/2026-06-16-encourage-kb-contribution-spec.org
@@ -1,10 +1,17 @@
#+TITLE: Encourage Org-Roam KB Contribution Across Workflows — Spec
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2026-06-16
-#+TODO: TODO | DONE SUPERSEDED CANCELLED
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* READY Encourage Org-Roam KB Contribution Across Workflows — Spec
+:PROPERTIES:
+:ID: f67f5f45-5aa1-4a5a-8704-d636e4e16f75
+:END:
+- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to READY (evidence-based, human-confirmed)
* Metadata
-| Status | approved (decisions ratified 2026-06-20) |
+| Status | ready |
|----------+------------------------------------------------|
| Owner | Craig Jennings |
|----------+------------------------------------------------|
diff --git a/docs/specs/2026-07-01-docs-lifecycle-spec.org b/docs/specs/2026-07-01-docs-lifecycle-spec.org
new file mode 100644
index 0000000..dcf88c8
--- /dev/null
+++ b/docs/specs/2026-07-01-docs-lifecycle-spec.org
@@ -0,0 +1,360 @@
+#+TITLE: Docs Lifecycle — Spec
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-07-01
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* DOING Docs lifecycle
+:PROPERTIES:
+:ID: 80b0787b-4a60-4c82-8a16-b383d3e3c8f2
+:END:
+- 2026-07-01 Wed @ 23:34:15 -0400 — READY → DOING: spec-response decomposition ran — build parent in todo.org carries the :SPEC_ID: binding, one task per phase plus the flip-to-IMPLEMENTED task and the manual-testing child. First live exercise of the transition-ownership table.
+- 2026-07-01 Wed @ 23:22:50 -0400 — DRAFT → READY: Codex re-review found all fourteen review findings closed and no remaining blocking implementation-readiness gaps.
+- 2026-07-01 Wed @ 22:54:41 -0400 — verify pass on the second responder round: all five fixes held, findings 1-9 unregressed, verdict ready; three minor nits folded in (scoped id-link criterion, untracked-copy cleanup in the recovery recipe, two stale prose spots). Stays DRAFT pending the reviewers' flip.
+- 2026-07-01 Wed @ 22:46:52 -0400 — second responder pass: all five re-review findings fixed (fourteen of fourteen closed); stays DRAFT — the READY flip belongs to the reviewers this round.
+- 2026-07-01 Wed @ 22:41:33 -0400 — READY → DRAFT: Codex re-review found five new blocking implementation-readiness gaps after the response pass.
+- 2026-07-01 Wed @ 22:41:21 -0400 — DRAFT → READY: dual independent review (Codex + fresh-context Claude agent, both initially Not ready), all nine findings fixed, verify pass by the original reviewer returned ready; flip authorized by Craig.
+- 2026-07-01 Wed @ 22:13:00 -0400 — drafted from the five decisions settled 2026-06-28 (todo.org "Spec storage location + lifecycle-status convention").
+
+* Metadata
+| Status | doing |
+|----------+------------------------------------------------------------------|
+| Owner | Craig Jennings |
+|----------+------------------------------------------------------------------|
+| Reviewer | Craig Jennings |
+|----------+------------------------------------------------------------------|
+| Date | 2026-07-01 |
+|----------+------------------------------------------------------------------|
+| Related | [[file:../design/2026-06-15-spec-storage-lifecycle-proposal.org][source proposal]]; todo.org "Spec storage location + |
+| | lifecycle-status convention" |
+|----------+------------------------------------------------------------------|
+
+* Summary
+
+Formal specs and working notes currently share one directory per project, and a spec's lifecycle state (drafted, in progress, shipped, dead) is invisible without opening the file. This spec adopts two coupled conventions — a location split (=docs/specs/= for formal specs, =docs/design/= for notes) and an authoritative in-file status carried by an org TODO keyword on a top-level status heading — plus =org-id= links for rename-safety, a general =docs-lifecycle= rule capturing the shape, and a one-time confirmed retrofit that sorts every project's existing pile.
+
+* Problem / Context
+
+.emacs.d triaged ~28 design docs and had to run a four-agent sweep reading every spec against the code to reconstruct which had shipped (6 implemented, 8 in progress, 12 not started, 1 superseded). Nothing in the filename, location, or file records the state, so the answer to "what's open?" degrades into "open every file and infer." rulesets has the same shape: 41 files in =docs/design/= of which only 3 carry a formal spec spine, plus two =-spec.org= files misfiled at the =docs/= root. The cost compounds with every doc added, and every project inherits the problem through the shared spec-create workflow.
+
+Two forces beyond triage cost:
+
+- *Links are load-bearing.* =todo.org= tasks, session archives, and sibling docs link specs by =file:= path. Any convention that renames or moves files on every status change (the filename-suffix approach) breaks those links repeatedly across a cross-linked, template-synced doc set.
+- *The convention is worthless if legacy docs stay misfiled* (Craig, 2026-06-28). Template sync distributes rules and workflows but cannot perform a one-time per-project migration, so the design must include a reach mechanism that gets each project's existing pile sorted once.
+
+* Goals and Non-Goals
+
+** Goals
+- A directory listing answers "which docs are specs, and what state is each in" without opening files.
+- Status transitions cost one small in-file edit (keyword + history line + Metadata mirror) — no rename, no link surgery.
+- Cross-doc spec links survive moves and renames.
+- The shape is captured once as a general rule (=docs-lifecycle=) so future artifact collections (brainstorm piles, recording queues) can reuse it.
+- Every existing project's =docs/design/= pile gets sorted exactly once, with human confirmation on each classification.
+
+** Non-Goals
+- No automation of status flips — the keyword is edited by whoever changes the state (spec-create, spec-review, spec-response, or a human), not by a watcher.
+- No retroactive rewriting of session archives or git history that reference old paths; only live inbound links (=todo.org=, =notes.org=, docs) are updated by the retrofit.
+- No new tracking database or index file — the files are the index.
+
+** Scope tiers
+- v1: the location split, the status-heading convention, the org-id link standard, the =docs-lifecycle= rule, spec-create/spec-review/spec-response updates, the retrofit helper + startup nudge, and the rulesets pilot.
+- Out of scope: applying the lifecycle shape to non-doc collections (the rule documents the pattern; adopting it elsewhere is per-collection work).
+- vNext: an org-agenda custom view over =docs/specs/*.org= keyed on the status keywords (nice-to-have once the keywords exist; log to todo.org).
+
+* Design
+
+** The location split
+
+- =docs/specs/= — formal specs only. A *spec* is a doc proposing a buildable change that carries a =Decisions= section and =Implementation phases= (the spec-create spine). Filenames keep the existing =YYYY-MM-DD-<topic>-spec.org= shape — the =-spec.org= suffix stays because spec-review's Phase 0 precondition keys on it; only the *status* suffixes from the original proposal are dropped.
+- =docs/design/= — everything else: brainstorms, inventories, proposals, research notes, frozen source material. Review findings live inside the spec they review (current spec-review behavior), so standalone review files are legacy notes and stay in =docs/design/=.
+
+** The status heading (the authoritative record)
+
+Each spec's first element after the file header is a single top-level *status heading* carrying the org TODO keyword:
+
+#+begin_example
+,#+TODO: TODO | DONE
+,#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+,* DOING <spec short name>
+:PROPERTIES:
+:ID: <uuid>
+:END:
+- <dated one-line history entries, newest first>
+#+end_example
+
+- *The keyword is authoritative.* The Metadata table's =Status= field mirrors it in lowercase for readers already in the table, and a status transition updates keyword + history line + mirror in the same edit; on disagreement the heading wins.
+- *Two keyword sequences, no collisions.* The lifecycle sequence *joins* — never replaces — the =TODO | DONE= sequence that the =* Decisions= and =* Review findings= task machinery depends on. The two sequences share no keyword (the old header's =SUPERSEDED CANCELLED= done-states migrate to the lifecycle sequence; a legacy =CANCELLED= decision heading still parses as a done-state there, so =[/]= cookies stay mechanically correct). The retrofit rewrites each legacy header to carry both lines.
+- *Vocabulary:* =DRAFT= (being written) → =READY= (review passed, buildable) → =DOING= (implementation in progress) → =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= (terminal).
+- *Transition ownership — every flip has a named owner:*
+ - =DRAFT= — spec-create stamps it at authoring time.
+ - =DRAFT= → =READY= — spec-review, on a passing gate (keyword + history line + mirror in the review pass).
+ - =READY= → =DOING= — spec-response, when it decomposes the phases into build tasks. *The decomposition writes the spec-to-task binding:* the =todo.org= parent task it creates (or updates) carries a =:SPEC_ID:= property holding the spec's status-heading UUID. That property is the durable join between the spec and its build work.
+ - =DOING= → =IMPLEMENTED= — the session that completes the final implementation phase. To make that a tracked obligation rather than a memory, spec-response's phase-to-task breakdown *always emits a final task*: "flip the spec to IMPLEMENTED + history line," as a child of the bound parent. Safety net: task-audit's reconcile pass runs one query — for each =docs/specs/*.org= whose keyword is =DOING=, find the =todo.org= task with the matching =:SPEC_ID:=; flag the spec when that parent is =DONE=/=CANCELLED=, archived, or missing. Checking the *parent's* keyword (not "are all child tasks closed") sidesteps both the flip-task chicken-and-egg (the parent only closes after the flip task ran) and =--convert-subtasks= rewriting completed children into dated entries (dated children never affect the parent's keyword). This is the mechanism whose absence produced the .emacs.d six-shipped-specs-with-no-record failure; "a human remembers" is explicitly not the design.
+ - =SUPERSEDED= / =CANCELLED= — whoever makes the call, with the reason in the history line.
+- *Glanceability without opening files:* one grep gives the full board —
+
+ #+begin_src sh
+ rg -H '^\* (DRAFT|READY|DOING|IMPLEMENTED|SUPERSEDED|CANCELLED) ' docs/specs/
+ #+end_src
+
+ and because the keyword sits on a real org heading, an org-agenda view over =docs/specs/= works for free (the vNext item).
+- *The heading body is the dated status history* — one line per transition (=YYYY-MM-DD Day @ HH:MM:SS -ZZZZ — <what changed, by whom>=), the record a filename could never carry.
+- Why a dedicated status heading rather than restructuring each spec under one top-level heading: demoting every section in every existing spec is a large, link-hostile rewrite; a prepended heading is additive, retrofittable by script, and leaves the familiar flat section layout untouched.
+
+** Rename-safe links
+
+The status heading carries an =:ID:= UUID, assigned at authoring time (and by the retrofit for legacy specs). The target state is that cross-doc references to a spec use =[[id:<uuid>]]= rather than =file:= paths, so any future move can't orphan them. =file:= links remain fine for intra-doc anchors and for notes that never move. The KB's existing id-resolution recipe applies: =rg ':ID:[[:space:]]+<uuid>' docs/=.
+
+*Staged conversion — ids assigned now, links converted only when clickable.* =org-id-locations= only indexes agenda files and files org has visited, so a fresh =:ID:= in =docs/specs/= won't resolve on click in a live Emacs until the id index learns about project docs. =org-id-extra-files= is not a glob mechanism — it's a literal file list, only consulted under =org-id-track-globally= — so "point it at the globs" is not executable as written. The sequencing is therefore:
+
+1. *Pilot and retrofit rewrite =file:= links only* (path recomputation per the relink contract). Every link stays clickable throughout; no conversion window exists.
+2. *:ID: properties are still assigned* during the sort — harmless, and they make the later conversion mechanical.
+3. *Link conversion to =id:= is a separate follow-up pass*, gated on .emacs.d landing an executable id-index mechanism: enumerate each project's =docs/specs/*.org= into =org-id-extra-files= as real file names (a small function globbing at startup, with =org-id-track-globally= t), or a periodic =org-id-update-id-locations= over that enumeration — verified by clicking a known id link. The Phase 4 note to .emacs.d carries this ask; the =rg= recipe is the fallback for non-Emacs consumers either way.
+
+** The =docs-lifecycle= rule (the generalization)
+
+A new =claude-rules/docs-lifecycle.md= captures the reusable shape, with spec-create as the first instance:
+
+1. Separate formal artifacts from working notes by location.
+2. Lifecycle state lives *in* the artifact, on a scannable, greppable carrier (an org keyword heading), with a dated history.
+3. Links use rename-safe identifiers.
+4. A growing collection earns this treatment when "which of these are live?" starts requiring a file-by-file read.
+
+** The retrofit (reach mechanism for existing piles)
+
+A synced helper, =spec-sort=, run once per project. *Canonical placement:* like every synced asset, the helper and all workflow edits land in rulesets' canonical tree first — =claude-templates/.ai/scripts/spec-sort= with its bats tests in =claude-templates/.ai/scripts/tests/= (the glob-discovered suite), workflow changes in =claude-templates/.ai/workflows/= — then =scripts/sync-check.sh --fix= propagates the committed =.ai/= mirror and both sides commit together. A mirror-only edit is reverted by the next sync; nothing in this feature is exempt from that contract. Downstream projects receive everything through the normal startup rsync. The run itself, per project:
+
+1. *Classify* each =docs/**/*.org= outside =docs/specs/= by one predicate: a doc carrying *both* a =Decisions= heading *and* an =Implementation phases= heading is a spec candidate; everything else is a note. (A =Metadata= table alone does not qualify — real counter-case: =docs/design/task-review.org= has a Metadata table and no spine, and is a note.) The heuristic *proposes*; a human confirms every move (classification is a judgment call — Craig, 2026-06-28).
+2. *Move* confirmed specs to =docs/specs/=, *renaming to carry the =-spec.org= suffix* when the file lacks it (spec-review's Phase 0 precondition requires it — a retrofitted spec must be reviewable in its new home). Prepend the status heading, assign an =:ID:=, and rewrite the keyword header to the two-sequence form above. *The proposed keyword is evidence-based, not laundered:* the doc's own Status field is one signal among several, because stale Status fields are exactly what caused the original .emacs.d sweep. For each candidate the helper shows an evidence panel — the current Status value, the decision/finding cookie states, the state and heading of any =todo.org= task that links or binds to the doc, the most recent history/review entries, and (where cheap) whether artifacts the phases name actually exist — and proposes the keyword the evidence supports. When the evidence is inconclusive, the default is the most conservative *non-terminal* state it supports (never a terminal one). =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= are never applied without an explicit human-stated reason, recorded in the status-history line.
+3. *Relink* under an explicit contract:
+ - *Rewritten roots (project-owned):* =todo.org=, =.ai/notes.org=, =docs/**=, =.ai/project-workflows/=, =.ai/project-scripts/=. The rewrite recomputes each link's relative path from the linking file's directory to the new location. *All rewrites stay =file:= links* — conversion to =[[id:...]]= is the separate follow-up pass gated on the Emacs id-index mechanism (see Rename-safe links), never part of a sort run.
+ - *Reported, never rewritten:* =.ai/sessions/= archives (frozen history), git history, and synced template paths (=.ai/workflows/=, =.ai/scripts/=, =.ai/protocols.org=) — a downstream edit there is reverted by the next template sync, so the report names the canonical rulesets file that needs the edit instead.
+ - *Supported link shapes:* org =[[file:...]]= links, relative or project-root-anchored, with or without a description. Bare-path mentions in prose or scripts are *reported for manual handling*, never rewritten.
+ - *Safety:* dry-run report is the default; =--apply= writes, under a fail-safe contract sized to the fact that one run mutates filenames, links, headers, and =.ai/notes.org= together:
+ - *Clean-worktree preflight.* =--apply= refuses on a dirty git tree (=git status --porcelain= non-empty) unless =--allow-dirty= is passed, which prints exactly what recovery loses. A clean tree is what makes recovery trivially safe.
+ - *Validate, then write.* The full move + relink plan — every source, destination, and link edit — is computed and validated first (every link parses, every target is unambiguous, every destination path is free), written to a plan file for inspection, and only then executed from that recorded plan. Ambiguous cases (two candidates sharing a basename, an unparseable link) block validation: listed, untouched, non-zero exit until each is resolved or explicitly waived.
+ - *Failure mid-apply is not a shrug.* Any write failure or a failed post-apply residue grep stops the run, names what was and wasn't applied (from the plan), and prints the recovery recipe — =git restore= over the plan's touched paths *plus* deletion of the plan's newly-created destination paths (=git restore= reverts tracked edits but doesn't remove untracked copies the move created). Safe by construction because preflight required a clean tree; the project is never silently left half-migrated.
+ - After a successful apply, the residue grep for each old path across the rewritten roots must return zero or =spec-sort= exits non-zero naming the residue.
+4. *Stamp* =:LAST_SPEC_SORT: YYYY-MM-DD= in =.ai/notes.org='s =* Workflow State= section — the same surface as =:LAST_AUDIT:= and =:LAST_INBOX_PROCESS:=, created idempotently (append the section if the file lacks it) exactly as task-audit already does.
+
+*The startup nudge — concrete contract.* Phase A's parallel batch gains one read-only probe:
+
+#+begin_src bash
+{ [ -d docs/design ] || [ -n "$(find docs -maxdepth 1 -name '*-spec.org' -print -quit 2>/dev/null)" ]; } \
+ && ! grep -qs ':LAST_SPEC_SORT:' .ai/notes.org \
+ && echo "spec-sort: unsorted docs present" || true
+#+end_src
+
+(Phase 4 refined the stray-root check from =compgen= to =find=: =compgen= is bash-only and zsh aborts on an unmatched glob, so the original snippet false-negatived on stray root specs under zsh.)
+
+(The probe also fires on stray =docs/*-spec.org= root files, so a project whose only misfiled specs sit at the =docs/= root still gets nudged.)
+
+Phase C surfaces one line when the probe printed ("this project's docs pile has never been spec-sorted — say 'run spec-sort' to sort it") and stays silent otherwise. Projects with nothing to sort — no =docs/design/= and no stray root specs — never see it; a stamped marker permanently clears it.
+
+* Alternatives Considered
+
+** Filename status suffix (=-spec-doing.org=, =-spec-implemented.org=)
+- Good, because the state is visible in a bare =ls= with no tooling.
+- Bad, because every transition renames a file in a cross-linked, template-synced doc set — each rename is link surgery or a broken link, and the churn lands in git history and inbound =todo.org= links.
+- Neutral, because the ls-visibility it buys is matched by the one-line =rg= over status headings.
+- Rejected 2026-06-28 (Craig chose org-keyword over his earlier filename-suffix lean).
+
+** Status field in the Metadata table only (no keyword)
+- Good, because the field already exists and needs no new structure.
+- Bad, because a table cell is neither org-agenda-scannable nor reliably greppable across format drift, and it carries no dated history.
+- Neutral, because the field stays anyway — as the in-table mirror.
+
+** Relink-helper instead of org-id (keep =file:= links, fix them on every move)
+- Good, because readers see plain paths.
+- Bad, because it makes every future move a tooling event, and one missed run silently breaks links — the failure mode is invisible until someone clicks.
+- Neutral, because the retrofit needs relink logic once regardless; org-id just makes it a one-time need.
+
+* Decisions [5/5]
+
+All five were settled with Craig on 2026-06-28 (recorded in todo.org; migrated here per that note).
+
+** DONE Location split — adopt
+- Context: specs and notes share one directory; telling them apart requires opening files.
+- Decision: =docs/specs/= for formal specs (Decisions + phases spine); =docs/design/= for notes. Documented in spec-create and the docs-lifecycle rule.
+- Consequences: easier — a listing answers "what's formal"; harder — one-time migration and link updates (the retrofit).
+
+** DONE Status mechanism — org keyword authoritative, no filename suffix
+- Context: filename suffix vs org keyword; suffix wins =ls= visibility, keyword wins link stability and zero-rename transitions.
+- Decision: the org TODO keyword on the spec's top status heading is authoritative, mirrored by the Metadata =Status= field. No status suffixes in filenames.
+- Consequences: easier — a transition is one keyword edit and links never break; harder — glanceability needs the one-line =rg= (or the vNext agenda view) instead of bare =ls=.
+- (Refined in review, 2026-07-01: "one keyword edit" became "three lines in one file" — keyword + history line + Metadata mirror. The ratified decision stands; see Review findings.)
+
+** DONE Link safety — org-id for cross-doc spec links
+- Context: both the migration move and any future rename break =file:= links.
+- Decision: specs carry =:ID:= UUIDs on the status heading; cross-doc references use =[[id:...]]=.
+- Consequences: easier — moves are free; harder — following a link outside org needs the =rg ':ID:'= lookup.
+- (Refined in review, 2026-07-01: the decision stands; the *sequencing* is staged — IDs are assigned at sort time, but link conversion to =id:= waits for the executable Emacs id-index mechanism, so no window exists where converted links don't click. See Review findings.)
+
+** DONE Generalize as a =docs-lifecycle= rule
+- Context: the shape (in-artifact lifecycle state, formal-vs-notes split, rename-safe links) recurs for any processed-document collection.
+- Decision: capture it in =claude-rules/docs-lifecycle.md= with spec-create as the first instance.
+- Consequences: easier — the next collection reuses a decided pattern; harder — the rule must stay honest as the spec instance evolves.
+
+** DONE Retrofit existing files across ALL projects
+- Context: template sync distributes conventions but cannot perform a per-project one-time migration; legacy piles would stay misfiled forever.
+- Decision: ship a confirmed classify-move-relink helper (=spec-sort=) plus a startup nudge gated on =:LAST_SPEC_SORT:=; the helper proposes, a human confirms. Pilot on rulesets first.
+- Consequences: easier — every project converges without manual archaeology; harder — the helper needs real relink logic and tests, and classification stays a judgment call.
+
+* Review findings [14/14]
+:PROPERTIES:
+:ID: cc77a7f6-e4c3-488a-ac3b-e739420a5c2b
+:END:
+
+Two independent reviews (Codex, 2026-07-01 22:22; a fresh-context Claude agent, 2026-07-01 22:25) converged on =Not ready= with the same worst finding. All nine findings were dispositioned accept and fixed in the responder pass below; each carries its response.
+
+** DONE Org TODO vocabulary drops decision and finding task states :blocking:
+(Codex; the Claude reviewer found the same, adding that keywords must be unique across sequences so a naive two-line fix collides on =SUPERSEDED=/=CANCELLED=.) The spec's example header replaced the file-level keyword vocabulary, so =TODO=/=DONE= stopped being task states and the =[/]= cookies that gate readiness went vacuous — this file itself was the first casualty.
+Response: the scheme is now two collision-free sequences — =TODO | DONE= for decisions/findings, =DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED= for lifecycle (the old header's =SUPERSEDED CANCELLED= done-states migrate to the lifecycle sequence, and a legacy =CANCELLED= decision still parses as a done-state, so cookies stay correct). This file's own header now carries both lines; the Design section documents the two-sequence rule and the retrofit rewrites legacy headers to it. New acceptance criterion: cookies must compute by org, not hand counting.
+
+** DONE Relink behavior is too vague for a safe migration :blocking:
+(Codex; the Claude reviewer independently flagged the synced-.ai/ slice — a downstream rewrite there is reverted by the next template sync, e.g. =startup.org:154='s reference to a spec candidate.) The retrofit named no scan scope, link-shape list, rewrite rule, residue policy, or dry-run format — the implementer would have had to invent the migration's data-safety contract.
+Response: the retrofit section now carries the explicit contract: rewritten roots (=todo.org=, =.ai/notes.org=, =docs/**=, project-owned =.ai/= dirs), reported-never-rewritten surfaces (=.ai/sessions/=, git history, synced template paths — with the canonical rulesets file named in the report), supported link shapes (org =file:= links; bare paths report-only), relative-path recomputation, dry-run default with =--apply=, post-apply residue grep gating exit status, and refuse-loudly on ambiguity.
+
+** DONE Sort marker and startup nudge do not name the actual state surface :blocking:
+(Codex; the Claude reviewer rated the same gap minor — Codex's version was sharper: startup reads =.ai/notes.org=, not a root =notes.org=, and Workflow State may not exist.)
+Response: the marker is pinned to =.ai/notes.org='s =* Workflow State= (the =:LAST_AUDIT:= / =:LAST_INBOX_PROCESS:= surface), created idempotently as task-audit already does; the Design section now spells the Phase A probe command, its exact fire condition, and the Phase C one-liner.
+
+** DONE Phase order can strand legacy specs behind the new review precondition :blocking:
+(Codex; the Claude reviewer found the same at medium severity.) Hardening spec-review's path precondition in Phase 1 while piles stay unsorted until Phases 3-4 would make every legacy spec unreviewable in the gap.
+Response: Phase 1 now carries the compatibility rule — legacy =-spec.org= locations stay reviewable (with a "run spec-sort" nudge) until the project stamps =:LAST_SPEC_SORT:=; the precondition hardens only after. Acceptance criterion 5 updated to match.
+
+** DONE No owner for the DOING → IMPLEMENTED flip :blocking:
+(Claude reviewer.) spec-create owns =DRAFT= and spec-review owns =DRAFT= → =READY=, but implementation finishes outside the spec trio, and "a human edits it" is the exact mechanism whose failure produced this spec (.emacs.d's six shipped-but-unmarked specs).
+Response: the Design section now has a transition-ownership table naming an owner for every flip. =READY= → =DOING= belongs to spec-response; =DOING= → =IMPLEMENTED= is a tracked obligation — spec-response's phase-to-task breakdown always emits a final "flip the spec" task — with task-audit's reconcile pass as the safety net (flag any =DOING= spec whose implementation tasks are all closed). Phase 1 includes both workflow edits.
+
+** DONE Classification heuristic is precedence-ambiguous
+(Claude reviewer.) "Decisions plus phases or Metadata table" reads two ways, and =docs/design/task-review.org= (Metadata table, no spine) classifies differently under each.
+Response: one predicate now — spec candidate iff the doc carries *both* a =Decisions= heading *and* an =Implementation phases= heading; a Metadata table alone does not qualify. The task-review.org counter-case is cited in the retrofit step.
+
+** DONE spec-sort never renames moved files to the -spec.org suffix
+(Claude reviewer.) spec-review's Phase 0 hard-requires the suffix, so a retrofitted legacy spec without it would be unreviewable in its new home.
+Response: retrofit step 2 now renames moved files to carry =-spec.org= when they lack it; the relink pass covers the rename like any move. Acceptance criterion 3 checks the suffix on the re-homed root specs.
+
+** DONE Clicked id: links won't resolve in Craig's Emacs
+(Claude reviewer.) =org-id-locations= indexes only agenda and visited files, so fresh =:ID:=s in =docs/specs/= are invisible-until-clicked broken — the convention would trade visible link breakage for invisible breakage.
+Response: named as an explicit .emacs.d-side prerequisite in the Rename-safe-links section (=org-id-extra-files= over =docs/specs/= globs, or periodic =org-id-update-id-locations=), carried in the Phase 4 note to .emacs.d, with the =rg= recipe as the interim fallback.
+
+** DONE Acceptance criterion 2 contradicts the Metadata Status mirror
+(Claude reviewer.) "Exactly one keyword edit" was irreconcilable with the mandated mirror update.
+Response: a transition is now defined everywhere as three lines in one file — keyword, history line, mirror — still no rename and no link edits. Goals, Design, and criterion 2 all say the same thing.
+
+** DONE Synced helper placement ignores the canonical/mirror split :blocking:
+The spec says to build =.ai/scripts/spec-sort= and update =.ai/workflows/= behavior, but rulesets' current contract is that =claude-templates/.ai/= is canonical and the repo-root =.ai/= tree is only the committed mirror kept honest by =scripts/sync-check.sh=. =CLAUDE.md= explicitly warns that mirror-only edits get silently reverted by the next sync, and =make test= runs the mirror-side tests only after the canonical copy has been synced. V1 should say every shared workflow/script edit lands in =claude-templates/.ai/{workflows,scripts}/= first, then =scripts/sync-check.sh --fix= updates the mirror; =spec-sort= tests should be placed in the synced script-test tree and the acceptance criteria should include =sync-check= / workflow-integrity where relevant. (blocking)
+Response: the retrofit section now opens with the canonical-placement contract (helper + tests in =claude-templates/.ai/scripts{,/tests}/=, workflow edits canonical-side, =sync-check --fix= propagates, both sides commit together); Phases 1 and 2 name it per artifact; new acceptance criterion requires =sync-check= to exit clean after the build commits.
+
+** DONE Task-audit safety net has no spec-to-task binding :blocking:
+The spec says task-audit flags a =DOING= spec whose implementation tasks are all closed, but current =task-audit.org= audits open =todo.org= tasks and has no model for scanning =docs/specs/=, finding a spec's implementation tasks, or deciding "all closed" after =todo-cleanup.el --convert-subtasks= rewrites completed child tasks into dated entries. The added final "flip to IMPLEMENTED" task also means there may always be one open task, so a naive "all tasks closed" check never fires. V1 should define the binding spec-response writes into =todo.org= (for example a parent task property or stable link to the spec ID), the exact audit query, how converted dated entries count, and whether the final flip task is excluded from or satisfies the reconciliation rule. (blocking)
+Response: spec-response's decomposition now stamps a =:SPEC_ID:= property (the spec's status-heading UUID) on the build parent task — the durable binding. The audit query is defined: for each =DOING= spec, find the task with matching =:SPEC_ID:=; flag when that parent is closed, archived, or missing. Checking the parent's keyword (not "all children closed") dissolves both the flip-task chicken-and-egg and the dated-entry conversion concern. New acceptance criterion exercises the flag.
+
+** DONE spec-sort apply path can leave a half-migrated tree :blocking:
+The retrofit contract has dry-run by default and a post-apply residue grep, but it does not say what happens when =--apply= has moved files and then a relink, parse, or residue check fails. Because the operation mutates filenames, links, headers, IDs, and =.ai/notes.org= together, a partial failure can strand the project in the exact mixed state the tool is meant to prevent. V1 should require a clean-worktree preflight (or an explicit dirty-tree refusal/override), validate the full move/relink plan before the first write, write from a single recorded plan, and define recovery behavior for every failed apply: no files moved, automatic rollback, or a printed =git restore= / =git revert= recovery recipe that is safe for uncommitted local edits. (blocking)
+Response: the relink contract's safety block now specifies the fail-safe apply: clean-worktree preflight (refuse on dirty, explicit =--allow-dirty= override that prints what recovery loses), full plan computed + validated + written to a plan file before the first write, execution from the recorded plan, and mid-apply failure stopping with a named applied/not-applied breakdown plus the =git restore= recovery recipe — safe by construction because preflight required a clean tree. Bats covers the preflight and the forced-failure recovery output (Phase 2, plus a new acceptance criterion).
+
+** DONE org-id Emacs prerequisite is not executable as written :blocking:
+The spec says the .emacs.d-side fix can be =org-id-extra-files= over =docs/specs/= globs, but Emacs' own docstring says =org-id-extra-files= is a list of additional files and is only relevant when =org-id-track-globally= is set; it does not establish that project glob strings will be expanded or that every project root will be discovered. The rollout also converts links during the rulesets pilot before the Phase 4 note asks .emacs.d to make clicked =id:= links resolvable. V1 should either keep =file:= links until the Emacs support has landed, or specify the executable Emacs-side implementation precisely: how project =docs/specs/*.org= files are enumerated into =org-id-extra-files= or fed to =org-id-update-id-locations=, when it runs, how it is tested, and how rollout avoids a window where converted links do not click through. (blocking)
+Response: took the fork that removes the window entirely — the pilot and every sort run rewrite =file:= links only; =:ID:= properties are still assigned (harmless, enables later mechanics); conversion to =id:= is a separate follow-up pass gated on .emacs.d landing an executable id-index mechanism, now specified concretely (enumerate =docs/specs/*.org= into =org-id-extra-files= as real file names under =org-id-track-globally=, or feed the enumeration to =org-id-update-id-locations=; verified by clicking a known link). Decision 3 carries a sequencing-refinement note; a new acceptance criterion asserts zero =id:= links exist after the pilot.
+
+** DONE Status confirmation can still encode stale reality :blocking:
+The retrofit proposes lifecycle status from a doc's current =Status= field or review history, then asks a human to confirm. Those are the same stale/incomplete signals that caused the original .emacs.d sweep: shipped specs and dead specs were only knowable by reading code/tasks against the spec. If =spec-sort= only confirms a guessed keyword, the pilot can produce a clean-looking board whose state is still wrong. V1 should define status-confirmation evidence: for each spec candidate, what sources the helper shows (current Status, decision/finding cookies, linked =todo.org= parent state, recent history, matching implementation files/tests), what default is allowed when evidence is inconclusive, and that =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= require an explicit reason in the status history line. (blocking)
+Response: retrofit step 2 now defines the evidence panel the helper shows per candidate (Status value, cookie states, bound/linking =todo.org= task state, recent history entries, cheap existence checks on phase-named artifacts) with the keyword proposed from the evidence, not the Status field alone. Inconclusive evidence defaults to the most conservative non-terminal state; =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= always require an explicit human-stated reason recorded in the history line.
+
+* Implementation phases
+
+** Phase 1 — Rule + template updates
+Write =claude-rules/docs-lifecycle.md=. Update spec-create (emit into =docs/specs/=, the two-sequence keyword header, status heading with =:ID:= in the template, transition mechanics), spec-review (path expectation with the compatibility rule below; flipping =DRAFT= → =READY= on a passing review updates keyword + history + mirror), spec-response (owns =READY= → =DOING=; its decomposition stamps the =:SPEC_ID:= binding on the build parent and always emits the final "flip to IMPLEMENTED" task), and task-audit (one reconcile bullet running the =:SPEC_ID:= query: a =DOING= spec whose bound parent is closed, archived, or missing gets flagged). All four are synced assets: edits land in =claude-templates/.ai/= (and =claude-rules/=), the mirror follows via =sync-check --fix=, both commit together. *Compatibility rule:* spec-review keeps accepting legacy =-spec.org= locations (=docs/= root, =docs/design/=) until the project's =:LAST_SPEC_SORT:= is stamped, nudging "run spec-sort" when it meets one; only after the stamp does the =docs/specs/= precondition harden. No legacy spec is ever unreviewable during the transition. Tree stays working: new specs land in the new shape; old specs remain reviewable until their project sorts.
+
+** Phase 2 — The =spec-sort= helper
+Build =claude-templates/.ai/scripts/spec-sort= (classify → evidence-based confirm → plan + validate → move + rename + prepend status heading + assign =:ID:= → relink =file:= references → stamp =:LAST_SPEC_SORT:=), with bats coverage in =claude-templates/.ai/scripts/tests/= (glob-discovered by =make test=) for classification, the evidence/confirm gate, plan validation, moving + renaming, relinking, the clean-worktree preflight, mid-apply failure recovery output, idempotence, and the marker stamp. Mirror synced via =sync-check --fix= in the same commit. Tree stays working: the script is callable but nothing invokes it yet.
+
+** Phase 3 — Pilot on rulesets
+Run =spec-sort= against rulesets' own =docs/= (41 design files, 3 spec-spine candidates, 2 stray root specs). Fix what the pilot surfaces before any other project runs it. Tree stays working: moves are confirmed one by one, links updated in the same pass.
+
+** Phase 4 — Startup nudge + broadcast
+Add the Phase A probe + Phase C nudge line (the concrete contract in the Design retrofit section). Send .emacs.d a note that the convention is live, its ~28-doc pile is ready to sort, and the id-index mechanism is its side of the staged link conversion: enumerate each project's =docs/specs/*.org= into =org-id-extra-files= as real file names (with =org-id-track-globally= t) or feed that enumeration to a periodic =org-id-update-id-locations=, verified by clicking a known id link. The =id:= link-conversion pass across projects runs only after that lands — it is follow-up work, not part of v1's sort runs. Tree stays working: the nudge is one read-only line per session until acted on; every link is a working =file:= link until conversion day.
+
+* Acceptance criteria
+- [ ] =rg '^\* (DRAFT|READY|DOING|IMPLEMENTED|SUPERSEDED|CANCELLED) ' docs/specs/= lists every rulesets spec with its state, and the answer matches reality.
+- [ ] A status transition on a spec changes exactly three lines in one file — the keyword, a history line, and the Metadata mirror — with no rename and no link edits.
+- [ ] Every doc remaining in rulesets =docs/design/= is a note (lacks the Decisions + Implementation-phases spine); both stray =docs/= root specs are re-homed and carry the =-spec.org= suffix.
+- [ ] All inbound links in the rewritten roots resolve after the pilot, and the post-apply residue grep returns zero.
+- [ ] The spec's own decision/finding =[/]= cookies compute correctly under the two-sequence keyword header (org, not hand counting).
+- [ ] spec-create emits new specs into =docs/specs/= in the new shape; spec-review accepts legacy locations until =:LAST_SPEC_SORT:= is stamped and refuses them after.
+- [ ] Every helper/workflow artifact of this feature lives canonical-side (=claude-templates/.ai/=, =claude-rules/=) with the mirror in sync — =scripts/sync-check.sh= exits clean after the build commits.
+- [ ] A =DOING= spec whose =:SPEC_ID:=-bound parent task is closed or missing is flagged by task-audit's reconcile pass (exercised in the pilot or a fixture).
+- [ ] =spec-sort --apply= on a dirty worktree refuses (absent the override); a forced mid-apply failure in the bats suite yields the named-recovery output, not a half-migrated tree.
+- [ ] After the pilot, no link the sort *rewrote* uses =[[id:...]]= form and no rewritten root gained a new =id:= link targeting a spec (conversion is the gated follow-up); every rewritten link is a resolving =file:= link. The check scopes to actual rewritten and spec-target links — literal prose mentions of the id syntax (which already exist in =todo.org= and older specs) don't count, so a naive whole-file grep is the wrong implementation.
+- [ ] A project with an unsorted =docs/design/= gets the startup nudge; one confirmed =spec-sort= run clears it via =:LAST_SPEC_SORT:=.
+
+* Readiness dimensions
+- Data model & ownership: the spec file owns its state; the Metadata mirror is display-only. No external index to drift.
+- Errors, empty states & failure: =spec-sort= on a project with no =docs/= is a silent no-op; an ambiguous classification is surfaced, never auto-moved; a relink pass that finds zero inbound links is normal.
+- Security & privacy: N/A because the docs are already in-repo; no new exposure surface.
+- Observability: the status grep is the dashboard; =spec-sort= prints every proposed move and every rewritten link.
+- Performance & scale: N/A because collections are tens of files; everything is one-shot or grep-speed.
+- Reuse & lost opportunities: reuses org TODO keywords, org-id, the existing scheme-header pattern of declared vocabularies, and spec-review's in-file findings convention.
+- Architecture fit & weak points: weak point is the classification heuristic — mitigated by the confirm gate. The status heading is additive, so old readers of spec files see one extra heading and nothing breaks.
+- Config surface: none new — one marker line (=:LAST_SPEC_SORT:=) in the existing Workflow State section.
+- Documentation plan: the docs-lifecycle rule is the documentation; spec-create's template is the worked example.
+- Dev tooling: =spec-sort= ships with bats tests under the existing glob-discovered suite.
+- Rollout, compatibility & rollback: additive per project, one project at a time, rulesets first. Rollback of a sort is =git revert= of the pilot commit (moves + relinks are one commit).
+- External APIs & deps: N/A — plain files, =rg=, =uuidgen=.
+
+* Risks, Rabbit Holes, and Drawbacks
+- *Relink misses an inbound link shape* (org radio links, bare paths in scripts). Dodge: the pilot greps for the old path after moving and fails loudly on any residue.
+- *Heuristic over-classifies notes as specs.* Dodge: the confirm gate is mandatory; the helper never moves unconfirmed.
+- *Keyword vocabulary drift* between this spec, the rule, and spec-create's template. Dodge: the rule names the vocabulary once and the others link it.
+
+* Testing / Verification / Rollout
+bats for =spec-sort= (classification, the evidence/confirm gate, plan validation, move + rename, relink, the clean-worktree preflight, forced mid-apply failure recovery output, idempotence, marker stamp). The pilot run on rulesets is the live verification; the post-move residue grep is the acceptance check. Rollout is per-project via the startup nudge, each run human-confirmed.
+
+* References / Appendix
+- Source proposal: [[file:../design/2026-06-15-spec-storage-lifecycle-proposal.org]] (.emacs.d handoff, 2026-06-15).
+- Decisions record: todo.org "Spec storage location + lifecycle-status convention" (settled 2026-06-28).
+- This file is the convention's first resident: it lives in =docs/specs/=, carries the status heading + =:ID:=, and drops the status filename suffix.
+
+* Review and iteration history
+** 2026-07-01 Wed @ 22:13:00 -0400 — Claude — author
+- What: initial draft, written from the five pre-ratified decisions.
+- Why: the queued-specs half of the 2026-06-30 session goal; decisions were settled 2026-06-28 and needed migration into a buildable spec.
+- Artifacts: todo.org task "Spec storage location + lifecycle-status convention"; source proposal above.
+
+** 2026-07-01 Wed @ 22:22:34 -0400 — Codex — reviewer
+- What changed or was recommended: rubric =Not ready=. Four blocking findings were added: preserve Org task keywords while adding lifecycle status, make =spec-sort= relinking executable and failure-safe, define the actual =.ai/notes.org= marker/startup-nudge contract, and avoid stranding legacy specs behind a stricter path precondition before retrofit.
+- Why: current rulesets workflows still depend on =TODO= / =DONE= decision and finding tasks, startup state lives in =.ai/notes.org=, and the repo still contains formal specs outside =docs/specs/= until the migration runs.
+- Artifacts: Review findings section; current-state checks against =.ai/workflows/spec-create.org=, =.ai/workflows/spec-review.org=, =.ai/workflows/startup.org=, =scripts/sync-check.sh=, and =todo.org=.
+
+** 2026-07-01 Wed @ 22:25:00 -0400 — Claude (fresh-context agent) — reviewer
+- What: rubric =Not ready=. Independently found Codex's keyword-vocabulary blocker (adding the cross-sequence uniqueness wrinkle) and the stranded-legacy-specs and marker-surface gaps, plus five findings of its own: no owner for the =DOING= → =IMPLEMENTED= flip (blocking), the precedence-ambiguous classification heuristic, the missing =-spec.org= rename in spec-sort, org-id click-resolution in a live Emacs, and the criterion-2/mirror contradiction.
+- Why: fresh-eyes adversarial pass requested by Craig after his own read found nothing; the two reviews converging on the same worst bug from independent context is the confidence signal.
+- Artifacts: Review findings section (findings 5-9); spot-checks against real repo files (=docs/design/task-review.org=, the two stray root specs, =startup.org:154=).
+
+** 2026-07-01 Wed @ 22:46:52 -0400 — Claude — second responder pass
+- What: fixed all five of Codex's re-review findings in place (fourteen of fourteen closed): canonical-placement contract for every synced artifact (+ sync-check acceptance criterion), the =:SPEC_ID:= spec-to-task binding with the parent-keyword audit query (dissolving the flip-task chicken-and-egg), the fail-safe =--apply= contract (clean-tree preflight, validate-then-write from a recorded plan, named recovery), staged id-link conversion (pilot rewrites =file:= links only; =id:= conversion gated on the concrete .emacs.d id-index mechanism — the fork Craig approved), and evidence-based status confirmation (evidence panel, conservative non-terminal default, reasons required for terminal states). Status stays DRAFT; the READY flip belongs to the reviewers this round.
+- Why: Craig approved fixing all five ("1", 2026-07-01), including the keep-file:-links-through-pilot fork.
+- Artifacts: per-finding responses inline; the fixed Design/phase/criteria sections.
+
+** 2026-07-01 Wed @ 23:22:50 -0400 — Codex — reviewer
+- What changed or was recommended: rubric =Ready=. No new blocking findings. The second responder pass closed all five Codex re-review blockers without regressing the first nine findings, and the spec now gives implementers concrete contracts for canonical synced assets, =:SPEC_ID:= task binding, fail-safe =spec-sort --apply= behavior, staged id-link conversion, evidence-based status confirmation, phase sequencing, and test coverage.
+- Why: the current spec can be implemented and tested without hidden product decisions; remaining vNext work is separately tracked.
+- Artifacts: status heading flipped to =READY=; =* Decisions= [5/5]; =* Review findings= [14/14]; Emacs batch cookie check.
+
+** 2026-07-01 Wed @ 22:41:21 -0400 — Claude (fresh-context agent) — verify pass; Claude — READY flip
+- What: the original reviewer re-read the fixed spec against its own nine findings: all held, none regressed, verdict ready. It re-ran the classification predicate live (exactly 5 candidates; task-review.org excluded) and confirmed org computes the cookies. Two non-blocking minors folded in before the flip: a refinement note under Decision 2 (whose frozen body still said "one keyword edit") and a wider nudge probe that also fires on stray =docs/*-spec.org= root files. Status flipped DRAFT → READY.
+- Why: Craig authorized the flip contingent on the verify pass clearing; it did.
+- Artifacts: the status heading's history line; verify-pass report in the session record.
+
+** 2026-07-01 Wed @ 22:41:33 -0400 — Codex — reviewer
+- What changed or was recommended: rubric =Not ready=. Five new blocking findings were added after the response pass: make shared workflow/script edits obey the =claude-templates/.ai/= canonical plus =.ai/= mirror contract; define how task-audit binds a =DOING= spec to its implementation tasks; make =spec-sort --apply= failure-safe; turn the org-id Emacs prerequisite into an executable rollout step; and require status confirmation to be evidence-based rather than a rubber-stamp of stale fields.
+- Why: the response fixed the original keyword/relink/precondition issues but introduced new integration points in synced template assets, task-audit, Emacs id resolution, and migration safety that are not yet buildable from the spec.
+- Artifacts: Review findings section; checks against =CLAUDE.md=, =scripts/sync-check.sh=, =.ai/workflows/task-audit.org=, =.ai/workflows/startup.org=, =.ai/notes.org=, current =docs/= inventory, and Emacs batch/docstring checks for Org TODO cookies and =org-id-extra-files=.
+
+** 2026-07-01 Wed @ 22:30:06 -0400 — Claude — responder
+- What: merged both reviews into one findings ledger (nine findings, all dispositioned accept) and fixed all nine in place: two-sequence keyword header (applied to this file itself), transition-ownership table with the tracked flip-to-IMPLEMENTED task, single classification predicate, the -spec.org rename step, the full relink data-safety contract, the =.ai/notes.org= marker + Phase A/C startup contract, the legacy-location compatibility rule, the org-id Emacs prerequisite, and the three-line transition definition. Acceptance criteria updated to match.
+- Why: Craig approved fixing all nine ("1", 2026-07-01); none touched the five ratified decisions.
+- Artifacts: Review findings section (responses inline per finding); the fixed sections themselves.
diff --git a/docs/agent-knowledge-base-spec.org b/docs/specs/agent-knowledge-base-spec.org
index 78ff9bd..e36d897 100644
--- a/docs/agent-knowledge-base-spec.org
+++ b/docs/specs/agent-knowledge-base-spec.org
@@ -1,12 +1,20 @@
#+TITLE: Agent Knowledge Base on Org-roam — Spec
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2026-06-10
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* IMPLEMENTED Agent Knowledge Base on Org-roam — Spec
+:PROPERTIES:
+:ID: 08a5ec99-9e1e-40e4-8241-e8a41e9de49f
+:END:
+- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to IMPLEMENTED (reason: v1 (Phases 0-4) shipped 2026-06-10 on Craig's go; KB live at ~/org/roam with the knowledge-base rule installed machine-wide)
* Metadata
-| Status | implemented — v1 (Phases 0-4) shipped 2026-06-10 on Craig's go; manual validation + other-machine clones outstanding (todo.org) |
+| Status | implemented |
| Owner | Craig Jennings |
| Reviewer | Craig Jennings; Codex (2026-06-10) |
-| Related | [[file:../todo.org][todo.org — "Check that memories are sync'd across machines via git"]] |
+| Related | [[file:../../todo.org][todo.org — "Check that memories are sync'd across machines via git"]] |
This spec supersedes the 2026-06-05 draft (formerly docs/design/2026-06-05-org-roam-knowledge-base-spec.org, removed; content in git history), folding in Craig's 2026-06-10 ratification answers and restructuring to the spec-create format.
@@ -308,4 +316,4 @@ Modified recommendations from the 2026-06-10 Codex review, with reasons. Everyth
** 2026-06-10 Wed @ 17:31:10 -0500 — Codex — reviewer
- What changed or was recommended: re-ran the spec-review workflow after the caveat resolution. Rubric: ready. No new blocking or medium-priority findings; no review file written. Confirmed the implementation phases and test-surface tasks are already represented under the existing parent task in todo.org.
- Why: the prior blockers are dispositioned, the work-root denylist is confirmed, the pointer-rule install path matches the current Makefile RULES glob, and v1's manual/agent-runnable verification surface is explicit.
-- Artifacts: this file; [[file:../todo.org][todo.org]] parent task "Check that memories are sync'd across machines via git".
+- Artifacts: this file; [[file:../../todo.org][todo.org]] parent task "Check that memories are sync'd across machines via git".
diff --git a/docs/specs/inbox-workflow-consolidation-spec.org b/docs/specs/inbox-workflow-consolidation-spec.org
new file mode 100644
index 0000000..4543e77
--- /dev/null
+++ b/docs/specs/inbox-workflow-consolidation-spec.org
@@ -0,0 +1,199 @@
+#+TITLE: Inbox Workflow Consolidation — Spec
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-06-23
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* READY Inbox Workflow Consolidation — Spec
+:PROPERTIES:
+:ID: a7fe2a10-dfa8-4ba3-a11a-e7b1288b7573
+:END:
+- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to READY (evidence-based, human-confirmed)
+
+* Metadata
+| Status | ready |
+|----------+-------------------------------------------------------------|
+| Owner | Craig |
+|----------+-------------------------------------------------------------|
+| Reviewer | Craig |
+|----------+-------------------------------------------------------------|
+| Related | [[file:../../todo.org][Consolidate inbox/triage workflows + scheduled inbox check]] |
+|----------+-------------------------------------------------------------|
+
+* Summary
+
+Four inbox-named workflows (=inbox-zero=, =process-inbox=, =monitor-inbox=, plus the startup/wrap-up nudges) circle the same disposition logic across three different surfaces. This spec consolidates them into one =inbox= engine with explicit modes, keeps =triage-intake= (external accounts) and =no-approvals= (session mode) separate, and adds an interactive recurring roam check (=auto inbox zero=). The fully-unattended cron pass is named but deferred to vNext.
+
+* Problem / Context
+
+"Too many inbox related workflows" (Craig, roam capture 2026-06-23). The word "inbox" is overloaded onto three genuinely different surfaces, and the workflows that serve them have grown to circle the same logic:
+
+- *Project-local =inbox/= dir* (handoffs from other projects/scripts/Craig) → =process-inbox.org= owns the value gate and disposition; =monitor-inbox.org= is a thin cadence layer on top of it ("loop process-inbox every 15 min + act-vs-file + reply discipline").
+- *Global roam inbox* (=~/org/roam/inbox.org=, GTD capture) → =inbox-zero.org=, whose Phase A already *calls* =process-inbox= for the local dir before doing the roam-routing part.
+- *External accounts* (email / calendar / PRs) → =triage-intake.org= + six source plugins.
+
+So a reader (or a non-Claude agent) facing "deal with my inbox" has to know which of four files to invoke, and the shared concepts — the three-question value gate, the skeptical review, the implement/fold/file/defer/reject disposition, the reply-to-sender discipline, the capture-guard before a roam write, the priority-scheme check before filing — are spread across and cross-referenced between them. The duplication is real (=monitor-inbox= and =inbox-zero= both lean on =process-inbox='s machinery) and the count is the symptom Craig named.
+
+A second gap surfaced in the same capture: there's no documented way to run a *recurring* inbox check, and Craig wants a keyword trigger for it. v1 answers this with an interactive in-session loop (=auto inbox zero=); a fully-unattended cron pass that fires while Craig is away is a larger contract (mutation safety, surfacing-when-away, cross-run state) and is deferred.
+
+* Goals and Non-Goals
+
+** Goals
+- One engine is the single entry point for the inbox surfaces, with the shared value-gate / disposition / reply / capture-guard / priority-scheme logic living in exactly one place.
+- Mode selection is unambiguous from the trigger phrase and the caller (startup, wrap-up, on-demand).
+- Every existing trigger phrase still works, routing to the right mode — no relearning.
+- A documented interactive recurring check (=auto inbox zero=, =/loop=-based). The fully-unattended cron pass (=/schedule=) is vNext, not v1.
+- INDEX.org, protocols.org, and the startup/wrap-up callers reconciled to the new shape with no dangling references.
+- No behavior regression: the value gate, disposition rules, capture-guard, and reply discipline behave exactly as today.
+
+** Non-Goals
+- *Not* merging =triage-intake.org=. External-account triage ("what's new across my email/cal/PRs") is a different domain from "my inbox dirs"; keeping it distinct is correct, not redundancy.
+- *Not* merging =no-approvals.org=. It's a session mode, not an inbox workflow (it's referenced by the monitor cadence, not part of it).
+- *Not* changing value-gate semantics or disposition rules. This is a structural merge, behavior-preserving.
+- *Not* the domain-aware whole-roam-inbox routing (still deferred, unchanged).
+- *Not* the agent-neutral language sweep over these files — that is the parked half of the agent-source task and runs *after* this merge, over fewer files.
+- *Not* renaming =CLAUDE.md=, =.claude/=, or other structural paths.
+
+** Scope tiers
+- v1: merge =process-inbox= + =monitor-inbox= + =inbox-zero= into one =inbox.org= engine with =process= / =monitor= / =roam= modes; preserve all trigger phrases; reconcile INDEX + protocols + startup + wrap-up; add the interactive =auto inbox zero= recurring check (=/loop=).
+- Out of scope: =triage-intake= merge, =no-approvals= merge, domain-aware roam routing, the agent-neutrality sweep.
+- vNext (log to todo.org): the fully-unattended =/schedule= cron pass — needs its own contract (read-only vs may-mutate =todo.org= / =~/org/roam/inbox.org=, how a find surfaces when Craig is away, how dedup state survives across runs, auth/session constraints); a later umbrella unifying =triage-intake='s "what's new" with the inbox engine; the agent-neutrality pass over the consolidated =inbox.org=.
+
+* Design
+
+The consolidation produces one engine file, =inbox.org=, structured as a shared core plus three thin modes. The core holds every concept that today is duplicated or cross-referenced: the three-question value gate, the skeptical review (with the cross-project battery for shared-asset proposals), the disposition ladder (implement-now / fold / file / defer / reject-by-source / park), the reply-to-sender discipline, the capture-guard before any roam-inbox disk write, and the priority-scheme check before filing. A mode is a short front section that says which surface it reads, how it enters and exits, and which core steps it runs.
+
+*Two altitudes.*
+
+For the *user*: the trigger phrase picks the mode, and the phrases are unchanged. "process inbox" / "handle the inbox" → process mode (the local =inbox/= dir). "monitor the inbox" / "watch the inbox" → monitor mode (process mode on a loop, with the act-vs-file and reply discipline and the clean-tree/green-suite gates). "inbox zero" / "process the roam inbox" → roam mode (route the global roam inbox by =<project>:= prefix, sweep empties, capture-guard the write). Startup calls process mode for the local dir and the read-only roam nudge; wrap-up calls process mode then the roam sweep.
+
+For the *implementer*: =inbox.org= is one file. The core sections are written once. Each mode is a section that references core steps by name rather than restating them ("run the value gate (core §X) on each item", "guard and reconcile the roam write (core §Y)"). The old three files are deleted; their content is absorbed, not copied. The =triage-intake= engine and its plugins are untouched and keep their own namespace.
+
+*Routing and callers.* protocols.org's terminology section and the startup workflow's INDEX-driven routing both key off trigger phrases, so the phrase→mode map is the contract. Each caller that today names =process-inbox.org= / =monitor-inbox.org= / =inbox-zero.org= (startup Phase C, wrap-up Step 3, protocols, INDEX) is repointed at =inbox.org= and the relevant mode. INDEX gets one entry for =inbox.org= listing every trigger phrase, grouped by mode.
+
+*Auto inbox zero (the scheduled mode).* The trigger phrase =auto inbox zero= starts a recurring roam-mode pass. On invocation the engine *asks Craig for the interval* (e.g. 30 min, 2 hours), then drives the loop with =/loop <interval>= running roam mode. It's in-session and interactive by design — each cycle reports, and a find waits for Craig's go before any work happens. Per cycle:
+
+- *Nothing found* → no inbox summary. A single acknowledgement line: ran at =HH:MM=, nothing found. Nothing else.
+- *Items found* → summarize the found items, file them as tasks, and *append them to a displayed queue* (the harness task list, =TaskCreate=) so the queue accumulates across cycles. Then ask: "run this batch next?" If Craig says yes, the engine launches into implementing the found items (each through the normal disposition + verify flow); if no, they stay queued for a later go. Subsequent cycles add only newly-found items to the same displayed queue, never re-surfacing what's already there.
+
+The acknowledge-only-on-empty rule keeps a quiet inbox quiet — no noise when there's nothing to do — while a find is always surfaced and gated on Craig's yes. =auto inbox zero= is the interactive =/loop= shape because its execute step waits for a yes, so it is inherently in-session.
+
+A fully-unattended =/schedule= cron pass (firing while Craig is away) is a different contract and is *vNext, not v1*: it can't wait for a yes, so it has to decide up front whether it may mutate =todo.org= and the roam inbox or stays read-only, how a find reaches Craig asynchronously, how dedup state persists between runs that don't share a session, and what session/auth context a cron run carries. v1 ships only the interactive loop; the unattended contract is logged to =todo.org= for its own design pass.
+
+* Alternatives Considered
+
+** Option A — One engine with modes (chosen)
+- Good, because it cuts four inbox-named files to one and puts the shared logic in a single authoritative place, which is exactly the "too many" complaint.
+- Good, because every trigger phrase can re-home to a mode with no user relearning.
+- Bad, because =inbox.org= becomes a larger file with internal mode branching.
+- Neutral, because =triage-intake= and =no-approvals= stay separate either way.
+
+** Option B — Keep three files, extract a shared include
+- Good, because the diffs are smaller and the per-surface entry points stay familiar.
+- Bad, because it does not reduce the file count — Craig's actual complaint is the number of files, and this keeps three plus adds an include.
+- Neutral, because the dedup of logic happens, just without the count reduction.
+
+** Option C — Merge only process-inbox + monitor-inbox, leave inbox-zero
+- Good, because it fixes the tightest, least-ambiguous redundancy (monitor is literally a loop over process) at the lowest risk.
+- Bad, because roam vs local stays two files; the consolidation is partial (4→3, not 4→2).
+- Neutral, because it could be a first phase of Option A rather than a competing end state.
+
+** Option D — Do nothing, just document which file is which
+- Good, because zero risk to load-bearing synced workflows.
+- Bad, because it doesn't reduce the count at all; the complaint stands.
+
+* Decisions [4/4]
+
+** DONE Engine shape — one file with modes vs partial merge
+- Context: Option A (one =inbox.org=, 4→1) maximally addresses "too many" but is the biggest single change to load-bearing synced files. Option C (merge the process/monitor pair only, 4→3) is lower-risk and could be A's first phase.
+- Decision: We will build Option A — one =inbox.org= engine with =process= / =monitor= / =roam= modes. (Craig, 2026-06-23.)
+- Consequences: easier discovery and one home for the logic; harder single-file size and a bigger, higher-blast-radius diff, mitigated by the shared-core + thin-mode structure and the plugin-namespace escape hatch for a mode that wants depth.
+
+** DONE Trigger-phrase routing — preserve all existing phrases
+- Context: protocols + startup route by phrase; users have these in muscle memory.
+- Decision: We will keep every existing trigger phrase, re-homing each to its mode on the one engine, adding only the new =auto inbox zero= phrase. (Craig, 2026-06-23 — accepted as recommended.)
+- Consequences: easier — no relearning, no broken muscle memory; harder — the engine must document a longer phrase→mode table and guard against collisions.
+
+** DONE triage-intake stays separate
+- Context: external-account triage is a different surface; folding it in would re-bloat the engine.
+- Decision: We will leave =triage-intake.org= and its plugins untouched, out of this consolidation. (Craig, 2026-06-23 — accepted as recommended.)
+- Consequences: easier — smaller, coherent inbox engine; harder — two "what's arriving" entry points remain (inbox engine vs triage-intake), documented so the boundary is clear.
+
+** DONE Scheduled-check mechanism + behavior + keyword
+- Context: Craig wants a recurring inbox check with a keyword, an interactive find-then-execute flow, and a running queue.
+- Decision: The trigger phrase is =auto inbox zero=. On invocation it asks Craig for the interval, then runs roam-mode on =/loop <interval>=. Empty cycle → one acknowledgement line (ran at HH:MM, nothing found), no inbox summary. Find → summarize, file as tasks, append to the displayed task queue (=TaskCreate=), and ask "run this batch next?"; on yes, implement the found items; subsequent cycles append only new finds to the same queue. =/schedule= stays available for a fully-unattended pass. (Craig, 2026-06-23.)
+- Consequences: easier — a quiet inbox stays quiet, a find is always gated on a yes, and the queue is one accumulating view; harder — the loop must dedup against already-queued items so it doesn't re-surface them, and the in-session =/loop= shape means the unattended case still needs =/schedule=.
+
+* Review findings [2/2]
+
+** DONE Fully unattended scheduled behavior is not specified :blocking:
+Disposition: accepted via the narrow option. v1 ships only the interactive =auto inbox zero= (=/loop=); the fully-unattended =/schedule= pass is deferred to vNext with its open contract questions named (read-only vs may-mutate, surface-when-away, cross-run dedup state, auth/session). Folded into Summary, Goals, Problem/Context, Scope tiers (vNext), and the Design "Auto inbox zero" subsection; the vNext contract is logged to todo.org. This sequences the unattended pass rather than dropping it, preserving Decision 4's intent.
+The Summary and Goals promise a scheduled unattended inbox check with trigger keywords, but the concrete =auto inbox zero= design is intentionally interactive: it asks for an interval, runs =/loop <interval>= in the live session, and waits for Craig before executing found work. The only unattended behavior is the sentence that =/schedule= remains available for a fully-unattended cloud-cron pass. That leaves an implementer to invent the actual scheduled contract: trigger phrase(s), whether the pass is read-only or may mutate =todo.org= / =~/org/roam/inbox.org=, how findings are surfaced when Craig is away, how dedup state survives across runs, and what auth/session constraints apply. Add a distinct =/schedule= subsection and acceptance criteria for the fully unattended mode, or narrow the Summary/Goals to say v1 ships only the interactive =/loop= mode and log the unattended cron shape as vNext. (blocking)
+
+** DONE Stale-reference verification relies on a checker that does not check workflow links
+Phase 2 and the Risks section rely on the workflow-integrity / INDEX-drift check as the backstop for missed references to =process-inbox.org=, =monitor-inbox.org=, and =inbox-zero.org=. Current =scripts/workflow-integrity.py= checks INDEX coverage, script references, plugin parentage, orientation sections, and duplicate trigger phrases; it does not validate arbitrary =[[file:...org]]= workflow links or prose references in workflows/protocols/rules. That means a deleted-workflow link in =startup.org=, =wrap-it-up.org=, or =protocols.org= can survive the named checker. Keep the grep requirement, but make it an explicit acceptance item with the exact scope: at minimum =rg 'process-inbox|monitor-inbox|inbox-zero' claude-templates/.ai .ai claude-rules= after caller rewrites, allowing only intentional historical/spec/todo mentions. Optionally extend =workflow-integrity.py= to validate local workflow links, but do not imply it already catches this class. (non-blocking)
+
+Disposition: accepted. Added the exact grep as an acceptance item and a Phase 2 step, reworded the Risks "missed caller reference" dodge and the Dev-tooling readiness line so the integrity checker is no longer implied to validate workflow links, and noted the optional =workflow-integrity.py= extension as not-required-for-v1.
+
+* Implementation phases
+
+** Phase 1 — Author the inbox engine
+Write =inbox.org= (canonical =claude-templates/.ai/workflows/=): the shared core (value gate, skeptical review, disposition ladder, reply discipline, capture-guard, priority-scheme check) plus the three mode sections, absorbing the content of the three source files. No caller changes and no deletions yet — the tree still works with the old files in place, the new engine sits alongside for review.
+
+** Phase 2 — Reconcile callers and retire the old files
+Repoint INDEX.org (one =inbox.org= entry, phrases grouped by mode), protocols.org terminology, startup.org Phase C, and wrap-it-up.org Step 3 at =inbox.org= + mode. Delete =process-inbox.org=, =monitor-inbox.org=, =inbox-zero.org=. Grep for stale references — =rg 'process-inbox|monitor-inbox|inbox-zero' claude-templates/.ai .ai claude-rules= — and clear every live caller (the integrity checker covers INDEX coverage and trigger duplication, not workflow links). Run the workflow-integrity / INDEX-drift check. Sync the mirror.
+
+** Phase 3 — Auto inbox zero + scheduled check
+Add the =auto inbox zero= mode to =inbox.org=: ask-for-interval, =/loop <interval>= over roam mode, the empty-cycle acknowledgement, and the find → summarize → file → queue → ask-to-execute flow with cross-cycle dedup against the displayed queue. Document the =/schedule= recipe for the fully-unattended pass alongside it.
+
+** Phase 4 — Verify
+Trigger-phrase coverage (every old phrase resolves to a mode), startup + wrap-up dry-run against the new engine, capture-guard still gates the roam write, INDEX drift clean, mirror in sync.
+
+* Acceptance criteria
+- [ ] Every trigger phrase that today routes to =process-inbox= / =monitor-inbox= / =inbox-zero= resolves to a mode of =inbox.org=.
+- [ ] The three old workflow files are deleted and absent from INDEX; the integrity check reports no orphan or stale entry.
+- [ ] After caller rewrites, =rg 'process-inbox|monitor-inbox|inbox-zero' claude-templates/.ai .ai claude-rules= returns only intentional historical / spec / todo mentions — no live caller reference to a deleted file. (The integrity checker validates INDEX coverage, not arbitrary workflow links, so this grep is the real backstop.)
+- [ ] Startup still processes the local inbox and produces the read-only roam nudge; wrap-up still sweeps the project's roam items.
+- [ ] The capture-guard runs before any roam-inbox disk write in the consolidated engine.
+- [ ] The value gate, disposition ladder, and reply-to-sender discipline are present once and unchanged in behavior.
+- [ ] =auto inbox zero= asks for an interval, then runs roam mode on =/loop <interval>=.
+- [ ] An empty auto cycle emits only a timestamped acknowledgement (ran at HH:MM, nothing found) — no inbox summary.
+- [ ] A find summarizes the items, files them as tasks, appends them to the displayed queue, and asks before executing; "yes" runs the batch; later cycles append only newly-found items, never re-surfacing queued ones.
+- [ ] Canonical and mirror copies are in sync (=sync-check.sh=).
+
+* Readiness dimensions
+- Data model & ownership: the engine reads two files it doesn't own (project =inbox/= dir contents, =~/org/roam/inbox.org=) and writes =todo.org= + the roam file. Ownership unchanged from today; the merge moves no data.
+- Errors, empty states & failure: empty inbox → report and stop (preserved per surface); roam pull blocked or dirty → surface and stop, never auto-stash (preserved); live org-capture on the roam file → capture-guard blocks the write (preserved).
+- Security & privacy: N/A because no credentials or sensitive data; the engine moves task text between local files.
+- Observability: the user sees which mode ran and its disposition summary; INDEX drift check surfaces a mis-wired routing.
+- Performance & scale: N/A because the inputs are small text files triaged by hand-scale counts.
+- Reuse & lost opportunities: the whole point — the shared core is written once instead of three times; =triage-intake='s plugin pattern is intentionally not reused here (different surface).
+- Architecture fit & weak points: integration points are INDEX.org, protocols.org terminology, startup Phase C, wrap-up Step 3. Weak point: a missed caller reference to an old filename breaks routing — mitigated by the explicit stale-reference grep (acceptance item), since the integrity check covers INDEX coverage, not workflow links.
+- Config surface: trigger phrases (the phrase→mode table) and the scheduled-check keyword set + cron expression.
+- Documentation plan: the engine file is the doc; INDEX entry updated; protocols terminology updated. No separate user doc needed.
+- Dev tooling: the workflow-integrity check + startup INDEX-drift check cover INDEX coverage and trigger-phrase duplication; they do *not* validate workflow file links, so the stale-reference grep (acceptance item) is a manual step. Optionally extend =workflow-integrity.py= to validate local =[[file:...org]]= workflow links — not required for v1.
+- Rollout, compatibility & rollback: the merge lands via the template sync; =rsync --delete= removes the three retired files from every consuming project on its next startup, and the new =inbox.org= arrives the same pass. Rollback = git revert of the rulesets commit, then the next sync restores the old files. Trigger-phrase preservation is the compatibility guarantee.
+- External APIs & deps: N/A because no external API; =/schedule= and =/loop= are harness features, not deps.
+
+* Risks, Rabbit Holes, and Drawbacks
+- *Missed caller reference.* A lingering mention of =process-inbox.org= / =monitor-inbox.org= / =inbox-zero.org= in a workflow, protocol, skill, or the INDEX would break routing after the files are deleted. Dodge: =rg 'process-inbox|monitor-inbox|inbox-zero' claude-templates/.ai .ai claude-rules= and clear every live caller before deleting. The workflow-integrity checker validates INDEX coverage and trigger-phrase duplication, *not* arbitrary =[[file:...]]= links, so the grep — not the checker — is the real backstop here.
+- *Single-file sprawl.* One engine with three modes risks becoming the wall-of-text the workflows were split to avoid. Dodge: the shared-core + thin-mode structure and the terseness pass; if a mode wants real depth, it can become a =inbox.<mode>.org= plugin under the engine namespace (the same pattern =triage-intake= uses) rather than bloating the core.
+- *Sequencing with the agent-neutrality sweep.* If the neutrality sweep runs first, it edits three files about to be deleted. Dodge: this consolidation lands first by construction (it's why the sweep was parked).
+
+* Review and iteration history
+** 2026-06-23 Tue @ 21:51:51 -0400 — Claude — author
+- What: initial draft.
+- Why: Craig chose to spec the inbox-workflow consolidation before building a load-bearing 3-to-1 merge of synced workflows.
+- Artifacts: docs/inbox-workflow-consolidation-spec.org; todo.org "Consolidate inbox/triage workflows + scheduled inbox check".
+** 2026-06-23 Tue @ 22:05:00 -0400 — Craig — decision-maker
+- What: resolved all four decisions; added the =auto inbox zero= scheduled mode (ask-for-interval, empty-cycle acknowledgement only, find → summarize → file → displayed queue → ask-to-execute, cross-cycle dedup). Status → ready for review.
+- Why: chose Option A (4→1 engine) and specified the recurring-check behavior in full.
+- Artifacts: Decisions [4/4]; Design "Auto inbox zero" subsection.
+** 2026-06-23 Tue @ 22:15:58 -0400 — Codex — reviewer
+- What: spec-review pass rated the spec =Not ready= and added two findings: the fully unattended =/schedule= behavior is not specified, and stale-reference verification leans on a checker that does not validate workflow links.
+- Why: the current design is strong enough for the consolidation and interactive =/loop= mode, but the stated scheduled/unattended goal would force implementers to invent behavior before shipping.
+- Artifacts: Review findings [0/2].
+** 2026-06-23 Tue @ 22:28:00 -0400 — Claude — responder
+- What: both findings accepted and folded. Finding 1 (blocking) resolved by narrowing v1 to the interactive =auto inbox zero= (=/loop=) and deferring the fully-unattended =/schedule= contract to vNext with its open questions named — Summary, Goals, Problem/Context, Scope tiers, and Design updated. Finding 2 resolved by adding the exact stale-reference grep as an acceptance item + Phase 2 step and dropping the over-claim that the integrity checker validates workflow links. Findings [2/2], Decisions [4/4]; scope narrowed (not expanded), so no readiness-rubric rerun needed. Status → Ready.
+- Why: the scheduled/unattended promise outran what Craig actually specced (he detailed the interactive loop); sequencing the cron pass to vNext keeps v1 honest. The checker genuinely doesn't catch stale workflow links.
+- Artifacts: Decisions [4/4]; Review findings [2/2]; vNext task to be logged for the unattended cron contract.
diff --git a/docs/design/wrapup-routing-spec.org b/docs/specs/wrapup-routing-spec.org
index 434f8d9..1a150fc 100644
--- a/docs/design/wrapup-routing-spec.org
+++ b/docs/specs/wrapup-routing-spec.org
@@ -1,16 +1,23 @@
#+TITLE: Wrap-Up Inbox/Transcript Routing — Spec
#+AUTHOR: Craig Jennings
#+DATE: 2026-06-13
-#+TODO: TODO | DONE SUPERSEDED CANCELLED
+#+TODO: TODO | DONE
+#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED
+
+* DOING Wrap-Up Inbox/Transcript Routing — Spec
+:PROPERTIES:
+:ID: 00b47414-2213-4a99-be35-48ceb266fc08
+:END:
+- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to DOING (evidence-based, human-confirmed)
* Metadata
-| Status | Ready — review incorporated (spec-review, 2026-06-21) |
+| Status | doing |
|----------+-----------------------------------------------------|
| Owner | Craig Jennings |
|----------+-----------------------------------------------------|
| Reviewer | Codex (spec-review) |
|----------+-----------------------------------------------------|
-| Related | [[file:../../todo.org][todo.org: wrap-up routing task]] · [[file:2026-06-13-wrapup-inbox-transcript-routing-proposal.org][archsetup proposal]] |
+| Related | [[file:../../todo.org][todo.org: wrap-up routing task]] · [[file:../design/2026-06-13-wrapup-inbox-transcript-routing-proposal.org][archsetup proposal]] |
|----------+-----------------------------------------------------|
* Summary