feat(backlog): extract the execution loop into work-the-backlog.org

work-the-backlog.org now owns the autonomous execution loop: the mechanical eligibility gate, the four-item defer checklist, the per-task quality bar, and the run-cap kill switch, fed a task set, session mode, and cap by its callers. I stubbed the pre-flight Q&A, waiver read, end-of-set page, and metrics record with pointers to their phases. inbox.org's auto mode drops its execute step. Per-cycle item 3 routes and queues only, so the loop has one home. This is Phase 1 of the autonomous-batch execution spec.
author: Craig Jennings <c@cjennings.net> 2026-07-02 01:11:06 -0400
committer: Craig Jennings <c@cjennings.net> 2026-07-02 01:11:06 -0400
commit: d379a23953166a2c95d20da5098426cbf49a510f (patch)
tree: fe10f952e9d329b4cc8a82ba2b289db1398df9d4
parent: 9ad415d654a0b2751c4472aeeb10571239f795fc (diff)
download: rulesets-d379a23953166a2c95d20da5098426cbf49a510f.tar.gz
rulesets-d379a23953166a2c95d20da5098426cbf49a510f.zip
7 files changed, 304 insertions, 12 deletions
diff --git a/.ai/workflows/INDEX.org b/.ai/workflows/INDEX.org
index a474b29..8d50577 100644
--- a/.ai/workflows/INDEX.org
+++ b/.ai/workflows/INDEX.org
@@ -54,6 +54,9 @@ This index must list every =.org= file in =.ai/workflows/= except this one and e
   - Roam-mode triggers: "inbox zero", "empty the inbox", "process the roam inbox", "triage my roam inbox"
   - Auto-mode trigger: "auto inbox zero" (match before "inbox zero")
 
+- =work-the-backlog.org= — the autonomous task-execution loop, the single home for working a batch of marked tasks unattended: takes an ordered task set (explicit list or tag query) + session mode (=file-only= default / =autonomous-commit= + paging) + a hard run cap; each candidate passes the mechanical eligibility gate (status =TODO= + =:solo:= per the project's scheme header) and the four-item defer checklist, then is implemented to the full quality bar (TDD, =/review-code=, =/voice=) as its own logical commits. Fed by the inbox auto-loop and the no-approvals speedrun preset (caller wiring lands with that feature's Phase 2).
+  - Triggers: "work the backlog", "work the backlog with <task set>" (manual fallback — the callers invoke it directly)
+
 ** Calendar
 
 - =add-calendar-event.org= — create a calendar event.
diff --git a/.ai/workflows/inbox.org b/.ai/workflows/inbox.org
index ea45ae3..60ae8ff 100644
--- a/.ai/workflows/inbox.org
+++ b/.ai/workflows/inbox.org
@@ -461,18 +461,16 @@ Take these up when the single-destination version is in use and the multi-projec
 
 * Mode: auto inbox zero
 
-A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop <interval>= running roam mode. It is in-session and interactive by design — each cycle reports, and a find waits for Craig's go before any work happens.
+A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop <interval>= running roam mode. It is in-session and interactive by design — each cycle reports what it found and filed.
 
 ** Per cycle
 
 1. Run roam mode's scan (Phase A local check + Phase B roam scan), read-only — no =git pull=. The capture-guard still gates any write: use =capture-guard --wait= (core §5) so a transient capture clears itself; if it's still open after the wait, *defer this cycle's roam reconcile to the next cycle* rather than surfacing — the loop cadence is the retry, and the filed items get swept next time. The rare write hands its git to =roam-sync= (roam Phase D).
 2. *Nothing found* → no inbox summary. One acknowledgement line: =ran at HH:MM, nothing found=. Nothing else. The acknowledge-only-on-empty rule keeps a quiet inbox quiet.
-3. *Items found* → summarize the found items, file them as tasks (roam Phase C), and *append them to a displayed queue* — the harness task list, via =TaskCreate= — so the queue accumulates across cycles. Then ask: "run this batch next?"
-   - *Yes* → launch into implementing the found items, each through the normal disposition ladder (core §3) + verify flow.
-   - *No* → they stay queued for a later go.
+3. *Items found* → summarize the found items, file them as tasks (roam Phase C), and *append them to a displayed queue* — the harness task list, via =TaskCreate= — so the queue accumulates across cycles. Routing is where this mode stops: the execution loop lives in =work-the-backlog.org=, its one home, and this mode never implements anything itself.
 4. *Cross-cycle dedup.* Subsequent cycles add only *newly-found* items to the same displayed queue, never re-surfacing what's already there. Dedup against the queue (the =TaskCreate= list), not against what's already been implemented — a find that was queued-but-not-yet-run must not reappear, and one already filed into =todo.org= is dropped by roam Phase C's status check.
 
-A find is always surfaced and gated on Craig's yes; a quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because its execute step waits for a yes.
+A find is always surfaced and filed; a quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because each cycle reports to Craig interactively.
 
 ** Fully-unattended pass (=/schedule=) — vNext, not v1
 
diff --git a/.ai/workflows/work-the-backlog.org b/.ai/workflows/work-the-backlog.org
new file mode 100644
index 0000000..8b69d8e
--- /dev/null
+++ b/.ai/workflows/work-the-backlog.org
@@ -0,0 +1,145 @@
+#+TITLE: Work the Backlog
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-07-02
+
+* Overview
+
+The single home for the autonomous task-execution loop: take a set of marked, solo-doable tasks from the project's =todo.org= and work them unattended, each held to the full quality bar, under a fixed safety contract. Spec: =rulesets/docs/specs/2026-06-16-autonomous-batch-execution-spec.org=.
+
+Two callers feed it, differing only in how they build the task set and which session mode they pass:
+
+- The *inbox auto-loop* (=inbox.org= auto mode) chains here after its routing completes, with a tag/priority query, file-only mode, cap 1.
+- The *no-approvals speedrun* preset feeds an explicit ordered list with autonomous-commit + always-push + paging-on, after a pre-flight Q&A that front-loads every decision.
+
+This workflow owns the execution logic — eligibility gate, defer checklist, quality bar, run cap. Callers own input assembly and mode selection. Capture-routing (inbox surfaces) stays entirely in =inbox.org=; this file never reads an inbox.
+
+* When to Use This Workflow
+
+Invoked by its callers, not usually by phrase. The caller wiring (the auto-loop chain step and the speedrun preset's trigger phrases) lands with the feature's Phase 2.
+
+Manual fallback: "work the backlog" / "work the backlog with <task set>" — gather the three inputs below (ask for whichever are missing, defaulting to file-only mode; default cap is the list length for an explicit set, 1 for a query) and run the loop.
+
+* Inputs — the caller contract
+
+A caller hands this workflow three things:
+
+1. *A task set* — an ordered list of candidate task headings from the project's =todo.org=. Either an explicit ordered list (speedrun) or the result of a tag/priority query (the loop). The loop does not care how the set was assembled; it receives an ordered list of candidates.
+2. *A session mode* — two orthogonal flags:
+   - *Commit autonomy:* =file-only= (default) or =autonomous-commit=. See "Commit autonomy" below.
+   - *Paging:* on or off. End-of-set only.
+3. *A run cap* — the hard maximum number of tasks to complete this run.
+
+It returns a per-task outcome and a run summary.
+
+* Outcomes — the per-task vocabulary
+
+Every task in the set ends in exactly one of:
+
+- =implemented-committed= — implemented, committed (and pushed per the project's flow) under =autonomous-commit=.
+- =implemented-diff-surfaced= — implemented, diff surfaced, *not* committed (=file-only=).
+- =deferred-VERIFY= — a defer-checklist hit; a =VERIFY= filed naming what's missing or risky.
+- =dropped-by-craig= — removed from the run at the speedrun pre-flight Q&A ("skip this").
+- =skipped-ineligible= — failed the mechanical eligibility gate.
+
+The run summary lists each task with its outcome, plus the remaining set when the cap stopped the run.
+
+* The loop
+
+For the task set, in order, until the run cap is hit:
+
+1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task.
+2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist.
+3. *Defer checklist* (below). Any hit → defer: file the =VERIFY= naming the gap and record =deferred-VERIFY= (or, under the speedrun preset, route a quick-question gap to the pre-flight Q&A), next task.
+4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped.
+5. *Commit autonomy branch:*
+   - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=.
+   - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=.
+6. *Record metrics* for the task (the JSONL append — lands with the feature's Phase 5).
+7. Decrement the cap. At zero, stop.
+
+After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary either way.
+
+* Eligibility gate — mechanical, no judgment
+
+A task is autonomous-safe when *both* hold. This layer is a lookup, not a judgment; all the judgment lives in the defer checklist.
+
+1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= marks "awaiting Craig's input"; auto-implementing one defeats the check it represents. The do-not-implement set is safe-by-omission: anything not plainly =TODO= (plus any project-declared "hold" marker) is out.
+2. *Tagged =:solo:=* — the autonomy tag, resolved against the project's priority/tag scheme header in =todo.org= (never hardcoded). =:solo:= carries the hard definition in =todo-format.md=: completable and verifiable without Craig beyond at most one or two quick decisions answerable up front, no design deliberation. A project whose scheme declares a different autonomous-safe tag set overrides the default.
+
+Priority and =:next:= drive *ordering* within the eligible set, not eligibility ([#A] before [#B] before [#C], then the author's ordering). =:quick:= is an effort hint for batching and duration estimates — never a gate.
+
+Task *size* is deliberately absent from this gate. A large but well-specified, decision-free task is in scope and gets decomposed into per-logical-commit chunks during implementation. Size never sends a task away; only *deliberation* or *risk* does (the checklist below).
+
+*No scheme header → don't run.* The gate reads =:solo:= semantics from the project's scheme header; a =todo.org= without one leaves the tag undefined (=todo-format.md= makes the header mandatory). Surface that the header is missing and stop rather than guessing eligibility.
+
+* The defer checklist — act vs file
+
+After the scope read, run each eligible candidate through the checklist. Each item is a concrete, answerable question, not an adjective. *Any* hit — or any "unsure" — defers the task. Only a task that clears every item is implemented.
+
+1. *Test-writability (the keystone).* Can I write the failing test from the task text — plus any decisions gathered up front — without inventing a requirement? *No / unsure* → underspecified. Under the speedrun preset, if the gap is one or two quick answerable questions, route it to the pre-flight Q&A; otherwise file a =VERIFY= noting what's missing. Under the unattended loop, file the =VERIFY= (no one to ask).
+2. *Data-loss / irreversible / external operation.* Does implementing it require any of: =rm= of non-scratch data, =git reset --hard= / force-push, =DROP= / =DELETE= / =TRUNCATE=, file truncate/overwrite of persisted content, a schema or data migration, any external or shared-state mutation, any credential touch? *Yes* → do NOT implement; file a =VERIFY= naming the risk. This is the hard safety gate; an upfront answer never overrides it without an explicit checkpoint.
+3. *Already-satisfied.* Does the scope read show the desired end-state already holds? *Yes* → file a =VERIFY= noting it and move on. Don't make a no-op change.
+4. *Design deliberation.* Does the task carry an unresolved design question, a "weigh these approaches" with real tradeoffs, or a TBD that isn't a quick factual answer? *Yes* → under the speedrun preset, if it collapses to one or two quick questions, route to the pre-flight Q&A; otherwise file and surface as a =/start-work= candidate. Under the loop, file. The discriminator is *quick-answerable question* vs *deliberation* — never task size.
+
+When genuinely unsure which side a task falls on, defer — a wrong auto-implement costs a revert *and* the next-session correction.
+
+The speedrun pre-flight Q&A this checklist routes to (gather → classify → order → intro → batch-ask → skip/answer) lands with the feature's Phase 4; until then a quick-question gap defers like any other hit.
+
+* Per-task quality bar
+
+Autonomy changes who approves, not what quality means. Per task, non-negotiable:
+
+- *TDD* per =testing.md=: red first, green, refactor. The keystone checklist item already proved the failing test is writable.
+- *Verification* per =verification.md=: fresh evidence, full suite green before any commit.
+- *=/review-code --staged=* before every commit; Critical and Important findings block until fixed.
+- *=/voice personal=* on every commit message on the =autonomous-commit= path (or the patterns walked inline if the skill is unavailable), message printed inline so the log shows what landed.
+- *Task closure* per =todo-format.md=: depth-based completion (keyword + =CLOSED:= at level 2, dated rewrite at level 3+).
+- *One logical change per commit.* A large task becomes several commits, not one omnibus.
+
+* Commit autonomy
+
+=file-only= is the default: surface the diff, never commit. =autonomous-commit= is honored only when the project carries the per-project commit-autonomy waiver; absent the waiver, the request degrades to =file-only= and says so. The waiver read (location, the "has waiver" vs "loop may commit" split) and the degrade contract land with the feature's Phase 3 — until then, treat every run as =file-only= unless Craig has put the session in no-approvals mode himself.
+
+* Bounding the run
+
+The cap is a hard per-run task ceiling passed by the caller — the kill switch a runaway can't exceed:
+
+- *Loop caller default: 1.* Implement the highest-priority eligible candidate, record, stop; the next tick continues.
+- *Speedrun: the length of the explicit list*, capped at a ceiling — the human bounded the set by naming it.
+
+Even the speedrun stops at the cap and surfaces (and, with paging on, pages) the remainder. The cap bounds task *count*, not cost; a token budget is logged as vNext.
+
+* End-of-set page
+
+With paging on, fire one page when the set is done or the cap is hit — end-of-set only, never per-task:
+
+#+begin_src sh
+notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>" --persist
+#+end_src
+
+=--persist= keeps it on screen until dismissed. The message carries the project, completed count, and remaining count so Craig can confirm ready + name the next project in one reply. Full paging detail lands with the feature's Phase 4.
+
+* Metrics
+
+Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl= (git-tracked). Logging is a side effect that never alters execution. The record schema lands with the feature's Phase 5.
+
+* Common Mistakes
+
+1. *Implementing a =VERIFY= or =DOING= task.* The gate is status =TODO= only — a =VERIFY= exists precisely because Craig's input is pending.
+2. *Treating =:quick:= as eligibility.* It's an effort hint. =:solo:= is the gate.
+3. *Deferring on size.* A large, well-specified, decision-free task runs — decomposed into logical commits. Size is not a checklist item.
+4. *Guessing past the keystone.* If the failing test isn't writable from the task text, the task isn't ready. Inventing the requirement is the failure the checklist exists to stop.
+5. *Rationalizing through the data-loss list.* "The migration is small" doesn't clear checklist item 2. Enumerated operations defer, full stop.
+6. *Committing in =file-only= mode.* The diff is the deliverable; the commit is Craig's.
+7. *One omnibus commit for the whole run.* Every logical change is its own reviewed commit.
+8. *Skipping =/review-code= or =/voice= because nobody's watching.* Autonomy removes interaction gates, never engineering-discipline gates (same contract as =no-approvals.org=).
+9. *Running past the cap.* The cap is the kill switch; hitting it means stop and surface, even mid-set.
+10. *Paging per-task.* One page, end of set.
+
+* Living Document
+
+Refine as the dogfooding signal arrives — the metrics log and the corrections-in-next-session signal are the feedback loop. Fold recurring adjustments in rather than accumulating caller-side workarounds.
+
+* History
+
+Created 2026-07-02 as Phase 1 of the autonomous-batch execution spec, reconciling the inbox-zero "Phase E" proposal and the =.emacs.d= speedrun proposal into one execution loop. The auto-inbox-zero execute step in =inbox.org= reverted to routing-only in the same change so this file is the loop's only home.
diff --git a/claude-templates/.ai/workflows/INDEX.org b/claude-templates/.ai/workflows/INDEX.org
index a474b29..8d50577 100644
--- a/claude-templates/.ai/workflows/INDEX.org
+++ b/claude-templates/.ai/workflows/INDEX.org
@@ -54,6 +54,9 @@ This index must list every =.org= file in =.ai/workflows/= except this one and e
   - Roam-mode triggers: "inbox zero", "empty the inbox", "process the roam inbox", "triage my roam inbox"
   - Auto-mode trigger: "auto inbox zero" (match before "inbox zero")
 
+- =work-the-backlog.org= — the autonomous task-execution loop, the single home for working a batch of marked tasks unattended: takes an ordered task set (explicit list or tag query) + session mode (=file-only= default / =autonomous-commit= + paging) + a hard run cap; each candidate passes the mechanical eligibility gate (status =TODO= + =:solo:= per the project's scheme header) and the four-item defer checklist, then is implemented to the full quality bar (TDD, =/review-code=, =/voice=) as its own logical commits. Fed by the inbox auto-loop and the no-approvals speedrun preset (caller wiring lands with that feature's Phase 2).
+  - Triggers: "work the backlog", "work the backlog with <task set>" (manual fallback — the callers invoke it directly)
+
 ** Calendar
 
 - =add-calendar-event.org= — create a calendar event.
diff --git a/claude-templates/.ai/workflows/inbox.org b/claude-templates/.ai/workflows/inbox.org
index ea45ae3..60ae8ff 100644
--- a/claude-templates/.ai/workflows/inbox.org
+++ b/claude-templates/.ai/workflows/inbox.org
@@ -461,18 +461,16 @@ Take these up when the single-destination version is in use and the multi-projec
 
 * Mode: auto inbox zero
 
-A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop <interval>= running roam mode. It is in-session and interactive by design — each cycle reports, and a find waits for Craig's go before any work happens.
+A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop <interval>= running roam mode. It is in-session and interactive by design — each cycle reports what it found and filed.
 
 ** Per cycle
 
 1. Run roam mode's scan (Phase A local check + Phase B roam scan), read-only — no =git pull=. The capture-guard still gates any write: use =capture-guard --wait= (core §5) so a transient capture clears itself; if it's still open after the wait, *defer this cycle's roam reconcile to the next cycle* rather than surfacing — the loop cadence is the retry, and the filed items get swept next time. The rare write hands its git to =roam-sync= (roam Phase D).
 2. *Nothing found* → no inbox summary. One acknowledgement line: =ran at HH:MM, nothing found=. Nothing else. The acknowledge-only-on-empty rule keeps a quiet inbox quiet.
-3. *Items found* → summarize the found items, file them as tasks (roam Phase C), and *append them to a displayed queue* — the harness task list, via =TaskCreate= — so the queue accumulates across cycles. Then ask: "run this batch next?"
-   - *Yes* → launch into implementing the found items, each through the normal disposition ladder (core §3) + verify flow.
-   - *No* → they stay queued for a later go.
+3. *Items found* → summarize the found items, file them as tasks (roam Phase C), and *append them to a displayed queue* — the harness task list, via =TaskCreate= — so the queue accumulates across cycles. Routing is where this mode stops: the execution loop lives in =work-the-backlog.org=, its one home, and this mode never implements anything itself.
 4. *Cross-cycle dedup.* Subsequent cycles add only *newly-found* items to the same displayed queue, never re-surfacing what's already there. Dedup against the queue (the =TaskCreate= list), not against what's already been implemented — a find that was queued-but-not-yet-run must not reappear, and one already filed into =todo.org= is dropped by roam Phase C's status check.
 
-A find is always surfaced and gated on Craig's yes; a quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because its execute step waits for a yes.
+A find is always surfaced and filed; a quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because each cycle reports to Craig interactively.
 
 ** Fully-unattended pass (=/schedule=) — vNext, not v1
 
diff --git a/claude-templates/.ai/workflows/work-the-backlog.org b/claude-templates/.ai/workflows/work-the-backlog.org
new file mode 100644
index 0000000..8b69d8e
--- /dev/null
+++ b/claude-templates/.ai/workflows/work-the-backlog.org
@@ -0,0 +1,145 @@
+#+TITLE: Work the Backlog
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-07-02
+
+* Overview
+
+The single home for the autonomous task-execution loop: take a set of marked, solo-doable tasks from the project's =todo.org= and work them unattended, each held to the full quality bar, under a fixed safety contract. Spec: =rulesets/docs/specs/2026-06-16-autonomous-batch-execution-spec.org=.
+
+Two callers feed it, differing only in how they build the task set and which session mode they pass:
+
+- The *inbox auto-loop* (=inbox.org= auto mode) chains here after its routing completes, with a tag/priority query, file-only mode, cap 1.
+- The *no-approvals speedrun* preset feeds an explicit ordered list with autonomous-commit + always-push + paging-on, after a pre-flight Q&A that front-loads every decision.
+
+This workflow owns the execution logic — eligibility gate, defer checklist, quality bar, run cap. Callers own input assembly and mode selection. Capture-routing (inbox surfaces) stays entirely in =inbox.org=; this file never reads an inbox.
+
+* When to Use This Workflow
+
+Invoked by its callers, not usually by phrase. The caller wiring (the auto-loop chain step and the speedrun preset's trigger phrases) lands with the feature's Phase 2.
+
+Manual fallback: "work the backlog" / "work the backlog with <task set>" — gather the three inputs below (ask for whichever are missing, defaulting to file-only mode; default cap is the list length for an explicit set, 1 for a query) and run the loop.
+
+* Inputs — the caller contract
+
+A caller hands this workflow three things:
+
+1. *A task set* — an ordered list of candidate task headings from the project's =todo.org=. Either an explicit ordered list (speedrun) or the result of a tag/priority query (the loop). The loop does not care how the set was assembled; it receives an ordered list of candidates.
+2. *A session mode* — two orthogonal flags:
+   - *Commit autonomy:* =file-only= (default) or =autonomous-commit=. See "Commit autonomy" below.
+   - *Paging:* on or off. End-of-set only.
+3. *A run cap* — the hard maximum number of tasks to complete this run.
+
+It returns a per-task outcome and a run summary.
+
+* Outcomes — the per-task vocabulary
+
+Every task in the set ends in exactly one of:
+
+- =implemented-committed= — implemented, committed (and pushed per the project's flow) under =autonomous-commit=.
+- =implemented-diff-surfaced= — implemented, diff surfaced, *not* committed (=file-only=).
+- =deferred-VERIFY= — a defer-checklist hit; a =VERIFY= filed naming what's missing or risky.
+- =dropped-by-craig= — removed from the run at the speedrun pre-flight Q&A ("skip this").
+- =skipped-ineligible= — failed the mechanical eligibility gate.
+
+The run summary lists each task with its outcome, plus the remaining set when the cap stopped the run.
+
+* The loop
+
+For the task set, in order, until the run cap is hit:
+
+1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task.
+2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist.
+3. *Defer checklist* (below). Any hit → defer: file the =VERIFY= naming the gap and record =deferred-VERIFY= (or, under the speedrun preset, route a quick-question gap to the pre-flight Q&A), next task.
+4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped.
+5. *Commit autonomy branch:*
+   - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=.
+   - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=.
+6. *Record metrics* for the task (the JSONL append — lands with the feature's Phase 5).
+7. Decrement the cap. At zero, stop.
+
+After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary either way.
+
+* Eligibility gate — mechanical, no judgment
+
+A task is autonomous-safe when *both* hold. This layer is a lookup, not a judgment; all the judgment lives in the defer checklist.
+
+1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= marks "awaiting Craig's input"; auto-implementing one defeats the check it represents. The do-not-implement set is safe-by-omission: anything not plainly =TODO= (plus any project-declared "hold" marker) is out.
+2. *Tagged =:solo:=* — the autonomy tag, resolved against the project's priority/tag scheme header in =todo.org= (never hardcoded). =:solo:= carries the hard definition in =todo-format.md=: completable and verifiable without Craig beyond at most one or two quick decisions answerable up front, no design deliberation. A project whose scheme declares a different autonomous-safe tag set overrides the default.
+
+Priority and =:next:= drive *ordering* within the eligible set, not eligibility ([#A] before [#B] before [#C], then the author's ordering). =:quick:= is an effort hint for batching and duration estimates — never a gate.
+
+Task *size* is deliberately absent from this gate. A large but well-specified, decision-free task is in scope and gets decomposed into per-logical-commit chunks during implementation. Size never sends a task away; only *deliberation* or *risk* does (the checklist below).
+
+*No scheme header → don't run.* The gate reads =:solo:= semantics from the project's scheme header; a =todo.org= without one leaves the tag undefined (=todo-format.md= makes the header mandatory). Surface that the header is missing and stop rather than guessing eligibility.
+
+* The defer checklist — act vs file
+
+After the scope read, run each eligible candidate through the checklist. Each item is a concrete, answerable question, not an adjective. *Any* hit — or any "unsure" — defers the task. Only a task that clears every item is implemented.
+
+1. *Test-writability (the keystone).* Can I write the failing test from the task text — plus any decisions gathered up front — without inventing a requirement? *No / unsure* → underspecified. Under the speedrun preset, if the gap is one or two quick answerable questions, route it to the pre-flight Q&A; otherwise file a =VERIFY= noting what's missing. Under the unattended loop, file the =VERIFY= (no one to ask).
+2. *Data-loss / irreversible / external operation.* Does implementing it require any of: =rm= of non-scratch data, =git reset --hard= / force-push, =DROP= / =DELETE= / =TRUNCATE=, file truncate/overwrite of persisted content, a schema or data migration, any external or shared-state mutation, any credential touch? *Yes* → do NOT implement; file a =VERIFY= naming the risk. This is the hard safety gate; an upfront answer never overrides it without an explicit checkpoint.
+3. *Already-satisfied.* Does the scope read show the desired end-state already holds? *Yes* → file a =VERIFY= noting it and move on. Don't make a no-op change.
+4. *Design deliberation.* Does the task carry an unresolved design question, a "weigh these approaches" with real tradeoffs, or a TBD that isn't a quick factual answer? *Yes* → under the speedrun preset, if it collapses to one or two quick questions, route to the pre-flight Q&A; otherwise file and surface as a =/start-work= candidate. Under the loop, file. The discriminator is *quick-answerable question* vs *deliberation* — never task size.
+
+When genuinely unsure which side a task falls on, defer — a wrong auto-implement costs a revert *and* the next-session correction.
+
+The speedrun pre-flight Q&A this checklist routes to (gather → classify → order → intro → batch-ask → skip/answer) lands with the feature's Phase 4; until then a quick-question gap defers like any other hit.
+
+* Per-task quality bar
+
+Autonomy changes who approves, not what quality means. Per task, non-negotiable:
+
+- *TDD* per =testing.md=: red first, green, refactor. The keystone checklist item already proved the failing test is writable.
+- *Verification* per =verification.md=: fresh evidence, full suite green before any commit.
+- *=/review-code --staged=* before every commit; Critical and Important findings block until fixed.
+- *=/voice personal=* on every commit message on the =autonomous-commit= path (or the patterns walked inline if the skill is unavailable), message printed inline so the log shows what landed.
+- *Task closure* per =todo-format.md=: depth-based completion (keyword + =CLOSED:= at level 2, dated rewrite at level 3+).
+- *One logical change per commit.* A large task becomes several commits, not one omnibus.
+
+* Commit autonomy
+
+=file-only= is the default: surface the diff, never commit. =autonomous-commit= is honored only when the project carries the per-project commit-autonomy waiver; absent the waiver, the request degrades to =file-only= and says so. The waiver read (location, the "has waiver" vs "loop may commit" split) and the degrade contract land with the feature's Phase 3 — until then, treat every run as =file-only= unless Craig has put the session in no-approvals mode himself.
+
+* Bounding the run
+
+The cap is a hard per-run task ceiling passed by the caller — the kill switch a runaway can't exceed:
+
+- *Loop caller default: 1.* Implement the highest-priority eligible candidate, record, stop; the next tick continues.
+- *Speedrun: the length of the explicit list*, capped at a ceiling — the human bounded the set by naming it.
+
+Even the speedrun stops at the cap and surfaces (and, with paging on, pages) the remainder. The cap bounds task *count*, not cost; a token budget is logged as vNext.
+
+* End-of-set page
+
+With paging on, fire one page when the set is done or the cap is hit — end-of-set only, never per-task:
+
+#+begin_src sh
+notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>" --persist
+#+end_src
+
+=--persist= keeps it on screen until dismissed. The message carries the project, completed count, and remaining count so Craig can confirm ready + name the next project in one reply. Full paging detail lands with the feature's Phase 4.
+
+* Metrics
+
+Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl= (git-tracked). Logging is a side effect that never alters execution. The record schema lands with the feature's Phase 5.
+
+* Common Mistakes
+
+1. *Implementing a =VERIFY= or =DOING= task.* The gate is status =TODO= only — a =VERIFY= exists precisely because Craig's input is pending.
+2. *Treating =:quick:= as eligibility.* It's an effort hint. =:solo:= is the gate.
+3. *Deferring on size.* A large, well-specified, decision-free task runs — decomposed into logical commits. Size is not a checklist item.
+4. *Guessing past the keystone.* If the failing test isn't writable from the task text, the task isn't ready. Inventing the requirement is the failure the checklist exists to stop.
+5. *Rationalizing through the data-loss list.* "The migration is small" doesn't clear checklist item 2. Enumerated operations defer, full stop.
+6. *Committing in =file-only= mode.* The diff is the deliverable; the commit is Craig's.
+7. *One omnibus commit for the whole run.* Every logical change is its own reviewed commit.
+8. *Skipping =/review-code= or =/voice= because nobody's watching.* Autonomy removes interaction gates, never engineering-discipline gates (same contract as =no-approvals.org=).
+9. *Running past the cap.* The cap is the kill switch; hitting it means stop and surface, even mid-set.
+10. *Paging per-task.* One page, end of set.
+
+* Living Document
+
+Refine as the dogfooding signal arrives — the metrics log and the corrections-in-next-session signal are the feedback loop. Fold recurring adjustments in rather than accumulating caller-side workarounds.
+
+* History
+
+Created 2026-07-02 as Phase 1 of the autonomous-batch execution spec, reconciling the inbox-zero "Phase E" proposal and the =.emacs.d= speedrun proposal into one execution loop. The auto-inbox-zero execute step in =inbox.org= reverted to routing-only in the same change so this file is the loop's only home.
diff --git a/todo.org b/todo.org
index 546f8ad..1f2207d 100644
--- a/todo.org
+++ b/todo.org
@@ -459,8 +459,8 @@ Craig ratified all eight decisions in [[file:docs/specs/2026-06-16-autonomous-ba
 *** 2026-07-02 Thu @ 00:44:59 -0400 spec-response decomposition — :SPEC_ID: bound, spec DOING
 Stamped the spec's UUID on this parent, broke Phases 1-6 into the build tasks below (plus the flip task and a live-trial validation child), and flipped the spec's status heading READY → DOING per the transition-ownership table.
 
-*** TODO [#C] Phase 1 — extract the loop into work-the-backlog.org :feature:solo:
-Write work-the-backlog.org holding the eligibility gate (mechanical tag read: :solo: present, no do-not-auto-implement marker), the defer checklist, the per-task quality bar (TDD red→green, /review-code, /voice on the commit), and run-cap logic — inputs: task set + session mode + cap. In the same change, revert inbox.org's "auto inbox zero" execute step (per-cycle item 3's "Yes → launch into implementing...") to routing-only so the execution loop has one home. Spec Phase 1. Verify: make test green, sync clean; the new workflow is callable but nothing invokes it yet.
+*** 2026-07-02 Thu @ 01:07:29 -0400 Phase 1 landed — execution loop extracted into work-the-backlog.org
+work-the-backlog.org written (canonical + mirror): caller contract (task set + session mode + cap), five-outcome vocabulary, the loop, mechanical eligibility gate (TODO + :solo: per scheme header, safe-by-omission, no-scheme-header → don't run), four-item defer checklist, per-task quality bar, cap/kill-switch semantics, page + metrics stubs pointing at Phases 4-5. inbox.org's auto-mode per-cycle item 3 reverted to routing-only (yes-path execution removed; mode intro + closing line updated to match). INDEX.org entry added. make test green, sync clean; nothing invokes the new workflow yet.
 
 *** TODO [#C] Phase 2 — wire the two callers :feature:solo:
 Auto-inbox-zero's "run this batch next?" yes-path invokes work-the-backlog (tag query + file-only + cap 1); the no-approvals speedrun preset (trigger phrases "speedrun" / "no approvals speedrun") feeds it an explicit ordered list + autonomous-commit + always-push + paging-on, running the pre-flight Q&A first. Spec Phase 2 / D4 / D7. Verify: each caller independently exercisable.
author	Craig Jennings <c@cjennings.net>	2026-07-02 01:11:06 -0400
committer	Craig Jennings <c@cjennings.net>	2026-07-02 01:11:06 -0400
commit	d379a23953166a2c95d20da5098426cbf49a510f (patch)
tree	fe10f952e9d329b4cc8a82ba2b289db1398df9d4
parent	9ad415d654a0b2751c4472aeeb10571239f795fc (diff)
download	rulesets-d379a23953166a2c95d20da5098426cbf49a510f.tar.gz rulesets-d379a23953166a2c95d20da5098426cbf49a510f.zip