From d379a23953166a2c95d20da5098426cbf49a510f Mon Sep 17 00:00:00 2001 From: Craig Jennings Date: Thu, 2 Jul 2026 01:11:06 -0400 Subject: feat(backlog): extract the execution loop into work-the-backlog.org work-the-backlog.org now owns the autonomous execution loop: the mechanical eligibility gate, the four-item defer checklist, the per-task quality bar, and the run-cap kill switch, fed a task set, session mode, and cap by its callers. I stubbed the pre-flight Q&A, waiver read, end-of-set page, and metrics record with pointers to their phases. inbox.org's auto mode drops its execute step. Per-cycle item 3 routes and queues only, so the loop has one home. This is Phase 1 of the autonomous-batch execution spec. --- .ai/workflows/INDEX.org | 3 + .ai/workflows/inbox.org | 8 +- .ai/workflows/work-the-backlog.org | 145 +++++++++++++++++++++ claude-templates/.ai/workflows/INDEX.org | 3 + claude-templates/.ai/workflows/inbox.org | 8 +- .../.ai/workflows/work-the-backlog.org | 145 +++++++++++++++++++++ todo.org | 4 +- 7 files changed, 304 insertions(+), 12 deletions(-) create mode 100644 .ai/workflows/work-the-backlog.org create mode 100644 claude-templates/.ai/workflows/work-the-backlog.org diff --git a/.ai/workflows/INDEX.org b/.ai/workflows/INDEX.org index a474b29..8d50577 100644 --- a/.ai/workflows/INDEX.org +++ b/.ai/workflows/INDEX.org @@ -54,6 +54,9 @@ This index must list every =.org= file in =.ai/workflows/= except this one and e - Roam-mode triggers: "inbox zero", "empty the inbox", "process the roam inbox", "triage my roam inbox" - Auto-mode trigger: "auto inbox zero" (match before "inbox zero") +- =work-the-backlog.org= — the autonomous task-execution loop, the single home for working a batch of marked tasks unattended: takes an ordered task set (explicit list or tag query) + session mode (=file-only= default / =autonomous-commit= + paging) + a hard run cap; each candidate passes the mechanical eligibility gate (status =TODO= + =:solo:= per the project's scheme header) and the four-item defer checklist, then is implemented to the full quality bar (TDD, =/review-code=, =/voice=) as its own logical commits. Fed by the inbox auto-loop and the no-approvals speedrun preset (caller wiring lands with that feature's Phase 2). + - Triggers: "work the backlog", "work the backlog with " (manual fallback — the callers invoke it directly) + ** Calendar - =add-calendar-event.org= — create a calendar event. diff --git a/.ai/workflows/inbox.org b/.ai/workflows/inbox.org index ea45ae3..60ae8ff 100644 --- a/.ai/workflows/inbox.org +++ b/.ai/workflows/inbox.org @@ -461,18 +461,16 @@ Take these up when the single-destination version is in use and the multi-projec * Mode: auto inbox zero -A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop = running roam mode. It is in-session and interactive by design — each cycle reports, and a find waits for Craig's go before any work happens. +A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop = running roam mode. It is in-session and interactive by design — each cycle reports what it found and filed. ** Per cycle 1. Run roam mode's scan (Phase A local check + Phase B roam scan), read-only — no =git pull=. The capture-guard still gates any write: use =capture-guard --wait= (core §5) so a transient capture clears itself; if it's still open after the wait, *defer this cycle's roam reconcile to the next cycle* rather than surfacing — the loop cadence is the retry, and the filed items get swept next time. The rare write hands its git to =roam-sync= (roam Phase D). 2. *Nothing found* → no inbox summary. One acknowledgement line: =ran at HH:MM, nothing found=. Nothing else. The acknowledge-only-on-empty rule keeps a quiet inbox quiet. -3. *Items found* → summarize the found items, file them as tasks (roam Phase C), and *append them to a displayed queue* — the harness task list, via =TaskCreate= — so the queue accumulates across cycles. Then ask: "run this batch next?" - - *Yes* → launch into implementing the found items, each through the normal disposition ladder (core §3) + verify flow. - - *No* → they stay queued for a later go. +3. *Items found* → summarize the found items, file them as tasks (roam Phase C), and *append them to a displayed queue* — the harness task list, via =TaskCreate= — so the queue accumulates across cycles. Routing is where this mode stops: the execution loop lives in =work-the-backlog.org=, its one home, and this mode never implements anything itself. 4. *Cross-cycle dedup.* Subsequent cycles add only *newly-found* items to the same displayed queue, never re-surfacing what's already there. Dedup against the queue (the =TaskCreate= list), not against what's already been implemented — a find that was queued-but-not-yet-run must not reappear, and one already filed into =todo.org= is dropped by roam Phase C's status check. -A find is always surfaced and gated on Craig's yes; a quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because its execute step waits for a yes. +A find is always surfaced and filed; a quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because each cycle reports to Craig interactively. ** Fully-unattended pass (=/schedule=) — vNext, not v1 diff --git a/.ai/workflows/work-the-backlog.org b/.ai/workflows/work-the-backlog.org new file mode 100644 index 0000000..8b69d8e --- /dev/null +++ b/.ai/workflows/work-the-backlog.org @@ -0,0 +1,145 @@ +#+TITLE: Work the Backlog +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-07-02 + +* Overview + +The single home for the autonomous task-execution loop: take a set of marked, solo-doable tasks from the project's =todo.org= and work them unattended, each held to the full quality bar, under a fixed safety contract. Spec: =rulesets/docs/specs/2026-06-16-autonomous-batch-execution-spec.org=. + +Two callers feed it, differing only in how they build the task set and which session mode they pass: + +- The *inbox auto-loop* (=inbox.org= auto mode) chains here after its routing completes, with a tag/priority query, file-only mode, cap 1. +- The *no-approvals speedrun* preset feeds an explicit ordered list with autonomous-commit + always-push + paging-on, after a pre-flight Q&A that front-loads every decision. + +This workflow owns the execution logic — eligibility gate, defer checklist, quality bar, run cap. Callers own input assembly and mode selection. Capture-routing (inbox surfaces) stays entirely in =inbox.org=; this file never reads an inbox. + +* When to Use This Workflow + +Invoked by its callers, not usually by phrase. The caller wiring (the auto-loop chain step and the speedrun preset's trigger phrases) lands with the feature's Phase 2. + +Manual fallback: "work the backlog" / "work the backlog with " — gather the three inputs below (ask for whichever are missing, defaulting to file-only mode; default cap is the list length for an explicit set, 1 for a query) and run the loop. + +* Inputs — the caller contract + +A caller hands this workflow three things: + +1. *A task set* — an ordered list of candidate task headings from the project's =todo.org=. Either an explicit ordered list (speedrun) or the result of a tag/priority query (the loop). The loop does not care how the set was assembled; it receives an ordered list of candidates. +2. *A session mode* — two orthogonal flags: + - *Commit autonomy:* =file-only= (default) or =autonomous-commit=. See "Commit autonomy" below. + - *Paging:* on or off. End-of-set only. +3. *A run cap* — the hard maximum number of tasks to complete this run. + +It returns a per-task outcome and a run summary. + +* Outcomes — the per-task vocabulary + +Every task in the set ends in exactly one of: + +- =implemented-committed= — implemented, committed (and pushed per the project's flow) under =autonomous-commit=. +- =implemented-diff-surfaced= — implemented, diff surfaced, *not* committed (=file-only=). +- =deferred-VERIFY= — a defer-checklist hit; a =VERIFY= filed naming what's missing or risky. +- =dropped-by-craig= — removed from the run at the speedrun pre-flight Q&A ("skip this"). +- =skipped-ineligible= — failed the mechanical eligibility gate. + +The run summary lists each task with its outcome, plus the remaining set when the cap stopped the run. + +* The loop + +For the task set, in order, until the run cap is hit: + +1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task. +2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist. +3. *Defer checklist* (below). Any hit → defer: file the =VERIFY= naming the gap and record =deferred-VERIFY= (or, under the speedrun preset, route a quick-question gap to the pre-flight Q&A), next task. +4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped. +5. *Commit autonomy branch:* + - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=. + - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=. +6. *Record metrics* for the task (the JSONL append — lands with the feature's Phase 5). +7. Decrement the cap. At zero, stop. + +After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary either way. + +* Eligibility gate — mechanical, no judgment + +A task is autonomous-safe when *both* hold. This layer is a lookup, not a judgment; all the judgment lives in the defer checklist. + +1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= marks "awaiting Craig's input"; auto-implementing one defeats the check it represents. The do-not-implement set is safe-by-omission: anything not plainly =TODO= (plus any project-declared "hold" marker) is out. +2. *Tagged =:solo:=* — the autonomy tag, resolved against the project's priority/tag scheme header in =todo.org= (never hardcoded). =:solo:= carries the hard definition in =todo-format.md=: completable and verifiable without Craig beyond at most one or two quick decisions answerable up front, no design deliberation. A project whose scheme declares a different autonomous-safe tag set overrides the default. + +Priority and =:next:= drive *ordering* within the eligible set, not eligibility ([#A] before [#B] before [#C], then the author's ordering). =:quick:= is an effort hint for batching and duration estimates — never a gate. + +Task *size* is deliberately absent from this gate. A large but well-specified, decision-free task is in scope and gets decomposed into per-logical-commit chunks during implementation. Size never sends a task away; only *deliberation* or *risk* does (the checklist below). + +*No scheme header → don't run.* The gate reads =:solo:= semantics from the project's scheme header; a =todo.org= without one leaves the tag undefined (=todo-format.md= makes the header mandatory). Surface that the header is missing and stop rather than guessing eligibility. + +* The defer checklist — act vs file + +After the scope read, run each eligible candidate through the checklist. Each item is a concrete, answerable question, not an adjective. *Any* hit — or any "unsure" — defers the task. Only a task that clears every item is implemented. + +1. *Test-writability (the keystone).* Can I write the failing test from the task text — plus any decisions gathered up front — without inventing a requirement? *No / unsure* → underspecified. Under the speedrun preset, if the gap is one or two quick answerable questions, route it to the pre-flight Q&A; otherwise file a =VERIFY= noting what's missing. Under the unattended loop, file the =VERIFY= (no one to ask). +2. *Data-loss / irreversible / external operation.* Does implementing it require any of: =rm= of non-scratch data, =git reset --hard= / force-push, =DROP= / =DELETE= / =TRUNCATE=, file truncate/overwrite of persisted content, a schema or data migration, any external or shared-state mutation, any credential touch? *Yes* → do NOT implement; file a =VERIFY= naming the risk. This is the hard safety gate; an upfront answer never overrides it without an explicit checkpoint. +3. *Already-satisfied.* Does the scope read show the desired end-state already holds? *Yes* → file a =VERIFY= noting it and move on. Don't make a no-op change. +4. *Design deliberation.* Does the task carry an unresolved design question, a "weigh these approaches" with real tradeoffs, or a TBD that isn't a quick factual answer? *Yes* → under the speedrun preset, if it collapses to one or two quick questions, route to the pre-flight Q&A; otherwise file and surface as a =/start-work= candidate. Under the loop, file. The discriminator is *quick-answerable question* vs *deliberation* — never task size. + +When genuinely unsure which side a task falls on, defer — a wrong auto-implement costs a revert *and* the next-session correction. + +The speedrun pre-flight Q&A this checklist routes to (gather → classify → order → intro → batch-ask → skip/answer) lands with the feature's Phase 4; until then a quick-question gap defers like any other hit. + +* Per-task quality bar + +Autonomy changes who approves, not what quality means. Per task, non-negotiable: + +- *TDD* per =testing.md=: red first, green, refactor. The keystone checklist item already proved the failing test is writable. +- *Verification* per =verification.md=: fresh evidence, full suite green before any commit. +- *=/review-code --staged=* before every commit; Critical and Important findings block until fixed. +- *=/voice personal=* on every commit message on the =autonomous-commit= path (or the patterns walked inline if the skill is unavailable), message printed inline so the log shows what landed. +- *Task closure* per =todo-format.md=: depth-based completion (keyword + =CLOSED:= at level 2, dated rewrite at level 3+). +- *One logical change per commit.* A large task becomes several commits, not one omnibus. + +* Commit autonomy + +=file-only= is the default: surface the diff, never commit. =autonomous-commit= is honored only when the project carries the per-project commit-autonomy waiver; absent the waiver, the request degrades to =file-only= and says so. The waiver read (location, the "has waiver" vs "loop may commit" split) and the degrade contract land with the feature's Phase 3 — until then, treat every run as =file-only= unless Craig has put the session in no-approvals mode himself. + +* Bounding the run + +The cap is a hard per-run task ceiling passed by the caller — the kill switch a runaway can't exceed: + +- *Loop caller default: 1.* Implement the highest-priority eligible candidate, record, stop; the next tick continues. +- *Speedrun: the length of the explicit list*, capped at a ceiling — the human bounded the set by naming it. + +Even the speedrun stops at the cap and surfaces (and, with paging on, pages) the remainder. The cap bounds task *count*, not cost; a token budget is logged as vNext. + +* End-of-set page + +With paging on, fire one page when the set is done or the cap is hit — end-of-set only, never per-task: + +#+begin_src sh +notify alarm "Page" ": done, remaining — " --persist +#+end_src + +=--persist= keeps it on screen until dismissed. The message carries the project, completed count, and remaining count so Craig can confirm ready + name the next project in one reply. Full paging detail lands with the feature's Phase 4. + +* Metrics + +Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl= (git-tracked). Logging is a side effect that never alters execution. The record schema lands with the feature's Phase 5. + +* Common Mistakes + +1. *Implementing a =VERIFY= or =DOING= task.* The gate is status =TODO= only — a =VERIFY= exists precisely because Craig's input is pending. +2. *Treating =:quick:= as eligibility.* It's an effort hint. =:solo:= is the gate. +3. *Deferring on size.* A large, well-specified, decision-free task runs — decomposed into logical commits. Size is not a checklist item. +4. *Guessing past the keystone.* If the failing test isn't writable from the task text, the task isn't ready. Inventing the requirement is the failure the checklist exists to stop. +5. *Rationalizing through the data-loss list.* "The migration is small" doesn't clear checklist item 2. Enumerated operations defer, full stop. +6. *Committing in =file-only= mode.* The diff is the deliverable; the commit is Craig's. +7. *One omnibus commit for the whole run.* Every logical change is its own reviewed commit. +8. *Skipping =/review-code= or =/voice= because nobody's watching.* Autonomy removes interaction gates, never engineering-discipline gates (same contract as =no-approvals.org=). +9. *Running past the cap.* The cap is the kill switch; hitting it means stop and surface, even mid-set. +10. *Paging per-task.* One page, end of set. + +* Living Document + +Refine as the dogfooding signal arrives — the metrics log and the corrections-in-next-session signal are the feedback loop. Fold recurring adjustments in rather than accumulating caller-side workarounds. + +* History + +Created 2026-07-02 as Phase 1 of the autonomous-batch execution spec, reconciling the inbox-zero "Phase E" proposal and the =.emacs.d= speedrun proposal into one execution loop. The auto-inbox-zero execute step in =inbox.org= reverted to routing-only in the same change so this file is the loop's only home. diff --git a/claude-templates/.ai/workflows/INDEX.org b/claude-templates/.ai/workflows/INDEX.org index a474b29..8d50577 100644 --- a/claude-templates/.ai/workflows/INDEX.org +++ b/claude-templates/.ai/workflows/INDEX.org @@ -54,6 +54,9 @@ This index must list every =.org= file in =.ai/workflows/= except this one and e - Roam-mode triggers: "inbox zero", "empty the inbox", "process the roam inbox", "triage my roam inbox" - Auto-mode trigger: "auto inbox zero" (match before "inbox zero") +- =work-the-backlog.org= — the autonomous task-execution loop, the single home for working a batch of marked tasks unattended: takes an ordered task set (explicit list or tag query) + session mode (=file-only= default / =autonomous-commit= + paging) + a hard run cap; each candidate passes the mechanical eligibility gate (status =TODO= + =:solo:= per the project's scheme header) and the four-item defer checklist, then is implemented to the full quality bar (TDD, =/review-code=, =/voice=) as its own logical commits. Fed by the inbox auto-loop and the no-approvals speedrun preset (caller wiring lands with that feature's Phase 2). + - Triggers: "work the backlog", "work the backlog with " (manual fallback — the callers invoke it directly) + ** Calendar - =add-calendar-event.org= — create a calendar event. diff --git a/claude-templates/.ai/workflows/inbox.org b/claude-templates/.ai/workflows/inbox.org index ea45ae3..60ae8ff 100644 --- a/claude-templates/.ai/workflows/inbox.org +++ b/claude-templates/.ai/workflows/inbox.org @@ -461,18 +461,16 @@ Take these up when the single-destination version is in use and the multi-projec * Mode: auto inbox zero -A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop = running roam mode. It is in-session and interactive by design — each cycle reports, and a find waits for Craig's go before any work happens. +A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop = running roam mode. It is in-session and interactive by design — each cycle reports what it found and filed. ** Per cycle 1. Run roam mode's scan (Phase A local check + Phase B roam scan), read-only — no =git pull=. The capture-guard still gates any write: use =capture-guard --wait= (core §5) so a transient capture clears itself; if it's still open after the wait, *defer this cycle's roam reconcile to the next cycle* rather than surfacing — the loop cadence is the retry, and the filed items get swept next time. The rare write hands its git to =roam-sync= (roam Phase D). 2. *Nothing found* → no inbox summary. One acknowledgement line: =ran at HH:MM, nothing found=. Nothing else. The acknowledge-only-on-empty rule keeps a quiet inbox quiet. -3. *Items found* → summarize the found items, file them as tasks (roam Phase C), and *append them to a displayed queue* — the harness task list, via =TaskCreate= — so the queue accumulates across cycles. Then ask: "run this batch next?" - - *Yes* → launch into implementing the found items, each through the normal disposition ladder (core §3) + verify flow. - - *No* → they stay queued for a later go. +3. *Items found* → summarize the found items, file them as tasks (roam Phase C), and *append them to a displayed queue* — the harness task list, via =TaskCreate= — so the queue accumulates across cycles. Routing is where this mode stops: the execution loop lives in =work-the-backlog.org=, its one home, and this mode never implements anything itself. 4. *Cross-cycle dedup.* Subsequent cycles add only *newly-found* items to the same displayed queue, never re-surfacing what's already there. Dedup against the queue (the =TaskCreate= list), not against what's already been implemented — a find that was queued-but-not-yet-run must not reappear, and one already filed into =todo.org= is dropped by roam Phase C's status check. -A find is always surfaced and gated on Craig's yes; a quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because its execute step waits for a yes. +A find is always surfaced and filed; a quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because each cycle reports to Craig interactively. ** Fully-unattended pass (=/schedule=) — vNext, not v1 diff --git a/claude-templates/.ai/workflows/work-the-backlog.org b/claude-templates/.ai/workflows/work-the-backlog.org new file mode 100644 index 0000000..8b69d8e --- /dev/null +++ b/claude-templates/.ai/workflows/work-the-backlog.org @@ -0,0 +1,145 @@ +#+TITLE: Work the Backlog +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-07-02 + +* Overview + +The single home for the autonomous task-execution loop: take a set of marked, solo-doable tasks from the project's =todo.org= and work them unattended, each held to the full quality bar, under a fixed safety contract. Spec: =rulesets/docs/specs/2026-06-16-autonomous-batch-execution-spec.org=. + +Two callers feed it, differing only in how they build the task set and which session mode they pass: + +- The *inbox auto-loop* (=inbox.org= auto mode) chains here after its routing completes, with a tag/priority query, file-only mode, cap 1. +- The *no-approvals speedrun* preset feeds an explicit ordered list with autonomous-commit + always-push + paging-on, after a pre-flight Q&A that front-loads every decision. + +This workflow owns the execution logic — eligibility gate, defer checklist, quality bar, run cap. Callers own input assembly and mode selection. Capture-routing (inbox surfaces) stays entirely in =inbox.org=; this file never reads an inbox. + +* When to Use This Workflow + +Invoked by its callers, not usually by phrase. The caller wiring (the auto-loop chain step and the speedrun preset's trigger phrases) lands with the feature's Phase 2. + +Manual fallback: "work the backlog" / "work the backlog with " — gather the three inputs below (ask for whichever are missing, defaulting to file-only mode; default cap is the list length for an explicit set, 1 for a query) and run the loop. + +* Inputs — the caller contract + +A caller hands this workflow three things: + +1. *A task set* — an ordered list of candidate task headings from the project's =todo.org=. Either an explicit ordered list (speedrun) or the result of a tag/priority query (the loop). The loop does not care how the set was assembled; it receives an ordered list of candidates. +2. *A session mode* — two orthogonal flags: + - *Commit autonomy:* =file-only= (default) or =autonomous-commit=. See "Commit autonomy" below. + - *Paging:* on or off. End-of-set only. +3. *A run cap* — the hard maximum number of tasks to complete this run. + +It returns a per-task outcome and a run summary. + +* Outcomes — the per-task vocabulary + +Every task in the set ends in exactly one of: + +- =implemented-committed= — implemented, committed (and pushed per the project's flow) under =autonomous-commit=. +- =implemented-diff-surfaced= — implemented, diff surfaced, *not* committed (=file-only=). +- =deferred-VERIFY= — a defer-checklist hit; a =VERIFY= filed naming what's missing or risky. +- =dropped-by-craig= — removed from the run at the speedrun pre-flight Q&A ("skip this"). +- =skipped-ineligible= — failed the mechanical eligibility gate. + +The run summary lists each task with its outcome, plus the remaining set when the cap stopped the run. + +* The loop + +For the task set, in order, until the run cap is hit: + +1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task. +2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist. +3. *Defer checklist* (below). Any hit → defer: file the =VERIFY= naming the gap and record =deferred-VERIFY= (or, under the speedrun preset, route a quick-question gap to the pre-flight Q&A), next task. +4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped. +5. *Commit autonomy branch:* + - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=. + - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=. +6. *Record metrics* for the task (the JSONL append — lands with the feature's Phase 5). +7. Decrement the cap. At zero, stop. + +After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary either way. + +* Eligibility gate — mechanical, no judgment + +A task is autonomous-safe when *both* hold. This layer is a lookup, not a judgment; all the judgment lives in the defer checklist. + +1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= marks "awaiting Craig's input"; auto-implementing one defeats the check it represents. The do-not-implement set is safe-by-omission: anything not plainly =TODO= (plus any project-declared "hold" marker) is out. +2. *Tagged =:solo:=* — the autonomy tag, resolved against the project's priority/tag scheme header in =todo.org= (never hardcoded). =:solo:= carries the hard definition in =todo-format.md=: completable and verifiable without Craig beyond at most one or two quick decisions answerable up front, no design deliberation. A project whose scheme declares a different autonomous-safe tag set overrides the default. + +Priority and =:next:= drive *ordering* within the eligible set, not eligibility ([#A] before [#B] before [#C], then the author's ordering). =:quick:= is an effort hint for batching and duration estimates — never a gate. + +Task *size* is deliberately absent from this gate. A large but well-specified, decision-free task is in scope and gets decomposed into per-logical-commit chunks during implementation. Size never sends a task away; only *deliberation* or *risk* does (the checklist below). + +*No scheme header → don't run.* The gate reads =:solo:= semantics from the project's scheme header; a =todo.org= without one leaves the tag undefined (=todo-format.md= makes the header mandatory). Surface that the header is missing and stop rather than guessing eligibility. + +* The defer checklist — act vs file + +After the scope read, run each eligible candidate through the checklist. Each item is a concrete, answerable question, not an adjective. *Any* hit — or any "unsure" — defers the task. Only a task that clears every item is implemented. + +1. *Test-writability (the keystone).* Can I write the failing test from the task text — plus any decisions gathered up front — without inventing a requirement? *No / unsure* → underspecified. Under the speedrun preset, if the gap is one or two quick answerable questions, route it to the pre-flight Q&A; otherwise file a =VERIFY= noting what's missing. Under the unattended loop, file the =VERIFY= (no one to ask). +2. *Data-loss / irreversible / external operation.* Does implementing it require any of: =rm= of non-scratch data, =git reset --hard= / force-push, =DROP= / =DELETE= / =TRUNCATE=, file truncate/overwrite of persisted content, a schema or data migration, any external or shared-state mutation, any credential touch? *Yes* → do NOT implement; file a =VERIFY= naming the risk. This is the hard safety gate; an upfront answer never overrides it without an explicit checkpoint. +3. *Already-satisfied.* Does the scope read show the desired end-state already holds? *Yes* → file a =VERIFY= noting it and move on. Don't make a no-op change. +4. *Design deliberation.* Does the task carry an unresolved design question, a "weigh these approaches" with real tradeoffs, or a TBD that isn't a quick factual answer? *Yes* → under the speedrun preset, if it collapses to one or two quick questions, route to the pre-flight Q&A; otherwise file and surface as a =/start-work= candidate. Under the loop, file. The discriminator is *quick-answerable question* vs *deliberation* — never task size. + +When genuinely unsure which side a task falls on, defer — a wrong auto-implement costs a revert *and* the next-session correction. + +The speedrun pre-flight Q&A this checklist routes to (gather → classify → order → intro → batch-ask → skip/answer) lands with the feature's Phase 4; until then a quick-question gap defers like any other hit. + +* Per-task quality bar + +Autonomy changes who approves, not what quality means. Per task, non-negotiable: + +- *TDD* per =testing.md=: red first, green, refactor. The keystone checklist item already proved the failing test is writable. +- *Verification* per =verification.md=: fresh evidence, full suite green before any commit. +- *=/review-code --staged=* before every commit; Critical and Important findings block until fixed. +- *=/voice personal=* on every commit message on the =autonomous-commit= path (or the patterns walked inline if the skill is unavailable), message printed inline so the log shows what landed. +- *Task closure* per =todo-format.md=: depth-based completion (keyword + =CLOSED:= at level 2, dated rewrite at level 3+). +- *One logical change per commit.* A large task becomes several commits, not one omnibus. + +* Commit autonomy + +=file-only= is the default: surface the diff, never commit. =autonomous-commit= is honored only when the project carries the per-project commit-autonomy waiver; absent the waiver, the request degrades to =file-only= and says so. The waiver read (location, the "has waiver" vs "loop may commit" split) and the degrade contract land with the feature's Phase 3 — until then, treat every run as =file-only= unless Craig has put the session in no-approvals mode himself. + +* Bounding the run + +The cap is a hard per-run task ceiling passed by the caller — the kill switch a runaway can't exceed: + +- *Loop caller default: 1.* Implement the highest-priority eligible candidate, record, stop; the next tick continues. +- *Speedrun: the length of the explicit list*, capped at a ceiling — the human bounded the set by naming it. + +Even the speedrun stops at the cap and surfaces (and, with paging on, pages) the remainder. The cap bounds task *count*, not cost; a token budget is logged as vNext. + +* End-of-set page + +With paging on, fire one page when the set is done or the cap is hit — end-of-set only, never per-task: + +#+begin_src sh +notify alarm "Page" ": done, remaining — " --persist +#+end_src + +=--persist= keeps it on screen until dismissed. The message carries the project, completed count, and remaining count so Craig can confirm ready + name the next project in one reply. Full paging detail lands with the feature's Phase 4. + +* Metrics + +Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl= (git-tracked). Logging is a side effect that never alters execution. The record schema lands with the feature's Phase 5. + +* Common Mistakes + +1. *Implementing a =VERIFY= or =DOING= task.* The gate is status =TODO= only — a =VERIFY= exists precisely because Craig's input is pending. +2. *Treating =:quick:= as eligibility.* It's an effort hint. =:solo:= is the gate. +3. *Deferring on size.* A large, well-specified, decision-free task runs — decomposed into logical commits. Size is not a checklist item. +4. *Guessing past the keystone.* If the failing test isn't writable from the task text, the task isn't ready. Inventing the requirement is the failure the checklist exists to stop. +5. *Rationalizing through the data-loss list.* "The migration is small" doesn't clear checklist item 2. Enumerated operations defer, full stop. +6. *Committing in =file-only= mode.* The diff is the deliverable; the commit is Craig's. +7. *One omnibus commit for the whole run.* Every logical change is its own reviewed commit. +8. *Skipping =/review-code= or =/voice= because nobody's watching.* Autonomy removes interaction gates, never engineering-discipline gates (same contract as =no-approvals.org=). +9. *Running past the cap.* The cap is the kill switch; hitting it means stop and surface, even mid-set. +10. *Paging per-task.* One page, end of set. + +* Living Document + +Refine as the dogfooding signal arrives — the metrics log and the corrections-in-next-session signal are the feedback loop. Fold recurring adjustments in rather than accumulating caller-side workarounds. + +* History + +Created 2026-07-02 as Phase 1 of the autonomous-batch execution spec, reconciling the inbox-zero "Phase E" proposal and the =.emacs.d= speedrun proposal into one execution loop. The auto-inbox-zero execute step in =inbox.org= reverted to routing-only in the same change so this file is the loop's only home. diff --git a/todo.org b/todo.org index 546f8ad..1f2207d 100644 --- a/todo.org +++ b/todo.org @@ -459,8 +459,8 @@ Craig ratified all eight decisions in [[file:docs/specs/2026-06-16-autonomous-ba *** 2026-07-02 Thu @ 00:44:59 -0400 spec-response decomposition — :SPEC_ID: bound, spec DOING Stamped the spec's UUID on this parent, broke Phases 1-6 into the build tasks below (plus the flip task and a live-trial validation child), and flipped the spec's status heading READY → DOING per the transition-ownership table. -*** TODO [#C] Phase 1 — extract the loop into work-the-backlog.org :feature:solo: -Write work-the-backlog.org holding the eligibility gate (mechanical tag read: :solo: present, no do-not-auto-implement marker), the defer checklist, the per-task quality bar (TDD red→green, /review-code, /voice on the commit), and run-cap logic — inputs: task set + session mode + cap. In the same change, revert inbox.org's "auto inbox zero" execute step (per-cycle item 3's "Yes → launch into implementing...") to routing-only so the execution loop has one home. Spec Phase 1. Verify: make test green, sync clean; the new workflow is callable but nothing invokes it yet. +*** 2026-07-02 Thu @ 01:07:29 -0400 Phase 1 landed — execution loop extracted into work-the-backlog.org +work-the-backlog.org written (canonical + mirror): caller contract (task set + session mode + cap), five-outcome vocabulary, the loop, mechanical eligibility gate (TODO + :solo: per scheme header, safe-by-omission, no-scheme-header → don't run), four-item defer checklist, per-task quality bar, cap/kill-switch semantics, page + metrics stubs pointing at Phases 4-5. inbox.org's auto-mode per-cycle item 3 reverted to routing-only (yes-path execution removed; mode intro + closing line updated to match). INDEX.org entry added. make test green, sync clean; nothing invokes the new workflow yet. *** TODO [#C] Phase 2 — wire the two callers :feature:solo: Auto-inbox-zero's "run this batch next?" yes-path invokes work-the-backlog (tag query + file-only + cap 1); the no-approvals speedrun preset (trigger phrases "speedrun" / "no approvals speedrun") feeds it an explicit ordered list + autonomous-commit + always-push + paging-on, running the pre-flight Q&A first. Spec Phase 2 / D4 / D7. Verify: each caller independently exercisable. -- cgit v1.2.3