aboutsummaryrefslogtreecommitdiff
path: root/docs/design
diff options
context:
space:
mode:
Diffstat (limited to 'docs/design')
-rw-r--r--docs/design/2026-06-16-autonomous-batch-execution-spec.org221
1 files changed, 138 insertions, 83 deletions
diff --git a/docs/design/2026-06-16-autonomous-batch-execution-spec.org b/docs/design/2026-06-16-autonomous-batch-execution-spec.org
index e2e0f90..84cefe3 100644
--- a/docs/design/2026-06-16-autonomous-batch-execution-spec.org
+++ b/docs/design/2026-06-16-autonomous-batch-execution-spec.org
@@ -4,7 +4,7 @@
#+TODO: TODO | DONE SUPERSEDED CANCELLED
* Metadata
-| Status | draft |
+| Status | ready |
|----------+--------------------------------------------------------------------|
| Owner | Craig Jennings |
|----------+--------------------------------------------------------------------|
@@ -12,21 +12,21 @@
|----------+--------------------------------------------------------------------|
| Date | 2026-06-16 |
|----------+--------------------------------------------------------------------|
-| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:2026-06-15-fix-speedrun-workflow-proposal.org][fix-speedrun proposal]] |
+| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] |
|----------+--------------------------------------------------------------------|
* Summary
-Two proposals arrived within a day of each other describing the same capability: have Claude work a batch of small, well-marked tasks autonomously, with a full quality bar per task and no per-step approval gate. The inbox-zero "Phase E" proposal drives it from a tag/priority query on a recurring loop; the "fix speedrun" proposal drives it from an explicit ordered list a human dictates in-session. This spec reconciles both into one feature: a single dedicated workflow, =work-the-backlog.org=, that holds the task-execution logic, with two thin callers feeding it. It also designs the instrumentation that measures whether the autonomy is actually paying off.
+Two proposals arrived within a day of each other describing the same capability: have Claude work a batch of small, well-marked tasks autonomously, with a full quality bar per task and no per-step approval gate. The inbox-zero "Phase E" proposal drives it from a tag/priority query on a recurring loop; the "speedrun" proposal drives it from an explicit ordered list a human dictates in-session. This spec reconciles both into one feature: a single dedicated workflow, =work-the-backlog.org=, that holds the task-execution logic, with two thin callers feeding it. It also designs the instrumentation that measures whether the autonomy is actually paying off.
* Problem / Context
-Craig has a standing backlog of small, solo-doable fixes across several projects, already marked with a tag convention (=:next:=, =:quick:+:solo:=). Doing them by hand one at a time is the bottleneck — each is 30 minutes or less, but the context-switch and the per-commit approval ceremony dominate the actual work. He wants Claude to burn these down unattended: on a recurring loop for the routed inbox case, and on demand when he batches a named list and says "fix speedrun, no approvals until done."
+Craig has a standing backlog of small, solo-doable fixes across several projects, already marked with a tag convention (=:next:=, =:quick:+:solo:=). Doing them by hand one at a time is the bottleneck — the context-switch and the per-commit approval ceremony dominate the actual work. He wants Claude to burn these down unattended: on a recurring loop for the routed inbox case, and on demand when he batches a named list and says "speedrun, no approvals until done." The speedrun is the away-from-desk / working-on-something-else mode, so it must be able to take on larger tasks too — not only sub-30-minute ones — or it forces him to stay at the desk for anything non-trivial.
Two separate proposals tried to answer this:
- *Phase E* (in =inbox-zero.org=, edited in =.emacs.d= as a stopgap) bolted autonomous execution onto the inbox-zero workflow's on-demand and loop callers. The sender flagged the seam as the open question: coupling capture-routing with autonomous-implementation pollutes inbox-zero's three existing callers (startup, wrap-up, on-demand), two of which must never execute anything.
-- *fix speedrun* (a =.emacs.d= theme-studio session that worked well) is the same execution loop driven by an explicit ordered task set, with end-of-set paging and always-push.
+- *speedrun* (a =.emacs.d= theme-studio session that worked well) is the same execution loop driven by an explicit ordered task set, with end-of-set paging and always-push.
They overlap almost entirely. The execution loop — eligibility gate, act-vs-file decision, per-task quality bar, bounded run — is identical. Only the *input* differs (tag query vs explicit list) and the *session mode* differs (loop default vs no-approvals + always-push + page). Building them as two features would duplicate the execution logic and let the two copies drift. The forces: keep inbox-zero's callers clean, share one execution loop, and make the autonomy safe enough to run unattended on a 30-minute timer without Craig watching.
@@ -37,22 +37,24 @@ A second, explicit ask from Craig: instrument this so its effectiveness is measu
** Goals
- One workflow, =work-the-backlog.org=, owns the task-execution loop. Both input shapes (tag query, explicit list) and both session modes feed it.
- inbox-zero's three existing callers stay clean: the loop caller chains into =work-the-backlog= *after* routing; startup and wrap-up never touch it.
-- "fix speedrun" is a thin named preset, not a second implementation: no-approvals session mode + always-push + end-of-set page, feeding an explicit ordered list.
+- The *no-approvals speedrun* is a thin named preset, not a second implementation: autonomous-commit + always-push + end-of-set page, fed an explicit ordered list, with all approvals front-loaded into a single pre-flight step (below) so the run itself is uninterrupted.
+- Eligibility is decided by *crisp, checkable criteria*, not adjectives: a mechanical tag/status gate (=:solo:= + status =TODO=), then a per-task defer checklist whose keystone is "can I write the failing test from the task text without inventing a requirement?" Task *size* is explicitly not a gate — a large task is decomposed into per-logical-commit chunks, not deferred.
+- The autonomy tags (=:solo:=, =:quick:=) carry hard definitions in =todo-format.md= and are applied + enforced as a mandatory step in the task-review and task-audit workflows, so the run-time gate trusts the author's tag instead of re-deriving it.
- Commit autonomy defaults to file-only (surface a diff, no auto-commit). A project opts into autonomous commit+push explicitly via its per-project waiver.
-- Hard guardrails: refuse to speedrun any task needing a design decision or carrying data-loss risk without a checkpoint; file a =VERIFY= and move on rather than guess-implement an underspecified task; a per-run cap / kill switch beyond "one task per run."
+- Hard guardrails: refuse any task carrying data-loss / irreversible / external-state risk without a checkpoint; gather any one-or-two quick decisions a task needs *up front* (speedrun) rather than guessing; file a =VERIFY= for anything underspecified or needing design deliberation; a per-run cap / kill switch beyond "one task per run."
- A lightweight per-run metrics log plus a periodic synthesis step that writes org-roam KB articles summarizing the trend.
** Non-Goals
-- *Not* a replacement for =/start-work=. Tasks needing deliberation, design, or an hour-plus stay with =/start-work= and its approval gates. This feature only touches the small, marked, solo set.
+- *Not* a replacement for =/start-work=. Tasks needing deliberation or design stay with =/start-work= and its approval gates. This feature only touches the marked, solo set — regardless of size.
- *Not* a new tag convention. It reads the project's own priority/tag scheme header; it never invents or hardcodes tags across projects.
- *Not* an inbox-routing change. =inbox-zero.org= keeps its A-D phases. The Phase E text added in =.emacs.d= as a stopgap is *removed* and its logic moves here.
- *Not* a multi-project orchestrator. One run works one project's backlog. Cross-project handoff stays with =inbox-send= and the paging reply.
- *Not* a credential-handling or external-API feature. Tasks that touch secrets or external mutations are out of the eligible set by the guardrail.
** Scope tiers
-- *v1:* =work-the-backlog.org=; the eligibility gate reading the project's scheme header; the act-vs-file decision with VERIFY-on-ambiguity; file-only commit default with per-project opt-in; the loop caller wiring and inbox-zero Phase E removal; the "fix speedrun" preset with end-of-set =notify --persist= page; the per-run metrics log (structured JSONL).
+- *v1:* =work-the-backlog.org=; crisp =:solo:= / =:quick:= definitions in =todo-format.md= plus their mandatory application in task-review and task-audit; the eligibility gate (=:solo:= + status =TODO=, read against the project's scheme header); the act-vs-file *defer checklist* (test-writability keystone, enumerated data-loss list, already-satisfied, design-deliberation); the no-approvals speedrun's pre-flight decision-gathering step; file-only commit default with per-project opt-in; the loop caller wiring and inbox-zero Phase E removal; the speedrun preset with end-of-set =notify --persist= page; the per-run metrics log (structured JSONL).
- *Out of scope:* a token-budget kill switch (cap is a task count in v1); cross-project batch runs; a dashboard or live UI over the metrics.
-- *vNext (log to todo.org):* the periodic org-roam synthesis step if it doesn't make v1; a token/cost budget alongside the task-count cap; auto-detection of "human corrected my autonomous commit" from the next session's diff.
+- *vNext (log to todo.org):* the periodic org-roam synthesis step if it doesn't make v1; a token/cost budget alongside the task-count cap (more pressing now that task size is uncapped — a single large task can run long in the unattended loop); auto-detection of "human corrected my autonomous commit" from the next session's diff.
* Design
@@ -63,8 +65,8 @@ The architecture is one execution workflow with two callers and one preset, plus
#+begin_example
inbox-zero loop caller ──(after Phase D routing)──┐
├──▶ work-the-backlog.org ──▶ metrics log (JSONL)
- "fix speedrun" preset ──(explicit ordered list)───┘ │
- = no-approvals + always-push + end-page ▼
+ no-approvals speedrun ──(explicit ordered list)──┘ │
+ = pre-flight Q&A + autonomous-commit + push + page ▼
periodic synthesis ──▶ org-roam KB articles
#+end_example
@@ -76,20 +78,20 @@ This is the seam the Phase E sender asked for: separating capture-routing (inbox
A caller hands =work-the-backlog= three things:
-1. *A task set* — either an explicit ordered list of task headings (fix speedrun), or the result of a tag/priority query against =todo.org= (the loop). The workflow does not care which; it receives an ordered list of candidate tasks.
+1. *A task set* — either an explicit ordered list of task headings (speedrun), or the result of a tag/priority query against =todo.org= (the loop). The workflow does not care which; it receives an ordered list of candidate tasks.
2. *A session mode* — =file-only= (default) or =autonomous-commit= (requires the project's per-project waiver), and a paging flag.
3. *A run cap* — the maximum number of tasks to complete this run.
-It returns: per-task outcome (implemented+committed / implemented+diff-surfaced / deferred-VERIFY / deferred-too-large / skipped-ineligible), and a metrics record per task.
+It returns: per-task outcome (implemented+committed / implemented+diff-surfaced / deferred-VERIFY / dropped-by-craig / skipped-ineligible), and a metrics record per task.
** The execution loop (implementer's view)
For the task set, in order, until the run cap is hit:
1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task.
-2. *Scope read* of the relevant code. Cheap; just enough to make the act-vs-file call.
-3. *Act-vs-file decision* (below). File → record the deferral reason, next task.
-4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important, then close the task per =todo-format.md=.
+2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist.
+3. *Defer checklist* (below). Any hit → record the deferral reason (or, under the speedrun preset, route the quick-question gap to the pre-flight Q&A), next task.
+4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important, then close the task per =todo-format.md=. Decompose into as many logical commits as the change needs — size is not capped.
5. *Commit autonomy branch:*
- =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=.
- =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=.
@@ -98,41 +100,63 @@ For the task set, in order, until the run cap is hit:
After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary.
-** Eligibility gate
+** Eligibility gate (mechanical — no judgment)
-A task is autonomous-safe when *all* hold:
+A task is autonomous-safe when *both* hold. This layer is a lookup, not a judgment; all the judgment lives in the defer checklist below.
-1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= is the "awaiting Craig's manual confirmation" marker; auto-implementing one defeats the manual check it represents.
-2. *Tagged per the project's autonomous-safe set* — resolved by reading the project's priority/tag scheme header at the top of its =todo.org=, not by hardcoding. The default reading is =:next:= OR both =:quick:= AND =:solo:=, but a project whose scheme declares a different autonomous-safe tag set overrides that.
-3. *Solo-doable* — no input or undecided judgment call from Craig.
-4. *Roughly 30 minutes or less* of focused work.
+1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= is the "awaiting Craig's manual confirmation" marker; auto-implementing one defeats the manual check it represents. The do-not-implement set is safe-by-omission: anything not plainly =TODO= (plus any project-declared "hold" marker) is out.
+2. *Tagged =:solo:=* — the autonomy tag, resolved against the project's priority/tag scheme header (not hardcoded). =:solo:= carries a hard definition (see Tag definitions, below): the task is completable without Craig's involvement beyond at most one or two quick decisions answerable up front, with no design deliberation. A project whose scheme declares a different autonomous-safe tag set overrides the default. Priority / =:next:= drive *ordering* within the eligible set, not eligibility.
-** Act-vs-file decision (the guardrail)
+Task *size* is deliberately absent from this gate. The old "≤ ~30 minutes / one logical commit" criterion is removed: a large but well-specified, decision-free task is in scope and is decomposed into per-logical-commit chunks during implementation. Size never sends a task to =/start-work=; only *deliberation* or *risk* does (the checklist below). This is what makes the speedrun usable as an away-from-desk mode rather than a sub-30-minute-only mode.
-After the scope read, for each eligible candidate:
+*** Tag definitions (land in =todo-format.md=, enforced in task-review + task-audit)
-- *Clear, bounded, solo, ≤ ~30 min* → implement.
-- *Needs a design decision, Craig's input, or discussion* → do NOT implement. File a one-line note on the task naming the input it needs; surface it.
-- *Carries data-loss risk without a checkpoint* (deletes data, rewrites persisted state, touches external/shared state irreversibly) → do NOT implement. File a =VERIFY= explaining the risk; surface it.
-- *Underspecified or already-satisfied* → do NOT guess-implement. File a =VERIFY= noting why (the fix-speedrun "raise max spans to 5 — every cap was already 8" case) and move on.
-- *An hour or more* → do NOT implement. File and surface as a =/start-work= candidate.
+- *=:solo:= — autonomy.* The task can be completed without Craig's involvement, except for at most one or two quick decisions that can be stated and answered before the run starts. No open design question, no "weigh these approaches," no waiting on Craig mid-task. This is the eligibility tag.
+- *=:quick:= — effort hint only.* A small, fast task. Informational for batching and estimating a run's duration; *not* an eligibility gate (size no longer gates).
-When unsure which side a task falls on, file rather than implement. A wrong auto-implement costs more than a deferred task — it costs a revert *and* the human correction in the next session that the metrics are designed to catch.
+Both tags are applied at task creation and *re-checked as a mandatory step* in the task-review and task-audit workflows, so the run-time gate can trust the author's tag rather than re-derive autonomy and effort from the task body. A task-review or task-audit that skips the =:solo:= / =:quick:= assessment is incomplete.
-** Session modes and the "fix speedrun" preset
+** Act-vs-file decision (the defer checklist)
+
+After the scope read, run each eligible candidate through the checklist below. Each item is a concrete, answerable question, not an adjective. *Any* hit — or any "unsure" — sends the task to defer (or, for a quick-decision gap under the speedrun preset, to the pre-flight Q&A). Only a task that clears every item is implemented.
+
+1. *Test-writability (the keystone).* Can I write the failing test from the task text — plus any decisions gathered up front — without inventing a requirement? *No / unsure* → underspecified. Under the speedrun preset, if the gap is one or two quick answerable questions, route it to the pre-flight Q&A; otherwise file a =VERIFY= noting what's missing. Under the unattended loop, file the =VERIFY= (no one to ask). This replaces the old "clear / bounded / underspecified" adjectives with an action that fails loudly: if the red test isn't writable, the task isn't ready.
+2. *Data-loss / irreversible / external operation.* Does implementing it require any of: =rm= of non-scratch data, =git reset --hard= / force-push, =DROP= / =DELETE= / =TRUNCATE=, file truncate/overwrite of persisted content, a schema or data migration, any external or shared-state mutation, any credential touch? *Yes* → do NOT implement; file a =VERIFY= naming the risk. This is the hard safety gate; an upfront answer never overrides it without an explicit checkpoint. Replaces the vague "data-loss risk" with an enumerated, greppable set.
+3. *Already-satisfied.* Does the scope read show the desired end-state already holds? *Yes* → file a =VERIFY= noting it (the "raise max spans to 5 — every cap was already 8" case) and move on. Don't make a no-op change.
+4. *Design deliberation.* Does the task carry an unresolved design question, a "weigh these approaches" with real tradeoffs, or a TBD that isn't a quick factual answer? *Yes* → under the speedrun preset, if it collapses to one or two quick questions, route to pre-flight Q&A; otherwise file and surface as a =/start-work= candidate. Under the loop, file. The discriminator is now *quick-answerable question* vs *deliberation* — not task size.
+
+A task that clears 1–4 is implemented under the project's commit discipline, decomposed into as many logical commits as the change needs. When genuinely unsure which side a task falls on, defer — a wrong auto-implement costs a revert *and* the next-session correction the metrics are designed to catch.
+
+** Pre-flight decision gathering (the no-approvals speedrun's only interaction)
+
+The speedrun preset front-loads every approval into one step before the run, so the run itself is uninterrupted — that is what "no approvals" means. It is *not* "no input ever"; it is "all input first, then hands-off."
+
+When Craig kicks off a speedrun over an explicit list:
+
+1. *Gather* the named task set.
+2. *Scope-read and classify* each task against the eligibility gate + defer checklist: ready (clears the checklist), needs-quick-decisions (one or two upfront-answerable questions — checklist item 1 or 4), or drop (data-loss / irreversible, or design deliberation that isn't a quick question).
+3. *Order* the list (priority, then the author's ordering / =:next:=).
+4. *Intro the work* — present the ordered plan: what will run, what was dropped and why, and the batched questions for the needs-quick-decisions tasks.
+5. *Craig answers each question, or says "skip this"* → a skipped task is removed from the run (recorded =dropped-by-craig=); an answered task has the answer recorded so implementation works from the decision, not a guess.
+6. *Run the finalized list autonomously* — no further approvals until done.
+7. *End-of-set page* with completed + remaining + skipped.
+
+The unattended *loop* caller has no human at kickoff, so it cannot gather decisions: there, a needs-quick-decisions task simply defers (files its note) like any other checklist hit. The pre-flight Q&A is a speedrun-preset capability, not a loop one.
+
+** Session modes and the no-approvals speedrun preset
Two orthogonal session-mode dimensions feed the loop:
- *Commit autonomy:* =file-only= (default) or =autonomous-commit=. =autonomous-commit= is honored only when the project carries the per-project waiver (=.emacs.d= and =rulesets= have it; most projects do not). Absent the waiver, a request for =autonomous-commit= degrades to =file-only= and says so.
- *Paging:* on or off. End-of-set only.
-"fix speedrun" is the named preset = =autonomous-commit= + always-push + paging-on, fed an *explicit ordered list*. It is not a separate code path; it is a label for that combination of mode flags plus the explicit-list input. The loop caller, by contrast, runs =file-only= (unless the project has the waiver and opts the loop into commits) with paging off, fed the *tag query*.
+The *no-approvals speedrun* is the named preset = =autonomous-commit= + always-push + paging-on, fed an *explicit ordered list*, run after the pre-flight decision-gathering step above. It is not a separate code path; it is a label for that combination of mode flags plus the explicit-list input, with the pre-flight Q&A as its only interactive moment. The loop caller, by contrast, runs =file-only= (unless the project has the waiver and opts the loop into commits) with paging off, fed the *tag query*, with no pre-flight step.
** Bounding the run and the kill switch
-Default cap: one task per run for the loop caller — implement the highest-priority eligible candidate (=[#A]= before =[#B]= before =[#C]=), record, then stop and let the next tick continue. The fix-speedrun preset works the whole explicit list in order (the human bounded it by naming it), still one commit per task.
+Default cap: one task per run for the loop caller — implement the highest-priority eligible candidate (=[#A]= before =[#B]= before =[#C]=), record, then stop and let the next tick continue. The speedrun preset works the whole explicit list in order (the human bounded it by naming it), still one commit per logical change.
-The kill switch is a hard per-run task cap passed by the caller, independent of "one per run": even fix-speedrun stops at the cap and pages with the remainder listed. A loop that fires every 30 minutes and commits unattended needs a ceiling that a runaway can't exceed.
+The kill switch is a hard per-run task cap passed by the caller, independent of "one per run": even the speedrun stops at the cap and pages with the remainder listed. A loop that fires every 30 minutes and commits unattended needs a ceiling that a runaway can't exceed. With task size now uncapped, the count cap no longer bounds *cost* — a single large task can run long — so a token/cost budget is the most pressing vNext addition.
** End-of-set paging
@@ -149,69 +173,89 @@ notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>"
** Fold execution into inbox-zero (the Phase E stopgap shape)
- Good, because it's the smallest diff — the loop caller already runs inbox-zero, so execution is "one more phase."
- Bad, because it couples capture-routing with implementation. inbox-zero has three callers; startup and wrap-up must never execute. A Phase E inside inbox-zero forces both to carry a "skip Phase E" caveat and risks a future caller running it by accident.
-- Neutral, because the eligibility-gate and act-vs-file text is identical either way — only its *home* differs.
+- Neutral, because the eligibility-gate and defer-checklist text is identical either way — only its *home* differs.
-** Two separate features (keep Phase E and fix-speedrun distinct)
+** Two separate features (keep Phase E and speedrun distinct)
- Good, because each proposal ships as written with no reconciliation work.
- Bad, because the execution loop is duplicated in two places and will drift; a guardrail tightened in one won't reach the other. Two ways to do autonomous execution is two things to audit.
- Neutral, because the input and session-mode differences are real — but they're thin caller-level differences, not a reason to fork the engine.
+** Keep the task-size gate (defer anything over ~30 minutes)
+- Good, because it bounds per-task cost and blast radius with a single number.
+- Bad, because it defeats the away-from-desk use case — anything non-trivial bounces back to Craig, so he can't actually leave. Size correlates poorly with risk; a large mechanical refactor is safer than a tiny change to persisted state.
+- Neutral, because the things size was a proxy for (risk, cost) are covered directly — risk by the data-loss checklist, cost by the run cap (and the vNext token budget). The defer checklist's deliberation item, not size, is what routes genuine =/start-work= tasks out.
+
** Autonomous-commit as the default
- Good, because it's faster end-to-end with no diff to review.
- Bad, because most projects lack the per-project waiver, and an unattended loop committing to a project that never opted in is exactly the failure the file-only default prevents. The blast radius of a bad autonomous commit is a revert plus lost trust in the loop.
- Neutral, because the projects that *do* want it (=.emacs.d=, =rulesets=) opt in explicitly, so the capability is available where it's wanted without being the default everywhere.
-* Decisions [/]
+* Decisions [8/8]
+
+** DONE Eligibility tag set and where it's read
+- Owner / by-when: Craig / spec-review
+- Context: Projects' priority/tag schemes vary, and the =todo-format.md= scheme header is the declared per-project source of truth. Task size is no longer a gate, so eligibility rests on the autonomy tag, not an effort cap.
+- Decision: Eligibility = status =TODO= AND the =:solo:= autonomy tag, resolved against the project's scheme header (a project may declare a different autonomous-safe set). Priority / =:next:= drive ordering, not eligibility. =:quick:= is an effort hint, never a gate.
+- Consequences: easier — one workflow works across projects with different vocab, and the gate is a pure lookup; harder — a project with no/malformed scheme header needs a fallback, and the default (=:solo:=) must be defined precisely enough that two projects agree.
+
+** DONE Crisp =:solo:= / =:quick:= definitions, enforced in task-review + task-audit
+- Owner / by-when: Craig / spec-review
+- Context: The run-time gate is only as crisp as the tags. Today =:quick:= / =:solo:= are listed in the scheme header with no hard definition, and nothing enforces that tasks get assessed for them.
+- Decision: Define =:solo:= (completable without Craig beyond at most one-or-two upfront-answerable quick decisions; no design deliberation) and =:quick:= (small/fast effort hint only) in =todo-format.md=, and make assessing both a *mandatory step* in the task-review and task-audit workflows. A review/audit that skips the assessment is incomplete.
+- Consequences: easier — authoring-time judgment by the human who knows the answer, and the run-time gate trusts the tag; harder — task-review and task-audit grow a required step, and existing untagged tasks need a back-fill pass.
-** TODO Where the eligibility gate reads its tag set
+** DONE The do-not-auto-implement marker set
- Owner / by-when: Craig / spec-review
-- Context: Phase E hardcoded =:next:= / =:quick:+:solo:=. Projects' priority/tag schemes vary, and the =todo-format.md= scheme header is the declared source of truth per project.
-- Decision: We will read the project's =todo.org= priority/tag scheme header to resolve the autonomous-safe tag set, defaulting to =:next:= OR =:quick:+:solo:= when the header doesn't declare an explicit autonomous-safe set.
-- Consequences: easier — one workflow works correctly across projects with different tag vocabularies; harder — a project with no scheme header (or a malformed one) needs a fallback, and the "default reading" has to be specified precisely enough that two projects agree on it.
+- Context: =VERIFY= means "awaiting Craig's manual confirmation"; other projects may use markers differently.
+- Decision: Do-not-implement = any status that is not =TODO=, plus any project-declared "hold" marker. Safe-by-omission: exclude anything not plainly =TODO=.
+- Consequences: easier — portable, and manual-check tasks can't auto-run; harder — richer per-project overrides need marker semantics in the scheme header, which most lack, so the default must stay conservative.
-** TODO The do-not-auto-implement marker set
+** DONE Pre-flight decision gathering for the speedrun preset
- Owner / by-when: Craig / spec-review
-- Context: =VERIFY= means "awaiting Craig's manual confirmation" in =.emacs.d= and =rulesets=. Other projects may use =VERIFY= differently or not at all. The gate excludes =VERIFY=, =DOING=, =DONE=, =CANCELLED= by status, but the *marker semantics* are what matter.
-- Decision: We will define the do-not-auto-implement set as: any status that is not =TODO=, plus any task carrying a project-declared "hold" marker. The canonical default treats =VERIFY= as do-not-implement; a project overrides only by declaring its marker semantics in its scheme header.
-- Consequences: easier — the gate is portable and a project can't accidentally have its manual-check tasks auto-run; harder — requires the scheme header to carry marker semantics, which most don't yet, so the default has to be safe-by-omission (exclude anything not plainly =TODO=).
+- Context: Forcing every decision-needing task to defer wastes the away-from-desk use case — many tasks need only one or two quick answers Craig could give at kickoff. The speedrun is interactive at its start but must be hands-off after.
+- Decision: The speedrun preset gathers + orders the set, intros the work, and batches all needed quick decisions into one pre-flight Q&A; Craig answers or says "skip this" (drops the task); the run then proceeds with zero further approvals. The unattended loop has no kickoff human, so it defers decision-needing tasks instead.
+- Consequences: easier — "no approvals" becomes "all approvals first," which fits working-while-away, and larger / lightly-underspecified tasks become runnable; harder — the classifier must reliably split quick-question vs real-deliberation, and the recorded answers must reach the implementer so it works from the decision, not a guess.
-** TODO Commit-autonomy opt-in mechanism
+** DONE Commit-autonomy opt-in mechanism
- Owner / by-when: Craig / spec-review
- Context: =file-only= is the default; =.emacs.d= and =rulesets= have a per-project waiver allowing autonomous commits. Where does the workflow *read* that a project has opted in?
-- Decision: We will read the opt-in from the project's existing per-project waiver location (the same place the commit discipline's "no approval gate" waiver lives — =notes.org= Workflow State or =CLAUDE.md=), not introduce a new config file.
-- Consequences: easier — no new config surface, reuses the existing waiver concept; harder — the waiver's exact location and format must be pinned so the workflow can detect it deterministically, and a project with the commit waiver but *not* wanting the loop to commit needs a way to say "waiver yes, loop-commit no" (two flags, not one).
+- Decision: Read the opt-in from the project's existing per-project waiver location (=notes.org= Workflow State or =CLAUDE.md=), not a new config file. Two flags: "has commit waiver" and "loop may commit" can differ.
+- Consequences: easier — no new config surface, reuses the existing waiver concept; harder — the waiver location/format must be pinned for deterministic detection, and "waiver yes, loop-commit no" needs the two-flag split.
-** TODO Run-cap default and the kill switch shape
+** DONE Run-cap default and the kill switch shape
- Owner / by-when: Craig / spec-review
-- Context: The loop default is one task per run; fix-speedrun works an explicit list. Both need a hard ceiling a runaway can't exceed.
-- Decision: We will pass a hard per-run task cap from the caller (loop default 1; fix-speedrun = length of the explicit list, capped at a ceiling), and stop + page with the remainder when the cap is hit. v1 caps by task count, not token budget.
-- Consequences: easier — a simple integer the caller controls; bounded blast radius; harder — a task-count cap doesn't bound *cost* (one 30-min task can burn many tokens), so a token budget is vNext, and until then a pathological task can run long within a single cap slot.
+- Context: The loop default is one task per run; the speedrun works an explicit list. Both need a hard ceiling. Task size is now uncapped, so a single task can be large.
+- Decision: The caller passes a hard per-run task cap (loop default 1; speedrun = length of the explicit list, capped at a ceiling); stop + page with the remainder when the cap is hit. v1 caps by task count, not token budget.
+- Consequences: easier — a simple caller-controlled integer with a bounded task count; harder — a count cap doesn't bound *cost*, and with size uncapped a single large task can run long, so a token budget is vNext and more pressing than before.
-** TODO Metrics log location and format
+** DONE Metrics log location and format
- Owner / by-when: Craig / spec-review
- Context: Per-run metrics must land somewhere structured and queryable, per-project, and survive across sessions for the synthesis step to read.
-- Decision: We will append one JSONL record per task to a per-project log at =.ai/metrics/work-the-backlog.jsonl=, git-tracked, with the synthesis step reading the union across projects.
-- Consequences: easier — append-only JSONL is trivial to write and =jq=-queryable; per-project keeps it local to where the work happened; harder — a git-tracked log adds churn to every autonomous run's commit (or needs its own commit), and "union across projects" needs the synthesis step to know where every project's log lives.
+- Decision: Append one JSONL record per task to a per-project log at =.ai/metrics/work-the-backlog.jsonl=, git-tracked, with the synthesis step reading the union across projects.
+- Consequences: easier — append-only JSONL is trivial to write and =jq=-queryable, and per-project keeps it local to the work; harder — a git-tracked log adds commit churn, and "union across projects" needs the synthesis step to know where every log lives.
-** TODO Synthesis cadence and trigger
+** DONE Synthesis cadence and trigger
- Owner / by-when: Craig / spec-review
- Context: Craig wants periodic org-roam articles summarizing the data. What triggers synthesis, and how often?
-- Decision: We will run synthesis on an explicit trigger ("synthesize backlog metrics") and optionally a weekly scheduled run, writing one KB node per synthesis under =~/org/roam/agents/= per the knowledge-base rule.
-- Consequences: easier — explicit trigger means no surprise writes, and the KB rule already governs node shape; harder — a weekly scheduled run needs a scheduler entry and the KB write-classification (personal-only) must gate it so work-project metrics never land in the KB.
+- Decision: Run synthesis on an explicit trigger ("synthesize backlog metrics") and optionally a weekly scheduled run, writing one KB node per synthesis under =~/org/roam/agents/= per the knowledge-base rule.
+- Consequences: easier — an explicit trigger means no surprise writes, and the KB rule already governs node shape; harder — a weekly run needs a scheduler entry, and the personal-only write-classification must gate it so work-project metrics never land in the KB.
* Implementation phases
+** Phase 0 — Tag definitions + task-review/audit enforcement
+Add the hard =:solo:= / =:quick:= definitions to =todo-format.md=, and add the mandatory tag-assessment step to the task-review and task-audit workflows. Independent of the workflow build; lands first so the eligibility gate has crisp tags to read and existing tasks start getting assessed. Tree stays working: these are rule + workflow prose additions.
+
** Phase 1 — Extract the execution loop into work-the-backlog.org
-Write =work-the-backlog.org= holding the eligibility gate, act-vs-file decision, per-task quality bar, and run-cap logic — taking a task set + session mode + cap as input. Remove the stopgap "Phase E" text from =inbox-zero.org= (restore it to its A-D shape) in the same change so there's one home, not two. Tree stays working: inbox-zero reverts to routing-only, and the new workflow is callable but not yet wired to the loop.
+Write =work-the-backlog.org= holding the eligibility gate, defer checklist, per-task quality bar, and run-cap logic — taking a task set + session mode + cap as input. Remove the stopgap "Phase E" text from =inbox-zero.org= (restore it to its A-D shape) in the same change so there's one home, not two. Tree stays working: inbox-zero reverts to routing-only, and the new workflow is callable but not yet wired to the loop.
** Phase 2 — Wire the two callers
-Add the loop caller's chain step (after inbox-zero Phase D, invoke work-the-backlog with the tag query + file-only + cap 1) and the "fix speedrun" preset (explicit list + autonomous-commit + always-push + paging-on). Both go through the same workflow. Tree stays working: each caller is independently testable.
+Add the loop caller's chain step (after inbox-zero Phase D, invoke work-the-backlog with the tag query + file-only + cap 1) and the no-approvals speedrun preset (pre-flight decision-gathering → explicit list + autonomous-commit + always-push + paging-on). Both go through the same workflow; only the speedrun runs the pre-flight Q&A. Tree stays working: each caller is independently testable.
** Phase 3 — File-only vs autonomous-commit gate
Implement the commit-autonomy branch: read the per-project waiver, degrade =autonomous-commit= to =file-only= when absent, surface the degrade. Tree stays working: default file-only behavior is the safe path even before the waiver-read lands.
-** Phase 4 — Guardrails and the page
-Implement the data-loss / design-decision refusal, the VERIFY-on-ambiguity filing, and the end-of-set =notify alarm ... --persist= page. Tree stays working: guardrails only ever *reduce* what runs, so adding them can't break a passing run.
+** Phase 4 — The defer checklist, pre-flight Q&A, and the page
+Implement the act-vs-file defer checklist (test-writability keystone, enumerated data-loss list, already-satisfied, design-deliberation), the speedrun pre-flight decision-gathering (gather → classify → order → intro → batch-ask → skip/answer), the =VERIFY=-on-ambiguity filing, and the end-of-set =notify alarm ... --persist= page. Tree stays working: the checklist only ever *reduces* what runs, and the pre-flight step only runs under the speedrun preset.
** Phase 5 — Metrics log
Append the per-task JSONL record at each task outcome. Tree stays working: logging is a side effect that doesn't alter execution.
@@ -222,12 +266,14 @@ Write the synthesis step: read the JSONL union, compute the per-run and trend me
* Acceptance criteria
- [ ] =work-the-backlog.org= exists and is the only home for the execution loop; =inbox-zero.org= is back to its A-D routing-only shape with no Phase E.
- [ ] The loop caller chains into work-the-backlog after routing; startup and wrap-up never invoke it.
-- [ ] "fix speedrun" runs as the preset (autonomous-commit + always-push + end-page) over an explicit ordered list, one commit per task.
-- [ ] A task tagged for autonomous execution but at status =VERIFY= / =DOING= / =DONE= / =CANCELLED= is skipped by the gate.
-- [ ] The eligibility tag set is read from the project's =todo.org= scheme header, not hardcoded.
+- [ ] The no-approvals speedrun runs as the preset (pre-flight Q&A → autonomous-commit + always-push + end-page) over an explicit ordered list, one commit per logical change.
+- [ ] =:solo:= and =:quick:= carry hard definitions in =todo-format.md=, and task-review + task-audit both refuse to complete without assessing them.
+- [ ] Eligibility = status =TODO= AND =:solo:=, read from the project's scheme header, not hardcoded; a =VERIFY= / =DOING= / =DONE= / =CANCELLED= task is skipped by the gate.
+- [ ] Task size never sends a task to =/start-work=; a large but =:solo:=, well-specified task runs and is decomposed into per-logical-commit chunks.
+- [ ] The defer checklist fires correctly: a task whose red test isn't writable (and isn't a quick-question gap), one carrying an enumerated data-loss operation, an already-satisfied one, and one needing design deliberation are each deferred (or routed to pre-flight Q&A under the speedrun), not implemented.
+- [ ] Under the speedrun preset, a task needing one or two quick decisions is surfaced in the pre-flight Q&A; "skip this" drops it, an answer is recorded and used; the run then proceeds with no further approvals.
+- [ ] Under the unattended loop, a decision-needing task defers (no pre-flight Q&A).
- [ ] In a project without the commit waiver, an =autonomous-commit= request degrades to file-only and says so; no commit is made.
-- [ ] A task carrying data-loss risk or needing a design decision is refused with a filed VERIFY, not implemented.
-- [ ] An underspecified / already-satisfied task files a VERIFY noting why and the run continues.
- [ ] The run stops at the per-run cap and pages with the remaining tasks listed.
- [ ] Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl=.
- [ ] The synthesis step reads the logs and writes a KB node under =~/org/roam/agents/=; it refuses to write for work-classified projects.
@@ -252,14 +298,16 @@ One record per task, appended to =.ai/metrics/work-the-backlog.jsonl= at each ta
|-------------------+--------------------------------------------------------------------|
| =project= | project basename |
|-------------------+--------------------------------------------------------------------|
-| =caller= | =loop= or =fix-speedrun= |
+| =caller= | =loop= or =speedrun= |
|-------------------+--------------------------------------------------------------------|
| =task= | task heading (slug) |
|-------------------+--------------------------------------------------------------------|
| =outcome= | implemented-committed / implemented-diff / deferred-verify / |
-| | deferred-too-large / skipped-ineligible |
+| | skipped-ineligible / dropped-by-craig (skipped at pre-flight) |
+|-------------------+--------------------------------------------------------------------|
+| =defer_reason= | underspecified / data-loss / already-satisfied / needs-deliberation |
|-------------------+--------------------------------------------------------------------|
-| =defer_reason= | for deferrals: needs-input / data-loss / underspecified / too-large |
+| =upfront_decision=| true if a pre-flight answer was recorded and used for this task |
|-------------------+--------------------------------------------------------------------|
| =wall_clock_s= | seconds from task start to outcome |
|-------------------+--------------------------------------------------------------------|
@@ -268,7 +316,7 @@ One record per task, appended to =.ai/metrics/work-the-backlog.jsonl= at each ta
| =review_findings= | count of /review-code Critical+Important findings on this task |
|-------------------+--------------------------------------------------------------------|
-Per-run rollups computed at synthesis (not stored per record): tasks attempted, completed, VERIFY-deferred, reverted; wall-clock total; commits landed; review findings per commit.
+Per-run rollups computed at synthesis (not stored per record): tasks attempted, completed, VERIFY-deferred, dropped-by-craig, reverted; wall-clock total; commits landed; review findings per commit.
** The corrections signal (the key metric)
@@ -293,13 +341,13 @@ The KB node is the artifact Craig reviews later — "are the autonomous runs com
- *Data model & ownership:* The task set is read from =todo.org= (project-owned, user-authored). The metrics JSONL is generated, append-only, git-tracked, project-owned. KB nodes are agent-generated under =~/org/roam/agents/= (never overwriting Craig's hand-authored nodes — link only). No editable region is co-owned.
- *Errors, empty states & failure:* Empty task set → report "nothing eligible" and stop. Malformed scheme header → fall back to the default tag reading and surface the fallback. A task that fails mid-implementation → leave the tree working (don't commit a broken state), record the failure outcome, surface it, continue to the next task. No silent data loss: the data-loss guardrail refuses irreversible tasks outright.
-- *Security & privacy:* Tasks touching credentials or external mutations are excluded by the data-loss / external-state guardrail. The KB write is personal-projects-only; work metrics never leave the project. No secrets in the JSONL (task slugs and SHAs only).
-- *Observability:* The end-of-set page surfaces the run outcome. The per-task surface (implemented / deferred + reason / skipped) is the live progress view. The metrics log + KB synthesis is the long-run observability. A bad run is isolable from the JSONL (which task, which outcome, which review findings).
-- *Performance & scale:* Expected counts are small — a handful of tasks per run, one run per 30-min tick. No bottleneck at this scale. The cap bounds the worst case. Synthesis over months of JSONL is still a small file (one record per task).
-- *Reuse & lost opportunities:* Reuses =todo-format.md= for task close, =/review-code= and =/voice personal= for the quality bar, =notify= for paging, the knowledge-base rule for KB writes, the per-project waiver for commit-autonomy. No new config file (the opt-in rides the existing waiver). The execution loop is the one new shared asset.
-- *Architecture fit & weak points:* Integration points — inbox-zero loop caller (chain after Phase D), the per-project waiver location, =todo.org= scheme header, =~/org/roam/agents/=. Weak point: the commit-autonomy gate depends on deterministically reading the waiver; mitigated by defaulting to file-only when the read is ambiguous (fail safe, not open). Second weak point: a 30-min loop committing unattended; mitigated by the hard cap and file-only default.
-- *Config surface:* Per-project — commit-autonomy opt-in (via existing waiver), optional loop-commit flag, optional autonomous-safe tag override in the scheme header. Per-call — task set, session mode, run cap. Defaults: file-only, paging-off (loop) / paging-on (fix-speedrun), cap 1 (loop).
-- *Documentation plan:* The workflow file itself is the user/operator doc (matches inbox-zero.org's self-documenting style). The =.emacs.d= stopgap note and the fix-speedrun proposal are superseded by this spec; no separate migration doc needed beyond removing the Phase E text.
+- *Security & privacy:* Tasks touching credentials or external mutations are excluded by the data-loss / external-state checklist item. The KB write is personal-projects-only; work metrics never leave the project. No secrets in the JSONL (task slugs and SHAs only).
+- *Observability:* The end-of-set page surfaces the run outcome. The per-task surface (implemented / deferred + reason / dropped / skipped) is the live progress view. The metrics log + KB synthesis is the long-run observability. A bad run is isolable from the JSONL (which task, which outcome, which review findings).
+- *Performance & scale:* Expected counts are small — a handful of tasks per run, one run per 30-min tick. No bottleneck at this scale. The cap bounds the worst case on task count; with size uncapped, a single large task is the cost outlier the vNext token budget addresses. Synthesis over months of JSONL is still a small file (one record per task).
+- *Reuse & lost opportunities:* Reuses =todo-format.md= for task close + the tag definitions, =/review-code= and =/voice personal= for the quality bar, =notify= for paging, the knowledge-base rule for KB writes, the per-project waiver for commit-autonomy, and task-review / task-audit for tag enforcement. No new config file (the opt-in rides the existing waiver). The execution loop is the one new shared asset.
+- *Architecture fit & weak points:* Integration points — inbox-zero loop caller (chain after Phase D), the per-project waiver location, =todo.org= scheme header, task-review / task-audit, =~/org/roam/agents/=. Weak point: the commit-autonomy gate depends on deterministically reading the waiver; mitigated by defaulting to file-only when the read is ambiguous (fail safe, not open). Second weak point: a 30-min loop committing unattended with uncapped task size; mitigated by the hard count cap and file-only default, with the token budget as the vNext backstop.
+- *Config surface:* Per-project — commit-autonomy opt-in (via existing waiver), optional loop-commit flag, optional autonomous-safe tag override in the scheme header. Per-call — task set, session mode, run cap. Defaults: file-only, paging-off (loop) / paging-on (speedrun), cap 1 (loop).
+- *Documentation plan:* The workflow file itself is the user/operator doc (matches inbox-zero.org's self-documenting style). The =.emacs.d= stopgap note and the speedrun proposal are superseded by this spec; no separate migration doc needed beyond removing the Phase E text.
- *Dev tooling:* N/A for new build targets — the workflows are prose, exercised by invocation. The metrics JSONL is =jq=-inspectable by hand; a tiny rollup helper may be added under =.ai/scripts/= if the synthesis prose proves to need it (decided at Phase 6, not a v1 prerequisite).
- *Rollout, compatibility & rollback:* Rollout is removing Phase E from inbox-zero and adding work-the-backlog — both prose changes, instantly reversible. Compatibility: inbox-zero's three callers are unchanged except the loop caller gaining a forward chain. Rollback: delete work-the-backlog and the loop chain step; inbox-zero is already back to A-D. The file-only default means the worst pre-rollback state is surfaced diffs, not committed changes.
- *External APIs & deps:* =notify alarm "Page" "<msg>" --persist= verified against =/home/cjennings/.local/bin/notify= and the page-me workflow. =~/org/roam/= KB write path and node shape verified against the knowledge-base rule. No external API calls.
@@ -308,17 +356,18 @@ The KB node is the artifact Craig reviews later — "are the autonomous runs com
- *The corrections signal is a proxy, not ground truth.* "A later commit touched the same files" over-counts (legitimate follow-up work) and under-counts (a correction in a different file). It's a flag for human review, not a verdict. Don't rabbit-hole on making it precise in v1 — the proxy plus a human glance is the design.
- *Waiver detection drift.* If the per-project waiver location moves or its format changes, the commit-autonomy gate could mis-read. Mitigation: fail safe to file-only. Pin the waiver format in the Phase 3 decision before building.
-- *Unattended-commit blast radius.* The headline risk. Mitigated three ways: file-only default, the hard cap, and the data-loss guardrail. The metrics loop is the fourth layer — it makes a bad run visible after the fact even if the first three let something through.
-- *Scope creep into /start-work territory.* The temptation to let "≤ 30 min" stretch. The act-vs-file gate and the "when unsure, file" rule are the brake; keep them strict.
+- *Unattended-commit blast radius.* The headline risk. Mitigated four ways: file-only default, the hard cap, the data-loss checklist item, and the metrics loop (which makes a bad run visible after the fact even if the first three let something through). With task size uncapped, the cost dimension of this risk grows — the vNext token budget is the planned fifth layer.
+- *Scope creep into /start-work territory.* Size is intentionally no longer the brake. The brake is the defer checklist's design-deliberation item plus the "when unsure, defer" rule — keep item 4 strict so genuine deliberation-class tasks still route out even when they're tagged =:solo:= by mistake.
+- *Pre-flight classifier error.* The speedrun's gather step has to split quick-answerable-question from real-deliberation. Misclassifying a deliberation task as a quick question puts a half-baked decision into an autonomous run. Mitigation: when the question isn't answerable in one or two lines, treat it as deliberation and drop it from the run, not as a pre-flight question.
* Testing / Verification / Rollout
-Verification is by invocation against a project's real =todo.org=: run the loop caller in file-only mode and confirm it surfaces diffs without committing; run fix-speedrun against a small explicit list in a waiver-carrying project and confirm one commit per task + the end page; plant a =VERIFY=-status task and a data-loss task and confirm both are skipped/refused; confirm the JSONL grows one record per task; run synthesis and confirm a KB node lands (personal project) or is refused (work project). Rollout is the Phase 1-6 sequence, each leaving the tree working; the file-only default makes early phases safe to ship before the commit and paging phases land.
+Verification is by invocation against a project's real =todo.org=: run the loop caller in file-only mode and confirm it surfaces diffs without committing; run the speedrun against a small explicit list in a waiver-carrying project and confirm the pre-flight Q&A fires, "skip this" drops a task, an answer is recorded and used, then one commit per logical change + the end page; plant a =VERIFY=-status task, a data-loss task, an already-satisfied task, and a large-but-=:solo:= task and confirm the first three are skipped/refused while the large one runs and decomposes; confirm the JSONL grows one record per task; run synthesis and confirm a KB node lands (personal project) or is refused (work project). Rollout is the Phase 0-6 sequence, each leaving the tree working; the file-only default makes early phases safe to ship before the commit and paging phases land.
* References / Appendix
- [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal (inbox-zero stopgap)]] and [[file:../../working/inbox-zero-phase-e/sender-note.org][its sender note with the 5 open questions]].
-- [[file:2026-06-15-fix-speedrun-workflow-proposal.org][fix-speedrun proposal]].
+- [[file:2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] (file retains its original on-disk name pending a rename pass).
- [[file:../../.ai/workflows/inbox-zero.org][inbox-zero.org (canonical, A-D)]] — the routing workflow this feature decouples from.
- =~/code/rulesets/claude-rules/knowledge-base.md= — the org-roam write contract the synthesis step follows.
@@ -327,3 +376,9 @@ Verification is by invocation against a project's real =todo.org=: run the loop
- What: initial draft reconciling the Phase E and fix-speedrun proposals into one work-the-backlog.org feature, plus the effectiveness-measurement instrumentation.
- Why: two overlapping proposals arrived within a day; building them separately would duplicate the execution loop and let it drift. Craig also asked explicitly for measurement + org-roam synthesis.
- Artifacts: this spec; the two source proposals under docs/design/ and working/inbox-zero-phase-e/.
+** 2026-06-28 Sun — revision (Craig)
+- What: removed the task-size gate (size no longer defers; large tasks decompose into per-commit chunks); recast the act-vs-file rule as a crisp four-item defer checklist keyed on test-writability; added crisp =:solo:= / =:quick:= definitions destined for =todo-format.md= and made their assessment mandatory in task-review + task-audit; added the speedrun's pre-flight decision-gathering step (batch the quick questions up front, "skip this" drops a task, then run hands-off); renamed "fix speedrun" → "no-approvals speedrun" in prose. Status stays draft pending ratification of the revised decisions.
+- Why: the original criteria were adjectives, not checkable; the size gate forced Craig to stay at his desk for anything non-trivial, defeating the away-from-desk use case; and decision-needing tasks were over-deferred when many need only a quick upfront answer.
+** 2026-06-29 Mon — ratified
+- What: Craig ratified all eight revised decisions; Status → ready. Implementation-ready across Phase 0 (tag definitions + task-review/audit enforcement) through Phase 6 (synthesis).
+- Why: the crisp defer checklist and the pre-flight-Q&A design resolved the "criteria too soft" and "size shouldn't gate" concerns that held the spec in draft.