diff options
6 files changed, 542 insertions, 96 deletions
diff --git a/.ai/notes.org b/.ai/notes.org index 03347d2..0d1e20e 100644 --- a/.ai/notes.org +++ b/.ai/notes.org @@ -79,6 +79,6 @@ Format: Markers maintained by workflows to record when they last ran. Read by other workflows that gate their behavior on freshness. :LAST_AUDIT: 2026-06-28 -:LAST_INBOX_PROCESS: 2026-06-28 (11 handoffs → simplification mode in /refactor, locating-craig.md rule, suspend.org workflow, commit-gate hardening + bundled-test deny hook, readability-audit template workflow, dot-stripped project names; bug-priority matrix made binding in todo-format.md + home/work handoffs) +:LAST_INBOX_PROCESS: 2026-06-29 (1 FYI handoff from home — confirmed it adopted the bug-priority matrix into its scheme header; no action needed, matrix already canonical here; acked) Format: one =:MARKER: YYYY-MM-DD= line per workflow. Workflows overwrite their own marker on completion. diff --git a/.ai/sessions/2026-06-28-15-57-inbox-proposals-shipped-and-task-audit.org b/.ai/sessions/2026-06-28-15-57-inbox-proposals-shipped-and-task-audit.org new file mode 100644 index 0000000..bf67b8e --- /dev/null +++ b/.ai/sessions/2026-06-28-15-57-inbox-proposals-shipped-and-task-audit.org @@ -0,0 +1,278 @@ +#+TITLE: Session Context +#+DATE: 2026-06-28 + +* Summary + +** Active Goal + +Startup → processed all 11 inbox handoffs (7 shared-asset proposals, skeptical- +reviewed via parallel subagents, walked A–G with Craig), then two follow-ups +(code-quality umbrella workflow + a task review), then a full task audit, then +wrap. Everything shipped is committed + pushed to origin/main. + +** Decisions + +- A simplification mode added to /refactor, and Craig chose to include it in the + default full scan (not own-mode-only). +- locating-craig.md is a standalone rule (not folded into daily-drivers.md). +- suspend.org built standalone-but-lean (not folded into flush), always-commit + step dropped. +- Commit gate: prose tightening AND a hard PreToolUse deny on bundled + test+commit (Craig picked the hook backstop over prose-only). +- Bug-priority matrix is BINDING for any project with a codebase (not opt-in); + mapping P1→[#A], P2→[#B], P3→[#C], P4→[#D]; home + work notified and work has + adopted it. +- Dot-stripped project names: alias approach (exact match still wins). +- readability-audit kept separate from /refactor (it feeds /refactor by filing + :refactor: tasks). +- Wrap done as NORMAL wrap, not teardown — the teardown feature is unvalidated + (its manual test was deferred this session), so no teardown sentinel dropped. + +** Data Collected / Findings + +- Task audit verdict: contrary to "many shipped," NONE of the 21 open tasks are + fully-done-but-open. The two closest (wrap-teardown, memories-sync) are + code-complete, gated only on Craig's manual validation. +- The git-commit-confirm hook's deny path: `** TODO` is a substring of + `*** TODO`, so Edit old_strings on demoted headings need a leading-newline + boundary to disambiguate. +- route_recommend matching: word-boundary literal match avoids home/homeowner + false positives; weak matching on common-word names (home, work) can + over-route — accepted v1 risk (labeled weak, reject-flow recovers). + +** Files Modified (all committed + pushed) + +- b621914 .claude/commands/refactor.md — simplification mode (later folded into full scan, 96dfa63 era edits) +- d4e9d7d claude-rules/locating-craig.md +- 797c426 suspend.org + readability-audit.org (+ INDEX, protocols) +- 92dfc35 verification.md + commits.md + hooks/{git-commit-confirm,_common}.py + tests (bundled-test deny) +- 9753d03 triggers.md + inbox-send.py (+ mirror, tests) — dot-stripped names +- 798ef02 claude-rules/todo-format.md + docs/design bug-priority bundle — binding matrix +- 6fb6797 notes.org inbox marker +- 96dfa63 code-quality.org umbrella workflow (+ INDEX) +- 5263cd6 todo.org — task review (3 restamped, generic-runtime → [#D]) +- 6be62ae .ai/scripts/route_recommend.py (+ mirror, tests) — wrap-up routing recommendation engine (spec Phases 1+3) +- 749566c todo.org — task audit reconciliation + flashcard tooling cluster + +** Next Steps — PICK UP HERE + +Phase D of the task audit was started but only 1 of 5 items handled. Remaining, +in suggested order: + +1. *wrap-teardown (task 42) manual validation — DEFERRED today.* Feature shipped + + pushed both sides; only the 5-test checklist remains (in the task body). + HAZARDS when running: test 1 tears down whatever session runs it (use a + SCRATCH ai-term session, never the live one); test 4 powers off (stub + `sudo shutdown now` → echo first); each "wrap" test ends its session so they + don't chain. On all-pass close task 42 DONE + CLOSED:; a failure becomes a bug. +2. *memories-sync VERIFY (task 214).* Implementation fully shipped; can't close + until (a) its manual-validation child runs and (b) ratio gets the one-time + roam.git clone + roam-sync timer (velox confirmed; ratio still outstanding, + can't verify from velox). See daily-drivers.md. +3. *Spec storage location + lifecycle convention (task 362).* Stalled on one + decision: filename-suffix vs org-keyword for lifecycle status. Needs Craig's call. +4. *"fix speedrun" autonomous-batch (task 383, DOING).* Stalled at spec-review; + needs Craig to ratify (or re-park) the 6 open spec decisions before building + work-the-backlog.org. +5. *Tooling-path warn-hook (task 434, [#D]).* Craig chose docs-only before; + greenlight building the warn-only hook, or leave [#D]. + +Also queued (not Phase D): +- Wrap-up routing feature (task 133, DOING): engine landed (6be62ae). Next + sub-tasks under it: :ROUTE_CANDIDATE: marker in inbox process mode, the + wrap-it-up router sub-step, the test surface, then manual e2e validation. All + call route_recommend.py. Spec: docs/design/wrapup-routing-spec.org. +- Flashcard tooling cluster (task parent created this session): apkg converter, + refutation (generic header-exemption per cj), multi-tag reconcile — build + together, same scripts; re-derive against the post-#+TITLE-fix canonical. + +KB: promoted 0 / consulted no + +* Session Log + +** 2026-06-28 Sun — Startup + inbox triage + +Ran startup (Phase A.0/A/B). Clean prior wrap (no session-context.org). .ai/ +synced from templates fine. Findings: 11 pending inbox handoffs, 3 top-level +tasks unreviewed >7 days, roam inbox 2 items, KB 51 nodes (none relevant). + +The 11 inbox items resolve to ~7 substantive proposals (some carry a cover +note): +- A: Simplification lens for the refactor skill (.emacs.d) +- B: new rule locating-craig.md (home) + cover note +- C: new workflow suspend.org (.emacs.d) + cover note +- D: bug-priority severity×frequency matrix (wttrin) + cover note +- E: harden commit gate to require green full suite (wttrin) +- F: generalize readability-audit.org into a template workflow (.emacs.d) + cover +- G: strip dots from project names .emacs.d→emacsd (.emacs.d roam item) + +All are shared-asset/convention changes → skeptical review + Craig approval, none +self-applies. Dispatching parallel read-only skeptical-review subagents for A-F; +G is a lightweight file-as-task. Surfacing dispositions to Craig next. + +Six skeptical reviews returned (all "do it with named changes"): +- A: keep only new lenses 1/2/3/4/7; cross-ref 5(dead-code)/6(duplication)/8(dead-code) + instead of re-specifying; own mode, NOT in default full scan; state /refactor↔/simplify + boundary. Lands in .claude/commands/refactor.md (it's a slash command, no refactor/ dir). +- B: standalone claude-rules/locating-craig.md. Changes: add whereami-fails fallback + (never fabricate), positive gate (velox→run else ask), drop -lts secondary tell, + add public-artifact-scope line (geolocated address is personal). whereami confirmed + real at ~/.local/bin/whereami; host is velox. +- C: REAL DESIGN FORK — substantial overlap with the flush skill (suspend ≈ flush's + pre-clear checkpoint minus /clear, plus a commit, reframed for departure). Reviewer + prefers folding into flush as a "checkpoint-and-stop" variant over a standalone + workflow. Also: drop the always-commit-working-files step (.emacs.d-ism; blind commits + fight wrap-it-up's leftover discipline). Startup's resume path already exists. + Companion: INDEX + protocols triggers if kept standalone. +- D: fold into todo-format.md as an OPTIONAL opt-in scheme; drop the global + letter-mapping mandate (present as template); keep security/privacy carve-out; mark + release-vehicle column illustrative. Defer the non-coding companion matrix as a + separate task. +- E: only the new mechanic is load-bearing (run suite as own command, gate on + failures==0, never bundle `make test; git commit`). Two-line edits to verification.md + #1 and commits.md #5; exact wording provided. Asks #2/#4 already covered. +- F: generalize Elisp specifics; drop concrete ;;; markers (keep principle); add + graceful degradation; add cross-links to /refactor + /simplify + A. Coherent division + of labor confirmed (readability = prose/naming clarity; /refactor = structural metrics; + /simplify + A = behavior-preserving logic reduction). Lands as a template workflow. +- G: file-as-task (strip dots from project names; touches triggers.md + inbox-send). + +*** Item A — APPLIED (not yet committed) +Craig approved option 1, then changed his mind: simplification IS part of the +default full scan. Added Mode: +Simplification to .claude/commands/refactor.md (frontmatter Six→Seven modes, +argument-table row, new section after Mode: Dead Code) with the 5 new lenses, +cross-refs to Dead Code (twin branches, plain deletion) + Duplication (repeated +literals), verify-all-call-sites rule, and a /simplify boundary note. Harness +picked up the skill change live mid-session. Confirmed for Craig: Rename mode +executes renames but doesn't flag bad names; no organization scan exists — both +gaps are proposal F's territory. + +*** Item B — APPLIED (not yet committed) +Craig approved. Wrote claude-rules/locating-craig.md (standalone) with the four +review changes: whereami-fails fallback (never fabricate), positive host gate +(velox→run, any other host→ask), dropped the -lts secondary tell, added a +keep-out-of-shared-artifacts section. make install linked it into +~/.claude/rules/locating-craig.md. + +*** Item C — APPLIED (not yet committed) +Craig picked option 1 (standalone, lean). Wrote +claude-templates/.ai/workflows/suspend.org with the review changes: drop the +always-commit step (note uncommitted work, leave tree as-is; project-opt-in +always-commit set only), cross-refs to flush + wrap-it-up, states it's the +capture half (startup is the resume half), flags "I need to go" breadth. +Registered in INDEX.org (Session lifecycle, after wrap-it-up) + protocols.org +trigger section. sync-check --fix synced canonical→mirror; re-verified exit 0, +suspend.org mirror matches. + +*** Item D — REVERTED then RE-APPLIED (binding), not yet committed +First applied an opt-in version; Craig reverted it ("do it differently"). His +intent: the matrix is BINDING, not opt-in — any project with a codebase (incl. +home + work, which have one despite being non-code) must prioritize its codebase +bugs by the matrix. Re-applied to claude-rules/todo-format.md as a mandatory +subsection. Mapping per Craig (2a): P1→[#A], P2→[#B], P3→[#C], P4→[#D] (fixed, +not a per-project knob). Bands defined per codebase; matrix structure + mapping +fixed. Severity-alone carve-out kept. Sent adoption handoffs to home + work +(inbox-send, 2026-06-28-1212). Non-coding companion matrix dropped — scope is +codebase bugs (home/work codebases covered). + +*** Item E — APPLIED (not yet committed) +Craig picked option 1 (prose tightening + bundling-detection hard gate in the +PreToolUse hook); asked first whether a hard gate existed — it didn't (githooks +pre-commit only runs sync-check; git-commit-confirm.py only scanned attribution). +Applied: verification.md "Before Committing" #1 and commits.md #5 rewritten to +"run the full suite as its own command, gate on zero failures, never bundle the +run with the commit." Added detect_bundled_test_run() + respond_deny() to the +hook (hooks/git-commit-confirm.py + hooks/_common.py): denies a test runner +chained into git commit via any ungated connector (;, &, |, ||, newline, or a +pipe that masks exit), allows the gated && form, matches the runner only in the +prefix before git commit so a runner name in the message doesn't trip it. TDD: +13 new tests red→green; full make test exit 0; end-to-end smoke test confirms +deny on bundled / pass on gated+plain. + +*** Item F — APPLIED (not yet committed) +Craig asked "should this be part of refactoring?" — concluded separate-but-linked +(it's a multi-phase workflow that FILES structural work as :refactor: tasks, i.e. +feeds /refactor rather than being a mode of it; /refactor is structure-only +scan-and-apply). Craig approved option 1. Wrote +claude-templates/.ai/workflows/readability-audit.org (generalized from .emacs.d's +Elisp draft: header convention / public-private naming / doc-linters all +"the project's X if it has one"; dropped concrete ;;; markers, kept the +mechanical-applier principle; added graceful degradation for no-suite/no-header/ +no-linter; added the pipeline cross-links to /refactor + /simplify). INDEX entry +under new "Code quality" section. sync-check exit 0, mirror matches. +Told Craig the run sequence: /refactor (incl. simplification) + readability-audit += existing-code sweep; /simplify = in-flight-diff cleanup. Offered (not built) a +code-quality umbrella workflow to chain them. + +*** Item G — APPLIED (not yet committed) +Craig picked option 2 (do it now) + alias approach. Implemented dot-stripped +project-name resolution: inbox-send.py gained display_name() (basename with dots +stripped), find_target() falls back to a dot-stripped alias after exact match +(exact wins), print_project_list shows the stripped name. triggers.md launch +resolution gained the dot-stripped match rule. TDD: 3 new alias tests red→green +(incl. exact-wins-over-alias), 26 inbox-send tests pass; sync-check exit 0; full +make test exit 0. .emacs.d→emacsd, .dotfiles→dotfiles now resolve in both ai +launch and inbox-send. + +** Walk complete; inbox close-out done — commits pending +A,B,C,D,E,F,G all applied + verified, uncommitted. Inbox cleared (0 pending): +bug-priority proposal + cover preserved to docs/design/2026-06-27-*; 9 other +handoffs deleted (content in canonical files). :LAST_INBOX_PROCESS: stamped +2026-06-28. Replies sent: emacsd (4 items, via the new alias), home (locating-craig), +work (bug-matrix FYI); plus binding-adoption handoffs to home + work. +Commits DONE — 6 landed on main (publish flow: /review-code over staged diff = +Approve; /voice personal over all 6 messages; Craig approved all): + b621914 feat(refactor): add simplification scan mode (A) + d4e9d7d feat(rules): add locating-craig rule (B) + 797c426 feat(workflows): add suspend and readability-audit workflows (C+F) + 92dfc35 feat(hooks): block bundled test+commit, require full suite before commit (E) + 9753d03 feat(inbox-send): resolve dot-stripped project names (G) + 798ef02 feat(todo-format): make the bug-priority matrix binding for codebases (D) +PUSHED to origin/main (ecd33e0..798ef02); in sync (0/0). Pre-push reconcile +confirmed ahead-only. Working tree: only .ai/notes.org marker + +.ai/session-context.org (both for wrap-up). + +** Post-push follow-ups (Craig: "do both") +- Task review: 3 stale tasks (reviewed 2026-06-15) re-stamped 2026-06-28; generic + agent-runtime spec re-graded [#C]→[#D] (speculative large arc, not committed); + memories-sync VERIFY + token-rotation helper kept. Staleness now 0. todo.org + uncommitted. +- Umbrella workflow: created claude-templates/.ai/workflows/code-quality.org — one + trigger sequencing /refactor → readability-audit over a scope, surfaces the + filed :refactor: backlog, documents the /simplify boundary. INDEX entry under + Code quality; sync-check exit 0. Uncommitted. +Both committed + pushed (publish flow: /review-code Approve, /voice personal on +the workflow body, Craig approved): + 96dfa63 feat(workflows): add code-quality sweep workflow + 5263cd6 chore(todo): task review — restamp stale tasks, downgrade generic-runtime to [#D] +origin/main in sync (0/0). Staleness nudge cleared (0). +Then committed the notes.org inbox marker (6fb6797 chore) to clean the tree; +working tree now only .ai/session-context.org (live anchor). 1 ahead of origin +(the marker commit, unpushed). [Pushed 6fb6797.] + +** Next work: wrap-up routing feature (Craig: "1 then 2") +*** 1 — Recommendation engine (spec Phase 1+3) — BUILT, tested +Added .ai/scripts/route_recommend.py (canonical+mirror): pure recommend(item, +projects)→(destination, confidence) — strong/weak/none, word-boundary literal +match, dot-stripped alias aware, top-tier tie→weak deterministic, empty→none. +CLI (--item/--exclude) reuses inbox-send discover_projects via importlib. 13 +tests green, full make test exit 0, mirror synced. Sub-task in todo.org rewritten +to dated entry. UNCOMMITTED — committing via publish flow next. +*** 2 — wrap-teardown manual validation — DEFERRED (Craig redirected) +Engine committed + pushed (6be62ae). Then Craig redirected: "many tasks in +todo.org were shipped — let's do a full task audit." Pivoted to task-audit. + +** TASK AUDIT (Craig: many tasks shipped) +Phase A: 21 open tasks (lines 42-1134 in Open Work). Phase B: dispatched 4 +parallel read-only reconciliation subagents over batches, each checking tasks +vs git log + repo tree + sessions, returning CURRENT/DONE/STALE/NEEDS-USER. +Verdict: contrary to "many shipped," NONE are fully-done-but-open. Most CURRENT +(backlog). The 2 closest (wrap-teardown 42, memories-sync VERIFY 214) are +code-complete, gated only on Craig's manual validation. +Phase C autonomous updates applied: task 186 (folded cj generic-header redirect, +superseded 2-option fix), task 203 (folded cj "document as local-only"; :bug:→ +:chore:, reframed as docs task), task 428 (precondition-landed note + LAST_REVIEWED). +Phase E: :LAST_AUDIT: stamped 2026-06-28. Phase F: skip task-review chain (ran +today). NEEDS-USER + clusters surfaced to Craig next. todo.org + notes.org +uncommitted (audit edits). diff --git a/.ai/sessions/2026-06-29-03-56-spec-lifecycle-decision-and-speedrun-ratified.org b/.ai/sessions/2026-06-29-03-56-spec-lifecycle-decision-and-speedrun-ratified.org new file mode 100644 index 0000000..2a61f75 --- /dev/null +++ b/.ai/sessions/2026-06-29-03-56-spec-lifecycle-decision-and-speedrun-ratified.org @@ -0,0 +1,107 @@ +#+TITLE: Session Context +#+DATE: 2026-06-28 + +* Summary + +** Active Goal +Handle todo.org items 4 (spec storage location + lifecycle convention) and 5 +(speedrun / autonomous-batch) — both decision-gated — then wrap. + +** Decisions +- Item 4 status mechanism: org-keyword authoritative + Status field in Metadata, + drop the filename suffix (Craig chose option 1 over his earlier filename-suffix + lean, 2026-06-28). +- Item 4 scope addition: retrofit existing docs across ALL projects, not just + document the convention going forward (Craig, 2026-06-28). +- Speedrun naming: the workflow is "speedrun" / "no approvals speedrun" (not + "fix speedrun"); threaded through task heading, body, and the spec prose. +- Item 5 criteria recast (Craig found them too soft): removed the task-size gate + entirely (large tasks decompose into per-commit chunks; size gating defeated the + away-from-desk use case); replaced act-vs-file adjectives with a crisp 4-item + defer checklist keyed on test-writability; eligibility simplified to status TODO + AND :solo:. +- :solo: / :quick: get hard definitions in todo-format.md, applied at creation and + enforced as a mandatory step in task-review + task-audit. +- Added the speedrun pre-flight decision-gathering step: batch all quick decisions + up front, "skip this" drops a task, then run hands-off. Unattended loop has no + kickoff human, so it still defers decision-needing tasks. +- Craig ratified all 8 revised decisions; spec Status → ready. + +** Data Collected / Findings +- No abandoned work from any shutdown: clean wrap last session (no crash anchor, + clean tree, last commit was the wrap archive at 15:59). Craig's "machine shut + down" recollection didn't match the record; deferred work (wrap-teardown + validation) was the closest match. +- The autonomous-batch spec already existed and reconciled the old fix-speedrun + + inbox-zero Phase E proposals; it had 6 drafted decisions awaiting ratification. + The revision grew it to 8 (added tag-definitions/enforcement + pre-flight Q&A). + +** Files Modified +- docs/design/2026-06-16-autonomous-batch-execution-spec.org — major revision + (size gate removed, defer checklist, tag definitions, pre-flight Q&A, naming), + then ratified: Status ready, cookie [8/8], all 8 decisions DONE, history entries. +- todo.org — item 4 (:373) decision + retrofit requirement recorded; item 5 (:394) + heading/body renamed to "No-approvals speedrun"; the spec-review VERIFY rewritten + to a dated event-log entry. + +** Next Steps +- Item 5 build (when prioritized): Phase 0 (todo-format.md :solo:/:quick: definitions + + task-review/task-audit enforcement) through Phase 6 (synthesis). Parent task + stays DOING. +- Item 4 build (when prioritized): spec-create via the recorded decisions; ship the + retrofit helper + startup nudge; pilot on rulesets' own docs/design first. +- Naming cleanup: the proposal-doc filenames still carry "fix-speedrun"; a rename + pass with link updates is deferred. +- Other open carryover from startup: wrap-teardown manual validation (task 42), + memories-sync VERIFY (needs ratio), plus the remaining what's-next candidates. + +KB: promoted 0 / consulted no + +* Session Log + +** 2026-06-28 — Startup + what's-next triage +Ran full startup: clean wrap last session (no crash anchor), repos current, +inbox empty, no reminders/pending decisions. Roam inbox had 4 items, all for +other projects (.emacs.d, emacs-wttrin) — none for rulesets. Surfaced 5 +what's-next candidates; Craig picked items 4 and 5 to handle, then wrap. + +** 2026-06-28 — Item 4 decision recorded +Craig chose option 1 for the spec lifecycle status mechanism (org-keyword +authoritative + Status field, drop filename suffix; adopt location split + +org-id links). He added a requirement: existing spec/design files in ALL +projects must be sorted into docs/specs/ vs docs/design/ — a one-time per-project +migration template sync can't do, so the spec must design the reach mechanism +(proposed: synced classify-and-move helper under .ai/scripts/ + startup nudge +gated on a :LAST_SPEC_SORT: marker). Recorded both into todo.org:373. + +** 2026-06-28 — Item 5 (speedrun) spec revised per Craig's direction +Craig found the eligibility criteria too soft. Revised the autonomous-batch spec +(docs/design/2026-06-16-autonomous-batch-execution-spec.org) substantially: +- Removed the task-size gate entirely (Craig: size shouldn't matter; large tasks + decompose into per-commit chunks; speedrun is the away-from-desk mode and size + gating forced him to stay at the desk). I agreed; only caveat is the unattended + loop's cost ceiling, handled by the vNext token budget. +- Recast act-vs-file as a crisp 4-item defer checklist keyed on test-writability + ("can I write the failing test from the task text without inventing a + requirement"), an enumerated data-loss operation list, already-satisfied, and + design-deliberation. Replaces the old adjectives. +- Eligibility simplified to status TODO AND :solo: (size gone, so :quick: drops to + an effort hint, not a gate). :solo:/:quick: get hard definitions in + todo-format.md, applied at creation + enforced as a mandatory step in + task-review and task-audit (Craig's ask). +- Added the speedrun pre-flight decision-gathering step: gather → classify → order + → intro → batch-ask the quick decisions → "skip this" drops a task → run + hands-off. Makes "no approvals" = all approvals front-loaded. The unattended + loop has no kickoff human, so it still defers decision-needing tasks. +- Naming: "fix speedrun" → "no-approvals speedrun" in spec prose + todo.org:394 + heading/body. Proposal-doc filenames keep their on-disk names (rename pass is + separate). Spec Status stays draft pending ratification of the revised decisions. +Spec opened in emacs for Craig's review. Companion build edits still pending: +todo-format.md definitions + task-review/task-audit enforcement (Phase 0). + +** 2026-06-29 — Item 5 ratified +Craig ratified all 8 decisions. Spec Status → ready, cookie → [8/8], all 8 +decision headings DONE, ratification entry added to iteration history. The +*** VERIFY "Review the autonomous-batch execution spec" (todo.org) rewritten to a +dated event-log entry. Parent task stays DOING (build pending: Phase 0–6). +Items 4 and 5 both handled. Ready to wrap. diff --git a/docs/design/2026-06-16-autonomous-batch-execution-spec.org b/docs/design/2026-06-16-autonomous-batch-execution-spec.org index e2e0f90..84cefe3 100644 --- a/docs/design/2026-06-16-autonomous-batch-execution-spec.org +++ b/docs/design/2026-06-16-autonomous-batch-execution-spec.org @@ -4,7 +4,7 @@ #+TODO: TODO | DONE SUPERSEDED CANCELLED * Metadata -| Status | draft | +| Status | ready | |----------+--------------------------------------------------------------------| | Owner | Craig Jennings | |----------+--------------------------------------------------------------------| @@ -12,21 +12,21 @@ |----------+--------------------------------------------------------------------| | Date | 2026-06-16 | |----------+--------------------------------------------------------------------| -| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:2026-06-15-fix-speedrun-workflow-proposal.org][fix-speedrun proposal]] | +| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] | |----------+--------------------------------------------------------------------| * Summary -Two proposals arrived within a day of each other describing the same capability: have Claude work a batch of small, well-marked tasks autonomously, with a full quality bar per task and no per-step approval gate. The inbox-zero "Phase E" proposal drives it from a tag/priority query on a recurring loop; the "fix speedrun" proposal drives it from an explicit ordered list a human dictates in-session. This spec reconciles both into one feature: a single dedicated workflow, =work-the-backlog.org=, that holds the task-execution logic, with two thin callers feeding it. It also designs the instrumentation that measures whether the autonomy is actually paying off. +Two proposals arrived within a day of each other describing the same capability: have Claude work a batch of small, well-marked tasks autonomously, with a full quality bar per task and no per-step approval gate. The inbox-zero "Phase E" proposal drives it from a tag/priority query on a recurring loop; the "speedrun" proposal drives it from an explicit ordered list a human dictates in-session. This spec reconciles both into one feature: a single dedicated workflow, =work-the-backlog.org=, that holds the task-execution logic, with two thin callers feeding it. It also designs the instrumentation that measures whether the autonomy is actually paying off. * Problem / Context -Craig has a standing backlog of small, solo-doable fixes across several projects, already marked with a tag convention (=:next:=, =:quick:+:solo:=). Doing them by hand one at a time is the bottleneck — each is 30 minutes or less, but the context-switch and the per-commit approval ceremony dominate the actual work. He wants Claude to burn these down unattended: on a recurring loop for the routed inbox case, and on demand when he batches a named list and says "fix speedrun, no approvals until done." +Craig has a standing backlog of small, solo-doable fixes across several projects, already marked with a tag convention (=:next:=, =:quick:+:solo:=). Doing them by hand one at a time is the bottleneck — the context-switch and the per-commit approval ceremony dominate the actual work. He wants Claude to burn these down unattended: on a recurring loop for the routed inbox case, and on demand when he batches a named list and says "speedrun, no approvals until done." The speedrun is the away-from-desk / working-on-something-else mode, so it must be able to take on larger tasks too — not only sub-30-minute ones — or it forces him to stay at the desk for anything non-trivial. Two separate proposals tried to answer this: - *Phase E* (in =inbox-zero.org=, edited in =.emacs.d= as a stopgap) bolted autonomous execution onto the inbox-zero workflow's on-demand and loop callers. The sender flagged the seam as the open question: coupling capture-routing with autonomous-implementation pollutes inbox-zero's three existing callers (startup, wrap-up, on-demand), two of which must never execute anything. -- *fix speedrun* (a =.emacs.d= theme-studio session that worked well) is the same execution loop driven by an explicit ordered task set, with end-of-set paging and always-push. +- *speedrun* (a =.emacs.d= theme-studio session that worked well) is the same execution loop driven by an explicit ordered task set, with end-of-set paging and always-push. They overlap almost entirely. The execution loop — eligibility gate, act-vs-file decision, per-task quality bar, bounded run — is identical. Only the *input* differs (tag query vs explicit list) and the *session mode* differs (loop default vs no-approvals + always-push + page). Building them as two features would duplicate the execution logic and let the two copies drift. The forces: keep inbox-zero's callers clean, share one execution loop, and make the autonomy safe enough to run unattended on a 30-minute timer without Craig watching. @@ -37,22 +37,24 @@ A second, explicit ask from Craig: instrument this so its effectiveness is measu ** Goals - One workflow, =work-the-backlog.org=, owns the task-execution loop. Both input shapes (tag query, explicit list) and both session modes feed it. - inbox-zero's three existing callers stay clean: the loop caller chains into =work-the-backlog= *after* routing; startup and wrap-up never touch it. -- "fix speedrun" is a thin named preset, not a second implementation: no-approvals session mode + always-push + end-of-set page, feeding an explicit ordered list. +- The *no-approvals speedrun* is a thin named preset, not a second implementation: autonomous-commit + always-push + end-of-set page, fed an explicit ordered list, with all approvals front-loaded into a single pre-flight step (below) so the run itself is uninterrupted. +- Eligibility is decided by *crisp, checkable criteria*, not adjectives: a mechanical tag/status gate (=:solo:= + status =TODO=), then a per-task defer checklist whose keystone is "can I write the failing test from the task text without inventing a requirement?" Task *size* is explicitly not a gate — a large task is decomposed into per-logical-commit chunks, not deferred. +- The autonomy tags (=:solo:=, =:quick:=) carry hard definitions in =todo-format.md= and are applied + enforced as a mandatory step in the task-review and task-audit workflows, so the run-time gate trusts the author's tag instead of re-deriving it. - Commit autonomy defaults to file-only (surface a diff, no auto-commit). A project opts into autonomous commit+push explicitly via its per-project waiver. -- Hard guardrails: refuse to speedrun any task needing a design decision or carrying data-loss risk without a checkpoint; file a =VERIFY= and move on rather than guess-implement an underspecified task; a per-run cap / kill switch beyond "one task per run." +- Hard guardrails: refuse any task carrying data-loss / irreversible / external-state risk without a checkpoint; gather any one-or-two quick decisions a task needs *up front* (speedrun) rather than guessing; file a =VERIFY= for anything underspecified or needing design deliberation; a per-run cap / kill switch beyond "one task per run." - A lightweight per-run metrics log plus a periodic synthesis step that writes org-roam KB articles summarizing the trend. ** Non-Goals -- *Not* a replacement for =/start-work=. Tasks needing deliberation, design, or an hour-plus stay with =/start-work= and its approval gates. This feature only touches the small, marked, solo set. +- *Not* a replacement for =/start-work=. Tasks needing deliberation or design stay with =/start-work= and its approval gates. This feature only touches the marked, solo set — regardless of size. - *Not* a new tag convention. It reads the project's own priority/tag scheme header; it never invents or hardcodes tags across projects. - *Not* an inbox-routing change. =inbox-zero.org= keeps its A-D phases. The Phase E text added in =.emacs.d= as a stopgap is *removed* and its logic moves here. - *Not* a multi-project orchestrator. One run works one project's backlog. Cross-project handoff stays with =inbox-send= and the paging reply. - *Not* a credential-handling or external-API feature. Tasks that touch secrets or external mutations are out of the eligible set by the guardrail. ** Scope tiers -- *v1:* =work-the-backlog.org=; the eligibility gate reading the project's scheme header; the act-vs-file decision with VERIFY-on-ambiguity; file-only commit default with per-project opt-in; the loop caller wiring and inbox-zero Phase E removal; the "fix speedrun" preset with end-of-set =notify --persist= page; the per-run metrics log (structured JSONL). +- *v1:* =work-the-backlog.org=; crisp =:solo:= / =:quick:= definitions in =todo-format.md= plus their mandatory application in task-review and task-audit; the eligibility gate (=:solo:= + status =TODO=, read against the project's scheme header); the act-vs-file *defer checklist* (test-writability keystone, enumerated data-loss list, already-satisfied, design-deliberation); the no-approvals speedrun's pre-flight decision-gathering step; file-only commit default with per-project opt-in; the loop caller wiring and inbox-zero Phase E removal; the speedrun preset with end-of-set =notify --persist= page; the per-run metrics log (structured JSONL). - *Out of scope:* a token-budget kill switch (cap is a task count in v1); cross-project batch runs; a dashboard or live UI over the metrics. -- *vNext (log to todo.org):* the periodic org-roam synthesis step if it doesn't make v1; a token/cost budget alongside the task-count cap; auto-detection of "human corrected my autonomous commit" from the next session's diff. +- *vNext (log to todo.org):* the periodic org-roam synthesis step if it doesn't make v1; a token/cost budget alongside the task-count cap (more pressing now that task size is uncapped — a single large task can run long in the unattended loop); auto-detection of "human corrected my autonomous commit" from the next session's diff. * Design @@ -63,8 +65,8 @@ The architecture is one execution workflow with two callers and one preset, plus #+begin_example inbox-zero loop caller ──(after Phase D routing)──┐ ├──▶ work-the-backlog.org ──▶ metrics log (JSONL) - "fix speedrun" preset ──(explicit ordered list)───┘ │ - = no-approvals + always-push + end-page ▼ + no-approvals speedrun ──(explicit ordered list)──┘ │ + = pre-flight Q&A + autonomous-commit + push + page ▼ periodic synthesis ──▶ org-roam KB articles #+end_example @@ -76,20 +78,20 @@ This is the seam the Phase E sender asked for: separating capture-routing (inbox A caller hands =work-the-backlog= three things: -1. *A task set* — either an explicit ordered list of task headings (fix speedrun), or the result of a tag/priority query against =todo.org= (the loop). The workflow does not care which; it receives an ordered list of candidate tasks. +1. *A task set* — either an explicit ordered list of task headings (speedrun), or the result of a tag/priority query against =todo.org= (the loop). The workflow does not care which; it receives an ordered list of candidate tasks. 2. *A session mode* — =file-only= (default) or =autonomous-commit= (requires the project's per-project waiver), and a paging flag. 3. *A run cap* — the maximum number of tasks to complete this run. -It returns: per-task outcome (implemented+committed / implemented+diff-surfaced / deferred-VERIFY / deferred-too-large / skipped-ineligible), and a metrics record per task. +It returns: per-task outcome (implemented+committed / implemented+diff-surfaced / deferred-VERIFY / dropped-by-craig / skipped-ineligible), and a metrics record per task. ** The execution loop (implementer's view) For the task set, in order, until the run cap is hit: 1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task. -2. *Scope read* of the relevant code. Cheap; just enough to make the act-vs-file call. -3. *Act-vs-file decision* (below). File → record the deferral reason, next task. -4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important, then close the task per =todo-format.md=. +2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist. +3. *Defer checklist* (below). Any hit → record the deferral reason (or, under the speedrun preset, route the quick-question gap to the pre-flight Q&A), next task. +4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important, then close the task per =todo-format.md=. Decompose into as many logical commits as the change needs — size is not capped. 5. *Commit autonomy branch:* - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=. - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=. @@ -98,41 +100,63 @@ For the task set, in order, until the run cap is hit: After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary. -** Eligibility gate +** Eligibility gate (mechanical — no judgment) -A task is autonomous-safe when *all* hold: +A task is autonomous-safe when *both* hold. This layer is a lookup, not a judgment; all the judgment lives in the defer checklist below. -1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= is the "awaiting Craig's manual confirmation" marker; auto-implementing one defeats the manual check it represents. -2. *Tagged per the project's autonomous-safe set* — resolved by reading the project's priority/tag scheme header at the top of its =todo.org=, not by hardcoding. The default reading is =:next:= OR both =:quick:= AND =:solo:=, but a project whose scheme declares a different autonomous-safe tag set overrides that. -3. *Solo-doable* — no input or undecided judgment call from Craig. -4. *Roughly 30 minutes or less* of focused work. +1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= is the "awaiting Craig's manual confirmation" marker; auto-implementing one defeats the manual check it represents. The do-not-implement set is safe-by-omission: anything not plainly =TODO= (plus any project-declared "hold" marker) is out. +2. *Tagged =:solo:=* — the autonomy tag, resolved against the project's priority/tag scheme header (not hardcoded). =:solo:= carries a hard definition (see Tag definitions, below): the task is completable without Craig's involvement beyond at most one or two quick decisions answerable up front, with no design deliberation. A project whose scheme declares a different autonomous-safe tag set overrides the default. Priority / =:next:= drive *ordering* within the eligible set, not eligibility. -** Act-vs-file decision (the guardrail) +Task *size* is deliberately absent from this gate. The old "≤ ~30 minutes / one logical commit" criterion is removed: a large but well-specified, decision-free task is in scope and is decomposed into per-logical-commit chunks during implementation. Size never sends a task to =/start-work=; only *deliberation* or *risk* does (the checklist below). This is what makes the speedrun usable as an away-from-desk mode rather than a sub-30-minute-only mode. -After the scope read, for each eligible candidate: +*** Tag definitions (land in =todo-format.md=, enforced in task-review + task-audit) -- *Clear, bounded, solo, ≤ ~30 min* → implement. -- *Needs a design decision, Craig's input, or discussion* → do NOT implement. File a one-line note on the task naming the input it needs; surface it. -- *Carries data-loss risk without a checkpoint* (deletes data, rewrites persisted state, touches external/shared state irreversibly) → do NOT implement. File a =VERIFY= explaining the risk; surface it. -- *Underspecified or already-satisfied* → do NOT guess-implement. File a =VERIFY= noting why (the fix-speedrun "raise max spans to 5 — every cap was already 8" case) and move on. -- *An hour or more* → do NOT implement. File and surface as a =/start-work= candidate. +- *=:solo:= — autonomy.* The task can be completed without Craig's involvement, except for at most one or two quick decisions that can be stated and answered before the run starts. No open design question, no "weigh these approaches," no waiting on Craig mid-task. This is the eligibility tag. +- *=:quick:= — effort hint only.* A small, fast task. Informational for batching and estimating a run's duration; *not* an eligibility gate (size no longer gates). -When unsure which side a task falls on, file rather than implement. A wrong auto-implement costs more than a deferred task — it costs a revert *and* the human correction in the next session that the metrics are designed to catch. +Both tags are applied at task creation and *re-checked as a mandatory step* in the task-review and task-audit workflows, so the run-time gate can trust the author's tag rather than re-derive autonomy and effort from the task body. A task-review or task-audit that skips the =:solo:= / =:quick:= assessment is incomplete. -** Session modes and the "fix speedrun" preset +** Act-vs-file decision (the defer checklist) + +After the scope read, run each eligible candidate through the checklist below. Each item is a concrete, answerable question, not an adjective. *Any* hit — or any "unsure" — sends the task to defer (or, for a quick-decision gap under the speedrun preset, to the pre-flight Q&A). Only a task that clears every item is implemented. + +1. *Test-writability (the keystone).* Can I write the failing test from the task text — plus any decisions gathered up front — without inventing a requirement? *No / unsure* → underspecified. Under the speedrun preset, if the gap is one or two quick answerable questions, route it to the pre-flight Q&A; otherwise file a =VERIFY= noting what's missing. Under the unattended loop, file the =VERIFY= (no one to ask). This replaces the old "clear / bounded / underspecified" adjectives with an action that fails loudly: if the red test isn't writable, the task isn't ready. +2. *Data-loss / irreversible / external operation.* Does implementing it require any of: =rm= of non-scratch data, =git reset --hard= / force-push, =DROP= / =DELETE= / =TRUNCATE=, file truncate/overwrite of persisted content, a schema or data migration, any external or shared-state mutation, any credential touch? *Yes* → do NOT implement; file a =VERIFY= naming the risk. This is the hard safety gate; an upfront answer never overrides it without an explicit checkpoint. Replaces the vague "data-loss risk" with an enumerated, greppable set. +3. *Already-satisfied.* Does the scope read show the desired end-state already holds? *Yes* → file a =VERIFY= noting it (the "raise max spans to 5 — every cap was already 8" case) and move on. Don't make a no-op change. +4. *Design deliberation.* Does the task carry an unresolved design question, a "weigh these approaches" with real tradeoffs, or a TBD that isn't a quick factual answer? *Yes* → under the speedrun preset, if it collapses to one or two quick questions, route to pre-flight Q&A; otherwise file and surface as a =/start-work= candidate. Under the loop, file. The discriminator is now *quick-answerable question* vs *deliberation* — not task size. + +A task that clears 1–4 is implemented under the project's commit discipline, decomposed into as many logical commits as the change needs. When genuinely unsure which side a task falls on, defer — a wrong auto-implement costs a revert *and* the next-session correction the metrics are designed to catch. + +** Pre-flight decision gathering (the no-approvals speedrun's only interaction) + +The speedrun preset front-loads every approval into one step before the run, so the run itself is uninterrupted — that is what "no approvals" means. It is *not* "no input ever"; it is "all input first, then hands-off." + +When Craig kicks off a speedrun over an explicit list: + +1. *Gather* the named task set. +2. *Scope-read and classify* each task against the eligibility gate + defer checklist: ready (clears the checklist), needs-quick-decisions (one or two upfront-answerable questions — checklist item 1 or 4), or drop (data-loss / irreversible, or design deliberation that isn't a quick question). +3. *Order* the list (priority, then the author's ordering / =:next:=). +4. *Intro the work* — present the ordered plan: what will run, what was dropped and why, and the batched questions for the needs-quick-decisions tasks. +5. *Craig answers each question, or says "skip this"* → a skipped task is removed from the run (recorded =dropped-by-craig=); an answered task has the answer recorded so implementation works from the decision, not a guess. +6. *Run the finalized list autonomously* — no further approvals until done. +7. *End-of-set page* with completed + remaining + skipped. + +The unattended *loop* caller has no human at kickoff, so it cannot gather decisions: there, a needs-quick-decisions task simply defers (files its note) like any other checklist hit. The pre-flight Q&A is a speedrun-preset capability, not a loop one. + +** Session modes and the no-approvals speedrun preset Two orthogonal session-mode dimensions feed the loop: - *Commit autonomy:* =file-only= (default) or =autonomous-commit=. =autonomous-commit= is honored only when the project carries the per-project waiver (=.emacs.d= and =rulesets= have it; most projects do not). Absent the waiver, a request for =autonomous-commit= degrades to =file-only= and says so. - *Paging:* on or off. End-of-set only. -"fix speedrun" is the named preset = =autonomous-commit= + always-push + paging-on, fed an *explicit ordered list*. It is not a separate code path; it is a label for that combination of mode flags plus the explicit-list input. The loop caller, by contrast, runs =file-only= (unless the project has the waiver and opts the loop into commits) with paging off, fed the *tag query*. +The *no-approvals speedrun* is the named preset = =autonomous-commit= + always-push + paging-on, fed an *explicit ordered list*, run after the pre-flight decision-gathering step above. It is not a separate code path; it is a label for that combination of mode flags plus the explicit-list input, with the pre-flight Q&A as its only interactive moment. The loop caller, by contrast, runs =file-only= (unless the project has the waiver and opts the loop into commits) with paging off, fed the *tag query*, with no pre-flight step. ** Bounding the run and the kill switch -Default cap: one task per run for the loop caller — implement the highest-priority eligible candidate (=[#A]= before =[#B]= before =[#C]=), record, then stop and let the next tick continue. The fix-speedrun preset works the whole explicit list in order (the human bounded it by naming it), still one commit per task. +Default cap: one task per run for the loop caller — implement the highest-priority eligible candidate (=[#A]= before =[#B]= before =[#C]=), record, then stop and let the next tick continue. The speedrun preset works the whole explicit list in order (the human bounded it by naming it), still one commit per logical change. -The kill switch is a hard per-run task cap passed by the caller, independent of "one per run": even fix-speedrun stops at the cap and pages with the remainder listed. A loop that fires every 30 minutes and commits unattended needs a ceiling that a runaway can't exceed. +The kill switch is a hard per-run task cap passed by the caller, independent of "one per run": even the speedrun stops at the cap and pages with the remainder listed. A loop that fires every 30 minutes and commits unattended needs a ceiling that a runaway can't exceed. With task size now uncapped, the count cap no longer bounds *cost* — a single large task can run long — so a token/cost budget is the most pressing vNext addition. ** End-of-set paging @@ -149,69 +173,89 @@ notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>" ** Fold execution into inbox-zero (the Phase E stopgap shape) - Good, because it's the smallest diff — the loop caller already runs inbox-zero, so execution is "one more phase." - Bad, because it couples capture-routing with implementation. inbox-zero has three callers; startup and wrap-up must never execute. A Phase E inside inbox-zero forces both to carry a "skip Phase E" caveat and risks a future caller running it by accident. -- Neutral, because the eligibility-gate and act-vs-file text is identical either way — only its *home* differs. +- Neutral, because the eligibility-gate and defer-checklist text is identical either way — only its *home* differs. -** Two separate features (keep Phase E and fix-speedrun distinct) +** Two separate features (keep Phase E and speedrun distinct) - Good, because each proposal ships as written with no reconciliation work. - Bad, because the execution loop is duplicated in two places and will drift; a guardrail tightened in one won't reach the other. Two ways to do autonomous execution is two things to audit. - Neutral, because the input and session-mode differences are real — but they're thin caller-level differences, not a reason to fork the engine. +** Keep the task-size gate (defer anything over ~30 minutes) +- Good, because it bounds per-task cost and blast radius with a single number. +- Bad, because it defeats the away-from-desk use case — anything non-trivial bounces back to Craig, so he can't actually leave. Size correlates poorly with risk; a large mechanical refactor is safer than a tiny change to persisted state. +- Neutral, because the things size was a proxy for (risk, cost) are covered directly — risk by the data-loss checklist, cost by the run cap (and the vNext token budget). The defer checklist's deliberation item, not size, is what routes genuine =/start-work= tasks out. + ** Autonomous-commit as the default - Good, because it's faster end-to-end with no diff to review. - Bad, because most projects lack the per-project waiver, and an unattended loop committing to a project that never opted in is exactly the failure the file-only default prevents. The blast radius of a bad autonomous commit is a revert plus lost trust in the loop. - Neutral, because the projects that *do* want it (=.emacs.d=, =rulesets=) opt in explicitly, so the capability is available where it's wanted without being the default everywhere. -* Decisions [/] +* Decisions [8/8] + +** DONE Eligibility tag set and where it's read +- Owner / by-when: Craig / spec-review +- Context: Projects' priority/tag schemes vary, and the =todo-format.md= scheme header is the declared per-project source of truth. Task size is no longer a gate, so eligibility rests on the autonomy tag, not an effort cap. +- Decision: Eligibility = status =TODO= AND the =:solo:= autonomy tag, resolved against the project's scheme header (a project may declare a different autonomous-safe set). Priority / =:next:= drive ordering, not eligibility. =:quick:= is an effort hint, never a gate. +- Consequences: easier — one workflow works across projects with different vocab, and the gate is a pure lookup; harder — a project with no/malformed scheme header needs a fallback, and the default (=:solo:=) must be defined precisely enough that two projects agree. + +** DONE Crisp =:solo:= / =:quick:= definitions, enforced in task-review + task-audit +- Owner / by-when: Craig / spec-review +- Context: The run-time gate is only as crisp as the tags. Today =:quick:= / =:solo:= are listed in the scheme header with no hard definition, and nothing enforces that tasks get assessed for them. +- Decision: Define =:solo:= (completable without Craig beyond at most one-or-two upfront-answerable quick decisions; no design deliberation) and =:quick:= (small/fast effort hint only) in =todo-format.md=, and make assessing both a *mandatory step* in the task-review and task-audit workflows. A review/audit that skips the assessment is incomplete. +- Consequences: easier — authoring-time judgment by the human who knows the answer, and the run-time gate trusts the tag; harder — task-review and task-audit grow a required step, and existing untagged tasks need a back-fill pass. -** TODO Where the eligibility gate reads its tag set +** DONE The do-not-auto-implement marker set - Owner / by-when: Craig / spec-review -- Context: Phase E hardcoded =:next:= / =:quick:+:solo:=. Projects' priority/tag schemes vary, and the =todo-format.md= scheme header is the declared source of truth per project. -- Decision: We will read the project's =todo.org= priority/tag scheme header to resolve the autonomous-safe tag set, defaulting to =:next:= OR =:quick:+:solo:= when the header doesn't declare an explicit autonomous-safe set. -- Consequences: easier — one workflow works correctly across projects with different tag vocabularies; harder — a project with no scheme header (or a malformed one) needs a fallback, and the "default reading" has to be specified precisely enough that two projects agree on it. +- Context: =VERIFY= means "awaiting Craig's manual confirmation"; other projects may use markers differently. +- Decision: Do-not-implement = any status that is not =TODO=, plus any project-declared "hold" marker. Safe-by-omission: exclude anything not plainly =TODO=. +- Consequences: easier — portable, and manual-check tasks can't auto-run; harder — richer per-project overrides need marker semantics in the scheme header, which most lack, so the default must stay conservative. -** TODO The do-not-auto-implement marker set +** DONE Pre-flight decision gathering for the speedrun preset - Owner / by-when: Craig / spec-review -- Context: =VERIFY= means "awaiting Craig's manual confirmation" in =.emacs.d= and =rulesets=. Other projects may use =VERIFY= differently or not at all. The gate excludes =VERIFY=, =DOING=, =DONE=, =CANCELLED= by status, but the *marker semantics* are what matter. -- Decision: We will define the do-not-auto-implement set as: any status that is not =TODO=, plus any task carrying a project-declared "hold" marker. The canonical default treats =VERIFY= as do-not-implement; a project overrides only by declaring its marker semantics in its scheme header. -- Consequences: easier — the gate is portable and a project can't accidentally have its manual-check tasks auto-run; harder — requires the scheme header to carry marker semantics, which most don't yet, so the default has to be safe-by-omission (exclude anything not plainly =TODO=). +- Context: Forcing every decision-needing task to defer wastes the away-from-desk use case — many tasks need only one or two quick answers Craig could give at kickoff. The speedrun is interactive at its start but must be hands-off after. +- Decision: The speedrun preset gathers + orders the set, intros the work, and batches all needed quick decisions into one pre-flight Q&A; Craig answers or says "skip this" (drops the task); the run then proceeds with zero further approvals. The unattended loop has no kickoff human, so it defers decision-needing tasks instead. +- Consequences: easier — "no approvals" becomes "all approvals first," which fits working-while-away, and larger / lightly-underspecified tasks become runnable; harder — the classifier must reliably split quick-question vs real-deliberation, and the recorded answers must reach the implementer so it works from the decision, not a guess. -** TODO Commit-autonomy opt-in mechanism +** DONE Commit-autonomy opt-in mechanism - Owner / by-when: Craig / spec-review - Context: =file-only= is the default; =.emacs.d= and =rulesets= have a per-project waiver allowing autonomous commits. Where does the workflow *read* that a project has opted in? -- Decision: We will read the opt-in from the project's existing per-project waiver location (the same place the commit discipline's "no approval gate" waiver lives — =notes.org= Workflow State or =CLAUDE.md=), not introduce a new config file. -- Consequences: easier — no new config surface, reuses the existing waiver concept; harder — the waiver's exact location and format must be pinned so the workflow can detect it deterministically, and a project with the commit waiver but *not* wanting the loop to commit needs a way to say "waiver yes, loop-commit no" (two flags, not one). +- Decision: Read the opt-in from the project's existing per-project waiver location (=notes.org= Workflow State or =CLAUDE.md=), not a new config file. Two flags: "has commit waiver" and "loop may commit" can differ. +- Consequences: easier — no new config surface, reuses the existing waiver concept; harder — the waiver location/format must be pinned for deterministic detection, and "waiver yes, loop-commit no" needs the two-flag split. -** TODO Run-cap default and the kill switch shape +** DONE Run-cap default and the kill switch shape - Owner / by-when: Craig / spec-review -- Context: The loop default is one task per run; fix-speedrun works an explicit list. Both need a hard ceiling a runaway can't exceed. -- Decision: We will pass a hard per-run task cap from the caller (loop default 1; fix-speedrun = length of the explicit list, capped at a ceiling), and stop + page with the remainder when the cap is hit. v1 caps by task count, not token budget. -- Consequences: easier — a simple integer the caller controls; bounded blast radius; harder — a task-count cap doesn't bound *cost* (one 30-min task can burn many tokens), so a token budget is vNext, and until then a pathological task can run long within a single cap slot. +- Context: The loop default is one task per run; the speedrun works an explicit list. Both need a hard ceiling. Task size is now uncapped, so a single task can be large. +- Decision: The caller passes a hard per-run task cap (loop default 1; speedrun = length of the explicit list, capped at a ceiling); stop + page with the remainder when the cap is hit. v1 caps by task count, not token budget. +- Consequences: easier — a simple caller-controlled integer with a bounded task count; harder — a count cap doesn't bound *cost*, and with size uncapped a single large task can run long, so a token budget is vNext and more pressing than before. -** TODO Metrics log location and format +** DONE Metrics log location and format - Owner / by-when: Craig / spec-review - Context: Per-run metrics must land somewhere structured and queryable, per-project, and survive across sessions for the synthesis step to read. -- Decision: We will append one JSONL record per task to a per-project log at =.ai/metrics/work-the-backlog.jsonl=, git-tracked, with the synthesis step reading the union across projects. -- Consequences: easier — append-only JSONL is trivial to write and =jq=-queryable; per-project keeps it local to where the work happened; harder — a git-tracked log adds churn to every autonomous run's commit (or needs its own commit), and "union across projects" needs the synthesis step to know where every project's log lives. +- Decision: Append one JSONL record per task to a per-project log at =.ai/metrics/work-the-backlog.jsonl=, git-tracked, with the synthesis step reading the union across projects. +- Consequences: easier — append-only JSONL is trivial to write and =jq=-queryable, and per-project keeps it local to the work; harder — a git-tracked log adds commit churn, and "union across projects" needs the synthesis step to know where every log lives. -** TODO Synthesis cadence and trigger +** DONE Synthesis cadence and trigger - Owner / by-when: Craig / spec-review - Context: Craig wants periodic org-roam articles summarizing the data. What triggers synthesis, and how often? -- Decision: We will run synthesis on an explicit trigger ("synthesize backlog metrics") and optionally a weekly scheduled run, writing one KB node per synthesis under =~/org/roam/agents/= per the knowledge-base rule. -- Consequences: easier — explicit trigger means no surprise writes, and the KB rule already governs node shape; harder — a weekly scheduled run needs a scheduler entry and the KB write-classification (personal-only) must gate it so work-project metrics never land in the KB. +- Decision: Run synthesis on an explicit trigger ("synthesize backlog metrics") and optionally a weekly scheduled run, writing one KB node per synthesis under =~/org/roam/agents/= per the knowledge-base rule. +- Consequences: easier — an explicit trigger means no surprise writes, and the KB rule already governs node shape; harder — a weekly run needs a scheduler entry, and the personal-only write-classification must gate it so work-project metrics never land in the KB. * Implementation phases +** Phase 0 — Tag definitions + task-review/audit enforcement +Add the hard =:solo:= / =:quick:= definitions to =todo-format.md=, and add the mandatory tag-assessment step to the task-review and task-audit workflows. Independent of the workflow build; lands first so the eligibility gate has crisp tags to read and existing tasks start getting assessed. Tree stays working: these are rule + workflow prose additions. + ** Phase 1 — Extract the execution loop into work-the-backlog.org -Write =work-the-backlog.org= holding the eligibility gate, act-vs-file decision, per-task quality bar, and run-cap logic — taking a task set + session mode + cap as input. Remove the stopgap "Phase E" text from =inbox-zero.org= (restore it to its A-D shape) in the same change so there's one home, not two. Tree stays working: inbox-zero reverts to routing-only, and the new workflow is callable but not yet wired to the loop. +Write =work-the-backlog.org= holding the eligibility gate, defer checklist, per-task quality bar, and run-cap logic — taking a task set + session mode + cap as input. Remove the stopgap "Phase E" text from =inbox-zero.org= (restore it to its A-D shape) in the same change so there's one home, not two. Tree stays working: inbox-zero reverts to routing-only, and the new workflow is callable but not yet wired to the loop. ** Phase 2 — Wire the two callers -Add the loop caller's chain step (after inbox-zero Phase D, invoke work-the-backlog with the tag query + file-only + cap 1) and the "fix speedrun" preset (explicit list + autonomous-commit + always-push + paging-on). Both go through the same workflow. Tree stays working: each caller is independently testable. +Add the loop caller's chain step (after inbox-zero Phase D, invoke work-the-backlog with the tag query + file-only + cap 1) and the no-approvals speedrun preset (pre-flight decision-gathering → explicit list + autonomous-commit + always-push + paging-on). Both go through the same workflow; only the speedrun runs the pre-flight Q&A. Tree stays working: each caller is independently testable. ** Phase 3 — File-only vs autonomous-commit gate Implement the commit-autonomy branch: read the per-project waiver, degrade =autonomous-commit= to =file-only= when absent, surface the degrade. Tree stays working: default file-only behavior is the safe path even before the waiver-read lands. -** Phase 4 — Guardrails and the page -Implement the data-loss / design-decision refusal, the VERIFY-on-ambiguity filing, and the end-of-set =notify alarm ... --persist= page. Tree stays working: guardrails only ever *reduce* what runs, so adding them can't break a passing run. +** Phase 4 — The defer checklist, pre-flight Q&A, and the page +Implement the act-vs-file defer checklist (test-writability keystone, enumerated data-loss list, already-satisfied, design-deliberation), the speedrun pre-flight decision-gathering (gather → classify → order → intro → batch-ask → skip/answer), the =VERIFY=-on-ambiguity filing, and the end-of-set =notify alarm ... --persist= page. Tree stays working: the checklist only ever *reduces* what runs, and the pre-flight step only runs under the speedrun preset. ** Phase 5 — Metrics log Append the per-task JSONL record at each task outcome. Tree stays working: logging is a side effect that doesn't alter execution. @@ -222,12 +266,14 @@ Write the synthesis step: read the JSONL union, compute the per-run and trend me * Acceptance criteria - [ ] =work-the-backlog.org= exists and is the only home for the execution loop; =inbox-zero.org= is back to its A-D routing-only shape with no Phase E. - [ ] The loop caller chains into work-the-backlog after routing; startup and wrap-up never invoke it. -- [ ] "fix speedrun" runs as the preset (autonomous-commit + always-push + end-page) over an explicit ordered list, one commit per task. -- [ ] A task tagged for autonomous execution but at status =VERIFY= / =DOING= / =DONE= / =CANCELLED= is skipped by the gate. -- [ ] The eligibility tag set is read from the project's =todo.org= scheme header, not hardcoded. +- [ ] The no-approvals speedrun runs as the preset (pre-flight Q&A → autonomous-commit + always-push + end-page) over an explicit ordered list, one commit per logical change. +- [ ] =:solo:= and =:quick:= carry hard definitions in =todo-format.md=, and task-review + task-audit both refuse to complete without assessing them. +- [ ] Eligibility = status =TODO= AND =:solo:=, read from the project's scheme header, not hardcoded; a =VERIFY= / =DOING= / =DONE= / =CANCELLED= task is skipped by the gate. +- [ ] Task size never sends a task to =/start-work=; a large but =:solo:=, well-specified task runs and is decomposed into per-logical-commit chunks. +- [ ] The defer checklist fires correctly: a task whose red test isn't writable (and isn't a quick-question gap), one carrying an enumerated data-loss operation, an already-satisfied one, and one needing design deliberation are each deferred (or routed to pre-flight Q&A under the speedrun), not implemented. +- [ ] Under the speedrun preset, a task needing one or two quick decisions is surfaced in the pre-flight Q&A; "skip this" drops it, an answer is recorded and used; the run then proceeds with no further approvals. +- [ ] Under the unattended loop, a decision-needing task defers (no pre-flight Q&A). - [ ] In a project without the commit waiver, an =autonomous-commit= request degrades to file-only and says so; no commit is made. -- [ ] A task carrying data-loss risk or needing a design decision is refused with a filed VERIFY, not implemented. -- [ ] An underspecified / already-satisfied task files a VERIFY noting why and the run continues. - [ ] The run stops at the per-run cap and pages with the remaining tasks listed. - [ ] Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl=. - [ ] The synthesis step reads the logs and writes a KB node under =~/org/roam/agents/=; it refuses to write for work-classified projects. @@ -252,14 +298,16 @@ One record per task, appended to =.ai/metrics/work-the-backlog.jsonl= at each ta |-------------------+--------------------------------------------------------------------| | =project= | project basename | |-------------------+--------------------------------------------------------------------| -| =caller= | =loop= or =fix-speedrun= | +| =caller= | =loop= or =speedrun= | |-------------------+--------------------------------------------------------------------| | =task= | task heading (slug) | |-------------------+--------------------------------------------------------------------| | =outcome= | implemented-committed / implemented-diff / deferred-verify / | -| | deferred-too-large / skipped-ineligible | +| | skipped-ineligible / dropped-by-craig (skipped at pre-flight) | +|-------------------+--------------------------------------------------------------------| +| =defer_reason= | underspecified / data-loss / already-satisfied / needs-deliberation | |-------------------+--------------------------------------------------------------------| -| =defer_reason= | for deferrals: needs-input / data-loss / underspecified / too-large | +| =upfront_decision=| true if a pre-flight answer was recorded and used for this task | |-------------------+--------------------------------------------------------------------| | =wall_clock_s= | seconds from task start to outcome | |-------------------+--------------------------------------------------------------------| @@ -268,7 +316,7 @@ One record per task, appended to =.ai/metrics/work-the-backlog.jsonl= at each ta | =review_findings= | count of /review-code Critical+Important findings on this task | |-------------------+--------------------------------------------------------------------| -Per-run rollups computed at synthesis (not stored per record): tasks attempted, completed, VERIFY-deferred, reverted; wall-clock total; commits landed; review findings per commit. +Per-run rollups computed at synthesis (not stored per record): tasks attempted, completed, VERIFY-deferred, dropped-by-craig, reverted; wall-clock total; commits landed; review findings per commit. ** The corrections signal (the key metric) @@ -293,13 +341,13 @@ The KB node is the artifact Craig reviews later — "are the autonomous runs com - *Data model & ownership:* The task set is read from =todo.org= (project-owned, user-authored). The metrics JSONL is generated, append-only, git-tracked, project-owned. KB nodes are agent-generated under =~/org/roam/agents/= (never overwriting Craig's hand-authored nodes — link only). No editable region is co-owned. - *Errors, empty states & failure:* Empty task set → report "nothing eligible" and stop. Malformed scheme header → fall back to the default tag reading and surface the fallback. A task that fails mid-implementation → leave the tree working (don't commit a broken state), record the failure outcome, surface it, continue to the next task. No silent data loss: the data-loss guardrail refuses irreversible tasks outright. -- *Security & privacy:* Tasks touching credentials or external mutations are excluded by the data-loss / external-state guardrail. The KB write is personal-projects-only; work metrics never leave the project. No secrets in the JSONL (task slugs and SHAs only). -- *Observability:* The end-of-set page surfaces the run outcome. The per-task surface (implemented / deferred + reason / skipped) is the live progress view. The metrics log + KB synthesis is the long-run observability. A bad run is isolable from the JSONL (which task, which outcome, which review findings). -- *Performance & scale:* Expected counts are small — a handful of tasks per run, one run per 30-min tick. No bottleneck at this scale. The cap bounds the worst case. Synthesis over months of JSONL is still a small file (one record per task). -- *Reuse & lost opportunities:* Reuses =todo-format.md= for task close, =/review-code= and =/voice personal= for the quality bar, =notify= for paging, the knowledge-base rule for KB writes, the per-project waiver for commit-autonomy. No new config file (the opt-in rides the existing waiver). The execution loop is the one new shared asset. -- *Architecture fit & weak points:* Integration points — inbox-zero loop caller (chain after Phase D), the per-project waiver location, =todo.org= scheme header, =~/org/roam/agents/=. Weak point: the commit-autonomy gate depends on deterministically reading the waiver; mitigated by defaulting to file-only when the read is ambiguous (fail safe, not open). Second weak point: a 30-min loop committing unattended; mitigated by the hard cap and file-only default. -- *Config surface:* Per-project — commit-autonomy opt-in (via existing waiver), optional loop-commit flag, optional autonomous-safe tag override in the scheme header. Per-call — task set, session mode, run cap. Defaults: file-only, paging-off (loop) / paging-on (fix-speedrun), cap 1 (loop). -- *Documentation plan:* The workflow file itself is the user/operator doc (matches inbox-zero.org's self-documenting style). The =.emacs.d= stopgap note and the fix-speedrun proposal are superseded by this spec; no separate migration doc needed beyond removing the Phase E text. +- *Security & privacy:* Tasks touching credentials or external mutations are excluded by the data-loss / external-state checklist item. The KB write is personal-projects-only; work metrics never leave the project. No secrets in the JSONL (task slugs and SHAs only). +- *Observability:* The end-of-set page surfaces the run outcome. The per-task surface (implemented / deferred + reason / dropped / skipped) is the live progress view. The metrics log + KB synthesis is the long-run observability. A bad run is isolable from the JSONL (which task, which outcome, which review findings). +- *Performance & scale:* Expected counts are small — a handful of tasks per run, one run per 30-min tick. No bottleneck at this scale. The cap bounds the worst case on task count; with size uncapped, a single large task is the cost outlier the vNext token budget addresses. Synthesis over months of JSONL is still a small file (one record per task). +- *Reuse & lost opportunities:* Reuses =todo-format.md= for task close + the tag definitions, =/review-code= and =/voice personal= for the quality bar, =notify= for paging, the knowledge-base rule for KB writes, the per-project waiver for commit-autonomy, and task-review / task-audit for tag enforcement. No new config file (the opt-in rides the existing waiver). The execution loop is the one new shared asset. +- *Architecture fit & weak points:* Integration points — inbox-zero loop caller (chain after Phase D), the per-project waiver location, =todo.org= scheme header, task-review / task-audit, =~/org/roam/agents/=. Weak point: the commit-autonomy gate depends on deterministically reading the waiver; mitigated by defaulting to file-only when the read is ambiguous (fail safe, not open). Second weak point: a 30-min loop committing unattended with uncapped task size; mitigated by the hard count cap and file-only default, with the token budget as the vNext backstop. +- *Config surface:* Per-project — commit-autonomy opt-in (via existing waiver), optional loop-commit flag, optional autonomous-safe tag override in the scheme header. Per-call — task set, session mode, run cap. Defaults: file-only, paging-off (loop) / paging-on (speedrun), cap 1 (loop). +- *Documentation plan:* The workflow file itself is the user/operator doc (matches inbox-zero.org's self-documenting style). The =.emacs.d= stopgap note and the speedrun proposal are superseded by this spec; no separate migration doc needed beyond removing the Phase E text. - *Dev tooling:* N/A for new build targets — the workflows are prose, exercised by invocation. The metrics JSONL is =jq=-inspectable by hand; a tiny rollup helper may be added under =.ai/scripts/= if the synthesis prose proves to need it (decided at Phase 6, not a v1 prerequisite). - *Rollout, compatibility & rollback:* Rollout is removing Phase E from inbox-zero and adding work-the-backlog — both prose changes, instantly reversible. Compatibility: inbox-zero's three callers are unchanged except the loop caller gaining a forward chain. Rollback: delete work-the-backlog and the loop chain step; inbox-zero is already back to A-D. The file-only default means the worst pre-rollback state is surfaced diffs, not committed changes. - *External APIs & deps:* =notify alarm "Page" "<msg>" --persist= verified against =/home/cjennings/.local/bin/notify= and the page-me workflow. =~/org/roam/= KB write path and node shape verified against the knowledge-base rule. No external API calls. @@ -308,17 +356,18 @@ The KB node is the artifact Craig reviews later — "are the autonomous runs com - *The corrections signal is a proxy, not ground truth.* "A later commit touched the same files" over-counts (legitimate follow-up work) and under-counts (a correction in a different file). It's a flag for human review, not a verdict. Don't rabbit-hole on making it precise in v1 — the proxy plus a human glance is the design. - *Waiver detection drift.* If the per-project waiver location moves or its format changes, the commit-autonomy gate could mis-read. Mitigation: fail safe to file-only. Pin the waiver format in the Phase 3 decision before building. -- *Unattended-commit blast radius.* The headline risk. Mitigated three ways: file-only default, the hard cap, and the data-loss guardrail. The metrics loop is the fourth layer — it makes a bad run visible after the fact even if the first three let something through. -- *Scope creep into /start-work territory.* The temptation to let "≤ 30 min" stretch. The act-vs-file gate and the "when unsure, file" rule are the brake; keep them strict. +- *Unattended-commit blast radius.* The headline risk. Mitigated four ways: file-only default, the hard cap, the data-loss checklist item, and the metrics loop (which makes a bad run visible after the fact even if the first three let something through). With task size uncapped, the cost dimension of this risk grows — the vNext token budget is the planned fifth layer. +- *Scope creep into /start-work territory.* Size is intentionally no longer the brake. The brake is the defer checklist's design-deliberation item plus the "when unsure, defer" rule — keep item 4 strict so genuine deliberation-class tasks still route out even when they're tagged =:solo:= by mistake. +- *Pre-flight classifier error.* The speedrun's gather step has to split quick-answerable-question from real-deliberation. Misclassifying a deliberation task as a quick question puts a half-baked decision into an autonomous run. Mitigation: when the question isn't answerable in one or two lines, treat it as deliberation and drop it from the run, not as a pre-flight question. * Testing / Verification / Rollout -Verification is by invocation against a project's real =todo.org=: run the loop caller in file-only mode and confirm it surfaces diffs without committing; run fix-speedrun against a small explicit list in a waiver-carrying project and confirm one commit per task + the end page; plant a =VERIFY=-status task and a data-loss task and confirm both are skipped/refused; confirm the JSONL grows one record per task; run synthesis and confirm a KB node lands (personal project) or is refused (work project). Rollout is the Phase 1-6 sequence, each leaving the tree working; the file-only default makes early phases safe to ship before the commit and paging phases land. +Verification is by invocation against a project's real =todo.org=: run the loop caller in file-only mode and confirm it surfaces diffs without committing; run the speedrun against a small explicit list in a waiver-carrying project and confirm the pre-flight Q&A fires, "skip this" drops a task, an answer is recorded and used, then one commit per logical change + the end page; plant a =VERIFY=-status task, a data-loss task, an already-satisfied task, and a large-but-=:solo:= task and confirm the first three are skipped/refused while the large one runs and decomposes; confirm the JSONL grows one record per task; run synthesis and confirm a KB node lands (personal project) or is refused (work project). Rollout is the Phase 0-6 sequence, each leaving the tree working; the file-only default makes early phases safe to ship before the commit and paging phases land. * References / Appendix - [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal (inbox-zero stopgap)]] and [[file:../../working/inbox-zero-phase-e/sender-note.org][its sender note with the 5 open questions]]. -- [[file:2026-06-15-fix-speedrun-workflow-proposal.org][fix-speedrun proposal]]. +- [[file:2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] (file retains its original on-disk name pending a rename pass). - [[file:../../.ai/workflows/inbox-zero.org][inbox-zero.org (canonical, A-D)]] — the routing workflow this feature decouples from. - =~/code/rulesets/claude-rules/knowledge-base.md= — the org-roam write contract the synthesis step follows. @@ -327,3 +376,9 @@ Verification is by invocation against a project's real =todo.org=: run the loop - What: initial draft reconciling the Phase E and fix-speedrun proposals into one work-the-backlog.org feature, plus the effectiveness-measurement instrumentation. - Why: two overlapping proposals arrived within a day; building them separately would duplicate the execution loop and let it drift. Craig also asked explicitly for measurement + org-roam synthesis. - Artifacts: this spec; the two source proposals under docs/design/ and working/inbox-zero-phase-e/. +** 2026-06-28 Sun — revision (Craig) +- What: removed the task-size gate (size no longer defers; large tasks decompose into per-commit chunks); recast the act-vs-file rule as a crisp four-item defer checklist keyed on test-writability; added crisp =:solo:= / =:quick:= definitions destined for =todo-format.md= and made their assessment mandatory in task-review + task-audit; added the speedrun's pre-flight decision-gathering step (batch the quick questions up front, "skip this" drops a task, then run hands-off); renamed "fix speedrun" → "no-approvals speedrun" in prose. Status stays draft pending ratification of the revised decisions. +- Why: the original criteria were adjectives, not checkable; the size gate forced Craig to stay at his desk for anything non-trivial, defeating the away-from-desk use case; and decision-needing tasks were over-deferred when many need only a quick upfront answer. +** 2026-06-29 Mon — ratified +- What: Craig ratified all eight revised decisions; Status → ready. Implementation-ready across Phase 0 (tag definitions + task-review/audit enforcement) through Phase 6 (synthesis). +- Why: the crisp defer checklist and the pre-flight-Q&A design resolved the "criteria too soft" and "size shouldn't gate" concerns that held the spec in draft. diff --git a/inbox/PROCESSED-2026-06-28-2301-from-home-adopted-home-s-todo-org-priority-scheme.org b/inbox/PROCESSED-2026-06-28-2301-from-home-adopted-home-s-todo-org-priority-scheme.org new file mode 100644 index 0000000..3bbd7dc --- /dev/null +++ b/inbox/PROCESSED-2026-06-28-2301-from-home-adopted-home-s-todo-org-priority-scheme.org @@ -0,0 +1,5 @@ +#+TITLE: Adopted: home's todo.org Priority Scheme now carries the sev +#+SOURCE: from home +#+DATE: 2026-06-28 23:01:35 -0400 + +Adopted: home's todo.org Priority Scheme now carries the severity × frequency bug-priority matrix as a 'Codebase bug priority' subsection, with Critical/Major/Minor/Cosmetic and the frequency rows defined for home's codebase (the finances/ plain-text-accounting pipeline). Matrix structure and the fixed P1->[#A]...P4->[#D] mapping kept verbatim; severity-alone carve-out included for financial-data leaks. Re-grade of existing :bug: tasks was a no-op — the only :bug: heading is CANCELLED. No further action needed. @@ -381,22 +381,23 @@ The two conventions: - *Location split* — formal specs live in =docs/specs/=; =docs/design/= keeps working notes, brainstorms, inventories, reviews. A spec is a doc proposing a buildable change with a Decisions section and phases; everything else is a note. - *Glanceable lifecycle status* — a spec's state (draft / doing / implemented / superseded / cancelled) is visible without opening the file, plus an authoritative in-file record. -Recommendation captured now so the thinking isn't lost; it migrates into the spec when this is worked. We handle the task in priority order. +We handle the task in priority order. Mechanism decided 2026-06-28; migrates into the spec when built. -*** Recommendation (draft — decide when worked, migrate into the spec) -1. *Location split — adopt.* Low controversy, clear payoff. =docs/specs/= for formal specs, =docs/design/= for notes. Document in spec-create and the docs-lifecycle rule. -2. *Status mechanism — the real fork.* Two options: filename suffix (=-spec-doing.org=, Craig's idea, ls-visible but every transition is a rename that breaks =[[file:...]]= links) vs the org-TODO keyword on the spec's top heading (specs already carry =#+TODO: TODO | DONE SUPERSEDED CANCELLED=; link-stable, zero-rename, org-agenda-scannable, but not visible in =ls=). My lean is the org-keyword as authoritative + a Status field in the Metadata table, dropping the filename suffix — the suffix is redundant with the Status field and adds rename churn across a heavily cross-linked, template-synced doc set. This diverges from Craig's stated filename-suffix preference, so it's teed up as a decision, not settled. Decide deliberately before building. -3. *Link safety — adopt =org-id= ([[id:...]]) for cross-doc spec links* regardless of which status mechanism wins. It decouples link stability from the status decision. Mandatory if the filename suffix wins; good hygiene either way. The alternative — a move/rename/relink/stamp helper run on each transition — is only needed if the suffix wins and org-id is rejected. -4. *Generalize after the mechanism settles.* The shape (lifecycle state in name-or-location, authoritative in-artifact status, rename-safe links, formal-vs-notes split) is reusable beyond specs. Capture it as a general =docs-lifecycle= convention in =claude-rules/= with spec-create as the first instance — but don't generalize an unsettled convention. +*** Decisions (settled 2026-06-28 — migrate into the spec when built) +1. *Location split — adopt.* =docs/specs/= for formal specs, =docs/design/= for notes (brainstorms, inventories, reviews). A spec is a doc proposing a buildable change with a Decisions section and phases; everything else is a note. Document in spec-create and the docs-lifecycle rule. +2. *Status mechanism — org-keyword authoritative.* The spec's =#+TODO:= state on its top heading is authoritative (specs already carry =#+TODO: TODO | DONE SUPERSEDED CANCELLED=), mirrored in a =Status= field in the Metadata table. Drop the filename suffix entirely — it's redundant with the Status field and adds rename churn across a cross-linked, template-synced doc set. (Craig 2026-06-28, choosing org-keyword over his earlier filename-suffix lean.) +3. *Link safety — adopt =org-id= ([[id:...]]) for cross-doc spec links.* Decouples link stability from the status mechanism; good hygiene regardless. +4. *Generalize.* Capture the shape (lifecycle state authoritative-in-artifact, formal-vs-notes split, rename-safe links) as a general =docs-lifecycle= convention in =claude-rules/=, with spec-create as the first instance. +5. *Retrofit existing files across ALL projects* (Craig 2026-06-28). The convention is worthless if legacy docs stay misfiled — every project's existing =docs/design/= pile (the ~28 in .emacs.d that surfaced this) must be sorted: formal specs move to =docs/specs/=, notes stay in =docs/design/=, inbound =file:= links updated. This is a one-time per-project migration that template sync can't perform, so the spec must design the reach mechanism. Proposed shape: a synced classify-and-move helper under =.ai/scripts/= (heuristic: a doc with a Decisions section + phases/Metadata is a spec) that proposes moves for confirmation and relinks, plus a startup nudge gated on a per-project =:LAST_SPEC_SORT:= marker so each project runs it once. Classification is a judgment call — the helper proposes, a human confirms. -Follow-up once decided: update spec-create to emit into =docs/specs/= with the chosen status mechanism; retrofit existing specs; optionally add the relink helper as a =.ai/scripts/= addition (downstream projects get it via template sync); send a note back if .emacs.d should pilot before generalizing. +Follow-up once built: update spec-create to emit into =docs/specs/= with the org-keyword status; write the =docs-lifecycle= rule; ship the retrofit helper + startup nudge; retrofit rulesets' own =docs/design/= first as the pilot; send a note if .emacs.d should pilot before generalizing. -** DOING [#C] "fix speedrun" cross-project autonomous-batch mode :feature:spec: +** DOING [#C] No-approvals speedrun — cross-project autonomous-batch mode :feature:spec: :PROPERTIES: :CREATED: [2026-06-15 Mon] :LAST_REVIEWED: 2026-06-24 :END: -A named mode for coding projects: Craig names an ordered task set and says "fix speedrun"; the set is worked autonomously, each task held to the full quality bar (TDD red→green, =/review-code=, =/voice= on the commit) and committed + pushed as its own logical commit, with a VERIFY filed instead of guessing on anything underspecified, and an end-of-set page listing completed + remaining tasks. Surfaced by .emacs.d from a 2026-06-15 theme-studio session where the shape worked. Source proposal: [[file:docs/design/2026-06-15-fix-speedrun-workflow-proposal.org]] (.emacs.d handoff 2026-06-15). Build via =spec-create= when worked; we handle the task in priority order. +A named mode for coding projects: Craig names an ordered task set and says "speedrun" / "no approvals speedrun"; the set is worked autonomously, each task held to the full quality bar (TDD red→green, =/review-code=, =/voice= on the commit) and committed + pushed as its own logical commit, with all needed quick decisions gathered in one pre-flight Q&A (answer or "skip this") and a VERIFY filed for anything underspecified or needing deliberation, plus an end-of-set page listing completed + remaining + skipped tasks. Task size is not a gate — large tasks decompose into per-commit chunks. Surfaced by .emacs.d from a 2026-06-15 theme-studio session where the shape worked. Source proposal: [[file:docs/design/2026-06-15-fix-speedrun-workflow-proposal.org]] (.emacs.d handoff 2026-06-15). Build via =spec-create= when worked; we handle the task in priority order. Skeptical-review read (open design questions to resolve in the spec, not settled here): - *Is it a new workflow or a documented preset?* The proposal frames it as no-approvals + always-push session modes plus an end page. Decide whether it needs its own workflow file or is mostly documentation of a preset over the two existing modes. @@ -410,8 +411,8 @@ Craig's "your call" (2026-06-16) answered in [[file:docs/design/2026-06-16-auton - *Paging:* end-of-set only, via =notify ... --persist= (reconciled past the removed page-signal wrapper). - *Auto-pull vs explicit list:* both — explicit list for the preset, tag/priority query for the loop. - *Effectiveness measurement (the trial Craig asked for):* the spec designs a per-task JSONL metrics log (=.ai/metrics/work-the-backlog.jsonl=), a corrections-in-next-session signal, and a periodic synthesis step that writes =:agent:metrics:= org-roam articles for later review — the "gather data + create org-roam articles" loop. -*** VERIFY Review the autonomous-batch execution spec -Review [[file:docs/design/2026-06-16-autonomous-batch-execution-spec.org]] (covers both this and Phase E) and ratify (or adjust) its six open decisions. Implementation-ready once no decision is still TODO. +*** 2026-06-29 Mon @ 03:48:09 -0400 Ratified the autonomous-batch execution spec +Craig ratified all eight decisions in [[file:docs/design/2026-06-16-autonomous-batch-execution-spec.org]] (revised this session — size gate removed, crisp four-item defer checklist, =:solo:= / =:quick:= definitions + task-review/audit enforcement, speedrun pre-flight Q&A). Spec Status → ready; implementation-ready across Phase 0–6. Decisions grew from six to eight during the revision. ** TODO [#C] ntfy phone channel as general two-way agent-comms :feature:spec: :PROPERTIES: @@ -2994,7 +2995,7 @@ CLOSED: [2026-06-24 Wed] :CREATED: [2026-06-22 Mon] :LAST_REVIEWED: 2026-06-24 :END: -flashcard-to-anki.py's =default_deck_name= returns =input_path.stem= (the filename), so every deck built through =flashcard-sync= (which passes no =--deck=) is named after the file, not the curated =#+TITLE=. =flashcard-review.org= already documents the intended behavior ("the #+TITLE line drives the Anki deck name"); the script never matched it. Fix: =default_deck_name(input_path, org_text)= scans for a =#+TITLE:= line (case-insensitive, trimmed) and returns it, basename fallback when absent; =main()= passes the already-read =org_text=. Edited script + test ready (validated, 29 pass): [[file:docs/design/2026-06-21-anki-titlefix-flashcard-to-anki.py][script]], [[file:docs/design/2026-06-21-anki-titlefix-test.py][test]], rationale [[file:docs/design/2026-06-21-anki-titlefix-proposal.org][proposal]]. Apply to both =.ai/scripts/= and =claude-templates/.ai/scripts/=, sync-check + make test. Migration caveat: deck ID derives from the name, so decks previously built without =--deck= land as new decks on next import (old basename-named decks keep history, delete by hand). Coordinate with "Reconcile flashcard multi-tag tooling into canonical" below — both edit =flashcard-to-anki.py=, build together to avoid conflicting edits. Shared-asset, review-gated. From home 2026-06-21. +flashcard-to-anki.py's =default_deck_name= returns =input_path.stem= (the filename), so every deck built through =flashcard-sync= (which passes no =--deck=) is named after the file, not the curated =#+TITLE=. =flashcard-review.org= already documents the intended behavior ("the #+TITLE line drives the Anki deck name"); the script never matched it. Fix: =default_deck_name(input_path, org_text)= scans for a =#+TITLE:= line (case-insensitive, trimmed) and returns it, basename fallback when absent; =main()= passes the already-read =org_text=. Edited script + test ready (validated, 29 pass); the staging =.py= files were removed after the fix landed (see below), rationale kept: [[file:docs/design/2026-06-21-anki-titlefix-proposal.org][proposal]]. Apply to both =.ai/scripts/= and =claude-templates/.ai/scripts/=, sync-check + make test. Migration caveat: deck ID derives from the name, so decks previously built without =--deck= land as new decks on next import (old basename-named decks keep history, delete by hand). Coordinate with "Reconcile flashcard multi-tag tooling into canonical" below — both edit =flashcard-to-anki.py=, build together to avoid conflicting edits. Shared-asset, review-gated. From home 2026-06-21. Done 2026-06-24 (commit 060a938): applied the pre-staged script + test red-to-green (5 new =#+TITLE= tests, 29 pass total), synced both script dirs, full suite green. The two redundant staging =.py= files removed, the rationale proposal kept. ** CANCELLED [#C] Morning ops orchestrator pilot — read-only :feature: |
