aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/design/2026-06-16-autonomous-batch-execution-spec.org329
-rw-r--r--docs/design/2026-06-16-encourage-kb-contribution-spec.org188
-rw-r--r--todo.org250
3 files changed, 650 insertions, 117 deletions
diff --git a/docs/design/2026-06-16-autonomous-batch-execution-spec.org b/docs/design/2026-06-16-autonomous-batch-execution-spec.org
new file mode 100644
index 0000000..e2e0f90
--- /dev/null
+++ b/docs/design/2026-06-16-autonomous-batch-execution-spec.org
@@ -0,0 +1,329 @@
+#+TITLE: Autonomous-Batch Task Execution — Spec
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-06-16
+#+TODO: TODO | DONE SUPERSEDED CANCELLED
+
+* Metadata
+| Status | draft |
+|----------+--------------------------------------------------------------------|
+| Owner | Craig Jennings |
+|----------+--------------------------------------------------------------------|
+| Reviewer | Craig Jennings |
+|----------+--------------------------------------------------------------------|
+| Date | 2026-06-16 |
+|----------+--------------------------------------------------------------------|
+| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:2026-06-15-fix-speedrun-workflow-proposal.org][fix-speedrun proposal]] |
+|----------+--------------------------------------------------------------------|
+
+* Summary
+
+Two proposals arrived within a day of each other describing the same capability: have Claude work a batch of small, well-marked tasks autonomously, with a full quality bar per task and no per-step approval gate. The inbox-zero "Phase E" proposal drives it from a tag/priority query on a recurring loop; the "fix speedrun" proposal drives it from an explicit ordered list a human dictates in-session. This spec reconciles both into one feature: a single dedicated workflow, =work-the-backlog.org=, that holds the task-execution logic, with two thin callers feeding it. It also designs the instrumentation that measures whether the autonomy is actually paying off.
+
+* Problem / Context
+
+Craig has a standing backlog of small, solo-doable fixes across several projects, already marked with a tag convention (=:next:=, =:quick:+:solo:=). Doing them by hand one at a time is the bottleneck — each is 30 minutes or less, but the context-switch and the per-commit approval ceremony dominate the actual work. He wants Claude to burn these down unattended: on a recurring loop for the routed inbox case, and on demand when he batches a named list and says "fix speedrun, no approvals until done."
+
+Two separate proposals tried to answer this:
+
+- *Phase E* (in =inbox-zero.org=, edited in =.emacs.d= as a stopgap) bolted autonomous execution onto the inbox-zero workflow's on-demand and loop callers. The sender flagged the seam as the open question: coupling capture-routing with autonomous-implementation pollutes inbox-zero's three existing callers (startup, wrap-up, on-demand), two of which must never execute anything.
+- *fix speedrun* (a =.emacs.d= theme-studio session that worked well) is the same execution loop driven by an explicit ordered task set, with end-of-set paging and always-push.
+
+They overlap almost entirely. The execution loop — eligibility gate, act-vs-file decision, per-task quality bar, bounded run — is identical. Only the *input* differs (tag query vs explicit list) and the *session mode* differs (loop default vs no-approvals + always-push + page). Building them as two features would duplicate the execution logic and let the two copies drift. The forces: keep inbox-zero's callers clean, share one execution loop, and make the autonomy safe enough to run unattended on a 30-minute timer without Craig watching.
+
+A second, explicit ask from Craig: instrument this so its effectiveness is measurable. "Gather data on this and create some org-roam articles we can look at later." Autonomous execution that silently makes bad commits is worse than no autonomy; the only way to know which it is, is to measure tasks completed vs deferred vs reverted, and human corrections in the following session, over time.
+
+* Goals and Non-Goals
+
+** Goals
+- One workflow, =work-the-backlog.org=, owns the task-execution loop. Both input shapes (tag query, explicit list) and both session modes feed it.
+- inbox-zero's three existing callers stay clean: the loop caller chains into =work-the-backlog= *after* routing; startup and wrap-up never touch it.
+- "fix speedrun" is a thin named preset, not a second implementation: no-approvals session mode + always-push + end-of-set page, feeding an explicit ordered list.
+- Commit autonomy defaults to file-only (surface a diff, no auto-commit). A project opts into autonomous commit+push explicitly via its per-project waiver.
+- Hard guardrails: refuse to speedrun any task needing a design decision or carrying data-loss risk without a checkpoint; file a =VERIFY= and move on rather than guess-implement an underspecified task; a per-run cap / kill switch beyond "one task per run."
+- A lightweight per-run metrics log plus a periodic synthesis step that writes org-roam KB articles summarizing the trend.
+
+** Non-Goals
+- *Not* a replacement for =/start-work=. Tasks needing deliberation, design, or an hour-plus stay with =/start-work= and its approval gates. This feature only touches the small, marked, solo set.
+- *Not* a new tag convention. It reads the project's own priority/tag scheme header; it never invents or hardcodes tags across projects.
+- *Not* an inbox-routing change. =inbox-zero.org= keeps its A-D phases. The Phase E text added in =.emacs.d= as a stopgap is *removed* and its logic moves here.
+- *Not* a multi-project orchestrator. One run works one project's backlog. Cross-project handoff stays with =inbox-send= and the paging reply.
+- *Not* a credential-handling or external-API feature. Tasks that touch secrets or external mutations are out of the eligible set by the guardrail.
+
+** Scope tiers
+- *v1:* =work-the-backlog.org=; the eligibility gate reading the project's scheme header; the act-vs-file decision with VERIFY-on-ambiguity; file-only commit default with per-project opt-in; the loop caller wiring and inbox-zero Phase E removal; the "fix speedrun" preset with end-of-set =notify --persist= page; the per-run metrics log (structured JSONL).
+- *Out of scope:* a token-budget kill switch (cap is a task count in v1); cross-project batch runs; a dashboard or live UI over the metrics.
+- *vNext (log to todo.org):* the periodic org-roam synthesis step if it doesn't make v1; a token/cost budget alongside the task-count cap; auto-detection of "human corrected my autonomous commit" from the next session's diff.
+
+* Design
+
+** Overview
+
+The architecture is one execution workflow with two callers and one preset, plus an instrumentation sidecar.
+
+#+begin_example
+ inbox-zero loop caller ──(after Phase D routing)──┐
+ ├──▶ work-the-backlog.org ──▶ metrics log (JSONL)
+ "fix speedrun" preset ──(explicit ordered list)───┘ │
+ = no-approvals + always-push + end-page ▼
+ periodic synthesis ──▶ org-roam KB articles
+#+end_example
+
+=work-the-backlog.org= is the only place the execution loop lives. It takes a *task set* (however assembled) and a *session mode* (which gates commit autonomy and paging), and works the set under a fixed safety contract. The two callers differ only in how they build the task set and which session mode they pass.
+
+This is the seam the Phase E sender asked for: separating capture-routing (inbox-zero) from autonomous-implementation (work-the-backlog) keeps inbox-zero's startup and wrap-up callers — which must never execute anything — untouched. The loop caller is the only one of inbox-zero's callers that chains forward into execution, and it does so as an explicit second step after routing completes, not as a phase buried inside inbox-zero.
+
+** The execution loop (two-altitude: caller's view)
+
+A caller hands =work-the-backlog= three things:
+
+1. *A task set* — either an explicit ordered list of task headings (fix speedrun), or the result of a tag/priority query against =todo.org= (the loop). The workflow does not care which; it receives an ordered list of candidate tasks.
+2. *A session mode* — =file-only= (default) or =autonomous-commit= (requires the project's per-project waiver), and a paging flag.
+3. *A run cap* — the maximum number of tasks to complete this run.
+
+It returns: per-task outcome (implemented+committed / implemented+diff-surfaced / deferred-VERIFY / deferred-too-large / skipped-ineligible), and a metrics record per task.
+
+** The execution loop (implementer's view)
+
+For the task set, in order, until the run cap is hit:
+
+1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task.
+2. *Scope read* of the relevant code. Cheap; just enough to make the act-vs-file call.
+3. *Act-vs-file decision* (below). File → record the deferral reason, next task.
+4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important, then close the task per =todo-format.md=.
+5. *Commit autonomy branch:*
+ - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=.
+ - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=.
+6. *Record metrics* for the task (the JSONL append, below).
+7. Decrement the cap. At zero, stop.
+
+After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary.
+
+** Eligibility gate
+
+A task is autonomous-safe when *all* hold:
+
+1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= is the "awaiting Craig's manual confirmation" marker; auto-implementing one defeats the manual check it represents.
+2. *Tagged per the project's autonomous-safe set* — resolved by reading the project's priority/tag scheme header at the top of its =todo.org=, not by hardcoding. The default reading is =:next:= OR both =:quick:= AND =:solo:=, but a project whose scheme declares a different autonomous-safe tag set overrides that.
+3. *Solo-doable* — no input or undecided judgment call from Craig.
+4. *Roughly 30 minutes or less* of focused work.
+
+** Act-vs-file decision (the guardrail)
+
+After the scope read, for each eligible candidate:
+
+- *Clear, bounded, solo, ≤ ~30 min* → implement.
+- *Needs a design decision, Craig's input, or discussion* → do NOT implement. File a one-line note on the task naming the input it needs; surface it.
+- *Carries data-loss risk without a checkpoint* (deletes data, rewrites persisted state, touches external/shared state irreversibly) → do NOT implement. File a =VERIFY= explaining the risk; surface it.
+- *Underspecified or already-satisfied* → do NOT guess-implement. File a =VERIFY= noting why (the fix-speedrun "raise max spans to 5 — every cap was already 8" case) and move on.
+- *An hour or more* → do NOT implement. File and surface as a =/start-work= candidate.
+
+When unsure which side a task falls on, file rather than implement. A wrong auto-implement costs more than a deferred task — it costs a revert *and* the human correction in the next session that the metrics are designed to catch.
+
+** Session modes and the "fix speedrun" preset
+
+Two orthogonal session-mode dimensions feed the loop:
+
+- *Commit autonomy:* =file-only= (default) or =autonomous-commit=. =autonomous-commit= is honored only when the project carries the per-project waiver (=.emacs.d= and =rulesets= have it; most projects do not). Absent the waiver, a request for =autonomous-commit= degrades to =file-only= and says so.
+- *Paging:* on or off. End-of-set only.
+
+"fix speedrun" is the named preset = =autonomous-commit= + always-push + paging-on, fed an *explicit ordered list*. It is not a separate code path; it is a label for that combination of mode flags plus the explicit-list input. The loop caller, by contrast, runs =file-only= (unless the project has the waiver and opts the loop into commits) with paging off, fed the *tag query*.
+
+** Bounding the run and the kill switch
+
+Default cap: one task per run for the loop caller — implement the highest-priority eligible candidate (=[#A]= before =[#B]= before =[#C]=), record, then stop and let the next tick continue. The fix-speedrun preset works the whole explicit list in order (the human bounded it by naming it), still one commit per task.
+
+The kill switch is a hard per-run task cap passed by the caller, independent of "one per run": even fix-speedrun stops at the cap and pages with the remainder listed. A loop that fires every 30 minutes and commits unattended needs a ceiling that a runaway can't exceed.
+
+** End-of-set paging
+
+When the set is done (or the cap is hit), if paging is on, fire one page — end-of-set only, never per-task:
+
+#+begin_src sh
+notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>" --persist
+#+end_src
+
+=--persist= keeps it on screen until dismissed (the page-me convention). The message carries the project name, the completed count, and the remaining count, so Craig can reply confirming ready + naming the next project in one turn. The page-signal wrapper removed 2026-06-12 is reconciled to =notify= here — there is no separate page-signal call.
+
+* Alternatives Considered
+
+** Fold execution into inbox-zero (the Phase E stopgap shape)
+- Good, because it's the smallest diff — the loop caller already runs inbox-zero, so execution is "one more phase."
+- Bad, because it couples capture-routing with implementation. inbox-zero has three callers; startup and wrap-up must never execute. A Phase E inside inbox-zero forces both to carry a "skip Phase E" caveat and risks a future caller running it by accident.
+- Neutral, because the eligibility-gate and act-vs-file text is identical either way — only its *home* differs.
+
+** Two separate features (keep Phase E and fix-speedrun distinct)
+- Good, because each proposal ships as written with no reconciliation work.
+- Bad, because the execution loop is duplicated in two places and will drift; a guardrail tightened in one won't reach the other. Two ways to do autonomous execution is two things to audit.
+- Neutral, because the input and session-mode differences are real — but they're thin caller-level differences, not a reason to fork the engine.
+
+** Autonomous-commit as the default
+- Good, because it's faster end-to-end with no diff to review.
+- Bad, because most projects lack the per-project waiver, and an unattended loop committing to a project that never opted in is exactly the failure the file-only default prevents. The blast radius of a bad autonomous commit is a revert plus lost trust in the loop.
+- Neutral, because the projects that *do* want it (=.emacs.d=, =rulesets=) opt in explicitly, so the capability is available where it's wanted without being the default everywhere.
+
+* Decisions [/]
+
+** TODO Where the eligibility gate reads its tag set
+- Owner / by-when: Craig / spec-review
+- Context: Phase E hardcoded =:next:= / =:quick:+:solo:=. Projects' priority/tag schemes vary, and the =todo-format.md= scheme header is the declared source of truth per project.
+- Decision: We will read the project's =todo.org= priority/tag scheme header to resolve the autonomous-safe tag set, defaulting to =:next:= OR =:quick:+:solo:= when the header doesn't declare an explicit autonomous-safe set.
+- Consequences: easier — one workflow works correctly across projects with different tag vocabularies; harder — a project with no scheme header (or a malformed one) needs a fallback, and the "default reading" has to be specified precisely enough that two projects agree on it.
+
+** TODO The do-not-auto-implement marker set
+- Owner / by-when: Craig / spec-review
+- Context: =VERIFY= means "awaiting Craig's manual confirmation" in =.emacs.d= and =rulesets=. Other projects may use =VERIFY= differently or not at all. The gate excludes =VERIFY=, =DOING=, =DONE=, =CANCELLED= by status, but the *marker semantics* are what matter.
+- Decision: We will define the do-not-auto-implement set as: any status that is not =TODO=, plus any task carrying a project-declared "hold" marker. The canonical default treats =VERIFY= as do-not-implement; a project overrides only by declaring its marker semantics in its scheme header.
+- Consequences: easier — the gate is portable and a project can't accidentally have its manual-check tasks auto-run; harder — requires the scheme header to carry marker semantics, which most don't yet, so the default has to be safe-by-omission (exclude anything not plainly =TODO=).
+
+** TODO Commit-autonomy opt-in mechanism
+- Owner / by-when: Craig / spec-review
+- Context: =file-only= is the default; =.emacs.d= and =rulesets= have a per-project waiver allowing autonomous commits. Where does the workflow *read* that a project has opted in?
+- Decision: We will read the opt-in from the project's existing per-project waiver location (the same place the commit discipline's "no approval gate" waiver lives — =notes.org= Workflow State or =CLAUDE.md=), not introduce a new config file.
+- Consequences: easier — no new config surface, reuses the existing waiver concept; harder — the waiver's exact location and format must be pinned so the workflow can detect it deterministically, and a project with the commit waiver but *not* wanting the loop to commit needs a way to say "waiver yes, loop-commit no" (two flags, not one).
+
+** TODO Run-cap default and the kill switch shape
+- Owner / by-when: Craig / spec-review
+- Context: The loop default is one task per run; fix-speedrun works an explicit list. Both need a hard ceiling a runaway can't exceed.
+- Decision: We will pass a hard per-run task cap from the caller (loop default 1; fix-speedrun = length of the explicit list, capped at a ceiling), and stop + page with the remainder when the cap is hit. v1 caps by task count, not token budget.
+- Consequences: easier — a simple integer the caller controls; bounded blast radius; harder — a task-count cap doesn't bound *cost* (one 30-min task can burn many tokens), so a token budget is vNext, and until then a pathological task can run long within a single cap slot.
+
+** TODO Metrics log location and format
+- Owner / by-when: Craig / spec-review
+- Context: Per-run metrics must land somewhere structured and queryable, per-project, and survive across sessions for the synthesis step to read.
+- Decision: We will append one JSONL record per task to a per-project log at =.ai/metrics/work-the-backlog.jsonl=, git-tracked, with the synthesis step reading the union across projects.
+- Consequences: easier — append-only JSONL is trivial to write and =jq=-queryable; per-project keeps it local to where the work happened; harder — a git-tracked log adds churn to every autonomous run's commit (or needs its own commit), and "union across projects" needs the synthesis step to know where every project's log lives.
+
+** TODO Synthesis cadence and trigger
+- Owner / by-when: Craig / spec-review
+- Context: Craig wants periodic org-roam articles summarizing the data. What triggers synthesis, and how often?
+- Decision: We will run synthesis on an explicit trigger ("synthesize backlog metrics") and optionally a weekly scheduled run, writing one KB node per synthesis under =~/org/roam/agents/= per the knowledge-base rule.
+- Consequences: easier — explicit trigger means no surprise writes, and the KB rule already governs node shape; harder — a weekly scheduled run needs a scheduler entry and the KB write-classification (personal-only) must gate it so work-project metrics never land in the KB.
+
+* Implementation phases
+
+** Phase 1 — Extract the execution loop into work-the-backlog.org
+Write =work-the-backlog.org= holding the eligibility gate, act-vs-file decision, per-task quality bar, and run-cap logic — taking a task set + session mode + cap as input. Remove the stopgap "Phase E" text from =inbox-zero.org= (restore it to its A-D shape) in the same change so there's one home, not two. Tree stays working: inbox-zero reverts to routing-only, and the new workflow is callable but not yet wired to the loop.
+
+** Phase 2 — Wire the two callers
+Add the loop caller's chain step (after inbox-zero Phase D, invoke work-the-backlog with the tag query + file-only + cap 1) and the "fix speedrun" preset (explicit list + autonomous-commit + always-push + paging-on). Both go through the same workflow. Tree stays working: each caller is independently testable.
+
+** Phase 3 — File-only vs autonomous-commit gate
+Implement the commit-autonomy branch: read the per-project waiver, degrade =autonomous-commit= to =file-only= when absent, surface the degrade. Tree stays working: default file-only behavior is the safe path even before the waiver-read lands.
+
+** Phase 4 — Guardrails and the page
+Implement the data-loss / design-decision refusal, the VERIFY-on-ambiguity filing, and the end-of-set =notify alarm ... --persist= page. Tree stays working: guardrails only ever *reduce* what runs, so adding them can't break a passing run.
+
+** Phase 5 — Metrics log
+Append the per-task JSONL record at each task outcome. Tree stays working: logging is a side effect that doesn't alter execution.
+
+** Phase 6 — Synthesis to org-roam
+Write the synthesis step: read the JSONL union, compute the per-run and trend metrics (below), write a KB node under =~/org/roam/agents/= per the knowledge-base rule, personal-projects-only classification enforced. Tree stays working: synthesis is read-only over the logs plus a KB write.
+
+* Acceptance criteria
+- [ ] =work-the-backlog.org= exists and is the only home for the execution loop; =inbox-zero.org= is back to its A-D routing-only shape with no Phase E.
+- [ ] The loop caller chains into work-the-backlog after routing; startup and wrap-up never invoke it.
+- [ ] "fix speedrun" runs as the preset (autonomous-commit + always-push + end-page) over an explicit ordered list, one commit per task.
+- [ ] A task tagged for autonomous execution but at status =VERIFY= / =DOING= / =DONE= / =CANCELLED= is skipped by the gate.
+- [ ] The eligibility tag set is read from the project's =todo.org= scheme header, not hardcoded.
+- [ ] In a project without the commit waiver, an =autonomous-commit= request degrades to file-only and says so; no commit is made.
+- [ ] A task carrying data-loss risk or needing a design decision is refused with a filed VERIFY, not implemented.
+- [ ] An underspecified / already-satisfied task files a VERIFY noting why and the run continues.
+- [ ] The run stops at the per-run cap and pages with the remaining tasks listed.
+- [ ] Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl=.
+- [ ] The synthesis step reads the logs and writes a KB node under =~/org/roam/agents/=; it refuses to write for work-classified projects.
+
+* Effectiveness measurement
+
+This section answers Craig's explicit ask: measure whether autonomous-batch execution is actually effective, and build the "gather data → org-roam articles" loop.
+
+** What "effective" means here
+
+The autonomy is effective if it completes real work that *stays* completed — i.e. tasks land green and the next session doesn't have to undo or fix them. The two failure modes to catch are (1) the loop defers everything (over-cautious, no value delivered) and (2) the loop implements badly (commits that get reverted or hand-corrected next session). Both are measurable.
+
+** Per-run metrics (the JSONL record)
+
+One record per task, appended to =.ai/metrics/work-the-backlog.jsonl= at each task outcome:
+
+| Field | Meaning |
+|-------------------+--------------------------------------------------------------------|
+| =ts= | ISO timestamp of the task outcome |
+|-------------------+--------------------------------------------------------------------|
+| =run_id= | UUID shared by all tasks in one run |
+|-------------------+--------------------------------------------------------------------|
+| =project= | project basename |
+|-------------------+--------------------------------------------------------------------|
+| =caller= | =loop= or =fix-speedrun= |
+|-------------------+--------------------------------------------------------------------|
+| =task= | task heading (slug) |
+|-------------------+--------------------------------------------------------------------|
+| =outcome= | implemented-committed / implemented-diff / deferred-verify / |
+| | deferred-too-large / skipped-ineligible |
+|-------------------+--------------------------------------------------------------------|
+| =defer_reason= | for deferrals: needs-input / data-loss / underspecified / too-large |
+|-------------------+--------------------------------------------------------------------|
+| =wall_clock_s= | seconds from task start to outcome |
+|-------------------+--------------------------------------------------------------------|
+| =commit_sha= | for committed tasks; empty otherwise |
+|-------------------+--------------------------------------------------------------------|
+| =review_findings= | count of /review-code Critical+Important findings on this task |
+|-------------------+--------------------------------------------------------------------|
+
+Per-run rollups computed at synthesis (not stored per record): tasks attempted, completed, VERIFY-deferred, reverted; wall-clock total; commits landed; review findings per commit.
+
+** The corrections signal (the key metric)
+
+The hardest and most valuable metric is *human corrections in the following session* — did Craig revert or hand-fix an autonomous commit? v1 captures the cheap proxy: at synthesis, for each =commit_sha=, check whether a later commit touching the same files reverted it or carries a "fix"/"revert" of that change within N days. A clean run is one where the autonomous commits survive untouched. (Auto-detecting "this later commit corrected that autonomous one" precisely is a vNext refinement; the proxy — reverted-or-touched-soon-after — is good enough to flag a problem run for human review.)
+
+** Where the data lands
+
+Per-project git-tracked JSONL at =.ai/metrics/work-the-backlog.jsonl=. Append-only, =jq=-queryable, survives across sessions and machines via the normal project sync. Git-tracked so the history is auditable and the synthesis step can read it from any clone.
+
+** The synthesis loop (gather → article)
+
+On the "synthesize backlog metrics" trigger (and optionally a weekly scheduled run):
+
+1. Read the JSONL union across the personal projects the synthesizer can see.
+2. Compute the rollups and the trend: completion rate over time, defer-reason distribution, review-findings-per-commit trend, and the corrections-signal flag count.
+3. Write one org-roam KB node under =~/org/roam/agents/YYYYMMDDHHMMSS-backlog-metrics-<window>.org= per the knowledge-base rule — filetags =:agent:metrics:=, a concise title, the rollup table, the trend narrative, and =[[id:...]]= links to prior synthesis nodes so the series is traceable.
+4. Enforce the KB write-classification: *personal projects only*. A work-classified project's metrics never write to the KB — they stay in that project's own =.ai/metrics/= log and the synthesizer reports the refusal per the KB refusal contract.
+
+The KB node is the artifact Craig reviews later — "are the autonomous runs completing more and getting corrected less over the last month?" reads off the trend table without re-querying raw logs.
+
+* Readiness dimensions
+
+- *Data model & ownership:* The task set is read from =todo.org= (project-owned, user-authored). The metrics JSONL is generated, append-only, git-tracked, project-owned. KB nodes are agent-generated under =~/org/roam/agents/= (never overwriting Craig's hand-authored nodes — link only). No editable region is co-owned.
+- *Errors, empty states & failure:* Empty task set → report "nothing eligible" and stop. Malformed scheme header → fall back to the default tag reading and surface the fallback. A task that fails mid-implementation → leave the tree working (don't commit a broken state), record the failure outcome, surface it, continue to the next task. No silent data loss: the data-loss guardrail refuses irreversible tasks outright.
+- *Security & privacy:* Tasks touching credentials or external mutations are excluded by the data-loss / external-state guardrail. The KB write is personal-projects-only; work metrics never leave the project. No secrets in the JSONL (task slugs and SHAs only).
+- *Observability:* The end-of-set page surfaces the run outcome. The per-task surface (implemented / deferred + reason / skipped) is the live progress view. The metrics log + KB synthesis is the long-run observability. A bad run is isolable from the JSONL (which task, which outcome, which review findings).
+- *Performance & scale:* Expected counts are small — a handful of tasks per run, one run per 30-min tick. No bottleneck at this scale. The cap bounds the worst case. Synthesis over months of JSONL is still a small file (one record per task).
+- *Reuse & lost opportunities:* Reuses =todo-format.md= for task close, =/review-code= and =/voice personal= for the quality bar, =notify= for paging, the knowledge-base rule for KB writes, the per-project waiver for commit-autonomy. No new config file (the opt-in rides the existing waiver). The execution loop is the one new shared asset.
+- *Architecture fit & weak points:* Integration points — inbox-zero loop caller (chain after Phase D), the per-project waiver location, =todo.org= scheme header, =~/org/roam/agents/=. Weak point: the commit-autonomy gate depends on deterministically reading the waiver; mitigated by defaulting to file-only when the read is ambiguous (fail safe, not open). Second weak point: a 30-min loop committing unattended; mitigated by the hard cap and file-only default.
+- *Config surface:* Per-project — commit-autonomy opt-in (via existing waiver), optional loop-commit flag, optional autonomous-safe tag override in the scheme header. Per-call — task set, session mode, run cap. Defaults: file-only, paging-off (loop) / paging-on (fix-speedrun), cap 1 (loop).
+- *Documentation plan:* The workflow file itself is the user/operator doc (matches inbox-zero.org's self-documenting style). The =.emacs.d= stopgap note and the fix-speedrun proposal are superseded by this spec; no separate migration doc needed beyond removing the Phase E text.
+- *Dev tooling:* N/A for new build targets — the workflows are prose, exercised by invocation. The metrics JSONL is =jq=-inspectable by hand; a tiny rollup helper may be added under =.ai/scripts/= if the synthesis prose proves to need it (decided at Phase 6, not a v1 prerequisite).
+- *Rollout, compatibility & rollback:* Rollout is removing Phase E from inbox-zero and adding work-the-backlog — both prose changes, instantly reversible. Compatibility: inbox-zero's three callers are unchanged except the loop caller gaining a forward chain. Rollback: delete work-the-backlog and the loop chain step; inbox-zero is already back to A-D. The file-only default means the worst pre-rollback state is surfaced diffs, not committed changes.
+- *External APIs & deps:* =notify alarm "Page" "<msg>" --persist= verified against =/home/cjennings/.local/bin/notify= and the page-me workflow. =~/org/roam/= KB write path and node shape verified against the knowledge-base rule. No external API calls.
+
+* Risks, Rabbit Holes, and Drawbacks
+
+- *The corrections signal is a proxy, not ground truth.* "A later commit touched the same files" over-counts (legitimate follow-up work) and under-counts (a correction in a different file). It's a flag for human review, not a verdict. Don't rabbit-hole on making it precise in v1 — the proxy plus a human glance is the design.
+- *Waiver detection drift.* If the per-project waiver location moves or its format changes, the commit-autonomy gate could mis-read. Mitigation: fail safe to file-only. Pin the waiver format in the Phase 3 decision before building.
+- *Unattended-commit blast radius.* The headline risk. Mitigated three ways: file-only default, the hard cap, and the data-loss guardrail. The metrics loop is the fourth layer — it makes a bad run visible after the fact even if the first three let something through.
+- *Scope creep into /start-work territory.* The temptation to let "≤ 30 min" stretch. The act-vs-file gate and the "when unsure, file" rule are the brake; keep them strict.
+
+* Testing / Verification / Rollout
+
+Verification is by invocation against a project's real =todo.org=: run the loop caller in file-only mode and confirm it surfaces diffs without committing; run fix-speedrun against a small explicit list in a waiver-carrying project and confirm one commit per task + the end page; plant a =VERIFY=-status task and a data-loss task and confirm both are skipped/refused; confirm the JSONL grows one record per task; run synthesis and confirm a KB node lands (personal project) or is refused (work project). Rollout is the Phase 1-6 sequence, each leaving the tree working; the file-only default makes early phases safe to ship before the commit and paging phases land.
+
+* References / Appendix
+
+- [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal (inbox-zero stopgap)]] and [[file:../../working/inbox-zero-phase-e/sender-note.org][its sender note with the 5 open questions]].
+- [[file:2026-06-15-fix-speedrun-workflow-proposal.org][fix-speedrun proposal]].
+- [[file:../../.ai/workflows/inbox-zero.org][inbox-zero.org (canonical, A-D)]] — the routing workflow this feature decouples from.
+- =~/code/rulesets/claude-rules/knowledge-base.md= — the org-roam write contract the synthesis step follows.
+
+* Review and iteration history
+** 2026-06-16 Tue — author
+- What: initial draft reconciling the Phase E and fix-speedrun proposals into one work-the-backlog.org feature, plus the effectiveness-measurement instrumentation.
+- Why: two overlapping proposals arrived within a day; building them separately would duplicate the execution loop and let it drift. Craig also asked explicitly for measurement + org-roam synthesis.
+- Artifacts: this spec; the two source proposals under docs/design/ and working/inbox-zero-phase-e/.
diff --git a/docs/design/2026-06-16-encourage-kb-contribution-spec.org b/docs/design/2026-06-16-encourage-kb-contribution-spec.org
new file mode 100644
index 0000000..059326c
--- /dev/null
+++ b/docs/design/2026-06-16-encourage-kb-contribution-spec.org
@@ -0,0 +1,188 @@
+#+TITLE: Encourage Org-Roam KB Contribution Across Workflows — Spec
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-06-16
+#+TODO: TODO | DONE SUPERSEDED CANCELLED
+
+* Metadata
+| Status | draft |
+|----------+------------------------------------------------|
+| Owner | Craig Jennings |
+|----------+------------------------------------------------|
+| Reviewer | Craig Jennings |
+|----------+------------------------------------------------|
+| Date | 2026-06-16 |
+|----------+------------------------------------------------|
+| Related | [[file:../../todo.org][rulesets todo.org]] |
+|----------+------------------------------------------------|
+
+* Summary
+
+The org-roam KB already exists (=knowledge-base.md=: =~/org/roam/agents/=, =:agent:= filetag, capture-then-promote, personal-vs-work write boundary), but nothing in the daily workflow loop encourages agents to use it. The wrap-up's =KB: promoted N / consulted yes-no= receipt is the only touchpoint, and it fires at the very end when the session's learnings have already faded. This feature wires four light prompts into the synced template workflows — startup, triage-intake, inbox-zero, wrap-it-up — plus one curated best-practices node in the KB, so contributing durable knowledge becomes a habit the workflows nudge rather than a rule agents forget.
+
+* Problem / Context
+
+The KB rule is sound but passive. An agent reads =knowledge-base.md= once at rule-load and then never gets reminded to consult or contribute, so the KB stays nearly empty and never reaches the critical mass where consulting it pays off. The compounding asset Craig wants — a cross-project store that gets more valuable as it grows — needs a contribution habit, and habits in this system come from workflow prompts, not from a rule sitting in the background.
+
+Three gaps:
+
+1. *No quality guidance.* =knowledge-base.md= says what goes in (durable facts) and where (=agents/= nodes), but not /how/ to write a good node — atomic, descriptively titled, linked. An agent following the rule literally can still produce a junk drawer of vague, unlinked notes that no future agent can find or trust.
+2. *No mid-session capture prompts.* Triage-intake and inbox-zero both surface durable signal (a recurring pattern across messages, a reference pointer worth keeping) and then drop it. Nothing tells the agent "that was worth a node."
+3. *The only contribution prompt is too late.* Wrap-up's KB promotion check runs in Step 1, after the session, when the agent is reconstructing learnings from the log rather than capturing them while fresh.
+
+* Goals and Non-Goals
+
+** Goals
+- Curate a best-practices node in the KB that teaches agents how to write good nodes, drawing on established note-taking guidance.
+- Link that node from startup with a light, one-line encouragement to contribute through the session.
+- Add a short end-of-flow KB reminder to triage-intake and inbox-zero.
+- Add an early wrap-up prompt that asks what the agent learned worth remembering, feeding the existing =KB: promoted N= receipt.
+- Keep every prompt light and non-blocking — encouragement, never a gate.
+
+** Non-Goals
+- *Not* changing =knowledge-base.md='s write boundary, schema, or the work/personal classification. The feature builds on that rule unchanged.
+- *Not* adding a blocking gate anywhere. No workflow stalls or fails because a node wasn't written.
+- *Not* automating node creation. The agent decides what's durable; the prompts only ask the question.
+- *Not* a second receipt or metric. Wrap-up's =KB: promoted N / consulted yes-no= line stays the single instrumentation point.
+- *Not* touching the wrap-up's existing Step 1 KB-promotion sub-section's schema — the new early prompt /feeds/ it, it doesn't replace it.
+
+** Scope tiers
+- v1: the four workflow edits + the one curated best-practices node. All synced templates, so the edits propagate to every project on next startup.
+- Out of scope: a contribution-rate dashboard, per-project KB stats, auto-suggesting nodes from session content.
+- vNext: a "consult the KB before this task" prompt in start-work / spec-create (deferred — log to todo.org).
+
+* Design
+
+The feature is four small prompt insertions plus one authored artifact. The design work is mostly about /placement/ and /wording/: these are synced templates, so a prompt that reads as nagging gets paid forward to every project on every run. The governing constraint is "light enough that an agent welcomes it, specific enough that it actually fires."
+
+** The best-practices node (the artifact)
+
+The node lives at =~/org/roam/agents/<timestamp>-agent-kb-best-practices.org=, authored by hand (not agent-generated), with the standard =:agent:reference:= filetags so it's a first-class KB node agents can find by the same =rg= the rule already documents. It is the one node startup links to, and the substance the workflow prompts point at instead of re-explaining note-taking inline.
+
+Its content is curated from the established note-taking literature — Sönke Ahrens' systematization of Luhmann's Zettelkasten, Andy Matuschak's evergreen-notes practice, and the org-roam community's own guidance — distilled to the handful of principles that matter for an /agent/ writing /durable facts/, not a human building a thinking environment. Proposed outline:
+
+1. *Why the KB exists* — one paragraph: a cross-project, cross-machine asset that compounds. Consulting it saves re-deriving; contributing to it pays the next agent forward.
+2. *One idea per node (atomicity).* Each node holds a single durable fact. Atomicity is what makes a note linkable and findable — a node about three things links cleanly to none of them. (Ahrens; zettelkasten.de atomicity guide.)
+3. *Descriptive, declarative titles.* The title states the claim, not the topic: "SSH auth routes through gpg-agent with a separate cache TTL" beats "SSH notes." A title you can read as a standalone statement is one a future agent can scan and trust without opening the node. (Matuschak evergreen notes; org-roam community practice.)
+4. *Link liberally.* Use =[[id:...]]= to connect a new node to related ones; the value is in the network, not the isolated note. Link to Craig's hand-authored nodes, never edit them. (Matuschak "densely linked"; the linking principle.)
+5. *Capture, then promote.* Harness memory is the fast capture layer; the KB is for facts that cleared the durability bar. Don't promote everything — promote what transfers. (Mirrors =knowledge-base.md='s capture-then-promote.)
+6. *What goes in / what stays out.* Restate the rule's inclusion bar tersely (durable, cross-project, the why behind a decision, environment gotchas, reference pointers) and the exclusion bar (session state, task state, high-churn facts, secrets, anything the repo already records).
+7. *The write boundary.* One line pointing at =knowledge-base.md=: personal projects only, work and unknown projects never write — with the refusal contract. The node /defers/ to the rule here rather than restating the denylist, so there's one source of truth for the boundary.
+8. *Sources.* The citations below, as a reference footer.
+
+Two-altitude note: for a /reading/ agent the node is "how do I tell a good node from a bad one before I trust it?"; for a /writing/ agent it's "what shape should this fact take before I commit it?" The outline serves both — principles 2-4 are the writing checklist, 6-7 are the reading/eligibility filter.
+
+** The four workflow prompts (placement + wording)
+
+Each is the minimum that fires reliably without nagging. Exact insertion points and proposed copy are in Implementation phases below; the design rationale per prompt:
+
+- *Startup (link + light encouragement).* Startup already reads =notes.org= and surfaces nudges in Phase C. The KB encouragement rides there as one line, not a new phase — it points at the best-practices node and frames the session's contribution as welcome, not required. It fires once per session at the top, setting the frame; the other three prompts collect on it.
+- *Triage-intake (end-of-flow reminder).* Placed at the very end of Phase D / Exit Criteria, after actions ship — the moment the agent has just seen a sweep's worth of signal and might recognize a durable pattern. One line, conditional in spirit ("if anything here was durable…"), never a blocking step before close-out.
+- *Inbox-zero (end-of-flow reminder).* Same shape, placed in Phase D (Surface) after the moved/folded/dropped report — the agent has just triaged a batch and may have spotted a reference pointer worth keeping.
+- *Wrap-up (early prompt feeding the existing receipt).* Placed at the /start/ of Step 1, before the Summary is finalized, while the session is fresh — "what did you learn worth remembering, for yourself or a future agent?" The answer flows into the existing Step 1 KB-promotion sub-section and its =KB: promoted N / consulted yes-no= receipt. The early prompt and the existing check are one pipeline: the prompt captures while fresh, the existing sub-section does the promotion and writes the receipt. No second receipt.
+
+** How the early wrap-up prompt feeds the existing receipt
+
+The existing wrap-up Step 1 already has a "KB promotion check" sub-section that asks the promotion question and writes =KB: promoted N / consulted yes-no=. The new early prompt is not a second check — it's a /relocation of the asking/ to the top of Step 1 so the question lands while the session is fresh rather than after the Summary is reconstructed. The existing sub-section keeps ownership of the actual promotion (writing the =agents/= nodes per schema) and the receipt line. Concretely: the early prompt asks and collects candidate facts into the session's working notes; the existing sub-section consumes those candidates, writes the nodes, and emits the one receipt. This avoids duplication by making the early prompt a /capture/ step and the existing check the /commit + receipt/ step of the same pipeline.
+
+* Alternatives Considered
+
+** A blocking gate ("you must write ≥1 node to wrap up")
+- Good, because it would guarantee contributions and grow the KB fast.
+- Bad, because it manufactures junk — agents would write a throwaway node to clear the gate, polluting exactly the asset the feature is meant to grow. It also fights the "light, non-nagging" constraint head-on.
+- Neutral, because the receipt already gives visibility into contribution rate without forcing it.
+
+** Inlining the best-practices guidance into each workflow prompt
+- Good, because the guidance is right there at the point of use; no indirection.
+- Bad, because it's four copies of the same note-taking advice in four synced templates — duplication that drifts, and four times the prompt length, which reads as nagging. One linked node keeps each prompt to one line.
+- Neutral, because a one-node-plus-links shape is exactly what the best-practices node /teaches/, so the design eats its own dogfood.
+
+** Putting the encouragement only in =knowledge-base.md= (no workflow edits)
+- Good, because it's the least change — one rule edit, no template churn.
+- Bad, because that's the status quo that produced the problem: a rule read once at load and then forgotten. Habits in this system come from workflow prompts, not background rules.
+- Neutral, because the rule still carries the authoritative boundary; the workflow prompts are the habit layer on top.
+
+* Decisions [/]
+
+** TODO Where exactly does the startup link land — Phase A read, Phase C nudge, or notes.org?
+- Owner / by-when: Craig / before implementation
+- Context: Startup has three candidate homes for the KB encouragement: a Phase A parallel read of the best-practices node (costs context every session), a Phase C surfaced nudge (one line, conditional, consistent with the existing roam-inbox and task-review nudges), or a static line in each project's =notes.org= Active Reminders (per-project, not synced, drifts). The Phase C nudge matches the established nudge pattern and costs nothing when there's nothing to say.
+- Decision: We will add the encouragement as a one-line Phase C nudge in startup.org, pointing at the best-practices node by its KB path, surfaced once near the other Phase C nudges.
+- Consequences: easier — consistent with existing nudge mechanics, synced to every project, no per-session read cost; harder — one more line competing for attention in the Phase C surface, so the wording has to earn its place and stay terse.
+
+** TODO Is the startup nudge unconditional, or gated on the KB clone being present?
+- Owner / by-when: Craig / before implementation
+- Context: =~/org/roam/= isn't on every machine. The existing roam-inbox nudge already guards on the clone's presence ([ -f ~/org/roam/inbox.org ]). An unconditional KB nudge would fire on machines where the agent can't act on it.
+- Decision: We will gate the startup nudge on the roam clone being present, reusing the existing presence check, so the encouragement only appears where the agent can act on it.
+- Consequences: easier — no dead nudge on KB-less machines, mirrors the roam-inbox guard; harder — one more conditional in Phase C, and a machine without the clone gets no encouragement at all (acceptable — it can't contribute there anyway).
+
+** TODO Does the early wrap-up prompt stop and ask Craig, or self-answer silently?
+- Owner / by-when: Craig / before implementation
+- Context: Wrap-up is meant to be quick — Craig already authorized the wrap, and the existing KB-promotion check self-answers (the agent decides what's durable; work projects skip the write). An early prompt that /stops and asks Craig/ "what did you learn?" would add an interactive turn to a flow designed not to have them. But a purely silent self-answer risks the agent skipping the reflection.
+- Decision: We will have the agent self-answer the early prompt — reflect on session learnings and stage candidate facts — without stopping to ask Craig, matching the wrap-up's no-extra-turns design; the candidates flow into the existing promotion check which writes the nodes and receipt.
+- Consequences: easier — preserves wrap-up cadence, no new interactive gate, one pipeline from reflect to receipt; harder — relies on the agent actually reflecting rather than rubber-stamping "nothing learned," which the receipt makes visible over time but doesn't enforce.
+
+** TODO Do triage-intake and inbox-zero reminders fire every run, or only when the run surfaced something durable?
+- Owner / by-when: Craig / before implementation
+- Context: Both workflows run frequently (triage-intake between meetings, inbox-zero twice a session). A reminder on /every/ run is the textbook nag-fatigue failure — a line the agent learns to skip. A reminder gated on "this run surfaced a pattern / reference pointer worth keeping" fires rarely and stays meaningful, but requires the agent to make that judgment, which is softer than a mechanical condition.
+- Decision: We will make both reminders conditional in spirit — a single line phrased as "if anything here was durable, write it to the KB" that the agent acts on only when the run actually surfaced something, rather than an unconditional step; an all-quiet triage sweep or an empty inbox-zero run emits no KB line.
+- Consequences: easier — the reminder stays rare and credible, never pads a no-change sweep, fits triage-intake's deltas-only discipline; harder — "durable-looking" is an agent judgment with no mechanical check, so the reminder's effectiveness rides on the best-practices node teaching that judgment well.
+
+** TODO Best-practices node: agent-authored once, or hand-authored by Craig?
+- Owner / by-when: Craig / before implementation
+- Context: =knowledge-base.md= says agents never edit Craig's hand-authored nodes. The best-practices node is /about/ how agents write nodes — if an agent authors it, future agents may treat it as fair game to edit; if Craig hand-authors it, it's protected and stable but he writes it. Given it's a foundational reference the whole feature points at, stability matters.
+- Decision: We will have Craig hand-author the best-practices node from the outline in this spec, so it's a protected, stable reference; the spec supplies the full drafted content for him to review and commit.
+- Consequences: easier — the node is stable and protected from agent edits, one authoritative reference; harder — Craig writes (or reviews-and-commits) it rather than delegating, and updates to it are his call, not an agent's.
+
+* Implementation phases
+
+** Phase 1 — Author the best-practices node
+Write =~/org/roam/agents/<timestamp>-agent-kb-best-practices.org= from the outline in Design, with a generated =:ID:=, =#+title:=, =:filetags: :agent:reference:=, the eight content sections, =[[id:...]]= links to any existing related =:agent:= nodes, and the sources footer. Commit + push the roam repo per =knowledge-base.md='s session discipline. Leaves the KB with one new reference node and nothing else touched.
+
+** Phase 2 — Wire the startup encouragement
+Add the one-line Phase C nudge to =claude-templates/.ai/workflows/startup.org= (canonical side), gated on the roam-clone presence check, pointing at the node by path. Run =scripts/sync-check.sh --fix=, commit both canonical + mirror. Propagates to every project on next startup.
+
+** Phase 3 — Wire the three remaining prompts
+Add the end-of-flow KB reminder to =triage-intake.org= (end of Phase D / Exit Criteria) and =inbox-zero.org= (Phase D Surface), and the early KB prompt to =wrap-it-up.org= (top of Step 1, feeding the existing promotion check). All on the canonical side, then sync-check + commit. Each edit is one short block; the tree stays working after each.
+
+** Phase 4 — Verify propagation + receipt linkage
+Confirm the four edits survive a startup sync into a test project, the wrap-up early prompt's output reaches the existing =KB: promoted N / consulted yes-no= receipt (no duplicate receipt), and the best-practices node is reachable by the =rg= the rule documents.
+
+* Acceptance criteria
+- [ ] Best-practices node exists at =~/org/roam/agents/= with =:agent:reference:= tags, is found by =rg '#\+filetags:.*:agent:' ~/org/roam/=, and cites its sources.
+- [ ] Startup surfaces a single KB-contribution line in Phase C, gated on the roam clone, pointing at the node — and stays silent when the clone is absent.
+- [ ] Triage-intake and inbox-zero each emit one KB reminder line only when the run surfaced something durable; an all-quiet run emits none.
+- [ ] Wrap-up asks the "what did you learn?" reflection early in Step 1, and its candidates feed the existing promotion check — producing exactly one =KB: promoted N / consulted yes-no= receipt, not two.
+- [ ] No workflow blocks, stalls, or fails because a node wasn't written.
+- [ ] All four workflow edits are on the canonical =claude-templates/.ai/= side, mirror synced, sync-check clean.
+
+* Readiness dimensions
+- Data model & ownership: KB nodes are agent-written under =agents/=; the best-practices node is Craig-authored and protected. No new persisted state beyond the one node and the four template edits. Wrap-up receipt ownership unchanged.
+- Errors, empty states & failure: roam clone absent → all KB prompts silently no-op (reuse existing presence guards). Work/unknown project → write boundary in =knowledge-base.md= still refuses with its contract; prompts fire but the agent declines to write per the rule. No silent data loss — nothing is deleted.
+- Security & privacy: no secrets in nodes (rule's exclusion bar). Work-confidential facts never written (the boundary). The best-practices node is reference-only, no sensitive content.
+- Observability: the existing =KB: promoted N / consulted yes-no= receipt is the single metric; grepping session archives for =KB:= answers "are agents using this?" No new instrumentation added.
+- Performance & scale: four one-line prompts; negligible. The startup nudge is a Phase C surface line, not a Phase A read, so no per-session context cost from loading the node.
+- Reuse & lost opportunities: reuses the existing Phase C nudge pattern, the roam-clone presence guard, the wrap-up promotion check + receipt, and =knowledge-base.md='s boundary. Nothing reinvented.
+- Architecture fit & weak points: the four workflows are synced templates; canonical-vs-mirror edit discipline applies (CLAUDE.md). Weak point — nag fatigue if the reminders fire unconditionally; mitigated by the conditional-in-spirit decision. Weak point — the reminders rely on agent judgment ("durable-looking"); mitigated by the best-practices node teaching that judgment.
+- Config surface: none. No new knobs; the prompts are unconditional copy gated only on the existing roam-clone check.
+- Documentation plan: the best-practices node /is/ the user-facing doc. =knowledge-base.md= stays the authoritative rule; this feature adds no new rule file. No migration doc needed.
+- Dev tooling: =scripts/sync-check.sh --fix= keeps canonical + mirror aligned (enforced by =githooks/pre-commit=). =make test= covers the repo's existing gates; no new test target needed for prose-only workflow edits.
+- Rollout, compatibility & rollback: edits propagate via the startup rsync to every project on next session — no migration. Rollback is reverting the four template edits + deleting the node; nothing persisted depends on them. Fully reversible.
+- External APIs & deps: none — no API calls, no new dependencies. The only external surface is the =~/org/roam/= git repo, already in use by the rule.
+
+* Risks, Rabbit Holes, and Drawbacks
+- *Nag fatigue* — the central risk. Four prompts across four frequently-run workflows can train agents to skip them. Dodge: one line each, conditional in spirit, the startup line gated, the triage/inbox reminders firing only on real signal. If the receipt shows agents tuning them out, cut the lowest-value prompt rather than adding more.
+- *Junk-node accumulation* — encouraging contribution without a quality bar grows a junk drawer. Dodge: the best-practices node /is/ the quality bar, and the exclusion list keeps high-churn / session-state facts out. Craig prunes at will (the rule already grants this).
+- *Receipt double-counting* — if the early wrap-up prompt writes its own receipt, the metric breaks. Dodge: the early prompt is explicitly a capture step feeding the existing check; only the existing sub-section emits the receipt. Acceptance criterion guards this.
+
+* References / Appendix
+Sources for the best-practices node's curated content:
+- Sönke Ahrens, /How to Take Smart Notes/ — atomicity, own-words, linking: [[https://www.soenkeahrens.de/en/takesmartnotes][soenkeahrens.de]]; principle of atomicity: [[https://zettelkasten.de/atomicity/guide/][zettelkasten.de atomicity guide]].
+- Andy Matuschak, /Evergreen notes/ — concept-oriented, densely linked, write for yourself: [[https://notes.andymatuschak.org/Evergreen_notes_should_be_concept-oriented][notes.andymatuschak.org]].
+- Org-roam community practice — declarative titles, atomic nodes, capture-then-refine: [[https://www.orgroam.com/manual.html][Org-roam manual]]; [[https://lucidmanager.org/productivity/taking-notes-with-emacs-org-mode-and-org-roam/][lucidmanager.org org-roam guide]].
+- Existing rule this builds on: =~/code/rulesets/claude-rules/knowledge-base.md=.
+
+* Review and iteration history
+** 2026-06-16 Tue — author
+- What: initial draft.
+- Why: Craig wants the org-roam KB to compound into a cross-project asset; needs the workflow wiring + curated best-practices node speced before building.
+- Artifacts: this spec; four target workflows (startup, triage-intake, inbox-zero, wrap-it-up); =knowledge-base.md=.
diff --git a/todo.org b/todo.org
index 4b9f452..481eb83 100644
--- a/todo.org
+++ b/todo.org
@@ -34,36 +34,7 @@ Tags are assigned and refreshed by =task-audit=; =task-review= keeps them honest
* Rulesets Open Work
-** VERIFY [#B] Parked: Phase E (autonomous task execution) for inbox-zero.org (from .emacs.d)
-:PROPERTIES:
-:CREATED: [2026-06-16 Tue]
-:END:
-What arrived: .emacs.d proposes adding a "Phase E — Execute actionable tagged tasks" to the synced =inbox-zero.org= so the on-demand/loop callers autonomously implement eligible =:next:= / =:quick:+:solo:= tasks after routing the roam inbox. Arrived in no-approvals mode, so it defers-and-stages per the Skeptical Review gate rather than self-applying.
-
-Recommendation: don't apply as-is — work it as a spec. It assumes .emacs.d's per-project commit waiver (most projects lack it, so canonical Phase E must default to file-only / surface-diff and gate auto-commit on an explicit per-project opt-in), hardcodes the eligibility tags instead of reading the project's priority/tag scheme, leaves the do-not-implement marker set and a kill-switch / per-run cap undefined, and the sender flags the seam question: autonomous execution may belong in a separate =work-the-backlog.org= chained after inbox-zero, keeping inbox-zero's three callers clean. It overlaps the "fix speedrun" autonomous-batch task filed 2026-06-15 ([#D] below) — both encode autonomous backlog execution and should be reconciled, likely into one spec.
-
-Prepared change: [[file:working/inbox-zero-phase-e/proposed-inbox-zero.org]] + [[file:working/inbox-zero-phase-e/proposed.diff]] + [[file:working/inbox-zero-phase-e/sender-note.org]]. Sender notified it's parked. Say "approve the parked Phase E" (or "spec it" / adjust / reject) to work it.
-
-** TODO [#C] Encourage org-roam KB contribution across workflows :feature:
-:PROPERTIES:
-:CREATED: [2026-06-16 Tue]
-:END:
-From the roam global inbox (Craig, 2026-06-16). Encourage agents to keep durable, strategic knowledge in the org-roam KB so it compounds into a cross-project asset:
-- Curate a best-practices node (good note-taking + org-roam practices, drawing on established advice) and link it from =startup.org= with encouragement to contribute through the session.
-- Add a reminder at the end of =triage-intake.org= and =inbox-zero.org= to store strategic / durable / useful info in the KB.
-- Add an early =wrap-it-up.org= prompt asking the agent what it learned worth remembering, then to write it to the KB before proceeding.
-Touches four synced template workflows and needs a curation pass on the best-practices content, so it's a design task — not a loop auto-implement. Filed from a =:next:=-tagged roam item; the eligibility tag was dropped on filing because the work needs a design decision (see the loop guardrail). Pairs with [[file:claude-rules/knowledge-base.md]] and the agent-knowledge-base spec.
-
-** DOING [#B] Wrap-up inbox/transcript routing to destination projects :feature:spec:
-:PROPERTIES:
-:CREATED: [2026-06-13 Sat]
-:LAST_REVIEWED: 2026-06-15
-:END:
-Optional wrap-up step that surfaces filed keepers belonging to another project, recommends a destination, and batch-moves them into that project's =todo.org= Open Work section (transcript filing deferred to vNext). All six decisions resolved (Reading B: the router acts on session-filed keepers, separate from the inbox gate and from defer-and-stage). Spec ready for review.
-
-Spec: [[file:docs/design/wrapup-routing-spec.org]]. Source proposal: [[file:docs/design/2026-06-13-wrapup-inbox-transcript-routing-proposal.org]] (archsetup handoff 2026-06-13). Next: =spec-review=.
-
-** DOING [#B] Helper-instance support — concurrent same-project Claude :feature:spec:
+** VERIFY [#B] Helper-instance support — concurrent same-project Claude :feature:spec:
:PROPERTIES:
:CREATED: [2026-06-11 Thu]
:LAST_REVIEWED: 2026-06-15
@@ -104,7 +75,43 @@ Stand up a drill rig before the gated work; build against it, don't touch synced
OPEN QUESTION to answer first (Craig, 2026-06-15): doesn't helper-instance support depend on generic agent runtime support? Resolve before treating the wiring as unblocked. Starting point: the spec frames this work as Phase 1.5, "Independent of the spec's phases 2-6 (runtime-neutral refactor), which stay gated on their own go/no-go," and the body claims it sits only on the already-shipped session-context split. The separate =Generic agent runtime support — Codex spec v0= task (#C, below) is that phases-2-6 arc. So the spec's stated answer is "no, 1.5 is independent" — but confirm that's actually true for every wiring slice (does ai --helper, the roster branch, or helper-mode routing secretly assume any runtime-manifest / multi-runtime machinery from 2-6?), or whether helper-instance should be sequenced after, or merged into, the generic-runtime task. Don't build the gated wiring until this is settled.
-** DOING [#C] Check that memories are sync'd across machines via git :spec:
+** 2026-06-16 Tue @ 00:53:36 -0500 Phase E spec'd — folded into the autonomous-batch spec
+:PROPERTIES:
+:CREATED: [2026-06-16 Tue]
+:END:
+Craig's answer (2026-06-16): spec it. Phase E reconciles with the "fix speedrun" proposal into one feature — see [[file:docs/design/2026-06-16-autonomous-batch-execution-spec.org][the autonomous-batch execution spec]]: a dedicated =work-the-backlog.org= holds the execution loop, inbox-zero keeps its A-D routing, and "fix speedrun" is a thin preset over the same loop. The prepared Phase E change stays under [[file:working/inbox-zero-phase-e/]] as a source. Tracked from here under the "fix speedrun" / autonomous-batch task below, where the spec-review VERIFY lives.
+
+
+** DOING [#B] Wrap-up inbox/transcript routing to destination projects :feature:spec:
+:PROPERTIES:
+:CREATED: [2026-06-13 Sat]
+:LAST_REVIEWED: 2026-06-15
+:END:
+Optional wrap-up step that surfaces filed keepers belonging to another project, recommends a destination, and batch-moves them into that project's =todo.org= Open Work section (transcript filing deferred to vNext). All six decisions resolved (Reading B: the router acts on session-filed keepers, separate from the inbox gate and from defer-and-stage). Spec ready for review.
+
+Spec: [[file:docs/design/wrapup-routing-spec.org]]. Source proposal: [[file:docs/design/2026-06-13-wrapup-inbox-transcript-routing-proposal.org]] (archsetup handoff 2026-06-13). Next: =spec-review=.
+
+#+begin_src cj: comment
+ I approved the spec in the spec document. please take it through the rest of the spec response process to implementation. bp
+#+end_src
+
+** DOING [#C] Encourage org-roam KB contribution across workflows :feature:
+:PROPERTIES:
+:CREATED: [2026-06-16 Tue]
+:END:
+From the roam global inbox (Craig, 2026-06-16). Encourage agents to keep durable, strategic knowledge in the org-roam KB so it compounds into a cross-project asset:
+- Curate a best-practices node (good note-taking + org-roam practices, drawing on established advice) and link it from =startup.org= with encouragement to contribute through the session.
+- Add a reminder at the end of =triage-intake.org= and =inbox-zero.org= to store strategic / durable / useful info in the KB.
+- Add an early =wrap-it-up.org= prompt asking the agent what it learned worth remembering, then to write it to the KB before proceeding.
+Touches four synced template workflows and needs a curation pass on the best-practices content, so it's a design task — not a loop auto-implement. Filed from a =:next:=-tagged roam item; the eligibility tag was dropped on filing because the work needs a design decision (see the loop guardrail). Pairs with [[file:claude-rules/knowledge-base.md]] and the agent-knowledge-base spec.
+
+*** 2026-06-16 Tue @ 00:53:36 -0500 Spec written for review
+Drafted [[file:docs/design/2026-06-16-encourage-kb-contribution-spec.org][the KB-contribution spec]]: four light workflow prompts (startup nudge, triage-intake + inbox-zero end-of-flow reminders, an early wrap-up reflection feeding the existing KB receipt) plus one Craig-authored best-practices node curated from Ahrens / Matuschak / org-roam guidance. Five open sub-decisions filed as decisions-as-TODO in the spec.
+*** VERIFY Review the KB-contribution spec
+Review [[file:docs/design/2026-06-16-encourage-kb-contribution-spec.org]] and ratify (or adjust) its five open decisions. Implementation-ready once no decision is still TODO.
+
+
+** VERIFY [#C] Check that memories are sync'd across machines via git :spec:
:PROPERTIES:
:LAST_REVIEWED: 2026-06-15
:END:
@@ -188,7 +195,7 @@ First of the 10 broadcast projects to report Phase 1.5 done (handoff 18:23). Inv
*** 2026-06-12 Fri @ 02:25:12 -0500 Five more sweeps complete via the home folds
Overnight handoffs from home closed five more broadcast targets, each swept at fold-time triage with Craig's approval: jr-estate 2 promoted (forms name-with-number, PDF-editing tooling split; roam 45d8e6c) / 3 kept with area attribution / 2 deleted as rule-encoded or duplicate; finances 0/1/0 (rosalea-daly contact fact kept local); elibrary 0/0/2, health 0/0/1, kit 1/0/2 (hand-prep-items-to-work-inbox promoted into home's memory; the rest duplicated rules or home memories). Nothing from these five met the KB bar that wasn't already encoded. All folded projects' session archives merged area-prefixed into home's .ai/sessions/, so session-harvest's first run sees them. Home covers its own and remaining areas' sweeps through ongoing discipline; still pending from the broadcast: archsetup and work.
-*** TODO Agent KB — manual testing and validation :test:
+*** VERIFY Agent KB — manual testing and validation :test:
What we're verifying: the v1 acceptance surface that needs Craig's eyes or a live cross-project session. Run after Phases 0-2 land.
- Seed node appears in org-roam (autosync) and in the =rg '#\+filetags:.*:agent:'= inventory.
- In the work project, a durable-storage request produces no write in the KB and the refusal report names the fact.
@@ -208,6 +215,100 @@ A scheduled headless morning run chaining the existing pieces: startup checks, t
The triage limb can reuse triage-intake's *auto mode* (added 2026-06-15, see [[file:.ai/workflows/triage-intake.org]]) — its accumulate-don't-mutate sweep is the propose-only behavior this orchestrator wants. Auto mode itself runs in-session (inherited MCP auth); the orchestrator is the durable headless schedule, so the headless-auth blocker above is the part still on this task to solve.
+** TODO [#C] Token-rotation helper for =@a-bonus/google-docs-mcp= OAuth refresh :feature:quick:
+:PROPERTIES:
+:LAST_REVIEWED: 2026-06-15
+:END:
+
+When a Google refresh token gets revoked (re-grant scopes, removed Connected App, account password reset), recovery is currently manual: run =npx -y @a-bonus/google-docs-mcp= with the right env, follow the URL in a browser, kill the process, base64-encode the new =token.json=, decrypt =secrets.env.gpg=, replace the var, re-encrypt. A small =mcp/refresh-google-docs-token.sh <profile>= would chain that into one command.
+
+*** Sketch
+
+#+begin_src bash
+# usage: mcp/refresh-google-docs-token.sh personal
+profile="$1"
+gpg -d ... | grep -v "GOOGLE_DOCS_${profile^^}_TOKEN_B64" > /tmp/secrets.env.tmp
+GOOGLE_MCP_PROFILE="$profile" npx -y @a-bonus/google-docs-mcp &
+xdg-open <captured-url>
+# wait for ~/.config/google-docs-mcp/$profile/token.json to land
+kill %1
+echo "GOOGLE_DOCS_${profile^^}_TOKEN_B64=$(base64 -w0 ~/.config/google-docs-mcp/$profile/token.json)" >> /tmp/secrets.env.tmp
+gpg -c --cipher-algo AES256 -o mcp/secrets.env.gpg.new /tmp/secrets.env.tmp
+mv mcp/secrets.env.gpg.new mcp/secrets.env.gpg
+rm /tmp/secrets.env.tmp
+#+end_src
+
+The flow tonight worked but took a handful of manual steps. One script collapses it.
+
+Decision (Craig, 2026-05-31): *hold until a token rotation is imminent.* The OAuth re-grant is a browser step that can't be triggered without revoking a live token, so the script can't be verified in isolation. Not marked =:solo:= — when a token actually needs rotating, write and verify in one pass (solo at that point).
+
+** TODO [#C] Generic agent runtime support — Codex spec v0 :spec:design:
+:PROPERTIES:
+:LAST_REVIEWED: 2026-06-15
+:END:
+Codex drafted a v0 design doc for making rulesets runtime-neutral rather than Claude-Code-specific. Motivating cases: offline operation with a local LLM, and two LLMs running in the same project at the same time without trampling each other's session-context.
+
+Spec at [[file:docs/design/2026-05-28-generic-agent-runtime-spec.org]] (moved here from inbox on intake).
+
+Immediate correctness issue Codex flagged: the singleton .ai/session-context.org is unsafe under simultaneous agents. Codex recommends starting with Phase 1 only — add AI_AGENT_ID + session-context.d/<id>.org without renaming the rest.
+
+Broader refactor proposes runtimes/ adapter manifests, generic install commands, language-bundle split (common/ + runtimes/<runtime>/), launcher refactor, local model service via llama.cpp/ollama. Big surface area, six phases.
+
+2026-06-12 spec review complete: [[file:docs/design/2026-05-28-generic-agent-runtime-spec-review.org][Codex review]] rubric for the whole spec is =Not ready=. Phase 1 is already shipped, and Phase 1.5 is tracked separately as the helper-instance task. Before any phases 2-5 implementation, decide whether to commit to the larger arc and answer the blocker decisions: generic instruction-file strategy, default local runtime/server, first supported local editing CLI, adapter scope, and compatibility behavior for existing =CLAUDE.md= / =.claude/= projects.
+
+*** 2026-06-10 Wed @ 14:13:55 -0500 Noted Phase 1 already shipped; narrowed scope to the phases 2-6 decision
+Phase 1 (the correctness fix) is live: protocols.org documents the AI_AGENT_ID-scoped session-context path (=.ai/session-context.d/<id>.org=) and =.ai/scripts/session-context-path= resolves it. The singleton race Codex flagged is closed. What remains is the spec review plus a go/no-go on the broader runtime-neutral refactor: runtimes/ adapter manifests, generic install commands, language-bundle split, launcher refactor, local model service.
+
+*** 2026-06-11 Thu @ 19:26:26 -0500 Spec amended with the helper-instance slice; implementation split out
+Craig's motivating case (a second Claude in the same project for lookups and safe task updates) was under-specified in v0 — it had identity and message targeting but no spawn mechanics and no write-safety contract for the shared files the session-context split doesn't isolate. Added the "Concurrent same-project agents (helper instances)" section (subagent boundary, identity/spawn via =ai --helper=, the tiered read/write contract, light startup, helper wrap-up) and Phase 1.5 to the migration plan. Implementation filed as its own [#B] task ("Helper-instance support"); this task stays scoped to the phases 2-6 go/no-go.
+
+*** 2026-06-12 Fri @ 02:09:10 -0500 Independent spec review complete
+Codex ran the spec-review workflow. Outcome: the combined spec is =Not ready= because phases 2-5 still require product decisions and current external-runtime/model verification. Phase 1.5 can proceed only as the already-split helper task, with rollout/manual-validation caveats accepted and no accidental template-wide release before sandbox/pilot drills pass. Review file: [[file:docs/design/2026-05-28-generic-agent-runtime-spec-review.org]].
+
+*** 2026-06-12 Fri @ 02:39:38 -0500 Second review after response pass
+Codex re-ran spec-review after the dispositions were folded in. Outcome by arc: Phase 1.5 helper instances =Ready with caveats=; phases 2-5 remain =Not ready= behind the explicit decisions/reverification gate. No new blocking findings for the helper slice. Review file updated in place: [[file:docs/design/2026-05-28-generic-agent-runtime-spec-review.org]].
+
+** TODO [#C] Spec storage location + lifecycle-status convention :spec:
+:PROPERTIES:
+:CREATED: [2026-06-15 Mon]
+:END:
+Two coupled documentation conventions for rulesets to adopt, surfaced by .emacs.d while triaging ~28 design docs. Both land in =spec-create= ([[file:.ai/workflows/spec-create.org]]) and likely a new =docs-lifecycle= rule under =claude-rules/=. Source proposal: [[file:docs/design/2026-06-15-spec-storage-lifecycle-proposal.org]] (.emacs.d handoff 2026-06-15).
+
+The two conventions:
+- *Location split* — formal specs live in =docs/specs/=; =docs/design/= keeps working notes, brainstorms, inventories, reviews. A spec is a doc proposing a buildable change with a Decisions section and phases; everything else is a note.
+- *Glanceable lifecycle status* — a spec's state (draft / doing / implemented / superseded / cancelled) is visible without opening the file, plus an authoritative in-file record.
+
+Recommendation captured now so the thinking isn't lost; it migrates into the spec when this is worked. We handle the task in priority order.
+
+*** Recommendation (draft — decide when worked, migrate into the spec)
+1. *Location split — adopt.* Low controversy, clear payoff. =docs/specs/= for formal specs, =docs/design/= for notes. Document in spec-create and the docs-lifecycle rule.
+2. *Status mechanism — the real fork.* Two options: filename suffix (=-spec-doing.org=, Craig's idea, ls-visible but every transition is a rename that breaks =[[file:...]]= links) vs the org-TODO keyword on the spec's top heading (specs already carry =#+TODO: TODO | DONE SUPERSEDED CANCELLED=; link-stable, zero-rename, org-agenda-scannable, but not visible in =ls=). My lean is the org-keyword as authoritative + a Status field in the Metadata table, dropping the filename suffix — the suffix is redundant with the Status field and adds rename churn across a heavily cross-linked, template-synced doc set. This diverges from Craig's stated filename-suffix preference, so it's teed up as a decision, not settled. Decide deliberately before building.
+3. *Link safety — adopt =org-id= ([[id:...]]) for cross-doc spec links* regardless of which status mechanism wins. It decouples link stability from the status decision. Mandatory if the filename suffix wins; good hygiene either way. The alternative — a move/rename/relink/stamp helper run on each transition — is only needed if the suffix wins and org-id is rejected.
+4. *Generalize after the mechanism settles.* The shape (lifecycle state in name-or-location, authoritative in-artifact status, rename-safe links, formal-vs-notes split) is reusable beyond specs. Capture it as a general =docs-lifecycle= convention in =claude-rules/= with spec-create as the first instance — but don't generalize an unsettled convention.
+
+Follow-up once decided: update spec-create to emit into =docs/specs/= with the chosen status mechanism; retrofit existing specs; optionally add the relink helper as a =.ai/scripts/= addition (downstream projects get it via template sync); send a note back if .emacs.d should pilot before generalizing.
+
+** DOING [#C] "fix speedrun" cross-project autonomous-batch mode :feature:spec:
+:PROPERTIES:
+:CREATED: [2026-06-15 Mon]
+:END:
+A named mode for coding projects: Craig names an ordered task set and says "fix speedrun"; the set is worked autonomously, each task held to the full quality bar (TDD red→green, =/review-code=, =/voice= on the commit) and committed + pushed as its own logical commit, with a VERIFY filed instead of guessing on anything underspecified, and an end-of-set page listing completed + remaining tasks. Surfaced by .emacs.d from a 2026-06-15 theme-studio session where the shape worked. Source proposal: [[file:docs/design/2026-06-15-fix-speedrun-workflow-proposal.org]] (.emacs.d handoff 2026-06-15). Build via =spec-create= when worked; we handle the task in priority order.
+
+Skeptical-review read (open design questions to resolve in the spec, not settled here):
+- *Is it a new workflow or a documented preset?* The proposal frames it as no-approvals + always-push session modes plus an end page. Decide whether it needs its own workflow file or is mostly documentation of a preset over the two existing modes.
+- *Where/how the page fires* — every task vs end-of-set, and via what. The paging surface is in flux (=page-signal= removed 2026-06-12), so reconcile against =notify --persist= or whatever paging stands now.
+- *Auto-pull vs explicit list* — whether the set comes from an explicit ordered list or a tag/priority query.
+- *Guardrails* — must refuse to speedrun tasks needing design decisions or carrying data-loss risk without a checkpoint (the sender's biased-safe unused-tile flag is the worked example).
+
+*** 2026-06-16 Tue @ 00:53:36 -0500 Spec written; design questions answered
+Craig's "your call" (2026-06-16) answered in [[file:docs/design/2026-06-16-autonomous-batch-execution-spec.org][the autonomous-batch execution spec]], which reconciles this with Phase E into one feature:
+- *Most effective / workflow-vs-preset:* one dedicated =work-the-backlog.org= workflow holds the execution loop; "fix speedrun" is a thin named preset (no-approvals + always-push + end page) feeding it an explicit list, and the inbox-zero loop feeds it a tag query. Pros of the shared workflow: one execution loop to audit, inbox-zero's three callers stay clean, both input shapes reuse one guardrail set. Cons: one more workflow file and a caller-to-workflow indirection. The con list is shorter and lighter than the duplication cost of two separate features, which is why the shared workflow wins. The pros carry the more important entries (single audit surface, clean seam).
+- *Paging:* end-of-set only, via =notify ... --persist= (reconciled past the removed page-signal wrapper).
+- *Auto-pull vs explicit list:* both — explicit list for the preset, tag/priority query for the loop.
+- *Effectiveness measurement (the trial Craig asked for):* the spec designs a per-task JSONL metrics log (=.ai/metrics/work-the-backlog.jsonl=), a corrections-in-next-session signal, and a periodic synthesis step that writes =:agent:metrics:= org-roam articles for later review — the "gather data + create org-roam articles" loop.
+*** VERIFY Review the autonomous-batch execution spec
+Review [[file:docs/design/2026-06-16-autonomous-batch-execution-spec.org]] (covers both this and Phase E) and ratify (or adjust) its six open decisions. Implementation-ready once no decision is still TODO.
+
** TODO [#D] Build =create-documentation= skill for high-quality project/product docs :feature:
:PROPERTIES:
:LAST_REVIEWED: 2026-06-15
@@ -1048,91 +1149,6 @@ having a skill to generate or check OV-1-shaped artifacts. Don't build
speculatively — defense-specific notations are narrow enough that each
skill should be driven by a concrete contract need, not aspiration.
-** TODO [#C] Token-rotation helper for =@a-bonus/google-docs-mcp= OAuth refresh :feature:quick:
-:PROPERTIES:
-:LAST_REVIEWED: 2026-06-15
-:END:
-
-When a Google refresh token gets revoked (re-grant scopes, removed Connected App, account password reset), recovery is currently manual: run =npx -y @a-bonus/google-docs-mcp= with the right env, follow the URL in a browser, kill the process, base64-encode the new =token.json=, decrypt =secrets.env.gpg=, replace the var, re-encrypt. A small =mcp/refresh-google-docs-token.sh <profile>= would chain that into one command.
-
-*** Sketch
-
-#+begin_src bash
-# usage: mcp/refresh-google-docs-token.sh personal
-profile="$1"
-gpg -d ... | grep -v "GOOGLE_DOCS_${profile^^}_TOKEN_B64" > /tmp/secrets.env.tmp
-GOOGLE_MCP_PROFILE="$profile" npx -y @a-bonus/google-docs-mcp &
-xdg-open <captured-url>
-# wait for ~/.config/google-docs-mcp/$profile/token.json to land
-kill %1
-echo "GOOGLE_DOCS_${profile^^}_TOKEN_B64=$(base64 -w0 ~/.config/google-docs-mcp/$profile/token.json)" >> /tmp/secrets.env.tmp
-gpg -c --cipher-algo AES256 -o mcp/secrets.env.gpg.new /tmp/secrets.env.tmp
-mv mcp/secrets.env.gpg.new mcp/secrets.env.gpg
-rm /tmp/secrets.env.tmp
-#+end_src
-
-The flow tonight worked but took a handful of manual steps. One script collapses it.
-
-Decision (Craig, 2026-05-31): *hold until a token rotation is imminent.* The OAuth re-grant is a browser step that can't be triggered without revoking a live token, so the script can't be verified in isolation. Not marked =:solo:= — when a token actually needs rotating, write and verify in one pass (solo at that point).
-
-** TODO [#C] Generic agent runtime support — Codex spec v0 :spec:design:
-:PROPERTIES:
-:LAST_REVIEWED: 2026-06-15
-:END:
-Codex drafted a v0 design doc for making rulesets runtime-neutral rather than Claude-Code-specific. Motivating cases: offline operation with a local LLM, and two LLMs running in the same project at the same time without trampling each other's session-context.
-
-Spec at [[file:docs/design/2026-05-28-generic-agent-runtime-spec.org]] (moved here from inbox on intake).
-
-Immediate correctness issue Codex flagged: the singleton .ai/session-context.org is unsafe under simultaneous agents. Codex recommends starting with Phase 1 only — add AI_AGENT_ID + session-context.d/<id>.org without renaming the rest.
-
-Broader refactor proposes runtimes/ adapter manifests, generic install commands, language-bundle split (common/ + runtimes/<runtime>/), launcher refactor, local model service via llama.cpp/ollama. Big surface area, six phases.
-
-2026-06-12 spec review complete: [[file:docs/design/2026-05-28-generic-agent-runtime-spec-review.org][Codex review]] rubric for the whole spec is =Not ready=. Phase 1 is already shipped, and Phase 1.5 is tracked separately as the helper-instance task. Before any phases 2-5 implementation, decide whether to commit to the larger arc and answer the blocker decisions: generic instruction-file strategy, default local runtime/server, first supported local editing CLI, adapter scope, and compatibility behavior for existing =CLAUDE.md= / =.claude/= projects.
-
-*** 2026-06-10 Wed @ 14:13:55 -0500 Noted Phase 1 already shipped; narrowed scope to the phases 2-6 decision
-Phase 1 (the correctness fix) is live: protocols.org documents the AI_AGENT_ID-scoped session-context path (=.ai/session-context.d/<id>.org=) and =.ai/scripts/session-context-path= resolves it. The singleton race Codex flagged is closed. What remains is the spec review plus a go/no-go on the broader runtime-neutral refactor: runtimes/ adapter manifests, generic install commands, language-bundle split, launcher refactor, local model service.
-
-*** 2026-06-11 Thu @ 19:26:26 -0500 Spec amended with the helper-instance slice; implementation split out
-Craig's motivating case (a second Claude in the same project for lookups and safe task updates) was under-specified in v0 — it had identity and message targeting but no spawn mechanics and no write-safety contract for the shared files the session-context split doesn't isolate. Added the "Concurrent same-project agents (helper instances)" section (subagent boundary, identity/spawn via =ai --helper=, the tiered read/write contract, light startup, helper wrap-up) and Phase 1.5 to the migration plan. Implementation filed as its own [#B] task ("Helper-instance support"); this task stays scoped to the phases 2-6 go/no-go.
-
-*** 2026-06-12 Fri @ 02:09:10 -0500 Independent spec review complete
-Codex ran the spec-review workflow. Outcome: the combined spec is =Not ready= because phases 2-5 still require product decisions and current external-runtime/model verification. Phase 1.5 can proceed only as the already-split helper task, with rollout/manual-validation caveats accepted and no accidental template-wide release before sandbox/pilot drills pass. Review file: [[file:docs/design/2026-05-28-generic-agent-runtime-spec-review.org]].
-
-*** 2026-06-12 Fri @ 02:39:38 -0500 Second review after response pass
-Codex re-ran spec-review after the dispositions were folded in. Outcome by arc: Phase 1.5 helper instances =Ready with caveats=; phases 2-5 remain =Not ready= behind the explicit decisions/reverification gate. No new blocking findings for the helper slice. Review file updated in place: [[file:docs/design/2026-05-28-generic-agent-runtime-spec-review.org]].
-
-** TODO [#C] Spec storage location + lifecycle-status convention :spec:
-:PROPERTIES:
-:CREATED: [2026-06-15 Mon]
-:END:
-Two coupled documentation conventions for rulesets to adopt, surfaced by .emacs.d while triaging ~28 design docs. Both land in =spec-create= ([[file:.ai/workflows/spec-create.org]]) and likely a new =docs-lifecycle= rule under =claude-rules/=. Source proposal: [[file:docs/design/2026-06-15-spec-storage-lifecycle-proposal.org]] (.emacs.d handoff 2026-06-15).
-
-The two conventions:
-- *Location split* — formal specs live in =docs/specs/=; =docs/design/= keeps working notes, brainstorms, inventories, reviews. A spec is a doc proposing a buildable change with a Decisions section and phases; everything else is a note.
-- *Glanceable lifecycle status* — a spec's state (draft / doing / implemented / superseded / cancelled) is visible without opening the file, plus an authoritative in-file record.
-
-Recommendation captured now so the thinking isn't lost; it migrates into the spec when this is worked. We handle the task in priority order.
-
-*** Recommendation (draft — decide when worked, migrate into the spec)
-1. *Location split — adopt.* Low controversy, clear payoff. =docs/specs/= for formal specs, =docs/design/= for notes. Document in spec-create and the docs-lifecycle rule.
-2. *Status mechanism — the real fork.* Two options: filename suffix (=-spec-doing.org=, Craig's idea, ls-visible but every transition is a rename that breaks =[[file:...]]= links) vs the org-TODO keyword on the spec's top heading (specs already carry =#+TODO: TODO | DONE SUPERSEDED CANCELLED=; link-stable, zero-rename, org-agenda-scannable, but not visible in =ls=). My lean is the org-keyword as authoritative + a Status field in the Metadata table, dropping the filename suffix — the suffix is redundant with the Status field and adds rename churn across a heavily cross-linked, template-synced doc set. This diverges from Craig's stated filename-suffix preference, so it's teed up as a decision, not settled. Decide deliberately before building.
-3. *Link safety — adopt =org-id= ([[id:...]]) for cross-doc spec links* regardless of which status mechanism wins. It decouples link stability from the status decision. Mandatory if the filename suffix wins; good hygiene either way. The alternative — a move/rename/relink/stamp helper run on each transition — is only needed if the suffix wins and org-id is rejected.
-4. *Generalize after the mechanism settles.* The shape (lifecycle state in name-or-location, authoritative in-artifact status, rename-safe links, formal-vs-notes split) is reusable beyond specs. Capture it as a general =docs-lifecycle= convention in =claude-rules/= with spec-create as the first instance — but don't generalize an unsettled convention.
-
-Follow-up once decided: update spec-create to emit into =docs/specs/= with the chosen status mechanism; retrofit existing specs; optionally add the relink helper as a =.ai/scripts/= addition (downstream projects get it via template sync); send a note back if .emacs.d should pilot before generalizing.
-
-** TODO [#C] "fix speedrun" cross-project autonomous-batch mode :feature:spec:
-:PROPERTIES:
-:CREATED: [2026-06-15 Mon]
-:END:
-A named mode for coding projects: Craig names an ordered task set and says "fix speedrun"; the set is worked autonomously, each task held to the full quality bar (TDD red→green, =/review-code=, =/voice= on the commit) and committed + pushed as its own logical commit, with a VERIFY filed instead of guessing on anything underspecified, and an end-of-set page listing completed + remaining tasks. Surfaced by .emacs.d from a 2026-06-15 theme-studio session where the shape worked. Source proposal: [[file:docs/design/2026-06-15-fix-speedrun-workflow-proposal.org]] (.emacs.d handoff 2026-06-15). Build via =spec-create= when worked; we handle the task in priority order.
-
-Skeptical-review read (open design questions to resolve in the spec, not settled here):
-- *Is it a new workflow or a documented preset?* The proposal frames it as no-approvals + always-push session modes plus an end page. Decide whether it needs its own workflow file or is mostly documentation of a preset over the two existing modes.
-- *Where/how the page fires* — every task vs end-of-set, and via what. The paging surface is in flux (=page-signal= removed 2026-06-12), so reconcile against =notify --persist= or whatever paging stands now.
-- *Auto-pull vs explicit list* — whether the set comes from an explicit ordered list or a tag/priority query.
-- *Guardrails* — must refuse to speedrun tasks needing design decisions or carrying data-loss risk without a checkpoint (the sender's biased-safe unused-tile flag is the worked example).
-
* Rulesets Resolved
** DONE [#C] Fix =cj-scan= false positives on cj fences nested inside other =#+begin_*= blocks :bug:
CLOSED: [2026-05-15 Fri]