diff options
| author | Craig Jennings <c@cjennings.net> | 2026-07-02 00:19:56 -0400 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-07-02 00:19:56 -0400 |
| commit | f4b64d6141156cf0ee2a2c2a13cda256f0bf0c84 (patch) | |
| tree | 76534c13c9b8f07d8f5315cf437c1aed0e0e1b67 /docs/design | |
| parent | 80ca5d00c4ddd481308ed8ce0c2f270bd34604c0 (diff) | |
| download | rulesets-f4b64d6141156cf0ee2a2c2a13cda256f0bf0c84.tar.gz rulesets-f4b64d6141156cf0ee2a2c2a13cda256f0bf0c84.zip | |
docs(specs): pilot the spec-sort retrofit on rulesets' own pile
Five specs moved to docs/specs/ with confirmed lifecycle keywords: agent-knowledge-base IMPLEMENTED (with the stated reason in its history line), inbox-workflow-consolidation READY, autonomous-batch-execution READY, encourage-kb-contribution READY, and wrapup-routing DOING. spec-sort recomputed twelve todo.org links and the moved specs' own outbound links, and stamped :LAST_SPEC_SORT:. The status board now answers "what's live" in one grep. The four -spec.org-named files in docs/design without a spec spine stay put as notes.
Diffstat (limited to 'docs/design')
| -rw-r--r-- | docs/design/2026-06-16-autonomous-batch-execution-spec.org | 384 | ||||
| -rw-r--r-- | docs/design/2026-06-16-encourage-kb-contribution-spec.org | 199 | ||||
| -rw-r--r-- | docs/design/wrapup-routing-spec.org | 218 |
3 files changed, 0 insertions, 801 deletions
diff --git a/docs/design/2026-06-16-autonomous-batch-execution-spec.org b/docs/design/2026-06-16-autonomous-batch-execution-spec.org deleted file mode 100644 index 84cefe3..0000000 --- a/docs/design/2026-06-16-autonomous-batch-execution-spec.org +++ /dev/null @@ -1,384 +0,0 @@ -#+TITLE: Autonomous-Batch Task Execution — Spec -#+AUTHOR: Craig Jennings & Claude -#+DATE: 2026-06-16 -#+TODO: TODO | DONE SUPERSEDED CANCELLED - -* Metadata -| Status | ready | -|----------+--------------------------------------------------------------------| -| Owner | Craig Jennings | -|----------+--------------------------------------------------------------------| -| Reviewer | Craig Jennings | -|----------+--------------------------------------------------------------------| -| Date | 2026-06-16 | -|----------+--------------------------------------------------------------------| -| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] | -|----------+--------------------------------------------------------------------| - -* Summary - -Two proposals arrived within a day of each other describing the same capability: have Claude work a batch of small, well-marked tasks autonomously, with a full quality bar per task and no per-step approval gate. The inbox-zero "Phase E" proposal drives it from a tag/priority query on a recurring loop; the "speedrun" proposal drives it from an explicit ordered list a human dictates in-session. This spec reconciles both into one feature: a single dedicated workflow, =work-the-backlog.org=, that holds the task-execution logic, with two thin callers feeding it. It also designs the instrumentation that measures whether the autonomy is actually paying off. - -* Problem / Context - -Craig has a standing backlog of small, solo-doable fixes across several projects, already marked with a tag convention (=:next:=, =:quick:+:solo:=). Doing them by hand one at a time is the bottleneck — the context-switch and the per-commit approval ceremony dominate the actual work. He wants Claude to burn these down unattended: on a recurring loop for the routed inbox case, and on demand when he batches a named list and says "speedrun, no approvals until done." The speedrun is the away-from-desk / working-on-something-else mode, so it must be able to take on larger tasks too — not only sub-30-minute ones — or it forces him to stay at the desk for anything non-trivial. - -Two separate proposals tried to answer this: - -- *Phase E* (in =inbox-zero.org=, edited in =.emacs.d= as a stopgap) bolted autonomous execution onto the inbox-zero workflow's on-demand and loop callers. The sender flagged the seam as the open question: coupling capture-routing with autonomous-implementation pollutes inbox-zero's three existing callers (startup, wrap-up, on-demand), two of which must never execute anything. -- *speedrun* (a =.emacs.d= theme-studio session that worked well) is the same execution loop driven by an explicit ordered task set, with end-of-set paging and always-push. - -They overlap almost entirely. The execution loop — eligibility gate, act-vs-file decision, per-task quality bar, bounded run — is identical. Only the *input* differs (tag query vs explicit list) and the *session mode* differs (loop default vs no-approvals + always-push + page). Building them as two features would duplicate the execution logic and let the two copies drift. The forces: keep inbox-zero's callers clean, share one execution loop, and make the autonomy safe enough to run unattended on a 30-minute timer without Craig watching. - -A second, explicit ask from Craig: instrument this so its effectiveness is measurable. "Gather data on this and create some org-roam articles we can look at later." Autonomous execution that silently makes bad commits is worse than no autonomy; the only way to know which it is, is to measure tasks completed vs deferred vs reverted, and human corrections in the following session, over time. - -* Goals and Non-Goals - -** Goals -- One workflow, =work-the-backlog.org=, owns the task-execution loop. Both input shapes (tag query, explicit list) and both session modes feed it. -- inbox-zero's three existing callers stay clean: the loop caller chains into =work-the-backlog= *after* routing; startup and wrap-up never touch it. -- The *no-approvals speedrun* is a thin named preset, not a second implementation: autonomous-commit + always-push + end-of-set page, fed an explicit ordered list, with all approvals front-loaded into a single pre-flight step (below) so the run itself is uninterrupted. -- Eligibility is decided by *crisp, checkable criteria*, not adjectives: a mechanical tag/status gate (=:solo:= + status =TODO=), then a per-task defer checklist whose keystone is "can I write the failing test from the task text without inventing a requirement?" Task *size* is explicitly not a gate — a large task is decomposed into per-logical-commit chunks, not deferred. -- The autonomy tags (=:solo:=, =:quick:=) carry hard definitions in =todo-format.md= and are applied + enforced as a mandatory step in the task-review and task-audit workflows, so the run-time gate trusts the author's tag instead of re-deriving it. -- Commit autonomy defaults to file-only (surface a diff, no auto-commit). A project opts into autonomous commit+push explicitly via its per-project waiver. -- Hard guardrails: refuse any task carrying data-loss / irreversible / external-state risk without a checkpoint; gather any one-or-two quick decisions a task needs *up front* (speedrun) rather than guessing; file a =VERIFY= for anything underspecified or needing design deliberation; a per-run cap / kill switch beyond "one task per run." -- A lightweight per-run metrics log plus a periodic synthesis step that writes org-roam KB articles summarizing the trend. - -** Non-Goals -- *Not* a replacement for =/start-work=. Tasks needing deliberation or design stay with =/start-work= and its approval gates. This feature only touches the marked, solo set — regardless of size. -- *Not* a new tag convention. It reads the project's own priority/tag scheme header; it never invents or hardcodes tags across projects. -- *Not* an inbox-routing change. =inbox-zero.org= keeps its A-D phases. The Phase E text added in =.emacs.d= as a stopgap is *removed* and its logic moves here. -- *Not* a multi-project orchestrator. One run works one project's backlog. Cross-project handoff stays with =inbox-send= and the paging reply. -- *Not* a credential-handling or external-API feature. Tasks that touch secrets or external mutations are out of the eligible set by the guardrail. - -** Scope tiers -- *v1:* =work-the-backlog.org=; crisp =:solo:= / =:quick:= definitions in =todo-format.md= plus their mandatory application in task-review and task-audit; the eligibility gate (=:solo:= + status =TODO=, read against the project's scheme header); the act-vs-file *defer checklist* (test-writability keystone, enumerated data-loss list, already-satisfied, design-deliberation); the no-approvals speedrun's pre-flight decision-gathering step; file-only commit default with per-project opt-in; the loop caller wiring and inbox-zero Phase E removal; the speedrun preset with end-of-set =notify --persist= page; the per-run metrics log (structured JSONL). -- *Out of scope:* a token-budget kill switch (cap is a task count in v1); cross-project batch runs; a dashboard or live UI over the metrics. -- *vNext (log to todo.org):* the periodic org-roam synthesis step if it doesn't make v1; a token/cost budget alongside the task-count cap (more pressing now that task size is uncapped — a single large task can run long in the unattended loop); auto-detection of "human corrected my autonomous commit" from the next session's diff. - -* Design - -** Overview - -The architecture is one execution workflow with two callers and one preset, plus an instrumentation sidecar. - -#+begin_example - inbox-zero loop caller ──(after Phase D routing)──┐ - ├──▶ work-the-backlog.org ──▶ metrics log (JSONL) - no-approvals speedrun ──(explicit ordered list)──┘ │ - = pre-flight Q&A + autonomous-commit + push + page ▼ - periodic synthesis ──▶ org-roam KB articles -#+end_example - -=work-the-backlog.org= is the only place the execution loop lives. It takes a *task set* (however assembled) and a *session mode* (which gates commit autonomy and paging), and works the set under a fixed safety contract. The two callers differ only in how they build the task set and which session mode they pass. - -This is the seam the Phase E sender asked for: separating capture-routing (inbox-zero) from autonomous-implementation (work-the-backlog) keeps inbox-zero's startup and wrap-up callers — which must never execute anything — untouched. The loop caller is the only one of inbox-zero's callers that chains forward into execution, and it does so as an explicit second step after routing completes, not as a phase buried inside inbox-zero. - -** The execution loop (two-altitude: caller's view) - -A caller hands =work-the-backlog= three things: - -1. *A task set* — either an explicit ordered list of task headings (speedrun), or the result of a tag/priority query against =todo.org= (the loop). The workflow does not care which; it receives an ordered list of candidate tasks. -2. *A session mode* — =file-only= (default) or =autonomous-commit= (requires the project's per-project waiver), and a paging flag. -3. *A run cap* — the maximum number of tasks to complete this run. - -It returns: per-task outcome (implemented+committed / implemented+diff-surfaced / deferred-VERIFY / dropped-by-craig / skipped-ineligible), and a metrics record per task. - -** The execution loop (implementer's view) - -For the task set, in order, until the run cap is hit: - -1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task. -2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist. -3. *Defer checklist* (below). Any hit → record the deferral reason (or, under the speedrun preset, route the quick-question gap to the pre-flight Q&A), next task. -4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important, then close the task per =todo-format.md=. Decompose into as many logical commits as the change needs — size is not capped. -5. *Commit autonomy branch:* - - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=. - - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=. -6. *Record metrics* for the task (the JSONL append, below). -7. Decrement the cap. At zero, stop. - -After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary. - -** Eligibility gate (mechanical — no judgment) - -A task is autonomous-safe when *both* hold. This layer is a lookup, not a judgment; all the judgment lives in the defer checklist below. - -1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= is the "awaiting Craig's manual confirmation" marker; auto-implementing one defeats the manual check it represents. The do-not-implement set is safe-by-omission: anything not plainly =TODO= (plus any project-declared "hold" marker) is out. -2. *Tagged =:solo:=* — the autonomy tag, resolved against the project's priority/tag scheme header (not hardcoded). =:solo:= carries a hard definition (see Tag definitions, below): the task is completable without Craig's involvement beyond at most one or two quick decisions answerable up front, with no design deliberation. A project whose scheme declares a different autonomous-safe tag set overrides the default. Priority / =:next:= drive *ordering* within the eligible set, not eligibility. - -Task *size* is deliberately absent from this gate. The old "≤ ~30 minutes / one logical commit" criterion is removed: a large but well-specified, decision-free task is in scope and is decomposed into per-logical-commit chunks during implementation. Size never sends a task to =/start-work=; only *deliberation* or *risk* does (the checklist below). This is what makes the speedrun usable as an away-from-desk mode rather than a sub-30-minute-only mode. - -*** Tag definitions (land in =todo-format.md=, enforced in task-review + task-audit) - -- *=:solo:= — autonomy.* The task can be completed without Craig's involvement, except for at most one or two quick decisions that can be stated and answered before the run starts. No open design question, no "weigh these approaches," no waiting on Craig mid-task. This is the eligibility tag. -- *=:quick:= — effort hint only.* A small, fast task. Informational for batching and estimating a run's duration; *not* an eligibility gate (size no longer gates). - -Both tags are applied at task creation and *re-checked as a mandatory step* in the task-review and task-audit workflows, so the run-time gate can trust the author's tag rather than re-derive autonomy and effort from the task body. A task-review or task-audit that skips the =:solo:= / =:quick:= assessment is incomplete. - -** Act-vs-file decision (the defer checklist) - -After the scope read, run each eligible candidate through the checklist below. Each item is a concrete, answerable question, not an adjective. *Any* hit — or any "unsure" — sends the task to defer (or, for a quick-decision gap under the speedrun preset, to the pre-flight Q&A). Only a task that clears every item is implemented. - -1. *Test-writability (the keystone).* Can I write the failing test from the task text — plus any decisions gathered up front — without inventing a requirement? *No / unsure* → underspecified. Under the speedrun preset, if the gap is one or two quick answerable questions, route it to the pre-flight Q&A; otherwise file a =VERIFY= noting what's missing. Under the unattended loop, file the =VERIFY= (no one to ask). This replaces the old "clear / bounded / underspecified" adjectives with an action that fails loudly: if the red test isn't writable, the task isn't ready. -2. *Data-loss / irreversible / external operation.* Does implementing it require any of: =rm= of non-scratch data, =git reset --hard= / force-push, =DROP= / =DELETE= / =TRUNCATE=, file truncate/overwrite of persisted content, a schema or data migration, any external or shared-state mutation, any credential touch? *Yes* → do NOT implement; file a =VERIFY= naming the risk. This is the hard safety gate; an upfront answer never overrides it without an explicit checkpoint. Replaces the vague "data-loss risk" with an enumerated, greppable set. -3. *Already-satisfied.* Does the scope read show the desired end-state already holds? *Yes* → file a =VERIFY= noting it (the "raise max spans to 5 — every cap was already 8" case) and move on. Don't make a no-op change. -4. *Design deliberation.* Does the task carry an unresolved design question, a "weigh these approaches" with real tradeoffs, or a TBD that isn't a quick factual answer? *Yes* → under the speedrun preset, if it collapses to one or two quick questions, route to pre-flight Q&A; otherwise file and surface as a =/start-work= candidate. Under the loop, file. The discriminator is now *quick-answerable question* vs *deliberation* — not task size. - -A task that clears 1–4 is implemented under the project's commit discipline, decomposed into as many logical commits as the change needs. When genuinely unsure which side a task falls on, defer — a wrong auto-implement costs a revert *and* the next-session correction the metrics are designed to catch. - -** Pre-flight decision gathering (the no-approvals speedrun's only interaction) - -The speedrun preset front-loads every approval into one step before the run, so the run itself is uninterrupted — that is what "no approvals" means. It is *not* "no input ever"; it is "all input first, then hands-off." - -When Craig kicks off a speedrun over an explicit list: - -1. *Gather* the named task set. -2. *Scope-read and classify* each task against the eligibility gate + defer checklist: ready (clears the checklist), needs-quick-decisions (one or two upfront-answerable questions — checklist item 1 or 4), or drop (data-loss / irreversible, or design deliberation that isn't a quick question). -3. *Order* the list (priority, then the author's ordering / =:next:=). -4. *Intro the work* — present the ordered plan: what will run, what was dropped and why, and the batched questions for the needs-quick-decisions tasks. -5. *Craig answers each question, or says "skip this"* → a skipped task is removed from the run (recorded =dropped-by-craig=); an answered task has the answer recorded so implementation works from the decision, not a guess. -6. *Run the finalized list autonomously* — no further approvals until done. -7. *End-of-set page* with completed + remaining + skipped. - -The unattended *loop* caller has no human at kickoff, so it cannot gather decisions: there, a needs-quick-decisions task simply defers (files its note) like any other checklist hit. The pre-flight Q&A is a speedrun-preset capability, not a loop one. - -** Session modes and the no-approvals speedrun preset - -Two orthogonal session-mode dimensions feed the loop: - -- *Commit autonomy:* =file-only= (default) or =autonomous-commit=. =autonomous-commit= is honored only when the project carries the per-project waiver (=.emacs.d= and =rulesets= have it; most projects do not). Absent the waiver, a request for =autonomous-commit= degrades to =file-only= and says so. -- *Paging:* on or off. End-of-set only. - -The *no-approvals speedrun* is the named preset = =autonomous-commit= + always-push + paging-on, fed an *explicit ordered list*, run after the pre-flight decision-gathering step above. It is not a separate code path; it is a label for that combination of mode flags plus the explicit-list input, with the pre-flight Q&A as its only interactive moment. The loop caller, by contrast, runs =file-only= (unless the project has the waiver and opts the loop into commits) with paging off, fed the *tag query*, with no pre-flight step. - -** Bounding the run and the kill switch - -Default cap: one task per run for the loop caller — implement the highest-priority eligible candidate (=[#A]= before =[#B]= before =[#C]=), record, then stop and let the next tick continue. The speedrun preset works the whole explicit list in order (the human bounded it by naming it), still one commit per logical change. - -The kill switch is a hard per-run task cap passed by the caller, independent of "one per run": even the speedrun stops at the cap and pages with the remainder listed. A loop that fires every 30 minutes and commits unattended needs a ceiling that a runaway can't exceed. With task size now uncapped, the count cap no longer bounds *cost* — a single large task can run long — so a token/cost budget is the most pressing vNext addition. - -** End-of-set paging - -When the set is done (or the cap is hit), if paging is on, fire one page — end-of-set only, never per-task: - -#+begin_src sh -notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>" --persist -#+end_src - -=--persist= keeps it on screen until dismissed (the page-me convention). The message carries the project name, the completed count, and the remaining count, so Craig can reply confirming ready + naming the next project in one turn. The page-signal wrapper removed 2026-06-12 is reconciled to =notify= here — there is no separate page-signal call. - -* Alternatives Considered - -** Fold execution into inbox-zero (the Phase E stopgap shape) -- Good, because it's the smallest diff — the loop caller already runs inbox-zero, so execution is "one more phase." -- Bad, because it couples capture-routing with implementation. inbox-zero has three callers; startup and wrap-up must never execute. A Phase E inside inbox-zero forces both to carry a "skip Phase E" caveat and risks a future caller running it by accident. -- Neutral, because the eligibility-gate and defer-checklist text is identical either way — only its *home* differs. - -** Two separate features (keep Phase E and speedrun distinct) -- Good, because each proposal ships as written with no reconciliation work. -- Bad, because the execution loop is duplicated in two places and will drift; a guardrail tightened in one won't reach the other. Two ways to do autonomous execution is two things to audit. -- Neutral, because the input and session-mode differences are real — but they're thin caller-level differences, not a reason to fork the engine. - -** Keep the task-size gate (defer anything over ~30 minutes) -- Good, because it bounds per-task cost and blast radius with a single number. -- Bad, because it defeats the away-from-desk use case — anything non-trivial bounces back to Craig, so he can't actually leave. Size correlates poorly with risk; a large mechanical refactor is safer than a tiny change to persisted state. -- Neutral, because the things size was a proxy for (risk, cost) are covered directly — risk by the data-loss checklist, cost by the run cap (and the vNext token budget). The defer checklist's deliberation item, not size, is what routes genuine =/start-work= tasks out. - -** Autonomous-commit as the default -- Good, because it's faster end-to-end with no diff to review. -- Bad, because most projects lack the per-project waiver, and an unattended loop committing to a project that never opted in is exactly the failure the file-only default prevents. The blast radius of a bad autonomous commit is a revert plus lost trust in the loop. -- Neutral, because the projects that *do* want it (=.emacs.d=, =rulesets=) opt in explicitly, so the capability is available where it's wanted without being the default everywhere. - -* Decisions [8/8] - -** DONE Eligibility tag set and where it's read -- Owner / by-when: Craig / spec-review -- Context: Projects' priority/tag schemes vary, and the =todo-format.md= scheme header is the declared per-project source of truth. Task size is no longer a gate, so eligibility rests on the autonomy tag, not an effort cap. -- Decision: Eligibility = status =TODO= AND the =:solo:= autonomy tag, resolved against the project's scheme header (a project may declare a different autonomous-safe set). Priority / =:next:= drive ordering, not eligibility. =:quick:= is an effort hint, never a gate. -- Consequences: easier — one workflow works across projects with different vocab, and the gate is a pure lookup; harder — a project with no/malformed scheme header needs a fallback, and the default (=:solo:=) must be defined precisely enough that two projects agree. - -** DONE Crisp =:solo:= / =:quick:= definitions, enforced in task-review + task-audit -- Owner / by-when: Craig / spec-review -- Context: The run-time gate is only as crisp as the tags. Today =:quick:= / =:solo:= are listed in the scheme header with no hard definition, and nothing enforces that tasks get assessed for them. -- Decision: Define =:solo:= (completable without Craig beyond at most one-or-two upfront-answerable quick decisions; no design deliberation) and =:quick:= (small/fast effort hint only) in =todo-format.md=, and make assessing both a *mandatory step* in the task-review and task-audit workflows. A review/audit that skips the assessment is incomplete. -- Consequences: easier — authoring-time judgment by the human who knows the answer, and the run-time gate trusts the tag; harder — task-review and task-audit grow a required step, and existing untagged tasks need a back-fill pass. - -** DONE The do-not-auto-implement marker set -- Owner / by-when: Craig / spec-review -- Context: =VERIFY= means "awaiting Craig's manual confirmation"; other projects may use markers differently. -- Decision: Do-not-implement = any status that is not =TODO=, plus any project-declared "hold" marker. Safe-by-omission: exclude anything not plainly =TODO=. -- Consequences: easier — portable, and manual-check tasks can't auto-run; harder — richer per-project overrides need marker semantics in the scheme header, which most lack, so the default must stay conservative. - -** DONE Pre-flight decision gathering for the speedrun preset -- Owner / by-when: Craig / spec-review -- Context: Forcing every decision-needing task to defer wastes the away-from-desk use case — many tasks need only one or two quick answers Craig could give at kickoff. The speedrun is interactive at its start but must be hands-off after. -- Decision: The speedrun preset gathers + orders the set, intros the work, and batches all needed quick decisions into one pre-flight Q&A; Craig answers or says "skip this" (drops the task); the run then proceeds with zero further approvals. The unattended loop has no kickoff human, so it defers decision-needing tasks instead. -- Consequences: easier — "no approvals" becomes "all approvals first," which fits working-while-away, and larger / lightly-underspecified tasks become runnable; harder — the classifier must reliably split quick-question vs real-deliberation, and the recorded answers must reach the implementer so it works from the decision, not a guess. - -** DONE Commit-autonomy opt-in mechanism -- Owner / by-when: Craig / spec-review -- Context: =file-only= is the default; =.emacs.d= and =rulesets= have a per-project waiver allowing autonomous commits. Where does the workflow *read* that a project has opted in? -- Decision: Read the opt-in from the project's existing per-project waiver location (=notes.org= Workflow State or =CLAUDE.md=), not a new config file. Two flags: "has commit waiver" and "loop may commit" can differ. -- Consequences: easier — no new config surface, reuses the existing waiver concept; harder — the waiver location/format must be pinned for deterministic detection, and "waiver yes, loop-commit no" needs the two-flag split. - -** DONE Run-cap default and the kill switch shape -- Owner / by-when: Craig / spec-review -- Context: The loop default is one task per run; the speedrun works an explicit list. Both need a hard ceiling. Task size is now uncapped, so a single task can be large. -- Decision: The caller passes a hard per-run task cap (loop default 1; speedrun = length of the explicit list, capped at a ceiling); stop + page with the remainder when the cap is hit. v1 caps by task count, not token budget. -- Consequences: easier — a simple caller-controlled integer with a bounded task count; harder — a count cap doesn't bound *cost*, and with size uncapped a single large task can run long, so a token budget is vNext and more pressing than before. - -** DONE Metrics log location and format -- Owner / by-when: Craig / spec-review -- Context: Per-run metrics must land somewhere structured and queryable, per-project, and survive across sessions for the synthesis step to read. -- Decision: Append one JSONL record per task to a per-project log at =.ai/metrics/work-the-backlog.jsonl=, git-tracked, with the synthesis step reading the union across projects. -- Consequences: easier — append-only JSONL is trivial to write and =jq=-queryable, and per-project keeps it local to the work; harder — a git-tracked log adds commit churn, and "union across projects" needs the synthesis step to know where every log lives. - -** DONE Synthesis cadence and trigger -- Owner / by-when: Craig / spec-review -- Context: Craig wants periodic org-roam articles summarizing the data. What triggers synthesis, and how often? -- Decision: Run synthesis on an explicit trigger ("synthesize backlog metrics") and optionally a weekly scheduled run, writing one KB node per synthesis under =~/org/roam/agents/= per the knowledge-base rule. -- Consequences: easier — an explicit trigger means no surprise writes, and the KB rule already governs node shape; harder — a weekly run needs a scheduler entry, and the personal-only write-classification must gate it so work-project metrics never land in the KB. - -* Implementation phases - -** Phase 0 — Tag definitions + task-review/audit enforcement -Add the hard =:solo:= / =:quick:= definitions to =todo-format.md=, and add the mandatory tag-assessment step to the task-review and task-audit workflows. Independent of the workflow build; lands first so the eligibility gate has crisp tags to read and existing tasks start getting assessed. Tree stays working: these are rule + workflow prose additions. - -** Phase 1 — Extract the execution loop into work-the-backlog.org -Write =work-the-backlog.org= holding the eligibility gate, defer checklist, per-task quality bar, and run-cap logic — taking a task set + session mode + cap as input. Remove the stopgap "Phase E" text from =inbox-zero.org= (restore it to its A-D shape) in the same change so there's one home, not two. Tree stays working: inbox-zero reverts to routing-only, and the new workflow is callable but not yet wired to the loop. - -** Phase 2 — Wire the two callers -Add the loop caller's chain step (after inbox-zero Phase D, invoke work-the-backlog with the tag query + file-only + cap 1) and the no-approvals speedrun preset (pre-flight decision-gathering → explicit list + autonomous-commit + always-push + paging-on). Both go through the same workflow; only the speedrun runs the pre-flight Q&A. Tree stays working: each caller is independently testable. - -** Phase 3 — File-only vs autonomous-commit gate -Implement the commit-autonomy branch: read the per-project waiver, degrade =autonomous-commit= to =file-only= when absent, surface the degrade. Tree stays working: default file-only behavior is the safe path even before the waiver-read lands. - -** Phase 4 — The defer checklist, pre-flight Q&A, and the page -Implement the act-vs-file defer checklist (test-writability keystone, enumerated data-loss list, already-satisfied, design-deliberation), the speedrun pre-flight decision-gathering (gather → classify → order → intro → batch-ask → skip/answer), the =VERIFY=-on-ambiguity filing, and the end-of-set =notify alarm ... --persist= page. Tree stays working: the checklist only ever *reduces* what runs, and the pre-flight step only runs under the speedrun preset. - -** Phase 5 — Metrics log -Append the per-task JSONL record at each task outcome. Tree stays working: logging is a side effect that doesn't alter execution. - -** Phase 6 — Synthesis to org-roam -Write the synthesis step: read the JSONL union, compute the per-run and trend metrics (below), write a KB node under =~/org/roam/agents/= per the knowledge-base rule, personal-projects-only classification enforced. Tree stays working: synthesis is read-only over the logs plus a KB write. - -* Acceptance criteria -- [ ] =work-the-backlog.org= exists and is the only home for the execution loop; =inbox-zero.org= is back to its A-D routing-only shape with no Phase E. -- [ ] The loop caller chains into work-the-backlog after routing; startup and wrap-up never invoke it. -- [ ] The no-approvals speedrun runs as the preset (pre-flight Q&A → autonomous-commit + always-push + end-page) over an explicit ordered list, one commit per logical change. -- [ ] =:solo:= and =:quick:= carry hard definitions in =todo-format.md=, and task-review + task-audit both refuse to complete without assessing them. -- [ ] Eligibility = status =TODO= AND =:solo:=, read from the project's scheme header, not hardcoded; a =VERIFY= / =DOING= / =DONE= / =CANCELLED= task is skipped by the gate. -- [ ] Task size never sends a task to =/start-work=; a large but =:solo:=, well-specified task runs and is decomposed into per-logical-commit chunks. -- [ ] The defer checklist fires correctly: a task whose red test isn't writable (and isn't a quick-question gap), one carrying an enumerated data-loss operation, an already-satisfied one, and one needing design deliberation are each deferred (or routed to pre-flight Q&A under the speedrun), not implemented. -- [ ] Under the speedrun preset, a task needing one or two quick decisions is surfaced in the pre-flight Q&A; "skip this" drops it, an answer is recorded and used; the run then proceeds with no further approvals. -- [ ] Under the unattended loop, a decision-needing task defers (no pre-flight Q&A). -- [ ] In a project without the commit waiver, an =autonomous-commit= request degrades to file-only and says so; no commit is made. -- [ ] The run stops at the per-run cap and pages with the remaining tasks listed. -- [ ] Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl=. -- [ ] The synthesis step reads the logs and writes a KB node under =~/org/roam/agents/=; it refuses to write for work-classified projects. - -* Effectiveness measurement - -This section answers Craig's explicit ask: measure whether autonomous-batch execution is actually effective, and build the "gather data → org-roam articles" loop. - -** What "effective" means here - -The autonomy is effective if it completes real work that *stays* completed — i.e. tasks land green and the next session doesn't have to undo or fix them. The two failure modes to catch are (1) the loop defers everything (over-cautious, no value delivered) and (2) the loop implements badly (commits that get reverted or hand-corrected next session). Both are measurable. - -** Per-run metrics (the JSONL record) - -One record per task, appended to =.ai/metrics/work-the-backlog.jsonl= at each task outcome: - -| Field | Meaning | -|-------------------+--------------------------------------------------------------------| -| =ts= | ISO timestamp of the task outcome | -|-------------------+--------------------------------------------------------------------| -| =run_id= | UUID shared by all tasks in one run | -|-------------------+--------------------------------------------------------------------| -| =project= | project basename | -|-------------------+--------------------------------------------------------------------| -| =caller= | =loop= or =speedrun= | -|-------------------+--------------------------------------------------------------------| -| =task= | task heading (slug) | -|-------------------+--------------------------------------------------------------------| -| =outcome= | implemented-committed / implemented-diff / deferred-verify / | -| | skipped-ineligible / dropped-by-craig (skipped at pre-flight) | -|-------------------+--------------------------------------------------------------------| -| =defer_reason= | underspecified / data-loss / already-satisfied / needs-deliberation | -|-------------------+--------------------------------------------------------------------| -| =upfront_decision=| true if a pre-flight answer was recorded and used for this task | -|-------------------+--------------------------------------------------------------------| -| =wall_clock_s= | seconds from task start to outcome | -|-------------------+--------------------------------------------------------------------| -| =commit_sha= | for committed tasks; empty otherwise | -|-------------------+--------------------------------------------------------------------| -| =review_findings= | count of /review-code Critical+Important findings on this task | -|-------------------+--------------------------------------------------------------------| - -Per-run rollups computed at synthesis (not stored per record): tasks attempted, completed, VERIFY-deferred, dropped-by-craig, reverted; wall-clock total; commits landed; review findings per commit. - -** The corrections signal (the key metric) - -The hardest and most valuable metric is *human corrections in the following session* — did Craig revert or hand-fix an autonomous commit? v1 captures the cheap proxy: at synthesis, for each =commit_sha=, check whether a later commit touching the same files reverted it or carries a "fix"/"revert" of that change within N days. A clean run is one where the autonomous commits survive untouched. (Auto-detecting "this later commit corrected that autonomous one" precisely is a vNext refinement; the proxy — reverted-or-touched-soon-after — is good enough to flag a problem run for human review.) - -** Where the data lands - -Per-project git-tracked JSONL at =.ai/metrics/work-the-backlog.jsonl=. Append-only, =jq=-queryable, survives across sessions and machines via the normal project sync. Git-tracked so the history is auditable and the synthesis step can read it from any clone. - -** The synthesis loop (gather → article) - -On the "synthesize backlog metrics" trigger (and optionally a weekly scheduled run): - -1. Read the JSONL union across the personal projects the synthesizer can see. -2. Compute the rollups and the trend: completion rate over time, defer-reason distribution, review-findings-per-commit trend, and the corrections-signal flag count. -3. Write one org-roam KB node under =~/org/roam/agents/YYYYMMDDHHMMSS-backlog-metrics-<window>.org= per the knowledge-base rule — filetags =:agent:metrics:=, a concise title, the rollup table, the trend narrative, and =[[id:...]]= links to prior synthesis nodes so the series is traceable. -4. Enforce the KB write-classification: *personal projects only*. A work-classified project's metrics never write to the KB — they stay in that project's own =.ai/metrics/= log and the synthesizer reports the refusal per the KB refusal contract. - -The KB node is the artifact Craig reviews later — "are the autonomous runs completing more and getting corrected less over the last month?" reads off the trend table without re-querying raw logs. - -* Readiness dimensions - -- *Data model & ownership:* The task set is read from =todo.org= (project-owned, user-authored). The metrics JSONL is generated, append-only, git-tracked, project-owned. KB nodes are agent-generated under =~/org/roam/agents/= (never overwriting Craig's hand-authored nodes — link only). No editable region is co-owned. -- *Errors, empty states & failure:* Empty task set → report "nothing eligible" and stop. Malformed scheme header → fall back to the default tag reading and surface the fallback. A task that fails mid-implementation → leave the tree working (don't commit a broken state), record the failure outcome, surface it, continue to the next task. No silent data loss: the data-loss guardrail refuses irreversible tasks outright. -- *Security & privacy:* Tasks touching credentials or external mutations are excluded by the data-loss / external-state checklist item. The KB write is personal-projects-only; work metrics never leave the project. No secrets in the JSONL (task slugs and SHAs only). -- *Observability:* The end-of-set page surfaces the run outcome. The per-task surface (implemented / deferred + reason / dropped / skipped) is the live progress view. The metrics log + KB synthesis is the long-run observability. A bad run is isolable from the JSONL (which task, which outcome, which review findings). -- *Performance & scale:* Expected counts are small — a handful of tasks per run, one run per 30-min tick. No bottleneck at this scale. The cap bounds the worst case on task count; with size uncapped, a single large task is the cost outlier the vNext token budget addresses. Synthesis over months of JSONL is still a small file (one record per task). -- *Reuse & lost opportunities:* Reuses =todo-format.md= for task close + the tag definitions, =/review-code= and =/voice personal= for the quality bar, =notify= for paging, the knowledge-base rule for KB writes, the per-project waiver for commit-autonomy, and task-review / task-audit for tag enforcement. No new config file (the opt-in rides the existing waiver). The execution loop is the one new shared asset. -- *Architecture fit & weak points:* Integration points — inbox-zero loop caller (chain after Phase D), the per-project waiver location, =todo.org= scheme header, task-review / task-audit, =~/org/roam/agents/=. Weak point: the commit-autonomy gate depends on deterministically reading the waiver; mitigated by defaulting to file-only when the read is ambiguous (fail safe, not open). Second weak point: a 30-min loop committing unattended with uncapped task size; mitigated by the hard count cap and file-only default, with the token budget as the vNext backstop. -- *Config surface:* Per-project — commit-autonomy opt-in (via existing waiver), optional loop-commit flag, optional autonomous-safe tag override in the scheme header. Per-call — task set, session mode, run cap. Defaults: file-only, paging-off (loop) / paging-on (speedrun), cap 1 (loop). -- *Documentation plan:* The workflow file itself is the user/operator doc (matches inbox-zero.org's self-documenting style). The =.emacs.d= stopgap note and the speedrun proposal are superseded by this spec; no separate migration doc needed beyond removing the Phase E text. -- *Dev tooling:* N/A for new build targets — the workflows are prose, exercised by invocation. The metrics JSONL is =jq=-inspectable by hand; a tiny rollup helper may be added under =.ai/scripts/= if the synthesis prose proves to need it (decided at Phase 6, not a v1 prerequisite). -- *Rollout, compatibility & rollback:* Rollout is removing Phase E from inbox-zero and adding work-the-backlog — both prose changes, instantly reversible. Compatibility: inbox-zero's three callers are unchanged except the loop caller gaining a forward chain. Rollback: delete work-the-backlog and the loop chain step; inbox-zero is already back to A-D. The file-only default means the worst pre-rollback state is surfaced diffs, not committed changes. -- *External APIs & deps:* =notify alarm "Page" "<msg>" --persist= verified against =/home/cjennings/.local/bin/notify= and the page-me workflow. =~/org/roam/= KB write path and node shape verified against the knowledge-base rule. No external API calls. - -* Risks, Rabbit Holes, and Drawbacks - -- *The corrections signal is a proxy, not ground truth.* "A later commit touched the same files" over-counts (legitimate follow-up work) and under-counts (a correction in a different file). It's a flag for human review, not a verdict. Don't rabbit-hole on making it precise in v1 — the proxy plus a human glance is the design. -- *Waiver detection drift.* If the per-project waiver location moves or its format changes, the commit-autonomy gate could mis-read. Mitigation: fail safe to file-only. Pin the waiver format in the Phase 3 decision before building. -- *Unattended-commit blast radius.* The headline risk. Mitigated four ways: file-only default, the hard cap, the data-loss checklist item, and the metrics loop (which makes a bad run visible after the fact even if the first three let something through). With task size uncapped, the cost dimension of this risk grows — the vNext token budget is the planned fifth layer. -- *Scope creep into /start-work territory.* Size is intentionally no longer the brake. The brake is the defer checklist's design-deliberation item plus the "when unsure, defer" rule — keep item 4 strict so genuine deliberation-class tasks still route out even when they're tagged =:solo:= by mistake. -- *Pre-flight classifier error.* The speedrun's gather step has to split quick-answerable-question from real-deliberation. Misclassifying a deliberation task as a quick question puts a half-baked decision into an autonomous run. Mitigation: when the question isn't answerable in one or two lines, treat it as deliberation and drop it from the run, not as a pre-flight question. - -* Testing / Verification / Rollout - -Verification is by invocation against a project's real =todo.org=: run the loop caller in file-only mode and confirm it surfaces diffs without committing; run the speedrun against a small explicit list in a waiver-carrying project and confirm the pre-flight Q&A fires, "skip this" drops a task, an answer is recorded and used, then one commit per logical change + the end page; plant a =VERIFY=-status task, a data-loss task, an already-satisfied task, and a large-but-=:solo:= task and confirm the first three are skipped/refused while the large one runs and decomposes; confirm the JSONL grows one record per task; run synthesis and confirm a KB node lands (personal project) or is refused (work project). Rollout is the Phase 0-6 sequence, each leaving the tree working; the file-only default makes early phases safe to ship before the commit and paging phases land. - -* References / Appendix - -- [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal (inbox-zero stopgap)]] and [[file:../../working/inbox-zero-phase-e/sender-note.org][its sender note with the 5 open questions]]. -- [[file:2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] (file retains its original on-disk name pending a rename pass). -- [[file:../../.ai/workflows/inbox-zero.org][inbox-zero.org (canonical, A-D)]] — the routing workflow this feature decouples from. -- =~/code/rulesets/claude-rules/knowledge-base.md= — the org-roam write contract the synthesis step follows. - -* Review and iteration history -** 2026-06-16 Tue — author -- What: initial draft reconciling the Phase E and fix-speedrun proposals into one work-the-backlog.org feature, plus the effectiveness-measurement instrumentation. -- Why: two overlapping proposals arrived within a day; building them separately would duplicate the execution loop and let it drift. Craig also asked explicitly for measurement + org-roam synthesis. -- Artifacts: this spec; the two source proposals under docs/design/ and working/inbox-zero-phase-e/. -** 2026-06-28 Sun — revision (Craig) -- What: removed the task-size gate (size no longer defers; large tasks decompose into per-commit chunks); recast the act-vs-file rule as a crisp four-item defer checklist keyed on test-writability; added crisp =:solo:= / =:quick:= definitions destined for =todo-format.md= and made their assessment mandatory in task-review + task-audit; added the speedrun's pre-flight decision-gathering step (batch the quick questions up front, "skip this" drops a task, then run hands-off); renamed "fix speedrun" → "no-approvals speedrun" in prose. Status stays draft pending ratification of the revised decisions. -- Why: the original criteria were adjectives, not checkable; the size gate forced Craig to stay at his desk for anything non-trivial, defeating the away-from-desk use case; and decision-needing tasks were over-deferred when many need only a quick upfront answer. -** 2026-06-29 Mon — ratified -- What: Craig ratified all eight revised decisions; Status → ready. Implementation-ready across Phase 0 (tag definitions + task-review/audit enforcement) through Phase 6 (synthesis). -- Why: the crisp defer checklist and the pre-flight-Q&A design resolved the "criteria too soft" and "size shouldn't gate" concerns that held the spec in draft. diff --git a/docs/design/2026-06-16-encourage-kb-contribution-spec.org b/docs/design/2026-06-16-encourage-kb-contribution-spec.org deleted file mode 100644 index cf8111b..0000000 --- a/docs/design/2026-06-16-encourage-kb-contribution-spec.org +++ /dev/null @@ -1,199 +0,0 @@ -#+TITLE: Encourage Org-Roam KB Contribution Across Workflows — Spec -#+AUTHOR: Craig Jennings & Claude -#+DATE: 2026-06-16 -#+TODO: TODO | DONE SUPERSEDED CANCELLED - -* Metadata -| Status | approved (decisions ratified 2026-06-20) | -|----------+------------------------------------------------| -| Owner | Craig Jennings | -|----------+------------------------------------------------| -| Reviewer | Craig Jennings | -|----------+------------------------------------------------| -| Date | 2026-06-16 | -|----------+------------------------------------------------| -| Related | [[file:../../todo.org][rulesets todo.org]] | -|----------+------------------------------------------------| - -* Summary - -The org-roam KB already exists (=knowledge-base.md=: =~/org/roam/agents/=, =:agent:= filetag, capture-then-promote, personal-vs-work write boundary), but nothing in the daily workflow loop encourages agents to use it. The wrap-up's =KB: promoted N / consulted yes-no= receipt is the only touchpoint, and it fires at the very end when the session's learnings have already faded. This feature wires four light prompts into the synced template workflows — startup, triage-intake, inbox-zero, wrap-it-up — plus one curated best-practices node in the KB, so contributing durable knowledge becomes a habit the workflows nudge rather than a rule agents forget. - -* Problem / Context - -The KB rule is sound but passive. An agent reads =knowledge-base.md= once at rule-load and then never gets reminded to consult or contribute, so the KB stays nearly empty and never reaches the critical mass where consulting it pays off. The compounding asset Craig wants — a cross-project store that gets more valuable as it grows — needs a contribution habit, and habits in this system come from workflow prompts, not from a rule sitting in the background. - -Three gaps: - -1. *No quality guidance.* =knowledge-base.md= says what goes in (durable facts) and where (=agents/= nodes), but not /how/ to write a good node — atomic, descriptively titled, linked. An agent following the rule literally can still produce a junk drawer of vague, unlinked notes that no future agent can find or trust. -2. *No mid-session capture prompts.* Triage-intake and inbox-zero both surface durable signal (a recurring pattern across messages, a reference pointer worth keeping) and then drop it. Nothing tells the agent "that was worth a node." -3. *The only contribution prompt is too late.* Wrap-up's KB promotion check runs in Step 1, after the session, when the agent is reconstructing learnings from the log rather than capturing them while fresh. - -* Goals and Non-Goals - -** Goals -- Curate a best-practices node in the KB that teaches agents how to write good nodes, drawing on established note-taking guidance. -- Link that node from startup with a light, one-line encouragement to contribute through the session. -- Add a short end-of-flow KB reminder to triage-intake and inbox-zero. -- Add an early wrap-up prompt that asks what the agent learned worth remembering, feeding the existing =KB: promoted N= receipt. -- Keep every prompt light and non-blocking — encouragement, never a gate. - -** Non-Goals -- *Not* changing =knowledge-base.md='s write boundary, schema, or the work/personal classification. The feature builds on that rule unchanged. -- *Not* adding a blocking gate anywhere. No workflow stalls or fails because a node wasn't written. -- *Not* automating node creation. The agent decides what's durable; the prompts only ask the question. -- *Not* a second receipt or metric. Wrap-up's =KB: promoted N / consulted yes-no= line stays the single instrumentation point. -- *Not* touching the wrap-up's existing Step 1 KB-promotion sub-section's schema — the new early prompt /feeds/ it, it doesn't replace it. - -** Scope tiers -- v1: the four workflow edits + the one curated best-practices node. All synced templates, so the edits propagate to every project on next startup. -- Out of scope: a contribution-rate dashboard, per-project KB stats, auto-suggesting nodes from session content. -- vNext: a "consult the KB before this task" prompt in start-work / spec-create (deferred — log to todo.org). - -* Design - -The feature is four small prompt insertions plus one authored artifact. The design work is mostly about /placement/ and /wording/: these are synced templates, so a prompt that reads as nagging gets paid forward to every project on every run. The governing constraint is "light enough that an agent welcomes it, specific enough that it actually fires." - -** The best-practices node (the artifact) - -The node lives at =~/org/roam/agents/<timestamp>-agent-kb-best-practices.org=, authored by hand (not agent-generated), with the standard =:agent:reference:= filetags so it's a first-class KB node agents can find by the same =rg= the rule already documents. It is the one node startup links to, and the substance the workflow prompts point at instead of re-explaining note-taking inline. - -Its content is curated from the established note-taking literature — Sönke Ahrens' systematization of Luhmann's Zettelkasten, Andy Matuschak's evergreen-notes practice, and the org-roam community's own guidance — distilled to the handful of principles that matter for an /agent/ writing /durable facts/, not a human building a thinking environment. Proposed outline: - -1. *Why the KB exists* — one paragraph: a cross-project, cross-machine asset that compounds. Consulting it saves re-deriving; contributing to it pays the next agent forward. -2. *One idea per node (atomicity).* Each node holds a single durable fact. Atomicity is what makes a note linkable and findable — a node about three things links cleanly to none of them. (Ahrens; zettelkasten.de atomicity guide.) -3. *Descriptive, declarative titles.* The title states the claim, not the topic: "SSH auth routes through gpg-agent with a separate cache TTL" beats "SSH notes." A title you can read as a standalone statement is one a future agent can scan and trust without opening the node. (Matuschak evergreen notes; org-roam community practice.) -4. *Link liberally.* Use =[[id:...]]= to connect a new node to related ones; the value is in the network, not the isolated note. Link to Craig's hand-authored nodes, never edit them. (Matuschak "densely linked"; the linking principle.) -5. *Capture, then promote.* Harness memory is the fast capture layer; the KB is for facts that cleared the durability bar. Don't promote everything — promote what transfers. (Mirrors =knowledge-base.md='s capture-then-promote.) -6. *What goes in / what stays out.* Restate the rule's inclusion bar tersely (durable, cross-project, the why behind a decision, environment gotchas, reference pointers) and the exclusion bar (session state, task state, high-churn facts, secrets, anything the repo already records). -7. *The write boundary.* One line pointing at =knowledge-base.md=: personal projects only, work and unknown projects never write — with the refusal contract. The node /defers/ to the rule here rather than restating the denylist, so there's one source of truth for the boundary. -8. *Sources.* The citations below, as a reference footer. - -Two-altitude note: for a /reading/ agent the node is "how do I tell a good node from a bad one before I trust it?"; for a /writing/ agent it's "what shape should this fact take before I commit it?" The outline serves both — principles 2-4 are the writing checklist, 6-7 are the reading/eligibility filter. - -** The four workflow prompts (placement + wording) - -Each is the minimum that fires reliably without nagging. Exact insertion points and proposed copy are in Implementation phases below; the design rationale per prompt: - -- *Startup (link + light encouragement).* Startup already reads =notes.org= and surfaces nudges in Phase C. The KB encouragement rides there as one line, not a new phase — it points at the best-practices node and frames the session's contribution as welcome, not required. It fires once per session at the top, setting the frame; the other three prompts collect on it. -- *Triage-intake (end-of-flow reminder).* Placed at the very end of Phase D / Exit Criteria, after actions ship — the moment the agent has just seen a sweep's worth of signal and might recognize a durable pattern. One line, conditional in spirit ("if anything here was durable…"), never a blocking step before close-out. -- *Inbox-zero (end-of-flow reminder).* Same shape, placed in Phase D (Surface) after the moved/folded/dropped report — the agent has just triaged a batch and may have spotted a reference pointer worth keeping. -- *Wrap-up (early prompt feeding the existing receipt).* Placed at the /start/ of Step 1, before the Summary is finalized, while the session is fresh — "what did you learn worth remembering, for yourself or a future agent?" The answer flows into the existing Step 1 KB-promotion sub-section and its =KB: promoted N / consulted yes-no= receipt. The early prompt and the existing check are one pipeline: the prompt captures while fresh, the existing sub-section does the promotion and writes the receipt. No second receipt. - -** How the early wrap-up prompt feeds the existing receipt - -The existing wrap-up Step 1 already has a "KB promotion check" sub-section that asks the promotion question and writes =KB: promoted N / consulted yes-no=. The new early prompt is not a second check — it's a /relocation of the asking/ to the top of Step 1 so the question lands while the session is fresh rather than after the Summary is reconstructed. The existing sub-section keeps ownership of the actual promotion (writing the =agents/= nodes per schema) and the receipt line. Concretely: the early prompt asks and collects candidate facts into the session's working notes; the existing sub-section consumes those candidates, writes the nodes, and emits the one receipt. This avoids duplication by making the early prompt a /capture/ step and the existing check the /commit + receipt/ step of the same pipeline. - -* Alternatives Considered - -** A blocking gate ("you must write ≥1 node to wrap up") -- Good, because it would guarantee contributions and grow the KB fast. -- Bad, because it manufactures junk — agents would write a throwaway node to clear the gate, polluting exactly the asset the feature is meant to grow. It also fights the "light, non-nagging" constraint head-on. -- Neutral, because the receipt already gives visibility into contribution rate without forcing it. - -** Inlining the best-practices guidance into each workflow prompt -- Good, because the guidance is right there at the point of use; no indirection. -- Bad, because it's four copies of the same note-taking advice in four synced templates — duplication that drifts, and four times the prompt length, which reads as nagging. One linked node keeps each prompt to one line. -- Neutral, because a one-node-plus-links shape is exactly what the best-practices node /teaches/, so the design eats its own dogfood. - -** Putting the encouragement only in =knowledge-base.md= (no workflow edits) -- Good, because it's the least change — one rule edit, no template churn. -- Bad, because that's the status quo that produced the problem: a rule read once at load and then forgotten. Habits in this system come from workflow prompts, not background rules. -- Neutral, because the rule still carries the authoritative boundary; the workflow prompts are the habit layer on top. - -* Decisions [6/6] - -** DONE Where exactly does the startup link land — Phase A read, Phase C nudge, or notes.org? -- Owner / by-when: Craig / before implementation -- Context: Startup has three candidate homes for the KB encouragement: a Phase A parallel read of the best-practices node (costs context every session), a Phase C surfaced nudge (one line, conditional, consistent with the existing roam-inbox and task-review nudges), or a static line in each project's =notes.org= Active Reminders (per-project, not synced, drifts). The Phase C nudge matches the established nudge pattern and costs nothing when there's nothing to say. -- Decision: We will add the encouragement as a one-line Phase C nudge in startup.org, pointing at the best-practices node by its KB path, surfaced once near the other Phase C nudges. -- Consequences: easier — consistent with existing nudge mechanics, synced to every project, no per-session read cost; harder — one more line competing for attention in the Phase C surface, so the wording has to earn its place and stay terse. - -** DONE Is the startup nudge unconditional, or gated on the KB clone being present? -- Owner / by-when: Craig / before implementation -- Context: =~/org/roam/= isn't on every machine. The existing roam-inbox nudge already guards on the clone's presence ([ -f ~/org/roam/inbox.org ]). An unconditional KB nudge would fire on machines where the agent can't act on it. -- Decision: We will gate the startup nudge on the roam clone being present, reusing the existing presence check, so the encouragement only appears where the agent can act on it. -- Consequences: easier — no dead nudge on KB-less machines, mirrors the roam-inbox guard; harder — one more conditional in Phase C, and a machine without the clone gets no encouragement at all (acceptable — it can't contribute there anyway). - -** DONE Does the early wrap-up prompt stop and ask Craig, or self-answer silently? -- Owner / by-when: Craig / before implementation -- Context: Wrap-up is meant to be quick — Craig already authorized the wrap, and the existing KB-promotion check self-answers (the agent decides what's durable; work projects skip the write). An early prompt that /stops and asks Craig/ "what did you learn?" would add an interactive turn to a flow designed not to have them. But a purely silent self-answer risks the agent skipping the reflection. -- Decision: We will have the agent self-answer the early prompt — reflect on session learnings and stage candidate facts — without stopping to ask Craig, matching the wrap-up's no-extra-turns design; the candidates flow into the existing promotion check which writes the nodes and receipt. -- Consequences: easier — preserves wrap-up cadence, no new interactive gate, one pipeline from reflect to receipt; harder — relies on the agent actually reflecting rather than rubber-stamping "nothing learned," which the receipt makes visible over time but doesn't enforce. - -** DONE Do triage-intake and inbox-zero reminders fire every run, or only when the run surfaced something durable? -- Owner / by-when: Craig / before implementation -- Context: Both workflows run frequently (triage-intake between meetings, inbox-zero twice a session). A reminder on /every/ run is the textbook nag-fatigue failure — a line the agent learns to skip. A reminder gated on "this run surfaced a pattern / reference pointer worth keeping" fires rarely and stays meaningful, but requires the agent to make that judgment, which is softer than a mechanical condition. -- Decision: We will make both reminders conditional in spirit — a single line phrased as "if anything here was durable, write it to the KB" that the agent acts on only when the run actually surfaced something, rather than an unconditional step; an all-quiet triage sweep or an empty inbox-zero run emits no KB line. -- Consequences: easier — the reminder stays rare and credible, never pads a no-change sweep, fits triage-intake's deltas-only discipline; harder — "durable-looking" is an agent judgment with no mechanical check, so the reminder's effectiveness rides on the best-practices node teaching that judgment well. - -** DONE Best-practices node: agent-authored once, or hand-authored by Craig? -- Owner / by-when: Craig / before implementation -- Context: =knowledge-base.md= says agents never edit Craig's hand-authored nodes. The best-practices node is /about/ how agents write nodes — if an agent authors it, future agents may treat it as fair game to edit; if Craig hand-authors it, it's protected and stable but he writes it. Given it's a foundational reference the whole feature points at, stability matters. -- Decision: We will have Craig hand-author the best-practices node from the outline in this spec, so it's a protected, stable reference; the spec supplies the full drafted content for him to review and commit. -- Consequences: easier — the node is stable and protected from agent edits, one authoritative reference; harder — Craig writes (or reviews-and-commits) it rather than delegating, and updates to it are his call, not an agent's. - -** DONE Read side: how does startup surface lessons to consult, not just encourage contribution? -- Owner / by-when: Craig / ratified 2026-06-20 -- Context: The original spec only strengthened the /write/ side — startup encourages contributing (D1) but never surfaces existing KB lessons to /read/. The wrap-up receipt data shows "consulted no" across recent sessions: agents don't reach for the KB because nothing brings it to their attention at the moment work starts. =knowledge-base.md='s "search the KB first" is reactive and read-once-at-rule-load. A proactive surfacing at startup is the missing counterpart to D1. The cost constraint is the same one D1 dodged: a full Phase A read of matching nodes would spend context every session. -- Decision: We will add a second startup Phase C nudge (alongside D1's contribute-link, gated on the same roam-clone presence check) that surfaces KB lessons relevant to the current project — a count plus the nodes' declarative /titles only/ (no full-node read), capped at ~5. Relevance is matched cheaply on the project basename and obvious topic words against node titles/filetags/paths, with a most-recent fallback when nothing matches. The agent opens a node on demand. Titles are declarative by the best-practices node's own rule, so a title alone tells the agent whether to open it. -- Consequences: easier — closes the "consulted no" half with near-zero context cost (titles only), reuses the Phase C nudge pattern and the roam guard, and the consult and contribute nudges sit together as one KB surface; harder — relevance matching is a heuristic that can miss or mis-surface, and it adds a second KB line to Phase C, so both must stay terse to avoid nudge fatigue. If the receipt shows consults rising but the surfaced titles are noise, tighten the match. - -* Implementation phases - -** Phase 1 — Author the best-practices node -Write =~/org/roam/agents/<timestamp>-agent-kb-best-practices.org= from the outline in Design, with a generated =:ID:=, =#+title:=, =:filetags: :agent:reference:=, the eight content sections, =[[id:...]]= links to any existing related =:agent:= nodes, and the sources footer. Commit + push the roam repo per =knowledge-base.md='s session discipline. Leaves the KB with one new reference node and nothing else touched. - -** Phase 2 — Wire the startup encouragement (contribute + consult) -Add two one-line Phase C nudges to =claude-templates/.ai/workflows/startup.org= (canonical side), both gated on the roam-clone presence check: (1) D1's contribute-link pointing at the best-practices node by path, and (2) D6's consult-surface listing project-relevant KB node titles (count + titles only, capped ~5, project-basename match with recent fallback). A Phase A read counts =:agent:= nodes cheaply so Phase C only does the title surfacing when there's something to show. Run =scripts/sync-check.sh --fix=, commit both canonical + mirror. Propagates to every project on next startup. - -** Phase 3 — Wire the three remaining prompts -Add the end-of-flow KB reminder to =triage-intake.org= (end of Phase D / Exit Criteria) and =inbox-zero.org= (Phase D Surface), and the early KB prompt to =wrap-it-up.org= (top of Step 1, feeding the existing promotion check). All on the canonical side, then sync-check + commit. Each edit is one short block; the tree stays working after each. - -** Phase 4 — Verify propagation + receipt linkage -Confirm the four edits survive a startup sync into a test project, the wrap-up early prompt's output reaches the existing =KB: promoted N / consulted yes-no= receipt (no duplicate receipt), and the best-practices node is reachable by the =rg= the rule documents. - -* Acceptance criteria -- [ ] Best-practices node exists at =~/org/roam/agents/= with =:agent:reference:= tags, is found by =rg '#\+filetags:.*:agent:' ~/org/roam/=, and cites its sources. -- [ ] Startup surfaces a single KB-contribution line in Phase C, gated on the roam clone, pointing at the node — and stays silent when the clone is absent. -- [ ] Startup also surfaces a KB-consult line in Phase C (D6): project-relevant node titles (count + titles only, capped ~5), gated on the clone, silent when nothing matches and the clone is absent. -- [ ] Triage-intake and inbox-zero each emit one KB reminder line only when the run surfaced something durable; an all-quiet run emits none. -- [ ] Wrap-up asks the "what did you learn?" reflection early in Step 1, and its candidates feed the existing promotion check — producing exactly one =KB: promoted N / consulted yes-no= receipt, not two. -- [ ] No workflow blocks, stalls, or fails because a node wasn't written. -- [ ] All four workflow edits are on the canonical =claude-templates/.ai/= side, mirror synced, sync-check clean. - -* Readiness dimensions -- Data model & ownership: KB nodes are agent-written under =agents/=; the best-practices node is Craig-authored and protected. No new persisted state beyond the one node and the four template edits. Wrap-up receipt ownership unchanged. -- Errors, empty states & failure: roam clone absent → all KB prompts silently no-op (reuse existing presence guards). Work/unknown project → write boundary in =knowledge-base.md= still refuses with its contract; prompts fire but the agent declines to write per the rule. No silent data loss — nothing is deleted. -- Security & privacy: no secrets in nodes (rule's exclusion bar). Work-confidential facts never written (the boundary). The best-practices node is reference-only, no sensitive content. -- Observability: the existing =KB: promoted N / consulted yes-no= receipt is the single metric; grepping session archives for =KB:= answers "are agents using this?" No new instrumentation added. -- Performance & scale: four one-line prompts; negligible. The startup nudge is a Phase C surface line, not a Phase A read, so no per-session context cost from loading the node. -- Reuse & lost opportunities: reuses the existing Phase C nudge pattern, the roam-clone presence guard, the wrap-up promotion check + receipt, and =knowledge-base.md='s boundary. Nothing reinvented. -- Architecture fit & weak points: the four workflows are synced templates; canonical-vs-mirror edit discipline applies (CLAUDE.md). Weak point — nag fatigue if the reminders fire unconditionally; mitigated by the conditional-in-spirit decision. Weak point — the reminders rely on agent judgment ("durable-looking"); mitigated by the best-practices node teaching that judgment. -- Config surface: none. No new knobs; the prompts are unconditional copy gated only on the existing roam-clone check. -- Documentation plan: the best-practices node /is/ the user-facing doc. =knowledge-base.md= stays the authoritative rule; this feature adds no new rule file. No migration doc needed. -- Dev tooling: =scripts/sync-check.sh --fix= keeps canonical + mirror aligned (enforced by =githooks/pre-commit=). =make test= covers the repo's existing gates; no new test target needed for prose-only workflow edits. -- Rollout, compatibility & rollback: edits propagate via the startup rsync to every project on next session — no migration. Rollback is reverting the four template edits + deleting the node; nothing persisted depends on them. Fully reversible. -- External APIs & deps: none — no API calls, no new dependencies. The only external surface is the =~/org/roam/= git repo, already in use by the rule. - -* Risks, Rabbit Holes, and Drawbacks -- *Nag fatigue* — the central risk. Four prompts across four frequently-run workflows can train agents to skip them. Dodge: one line each, conditional in spirit, the startup line gated, the triage/inbox reminders firing only on real signal. If the receipt shows agents tuning them out, cut the lowest-value prompt rather than adding more. -- *Junk-node accumulation* — encouraging contribution without a quality bar grows a junk drawer. Dodge: the best-practices node /is/ the quality bar, and the exclusion list keeps high-churn / session-state facts out. Craig prunes at will (the rule already grants this). -- *Receipt double-counting* — if the early wrap-up prompt writes its own receipt, the metric breaks. Dodge: the early prompt is explicitly a capture step feeding the existing check; only the existing sub-section emits the receipt. Acceptance criterion guards this. - -* References / Appendix -Sources for the best-practices node's curated content: -- Sönke Ahrens, /How to Take Smart Notes/ — atomicity, own-words, linking: [[https://www.soenkeahrens.de/en/takesmartnotes][soenkeahrens.de]]; principle of atomicity: [[https://zettelkasten.de/atomicity/guide/][zettelkasten.de atomicity guide]]. -- Andy Matuschak, /Evergreen notes/ — concept-oriented, densely linked, write for yourself: [[https://notes.andymatuschak.org/Evergreen_notes_should_be_concept-oriented][notes.andymatuschak.org]]. -- Org-roam community practice — declarative titles, atomic nodes, capture-then-refine: [[https://www.orgroam.com/manual.html][Org-roam manual]]; [[https://lucidmanager.org/productivity/taking-notes-with-emacs-org-mode-and-org-roam/][lucidmanager.org org-roam guide]]. -- Existing rule this builds on: =~/code/rulesets/claude-rules/knowledge-base.md=. - -* Review and iteration history -** 2026-06-16 Tue — author -- What: initial draft. -- Why: Craig wants the org-roam KB to compound into a cross-project asset; needs the workflow wiring + curated best-practices node speced before building. -- Artifacts: this spec; four target workflows (startup, triage-intake, inbox-zero, wrap-it-up); =knowledge-base.md=. -** 2026-06-20 Sat — ratified + read-side added -- What: ratified all five original decisions; added decision D6 (read-side startup consult-nudge) and threaded it through Design, Phase 2, and acceptance. Status draft → approved. -- Why: receipt data showed the write-only design left "consulted no" across recent sessions. Craig asked for the reverse of contribution — surfacing relevant lessons to read at startup. D6 is that counterpart. -- Artifacts: this spec; startup.org (now two Phase C nudges); the lint level-2-dated-header checker tracked separately. diff --git a/docs/design/wrapup-routing-spec.org b/docs/design/wrapup-routing-spec.org deleted file mode 100644 index 434f8d9..0000000 --- a/docs/design/wrapup-routing-spec.org +++ /dev/null @@ -1,218 +0,0 @@ -#+TITLE: Wrap-Up Inbox/Transcript Routing — Spec -#+AUTHOR: Craig Jennings -#+DATE: 2026-06-13 -#+TODO: TODO | DONE SUPERSEDED CANCELLED - -* Metadata -| Status | Ready — review incorporated (spec-review, 2026-06-21) | -|----------+-----------------------------------------------------| -| Owner | Craig Jennings | -|----------+-----------------------------------------------------| -| Reviewer | Codex (spec-review) | -|----------+-----------------------------------------------------| -| Related | [[file:../../todo.org][todo.org: wrap-up routing task]] · [[file:2026-06-13-wrapup-inbox-transcript-routing-proposal.org][archsetup proposal]] | -|----------+-----------------------------------------------------| - -* Summary - -At wrap-up, an inbox handoff that belongs to another project, once accepted and filed locally, has no clean home in the current project's =todo.org=. This adds an optional routing step to =wrap-it-up.org=: surface the filed keepers whose home is elsewhere, recommend a destination for each, and on one confirmation deliver each to that project's =inbox/= via =inbox-send= (one handoff per task), removing it from the local =todo.org=. The destination's own next session files it through =process-inbox=, applying that project's value gate, priority scheme, and =todo-format.md=. A parallel step (vNext) files meeting-transcript recordings into the right project's =assets/=. - -* Problem / Context - -=process-inbox.org= dispositions each handoff as act / fold / file / reject, and "file as TODO" lands the task in the *current* project's =todo.org=. When the real home is a different project, the choices today are: file it locally and let it rot in the wrong tracker, hand-edit two projects' =todo.org= files, or defer it and carry the debt to next session. - -The wrap-up's existing Step 3 "Inbox sanity check" only counts unprocessed items and blocks the wrap until they clear. It answers "is the inbox clean?" — it doesn't route anything. - -Meeting transcripts have the same homelessness: a recording dropped during a session belongs in some project's =assets/=, but nothing moves it there at wrap. - -The friction is small per-item but recurring, and the manual cross-project edit is error-prone (two files, two repos, easy to leave one half-done). - -* Goals and Non-Goals - -** Goals -- At wrap-up, surface filed keepers whose home is a different project, with a recommended destination each. -- Route the whole batch on one confirmation ("go with recommendations") or leave it entirely ("skip"). No per-item triage. -- Deliver each routable keeper to the destination's =inbox/= via =inbox-send=, one handoff per task, and remove the keeper from the local =todo.org= on send. The destination files it through its own =process-inbox=. -- Provenance is automatic: =inbox-send= stamps the source project and date on every handoff (the =from-<source>= filename and =#+SOURCE:= line). The delivery shows in the destination inbox; the removal shows in the source's git diff. -- The destination set is any project with an =inbox/= — reuse =inbox-send='s existing discovery. - -** Non-Goals -- Not a wrap gate. A skip is a clean, complete wrap. -- Not per-item triage. The interaction is batch-level: go or skip. -- Not a replacement for =process-inbox.org='s value gate. Routing assumes the item is already an accepted keeper. -- Not a confidence-free auto-mover. A low-confidence destination recommendation says so, and the batch "go" stays trustworthy because the surfaced list is reviewable before the keystroke. - -** Scope tiers -- v1: task/event routing by =inbox-send= delivery to the destination's =inbox/=. The interaction, the recommendation engine, the candidate-set marker stamped at file time, reusing =inbox-send='s discovery and delivery. -- Out of scope: per-item destination editing, an interactive correction loop, moving items that aren't accepted keepers, a new cross-repo =todo.org= move primitive (the superseded direct-move design). -- vNext: meeting-transcript filing (gated on the unresolved source-location decision and the file-vs-file+extract question — see Decisions). - -* Design - -** User-facing (the wrap interaction) - -The router is a new sub-step of =wrap-it-up.org='s Step 3, running after the existing inbox sanity check. Its input is filed keepers, not raw inbox files (decision: Reading B): tasks =process-inbox= accepted and filed into the local =todo.org= this session whose inferred home is a different project. When the router finds such a keeper, it surfaces it in a list, one line each: the task, the recommended destination project, and a confidence marker when the inference is weak. Then two options, batch-level: - -1. Go with the recommendations — route every recommended item (inbox-send to the destination + local removal). -2. Skip — leave the whole batch in place. A skip is a clean wrap. - -That is the entire interaction. No per-item walk. The surfaced list is the review surface; the single keystroke is trustworthy because the list was reviewable and low-confidence recommendations flagged themselves. - -On "go", each routable keeper is delivered to its recommended destination's =inbox/= via =inbox-send= (one handoff per task) and removed from the local =todo.org=; the destination's own next session files it through =process-inbox=. A skipped or no-match item stays where it is; the existing sanity check still governs whether the wrap is clean. - -** Implementer (the mechanics) - -*Candidate set (what the router considers).* Reading B means the router does not scan the whole local backlog — it would otherwise suggest moving legitimate local tasks every wrap. The candidate set is keepers =process-inbox= filed this session whose inferred home differs from the current project, identified by a marker stamped at file time (decision D8): =process-inbox='s "file as TODO" step stamps =:ROUTE_CANDIDATE: <inferred-project>= on any keeper whose inferred home is not the current project. At wrap, the router's candidate set is exactly the local tasks carrying that property — never the standing backlog. - -*Destination discovery.* Reuse =inbox-send.py='s existing =discover_projects= (a project is a directory with =.ai/= AND =inbox/=). The destination must have an =inbox/= to receive a handoff, so that is the natural destination set — no new discovery code. A project with a =todo.org= but no =inbox/= cannot receive an inbox handoff and must be bootstrapped first; in practice every active project has an =inbox/=. - -*Delivery.* For each candidate, on "go": (1) =inbox-send <destination> --file= a one-task handoff into the destination's =inbox/= (one file per task, so the destination's =process-inbox= dispositions it as a single item), then (2) remove the keeper from the local =todo.org=. Step 1 is a cross-project write, but it uses the =cross-project.md=-sanctioned path (dropping a file in another project's inbox needs no confirmation); step 2 is a single-file edit in the current project's own =todo.org=, which the wrap is already committing. No new cross-repo move primitive, no foreign =todo.org= edit. - -*Provenance and filing.* =inbox-send= stamps the source and date automatically (=from-<source>= filename + =#+SOURCE:= line), so the destination's session knows where the item came from. That session files it through its own =process-inbox= — value gate, priority scheme, =todo-format.md= — so the task lands per the destination's conventions rather than as an externally-authored insertion. - -*Recovery (mis-route).* If the recommendation engine picks a wrong destination, the receiving session rejects it via =process-inbox='s reject-from-another-project flow (write a response, =inbox-send= it back to the source named in the provenance, delete the local copy). The task returns to the source project's inbox; nothing is lost or corrupted. This is why removing the source on send is safe — the reject path is the undo. - -*Recommendation engine.* Infer the destination from the item's content — project names, file paths, topic words — matched against the discovered project list, with a confidence tier: *strong* = a destination project's name or path appears literally in the item; *weak* = topic-word overlap only; *none* = no match, the item stays put and is never surfaced as a route. "Go" routes strong and weak items (weak visibly labeled); a no-match item is left in place. Pure function =(item, project-list) → (destination, confidence)=, unit-tested directly. The engine is the interesting, uncertain part; it earns the spec. - -* Alternatives Considered - -** Per-item triage instead of batch go/skip -- Good, because it gives precise control over each destination. -- Bad, because it taxes the common case (a batch that's all-correct, or all-stay) with a walk. Craig explicitly asked for two options, not a triage loop. -- Neutral, because per-item correction could return as a vNext refinement if batch-only proves too blunt. - -** Fold the router into the existing Inbox sanity check step -- Good, because one inbox step is simpler than two. -- Bad, because the sanity check *gates* the wrap (blocks until clean) and the router is *optional* (skip is clean). Merging a blocking check with an optional action muddies both. -- Neutral, because the two share discovery code while staying separate steps. (Resolved: D1 keeps them separate, with the router acting on filed keepers rather than inbox files.) - -** Reuse process-inbox's "file as TODO" with a destination argument -- Good, because it avoids a second mechanism. -- Bad, because =process-inbox= runs per-item mid-session against the local project; the router runs at wrap, batch-level, cross-project. Different cadence, different scope. -- Neutral, because both ultimately call the same atomic move helper — the helper is the shared primitive, the two callers stay distinct. - -* Decisions [9/9] - -** DONE Reuse the Open Work matcher for destination anchoring -- Context: the move needs a reliable insertion point in the destination =todo.org=; guessing risks corrupting another project's file. -- Decision: We will reuse =todo-cleanup.el='s =tc--find-section "open work"= matcher, which already handles the unique / missing / ambiguous cases, and skip+surface any destination without a clean Open Work heading. -- Consequences: easier — no new parser, consistent with =--archive-done=. Harder — destinations must carry the "Open Work" heading convention, so a project with a differently-named section is silently unroutable until it conforms. - -** SUPERSEDED Move atomically through a helper, never hand-edit two repos -Superseded 2026-06-21 by "Deliver via inbox-send" below. The original plan built a new atomic helper to insert a subtree into a foreign =todo.org= and remove the source. The inbox-route delivers the keeper to the destination's inbox instead, so no cross-repo move primitive is built. -- Context: a move touches two files in two repos; a half-done move loses or duplicates a task. -- Decision (superseded): route every move through one helper that inserts under the destination's Open Work heading and removes the source as one operation. - -** SUPERSEDED Cross-project writes stay visible and carry provenance -Superseded 2026-06-21 by "Deliver via inbox-send" below. =inbox-send= already stamps provenance (=from-<source>= filename + =#+SOURCE:= line), so the hand-stamped note is unnecessary; the destination files the item through its own gate rather than receiving an externally-authored insertion. -- Context: writing into another project's =todo.org= crosses the =cross-project.md= scope boundary. -- Decision (superseded): treat the batch "go" as authorization, leave the move visible in the destination's git diff, and stamp a one-line provenance note on each moved task. - -** DONE Separate router step, operating on filed keepers (Reading B) -- Context: the sanity check gates the wrap on inbox/ contents; the router is optional. The deeper question was the router's input — raw inbox files (Reading A, which overlaps the sanity check) or already-filed keepers that belong elsewhere (Reading B, a todo-routing concern). -- Decision: We will keep the router a separate optional sub-step after the sanity check, and its input is Reading B: accepted keepers process-inbox filed into the local =todo.org= whose inferred home is another project. The sanity check stays a pure inbox gate; the router is a todo-routing action that shares only the destination-discovery code. -- Consequences: easier — each step has one job, the gate can't be muddied by an optional action, and the router never competes with the inbox gate over the same files. Harder — the candidate set (which local tasks the router considers) needs a marking mechanism (see the Implementer "candidate set" note); Reading A's "dispose raw inbox files at wrap" convenience is given up. - -** DONE Transcript routing deferred to vNext -- Context: transcripts file as artifacts, not tasks, and a meeting usually produces both a recording to keep and action items to track. Two unknowns block it: where recordings accumulate (a recordings inbox, a downloads dir, wherever the meeting tooling drops them), and whether filing should also extract action items into the destination's =todo.org=. -- Decision: We will defer transcript routing to vNext. Both the source-location dependency and the file-only-vs-extract-action-items question are deferred with it, to be settled when the vNext work is specced. v1 ships task routing only. -- Consequences: easier — v1 isn't blocked on the unresolved source location. Harder — until vNext, a meeting recording still has no automatic home; only its action items (if filed as tasks) route through v1. - -** DONE Keep defer-and-stage and the router as distinct policies -- Context: the 2026-06-12 Skeptical Review added a defer-and-stage path in =process-inbox.org= that files a =[#B]= VERIFY for shared-asset proposals parked for review. That also turns an inbox item into a =todo.org= task — overlapping surface with this router. -- Decision: We will keep them distinct. Defer-and-stage parks a proposal-under-review locally as a VERIFY; the router moves an accepted keeper to its home project as a TODO. They differ on review status (proposal vs accepted) and destination (local vs cross-project), and share only the atomic move helper, not the policy. Reading B makes the split clean: the router acts on accepted keepers, never on proposals under review. -- Consequences: easier — two clear, non-competing policies on one shared primitive. Harder — the workflow prose must name the boundary so a future reader doesn't collapse them and reintroduce the ambiguity. - -** DONE Deliver via inbox-send to the destination's inbox, not a direct todo.org move (supersedes D2/D3) -- Owner / by-when: Craig / ratified 2026-06-21 (spec-response) -- Context: D2/D3 built a new atomic helper that edits a foreign =todo.org= and removes the source, with a hand-stamped provenance note. =inbox-send= + =process-inbox= already do cross-project delivery: inbox-send writes the handoff with =from-<source>= provenance, and the destination's process-inbox files it through that project's own gate. =cross-project.md= names the inbox as the sanctioned cross-scope write path. A verified precondition reversed the old assumption — some projects have =inbox/= but no =todo.org=, so direct-move's discovery silently drops keepers headed there while inbox-route delivers. -- Decision: We will route each keeper by =inbox-send= into the destination's =inbox/= (one handoff per task) and let the destination's own =process-inbox= file it; we will not edit the destination's =todo.org= directly. D2 (atomic move helper) and D3 (hand-stamped provenance) are superseded — the helper isn't built, and provenance is inbox-send's by construction. -- Consequences: easier — no new cross-repo write primitive, no foreign-tracker corruption risk, provenance and per-project filing for free, graceful when the destination lacks a =todo.org=. Harder — filing is deferred to the destination's next session (self-resolving, since startup auto-runs =process-inbox= on a non-empty inbox), and a project never opened accumulates a visible inbox backlog rather than a silent foreign insertion. - -** DONE Candidate-set marking: tag :ROUTE_CANDIDATE: at process-inbox file time (Option A) -- Owner / by-when: Craig / ratified 2026-06-21 (spec-response) -- Context: the router must consider only this-session-filed inbox keepers whose home is elsewhere, never the standing backlog. Two options: tag at file time (process-inbox stamps a marker) or infer from a =CREATED=-this-session stamp + content. =process-inbox= does not stamp =:CREATED:= today, so the inference option would need that paired edit anyway, removing its only advantage. -- Decision: We will tag at file time. =process-inbox='s "file as TODO" step stamps =:ROUTE_CANDIDATE: <inferred-project>= on any keeper whose inferred home differs from the current project; the router's candidate set is the local tasks carrying it. -- Consequences: easier — precise (zero standing-backlog false positives), the inference happens once where context is richest, and the marker doubles as the router's "go" trigger. Harder — a paired edit to =process-inbox.org= Phase D ships coupled with the router. - -** DONE Source removal is a local todo.org edit on send; recovery via the reject flow -- Owner / by-when: Craig / ratified 2026-06-21 (spec-response) -- Context: the review left source-handling vague ("leave the source until the destination confirms by filing"), but there is no confirmation callback, so leaving it duplicates the task once the destination files. The keeper was filed into the *current* project this session and doesn't belong there. -- Decision: On "go" we will remove the routed keeper from the *current* project's =todo.org= (a local single-file edit, not a cross-repo write) right after the =inbox-send=. If the destination rejects the handoff, =process-inbox='s reject-from-another-project flow returns it to the source's inbox, so the removal is reversible. -- Consequences: easier — no duplication, the only deletion is from a file we own and are already committing, the reject path is the undo. Harder — a brief window exists where the task lives only as an in-flight inbox handoff (between send and the destination's filing); acceptable because the handoff file is durable and the reject path recovers a mis-route. - -* Implementation phases - -** Phase 1 — Destination discovery (reuse inbox-send) -Reuse =inbox-send.py='s =discover_projects= (a directory with =.ai/= AND =inbox/=) as the destination set — no new discovery code. Confirm the destination universe: if a real destination has a =todo.org= but no =.ai/+inbox/=, name it and bootstrap its inbox; otherwise the existing filter already covers it. Leaves the tree working. - -** Phase 2 — Candidate-set marking in process-inbox -Extend =process-inbox.org='s "file as TODO" step (Phase D) to stamp =:ROUTE_CANDIDATE: <inferred-project>= on any keeper whose inferred home differs from the current project (decision D8). Sync the =.ai/= mirror. This is the paired workflow edit that lets the wrap-up router find candidates without scanning the standing backlog. (Replaces the superseded atomic-move helper.) - -** Phase 3 — Recommendation engine -Infer destination from item content against the discovered list, with a confidence tier. Pure function =(item, project-list) → (destination, confidence)=. Unit-tested: strong match (destination project named or path present literally → high) , weak match (topic-word overlap only → low, still routed but labeled), no match (stays put, never surfaced), two-project tie (lowest-confidence / tie-break), empty project list (all stay put). The engine is shared by process-inbox's file-time marker (Phase 2) and the wrap-up router (Phase 4), so it lives where both can call it. - -** Phase 4 — Wrap-up step wiring -Add the optional router sub-step to =wrap-it-up.org= Step 3, after the inbox sanity check: surface the candidate batch (one line each: task, destination, delivery mode, confidence), the two options (go / skip). On "go", for each candidate, =inbox-send= a one-task handoff to the destination's =inbox/= and remove the keeper from the local =todo.org=. Empty candidate set = zero interaction (silent). Name the gate-vs-optional split in the prose (the sanity check gates; the router is optional). Sync the =.ai/= mirror. - -** Phase 5 — Transcript routing (vNext, gated on the transcript decision) -Only after the transcript-scope decision resolves. File a recording into the destination =assets/= per =working-files.md=, batch go/skip mirroring the task router. - -* Acceptance criteria -- [ ] At wrap, a filed keeper naming another project is surfaced with that project as the recommended destination. -- [ ] "Go" delivers every recommended item as a one-task =from-<source>= handoff into its destination's =inbox/= and removes it from the local =todo.org=. -- [ ] "Skip" leaves every item in place and the wrap completes cleanly. -- [ ] An empty candidate set produces zero interaction (no prompt, no "0 items" line). -- [ ] A weak (low-confidence) recommendation is visibly labeled in the surfaced list; a no-match item is never surfaced as a route. -- [ ] A candidate whose destination has an =inbox/= but no =todo.org= still delivers (degrades gracefully). -- [ ] A mis-routed handoff is recoverable via =process-inbox='s reject-from-another-project flow, returning it to the source's inbox. -- [ ] The router considers only =:ROUTE_CANDIDATE:=-tagged keepers, never the standing backlog. - -* Readiness dimensions -- Data model & ownership: items are org subtrees; the destination owns the moved task after the move (provenance note records origin). N/A for remote/cached state — all local files. -- Errors, empty states & failure: missing/ambiguous Open Work heading → skip+surface; failed move → atomic no-op; empty routable set → router stays silent (no prompt). -- Security & privacy: N/A — local org files, no credentials or external services. -- Observability: the move shows in the destination's git diff plus the provenance line; the surfaced batch list is the pre-move view. -- Performance & scale: bounded by inbox size (single digits) and project count (tens); no hot path. -- Reuse & lost opportunities: reuses =tc--find-section= and todo-cleanup's subtree-move; widens existing discovery rather than adding a parallel one. -- Architecture fit & weak points: the recommendation engine is the weak point (a wrong-confident destination is the worst failure) — mitigated by the confidence label and reviewable batch list. -- Config surface: possibly a discovery-root list (defaults to =~/projects/=, =~/code/=, matching =inbox-send.py=). Name it if it needs to be user-visible. -- Documentation plan: =wrap-it-up.org= step prose; a note in =cross-project.md= that the router is a sanctioned cross-project write path. -- Dev tooling: ERT for the elisp helper + discovery; the existing =make test= picks up new test files by glob. -- Rollout, compatibility & rollback: additive workflow step; rollback is removing the sub-step. No persisted-data migration. -- External APIs & deps: none. - -* Risks, Rabbit Holes, and Drawbacks -- *Recommendation accuracy is the rabbit hole.* A confidently-wrong destination silently files a task in the wrong project. Dodge: keep the engine conservative, label low confidence, and keep the batch list reviewable before the keystroke. Don't chase a clever inference model in v1. -- *Two inbox-touching steps* (sanity check + router) risk reading as redundant. Dodge: the D1 decision states the gate-vs-optional split in the workflow prose. -- *Scope creep into transcripts* before the source-location question is answered would stall v1. Dodge: transcripts are explicitly vNext behind decision D4. - -* Review dispositions - -Everything in the 2026-06-21 review was accepted, with one modify: - -- *Modified — H1 source-handling.* The review proposed leaving the source keeper in place "until the destination confirms by filing." There is no confirmation callback, so leaving it would duplicate the task once the destination files. Resolved instead (decision D9) to remove the keeper from the *local* =todo.org= on send — a single-file edit in the project we already own and are committing, with =process-inbox='s reject flow as the undo for a mis-route. Keeps the no-foreign-write safety win without the duplication. - -Everything else accepted as written: H1 (inbox-route supersedes direct-move; D2/D3 superseded), H1a (one handoff per task), H1b (reuse =inbox-send= discovery; Phase 1), H2 (tag at file time; D8), M1 (confidence tiers defined in Phase 3 + acceptance), M2 (empty-set silence; acceptance), M3 (paired =process-inbox= edit; Phase 2), M4 (=cross-project.md= note adjusted to "the router uses the sanctioned inbox path"). - -* Review and iteration history - -** 2026-06-13 Sat @ 01:23:13 -0500 — Claude Code (rulesets) — author -- What: initial draft. Problem, goals/scope tiers, two-altitude design, alternatives, six decisions (three DONE from grounding, three TODO for Craig), five implementation phases, acceptance criteria, readiness dimensions, risks. -- Why: the archsetup 2026-06-13 handoff cleared the spec bar in inbox triage and was filed spec-bound rather than applied. This draft turns the proposal into a reviewable design with the open questions isolated as decision tasks. -- Artifacts: proposal source at =docs/design/2026-06-13-wrapup-inbox-transcript-routing-proposal.org=; grounded against =wrap-it-up.org= Step 3, =todo-cleanup.el= =tc--find-section=, and =inbox-send.py= discovery. - -** 2026-06-13 Sat @ 01:36:28 -0500 — Craig Jennings + Claude Code (rulesets) — author -- What: resolved all three open decisions. The router's input is Reading B (filed keepers that belong elsewhere, not raw inbox files), so D1 keeps it a separate sub-step from the inbox gate and D5 keeps it distinct from the defer-and-stage router; D4 defers transcript routing to vNext. Reworked the design (input definition, a candidate-set note bounding the router to session-filed keepers) and Phase 3 to match. Cookie now [6/6]; Status moved to ready-for-review. -- Why: Craig chose Reading B after the A-vs-B input ambiguity surfaced as the root under D1 and D5. Reading B keeps the inbox gate, the router, and defer-and-stage each simple instead of entangling three mechanisms. -- Artifacts: this spec; the candidate-set marking mechanism is the one detail flagged for spec-review to pin. - -** 2026-06-21 Sun @ 01:58:41 -0400 — Claude Code (rulesets) — reviewer -- What: spec-review pass. Rubric *Not ready*, two blocking findings. H1: the inbox-route alternative (inbox-send each routable keeper to the destination's inbox/, let its own process-inbox file it) supersedes the direct-move design — reshape D2, drop Phase 2 and D3's provenance burden. H2: pin the candidate-set marking to Option A (tag =:ROUTE_CANDIDATE:= at process-inbox file time). Four medium findings (M1 confidence tiers, M2 empty-set silence, M3 paired process-inbox edit phase, M4 cross-project.md note). Full review + drop-in implementation tasks in the review file. -- Why: Craig challenged D2 directly (why edit a foreign todo.org rather than use the sanctioned inbox-send path). The review confirmed it: inbox-send already emits the exact provenance D3 reinvents, process-inbox already files per-item with the destination's own gate, cross-project.md sanctions the inbox path, and a verified precondition reverses the spec's assumption — chime and yt-sync have inbox/ but no todo.org, so direct-move silently drops keepers headed there while inbox-route degrades gracefully. -- Artifacts: [[file:wrapup-routing-spec-review.org][review file]]. Next: spec-response to disposition H1/H2 (recommend accept both), which moves the rubric to Ready. - -** 2026-06-21 Sun @ 02:06:37 -0400 — Craig Jennings + Claude Code (rulesets) — responder -- What: folded the spec-review in. Accepted H1 (inbox-route) and H2 (tag at file time); superseded D2 and D3; added D7 (deliver via =inbox-send=), D8 (=:ROUTE_CANDIDATE:= marker at file time), D9 (local source removal + reject-flow recovery). Rewrote Summary, Goals, Design mechanics, Implementation phases (dropped the atomic-move helper — Phase 2 is now the =process-inbox= marker edit), and Acceptance criteria for the inbox-route. One modify (D9) refines H1's vague source-handling. Cookie [9/9]; Status → Ready. -- Why: Craig's inbox-route challenge held up under review — it reuses the sanctioned cross-project path, gets provenance and per-project filing for free, and degrades gracefully where direct-move drops the task. D9 closes the duplication gap the review left open. -- Artifacts: review file deleted on this pass. Next: Phase 6 implementation-task breakdown into =todo.org= on the author's go. |
