diff options
| author | Craig Jennings <c@cjennings.net> | 2026-07-02 01:29:37 -0400 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-07-02 01:29:37 -0400 |
| commit | 44c8cc2f0657653b2173faaa3518fa74f931d468 (patch) | |
| tree | 0e199070382728d3c0bdf9be38630f0e41f1b3cf | |
| parent | eea93f152460b9624b3b863fa2b7a4901b391eb0 (diff) | |
| download | rulesets-44c8cc2f0657653b2173faaa3518fa74f931d468.tar.gz rulesets-44c8cc2f0657653b2173faaa3518fa74f931d468.zip | |
feat(backlog): add the metrics synthesis step to org-roam
"synthesize backlog metrics" reads the JSONL union across personal projects and computes the per-run rollups, the trends, and the corrections signal (a later revert or fix touching an autonomous commit's files within ~14 days, a flag for review rather than a conviction). It writes one :agent:metrics: KB node linking back to prior synthesis nodes. Work and unknown projects are excluded by the denylist classification and reported per the refusal contract.
The step is read-only over the logs plus the single KB write. It never mutates the JSONL, todo.org, or any project tree.
| -rw-r--r-- | .ai/workflows/INDEX.org | 1 | ||||
| -rw-r--r-- | .ai/workflows/work-the-backlog.org | 14 | ||||
| -rw-r--r-- | claude-templates/.ai/workflows/INDEX.org | 1 | ||||
| -rw-r--r-- | claude-templates/.ai/workflows/work-the-backlog.org | 14 | ||||
| -rw-r--r-- | todo.org | 4 |
5 files changed, 30 insertions, 4 deletions
diff --git a/.ai/workflows/INDEX.org b/.ai/workflows/INDEX.org index 441d6c2..88721ed 100644 --- a/.ai/workflows/INDEX.org +++ b/.ai/workflows/INDEX.org @@ -57,6 +57,7 @@ This index must list every =.org= file in =.ai/workflows/= except this one and e - =work-the-backlog.org= — the autonomous task-execution loop, the single home for working a batch of marked tasks unattended: takes an ordered task set (explicit list or tag query) + session mode (=file-only= default / =autonomous-commit= + paging) + a hard run cap; each candidate passes the mechanical eligibility gate (status =TODO= + =:solo:= per the project's scheme header) and the four-item defer checklist, then is implemented to the full quality bar (TDD, =/review-code=, =/voice=) as its own logical commits. Fed by the inbox auto-loop's chain step (yes-gated, file-only, cap 1) and the no-approvals speedrun preset (pre-flight Q&A → autonomous-commit + always-push + end-of-set page over an explicit ordered list). - Speedrun triggers: "speedrun", "no approvals speedrun", "speedrun these: <task set>" — any phrase containing "speedrun" routes here (the preset), never to =no-approvals.org= - Manual triggers: "work the backlog", "work the backlog with <task set>" (file-only defaults) + - Synthesis trigger: "synthesize backlog metrics" — read the per-project metrics logs, compute trends + the corrections signal, write one =:agent:metrics:= KB node (personal projects only) ** Calendar diff --git a/.ai/workflows/work-the-backlog.org b/.ai/workflows/work-the-backlog.org index d3e77c0..284935b 100644 --- a/.ai/workflows/work-the-backlog.org +++ b/.ai/workflows/work-the-backlog.org @@ -212,6 +212,18 @@ When Craig names a task set and says "speedrun": The batch-ask (step 4-5) is one message: each question names its task, puts the recommended answer at item 1 when there is one (per =interaction.md= — inline numbered, no popup), and offers "skip this" as the last option. Before the run starts, write each answer into its task's body in =todo.org= as a dated line — the implementation works from the recorded decision, and the record survives the session. The Q&A fires only under this preset; the loop caller never asks (its decision-needing tasks defer). +* Synthesis: metrics → org-roam KB + +Trigger: "synthesize backlog metrics" (optionally a weekly scheduled run). This is the read side of the metrics log — Craig's ask was "gather data and create org-roam articles we can look at later," and this step is the second half. It is read-only over the logs plus exactly one KB write. + +1. *Gather the JSONL union.* Discover =.ai/metrics/work-the-backlog.jsonl= across the project roots (dirs carrying =.ai/protocols.org= under =~/code=, =~/projects=, =~/.emacs.d=). Classify each project per =knowledge-base.md= (work-root denylist, never inference) before reading it into the union. +2. *Enforce personal-only.* A work-classified or unknown project's metrics never enter the KB write — they stay in that project's own log. Report the exclusion per the KB refusal contract: the classification, a one-line redacted summary, and where the data stayed. +3. *Compute the rollups and trends.* Per run: attempted / completed / deferred (by reason) / dropped / failed, wall-clock total, commits landed, review findings per commit. Trends across runs: completion rate over time, defer-reason distribution, findings-per-commit trend. +4. *Compute the corrections signal* — the key metric. For each =commit_sha= in the window, check that project's history for a later commit (within ~14 days) that reverts it or carries a fix touching the same files. A clean run is one whose autonomous commits survive untouched; a flagged run is what Craig reviews by hand. This is a cheap proxy, not proof — it flags candidates, it doesn't convict. +5. *Write one KB node* at =~/org/roam/agents/YYYYMMDDHHMMSS-backlog-metrics-<window>.org= per =knowledge-base.md=: =:agent:metrics:= filetags, a concise title, the rollup table, the trend narrative, and =[[id:...]]= links to prior synthesis nodes so the series is traceable. Pull before writing, commit and push after — the normal KB session discipline. + +The KB node is the artifact Craig reads later: "are the runs completing more and getting corrected less?" should read off the trend table without touching raw logs. Synthesis never mutates the JSONL, todo.org, or any project tree. + * Common Mistakes 1. *Implementing a =VERIFY= or =DOING= task.* The gate is status =TODO= only — a =VERIFY= exists precisely because Craig's input is pending. @@ -234,4 +246,4 @@ Refine as the dogfooding signal arrives — the metrics log and the corrections- * History -Created 2026-07-02 as Phase 1 of the autonomous-batch execution spec, reconciling the inbox-zero "Phase E" proposal and the =.emacs.d= speedrun proposal into one execution loop. The auto-inbox-zero execute step in =inbox.org= reverted to routing-only in the same change so this file is the loop's only home. Phase 2 (same day) wired both callers: the auto-loop chain step and the speedrun preset's trigger phrases. +Created 2026-07-02 as Phase 1 of the autonomous-batch execution spec, reconciling the inbox-zero "Phase E" proposal and the =.emacs.d= speedrun proposal into one execution loop. The auto-inbox-zero execute step in =inbox.org= reverted to routing-only in the same change so this file is the loop's only home. Phases 2-6 (same day) wired both callers, pinned the commit-autonomy waiver markers, fleshed the defer/Q&A/page mechanics, and added the metrics record + KB synthesis step. diff --git a/claude-templates/.ai/workflows/INDEX.org b/claude-templates/.ai/workflows/INDEX.org index 441d6c2..88721ed 100644 --- a/claude-templates/.ai/workflows/INDEX.org +++ b/claude-templates/.ai/workflows/INDEX.org @@ -57,6 +57,7 @@ This index must list every =.org= file in =.ai/workflows/= except this one and e - =work-the-backlog.org= — the autonomous task-execution loop, the single home for working a batch of marked tasks unattended: takes an ordered task set (explicit list or tag query) + session mode (=file-only= default / =autonomous-commit= + paging) + a hard run cap; each candidate passes the mechanical eligibility gate (status =TODO= + =:solo:= per the project's scheme header) and the four-item defer checklist, then is implemented to the full quality bar (TDD, =/review-code=, =/voice=) as its own logical commits. Fed by the inbox auto-loop's chain step (yes-gated, file-only, cap 1) and the no-approvals speedrun preset (pre-flight Q&A → autonomous-commit + always-push + end-of-set page over an explicit ordered list). - Speedrun triggers: "speedrun", "no approvals speedrun", "speedrun these: <task set>" — any phrase containing "speedrun" routes here (the preset), never to =no-approvals.org= - Manual triggers: "work the backlog", "work the backlog with <task set>" (file-only defaults) + - Synthesis trigger: "synthesize backlog metrics" — read the per-project metrics logs, compute trends + the corrections signal, write one =:agent:metrics:= KB node (personal projects only) ** Calendar diff --git a/claude-templates/.ai/workflows/work-the-backlog.org b/claude-templates/.ai/workflows/work-the-backlog.org index d3e77c0..284935b 100644 --- a/claude-templates/.ai/workflows/work-the-backlog.org +++ b/claude-templates/.ai/workflows/work-the-backlog.org @@ -212,6 +212,18 @@ When Craig names a task set and says "speedrun": The batch-ask (step 4-5) is one message: each question names its task, puts the recommended answer at item 1 when there is one (per =interaction.md= — inline numbered, no popup), and offers "skip this" as the last option. Before the run starts, write each answer into its task's body in =todo.org= as a dated line — the implementation works from the recorded decision, and the record survives the session. The Q&A fires only under this preset; the loop caller never asks (its decision-needing tasks defer). +* Synthesis: metrics → org-roam KB + +Trigger: "synthesize backlog metrics" (optionally a weekly scheduled run). This is the read side of the metrics log — Craig's ask was "gather data and create org-roam articles we can look at later," and this step is the second half. It is read-only over the logs plus exactly one KB write. + +1. *Gather the JSONL union.* Discover =.ai/metrics/work-the-backlog.jsonl= across the project roots (dirs carrying =.ai/protocols.org= under =~/code=, =~/projects=, =~/.emacs.d=). Classify each project per =knowledge-base.md= (work-root denylist, never inference) before reading it into the union. +2. *Enforce personal-only.* A work-classified or unknown project's metrics never enter the KB write — they stay in that project's own log. Report the exclusion per the KB refusal contract: the classification, a one-line redacted summary, and where the data stayed. +3. *Compute the rollups and trends.* Per run: attempted / completed / deferred (by reason) / dropped / failed, wall-clock total, commits landed, review findings per commit. Trends across runs: completion rate over time, defer-reason distribution, findings-per-commit trend. +4. *Compute the corrections signal* — the key metric. For each =commit_sha= in the window, check that project's history for a later commit (within ~14 days) that reverts it or carries a fix touching the same files. A clean run is one whose autonomous commits survive untouched; a flagged run is what Craig reviews by hand. This is a cheap proxy, not proof — it flags candidates, it doesn't convict. +5. *Write one KB node* at =~/org/roam/agents/YYYYMMDDHHMMSS-backlog-metrics-<window>.org= per =knowledge-base.md=: =:agent:metrics:= filetags, a concise title, the rollup table, the trend narrative, and =[[id:...]]= links to prior synthesis nodes so the series is traceable. Pull before writing, commit and push after — the normal KB session discipline. + +The KB node is the artifact Craig reads later: "are the runs completing more and getting corrected less?" should read off the trend table without touching raw logs. Synthesis never mutates the JSONL, todo.org, or any project tree. + * Common Mistakes 1. *Implementing a =VERIFY= or =DOING= task.* The gate is status =TODO= only — a =VERIFY= exists precisely because Craig's input is pending. @@ -234,4 +246,4 @@ Refine as the dogfooding signal arrives — the metrics log and the corrections- * History -Created 2026-07-02 as Phase 1 of the autonomous-batch execution spec, reconciling the inbox-zero "Phase E" proposal and the =.emacs.d= speedrun proposal into one execution loop. The auto-inbox-zero execute step in =inbox.org= reverted to routing-only in the same change so this file is the loop's only home. Phase 2 (same day) wired both callers: the auto-loop chain step and the speedrun preset's trigger phrases. +Created 2026-07-02 as Phase 1 of the autonomous-batch execution spec, reconciling the inbox-zero "Phase E" proposal and the =.emacs.d= speedrun proposal into one execution loop. The auto-inbox-zero execute step in =inbox.org= reverted to routing-only in the same change so this file is the loop's only home. Phases 2-6 (same day) wired both callers, pinned the commit-autonomy waiver markers, fleshed the defer/Q&A/page mechanics, and added the metrics record + KB synthesis step. @@ -474,8 +474,8 @@ The four-item checklist (in since Phase 1) gained its mechanics: a VERIFY-filing *** 2026-07-02 Thu @ 01:24:50 -0400 Phase 5 landed — per-task JSONL metrics log Metrics section written into work-the-backlog.org: one record per task at outcome time, appended to the project's .ai/metrics/work-the-backlog.jsonl (git-tracked, append-only, dir+file created on first append). Full field table per the spec (ts, run_id, project, caller, task, outcome, defer_reason, upfront_decision, wall_clock_s, commit_sha, review_findings), outcome slugs mapped to the prose vocabulary, commit_sha flagged as the corrections-signal key (comma-separated when a task decomposed into several commits). Added the sixth outcome the spec's readiness section demanded but the enum missed: failed (tree left working, surfaced, run continues) — wired into the Outcomes vocabulary and loop step 4. A failed append warns in the run summary but never blocks, reorders, or aborts execution. -*** TODO [#C] Phase 6 — synthesis to org-roam :feature: -Read the JSONL union, compute per-run + trend metrics, write a KB node under ~/org/roam/agents/ per knowledge-base.md (personal-projects-only classification enforced). Spec Phase 6. Verify: read-only over the logs plus one KB write. +*** 2026-07-02 Thu @ 01:27:43 -0400 Phase 6 landed — synthesis step to org-roam +Synthesis section written into work-the-backlog.org (trigger "synthesize backlog metrics", INDEX row added): discover the JSONL union across project roots, classify each project per knowledge-base.md's denylist before reading, exclude work/unknown projects with the refusal contract, compute per-run rollups + trends, compute the corrections signal (later revert/fix commit touching the same files within ~14 days — a flag for human review, not a conviction), write one :agent:metrics: KB node under ~/org/roam/agents/ with [[id:...]] links to prior synthesis nodes, pull-before/commit-push-after. Read-only over the logs plus the single KB write; never mutates JSONL, todo.org, or any tree. *** TODO [#C] Speedrun — live trial validation :test: What we're verifying: the whole loop under a real run. Craig names a small ordered set in a coding project and says "no approvals speedrun": pre-flight Q&A fires once up front, each task lands as its own reviewed commit, ineligible/underspecified tasks get VERIFYs instead of half-work, the end-of-set page arrives via notify --persist, and the metrics JSONL carries one record per task. Not :solo: — needs Craig's set and his read on the run. |
