From eea93f152460b9624b3b863fa2b7a4901b391eb0 Mon Sep 17 00:00:00 2001 From: Craig Jennings Date: Thu, 2 Jul 2026 01:26:47 -0400 Subject: feat(backlog): add the per-task JSONL metrics record One record per task at outcome time, appended to the project's .ai/metrics/work-the-backlog.jsonl. The field table follows the spec, with commit_sha called out as the corrections-signal key and comma-separated when a task decomposes into several commits. A failed append warns in the run summary but never blocks or aborts the run. I added the "failed" outcome the spec's error-handling section required but its enum missed: a mid-implementation failure leaves the tree working, gets surfaced, and the run continues. --- .ai/workflows/work-the-backlog.org | 39 ++++++++++++++++++++-- .../.ai/workflows/work-the-backlog.org | 39 ++++++++++++++++++++-- todo.org | 4 +-- 3 files changed, 74 insertions(+), 8 deletions(-) diff --git a/.ai/workflows/work-the-backlog.org b/.ai/workflows/work-the-backlog.org index 418267a..d3e77c0 100644 --- a/.ai/workflows/work-the-backlog.org +++ b/.ai/workflows/work-the-backlog.org @@ -43,6 +43,7 @@ Every task in the set ends in exactly one of: - =deferred-VERIFY= — a defer-checklist hit; a =VERIFY= filed naming what's missing or risky. - =dropped-by-craig= — removed from the run at the speedrun pre-flight Q&A ("skip this"). - =skipped-ineligible= — failed the mechanical eligibility gate. +- =failed= — implementation was attempted and abandoned: the tree is left working (never commit a broken state), the failure is surfaced in the run summary, and the run continues to the next task. The run summary lists each task with its outcome, plus the remaining set when the cap stopped the run. @@ -53,11 +54,11 @@ For the task set, in order, until the run cap is hit: 1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task. 2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist. 3. *Defer checklist* (below). Any hit → defer: file the =VERIFY= naming the gap and record =deferred-VERIFY= (or, under the speedrun preset, route a quick-question gap to the pre-flight Q&A), next task. -4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped. +4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped. If implementation fails partway, leave the tree working, record =failed=, surface it, and continue to the next task. 5. *Commit autonomy branch:* - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=. - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=. -6. *Record metrics* for the task (the JSONL append — lands with the feature's Phase 5). +6. *Record metrics* for the task (the JSONL append — see Metrics below). 7. Decrement the cap. At zero, stop. After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary either way. @@ -151,7 +152,39 @@ notify alarm "Page" ": done, remaining — " * Metrics -Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl= (git-tracked). Logging is a side effect that never alters execution. The record schema lands with the feature's Phase 5. +Each task outcome appends one JSON line to the project's =.ai/metrics/work-the-backlog.jsonl= — git-tracked, append-only, =jq=-queryable. Create the directory and file on the first append. Logging is a side effect only: a failed append surfaces a warning in the run summary but never blocks, reorders, or aborts execution. + +One record per task, written at the moment its outcome is decided: + +| Field | Meaning | +|--------------------+-------------------------------------------------------------------------------------------------| +| =ts= | ISO-8601 timestamp of the task outcome | +|--------------------+-------------------------------------------------------------------------------------------------| +| =run_id= | UUID shared by every record in one run (=uuidgen= at run start) | +|--------------------+-------------------------------------------------------------------------------------------------| +| =project= | project basename | +|--------------------+-------------------------------------------------------------------------------------------------| +| =caller= | =loop= / =speedrun= / =manual= | +|--------------------+-------------------------------------------------------------------------------------------------| +| =task= | the task heading (slug) | +|--------------------+-------------------------------------------------------------------------------------------------| +| =outcome= | =implemented-committed= / =implemented-diff= / =deferred-verify= / =skipped-ineligible= / | +| | =dropped-by-craig= / =failed= | +|--------------------+-------------------------------------------------------------------------------------------------| +| =defer_reason= | =underspecified= / =data-loss= / =already-satisfied= / =needs-deliberation= — set on | +| | =deferred-verify= records only | +|--------------------+-------------------------------------------------------------------------------------------------| +| =upfront_decision= | =true= when a pre-flight answer was recorded and used for this task | +|--------------------+-------------------------------------------------------------------------------------------------| +| =wall_clock_s= | seconds from task start to outcome | +|--------------------+-------------------------------------------------------------------------------------------------| +| =commit_sha= | committed tasks: the commit SHA (comma-separated when the task decomposed into several); empty | +| | otherwise | +|--------------------+-------------------------------------------------------------------------------------------------| +| =review_findings= | count of =/review-code= Critical + Important findings on this task | +|--------------------+-------------------------------------------------------------------------------------------------| + +The =outcome= slugs map one-to-one onto the outcome vocabulary above (=implemented-diff= is =implemented-diff-surfaced=; =deferred-verify= is =deferred-VERIFY=). Per-run rollups (attempted / completed / deferred / dropped, wall-clock total, findings per commit) are computed at synthesis, not stored per record. The =commit_sha= field is what the synthesis step's corrections signal keys on — whether a later commit reverted or hand-fixed an autonomous one — so never omit it on a committed task. * Caller: the inbox auto-loop diff --git a/claude-templates/.ai/workflows/work-the-backlog.org b/claude-templates/.ai/workflows/work-the-backlog.org index 418267a..d3e77c0 100644 --- a/claude-templates/.ai/workflows/work-the-backlog.org +++ b/claude-templates/.ai/workflows/work-the-backlog.org @@ -43,6 +43,7 @@ Every task in the set ends in exactly one of: - =deferred-VERIFY= — a defer-checklist hit; a =VERIFY= filed naming what's missing or risky. - =dropped-by-craig= — removed from the run at the speedrun pre-flight Q&A ("skip this"). - =skipped-ineligible= — failed the mechanical eligibility gate. +- =failed= — implementation was attempted and abandoned: the tree is left working (never commit a broken state), the failure is surfaced in the run summary, and the run continues to the next task. The run summary lists each task with its outcome, plus the remaining set when the cap stopped the run. @@ -53,11 +54,11 @@ For the task set, in order, until the run cap is hit: 1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task. 2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist. 3. *Defer checklist* (below). Any hit → defer: file the =VERIFY= naming the gap and record =deferred-VERIFY= (or, under the speedrun preset, route a quick-question gap to the pre-flight Q&A), next task. -4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped. +4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped. If implementation fails partway, leave the tree working, record =failed=, surface it, and continue to the next task. 5. *Commit autonomy branch:* - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=. - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=. -6. *Record metrics* for the task (the JSONL append — lands with the feature's Phase 5). +6. *Record metrics* for the task (the JSONL append — see Metrics below). 7. Decrement the cap. At zero, stop. After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary either way. @@ -151,7 +152,39 @@ notify alarm "Page" ": done, remaining — " * Metrics -Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl= (git-tracked). Logging is a side effect that never alters execution. The record schema lands with the feature's Phase 5. +Each task outcome appends one JSON line to the project's =.ai/metrics/work-the-backlog.jsonl= — git-tracked, append-only, =jq=-queryable. Create the directory and file on the first append. Logging is a side effect only: a failed append surfaces a warning in the run summary but never blocks, reorders, or aborts execution. + +One record per task, written at the moment its outcome is decided: + +| Field | Meaning | +|--------------------+-------------------------------------------------------------------------------------------------| +| =ts= | ISO-8601 timestamp of the task outcome | +|--------------------+-------------------------------------------------------------------------------------------------| +| =run_id= | UUID shared by every record in one run (=uuidgen= at run start) | +|--------------------+-------------------------------------------------------------------------------------------------| +| =project= | project basename | +|--------------------+-------------------------------------------------------------------------------------------------| +| =caller= | =loop= / =speedrun= / =manual= | +|--------------------+-------------------------------------------------------------------------------------------------| +| =task= | the task heading (slug) | +|--------------------+-------------------------------------------------------------------------------------------------| +| =outcome= | =implemented-committed= / =implemented-diff= / =deferred-verify= / =skipped-ineligible= / | +| | =dropped-by-craig= / =failed= | +|--------------------+-------------------------------------------------------------------------------------------------| +| =defer_reason= | =underspecified= / =data-loss= / =already-satisfied= / =needs-deliberation= — set on | +| | =deferred-verify= records only | +|--------------------+-------------------------------------------------------------------------------------------------| +| =upfront_decision= | =true= when a pre-flight answer was recorded and used for this task | +|--------------------+-------------------------------------------------------------------------------------------------| +| =wall_clock_s= | seconds from task start to outcome | +|--------------------+-------------------------------------------------------------------------------------------------| +| =commit_sha= | committed tasks: the commit SHA (comma-separated when the task decomposed into several); empty | +| | otherwise | +|--------------------+-------------------------------------------------------------------------------------------------| +| =review_findings= | count of =/review-code= Critical + Important findings on this task | +|--------------------+-------------------------------------------------------------------------------------------------| + +The =outcome= slugs map one-to-one onto the outcome vocabulary above (=implemented-diff= is =implemented-diff-surfaced=; =deferred-verify= is =deferred-VERIFY=). Per-run rollups (attempted / completed / deferred / dropped, wall-clock total, findings per commit) are computed at synthesis, not stored per record. The =commit_sha= field is what the synthesis step's corrections signal keys on — whether a later commit reverted or hand-fixed an autonomous one — so never omit it on a committed task. * Caller: the inbox auto-loop diff --git a/todo.org b/todo.org index eadd5d9..5135bc1 100644 --- a/todo.org +++ b/todo.org @@ -471,8 +471,8 @@ Pinned the waiver format per D5: two marker lines in .ai/notes.org Workflow Stat *** 2026-07-02 Thu @ 01:21:47 -0400 Phase 4 landed — checklist mechanics, pre-flight Q&A contract, page The four-item checklist (in since Phase 1) gained its mechanics: a VERIFY-filing subsection (dedup against an existing sibling first — the deferred task stays TODO, so without the check every run re-files; placement/heading/body per todo-format.md) and a quick-question routing subsection (discriminator: one-line factual/preference pick vs tradeoff-weighing; three-plus questions = underspecified = file; item 2 data-loss never routes to Q&A). Preset section gained the batch-ask contract (one message, recommendation-first numbered options per interaction.md, answers recorded as dated lines in the task bodies before the run). Page section finalized (fires once on set-done or cap-hit; notify --persist is the paging surface). Common Mistakes 12-13 added. Checklist only ever reduces what runs; pre-flight fires only under the preset. -*** TODO [#C] Phase 5 — per-task JSONL metrics log :feature:solo: -Append the per-task record to .ai/metrics/work-the-backlog.jsonl at each task outcome. Spec Phase 5. Verify: logging never alters execution. +*** 2026-07-02 Thu @ 01:24:50 -0400 Phase 5 landed — per-task JSONL metrics log +Metrics section written into work-the-backlog.org: one record per task at outcome time, appended to the project's .ai/metrics/work-the-backlog.jsonl (git-tracked, append-only, dir+file created on first append). Full field table per the spec (ts, run_id, project, caller, task, outcome, defer_reason, upfront_decision, wall_clock_s, commit_sha, review_findings), outcome slugs mapped to the prose vocabulary, commit_sha flagged as the corrections-signal key (comma-separated when a task decomposed into several commits). Added the sixth outcome the spec's readiness section demanded but the enum missed: failed (tree left working, surfaced, run continues) — wired into the Outcomes vocabulary and loop step 4. A failed append warns in the run summary but never blocks, reorders, or aborts execution. *** TODO [#C] Phase 6 — synthesis to org-roam :feature: Read the JSONL union, compute per-run + trend metrics, write a KB node under ~/org/roam/agents/ per knowledge-base.md (personal-projects-only classification enforced). Spec Phase 6. Verify: read-only over the logs plus one KB write. -- cgit v1.2.3