feat(backlog): add the per-task JSONL metrics record

One record per task at outcome time, appended to the project's .ai/metrics/work-the-backlog.jsonl. The field table follows the spec, with commit_sha called out as the corrections-signal key and comma-separated when a task decomposes into several commits. A failed append warns in the run summary but never blocks or aborts the run. I added the "failed" outcome the spec's error-handling section required but its enum missed: a mid-implementation failure leaves the tree working, gets surfaced, and the run continues.
author: Craig Jennings <c@cjennings.net> 2026-07-02 01:26:47 -0400
committer: Craig Jennings <c@cjennings.net> 2026-07-02 01:26:47 -0400
commit: eea93f152460b9624b3b863fa2b7a4901b391eb0 (patch)
tree: 0cc0265661d97e56f173a72ac5a71f1958c62078 /.ai/workflows/work-the-backlog.org
parent: 04561b2ac0829597ccca2a5f0e4f0319eb5c7cef (diff)
download: rulesets-eea93f152460b9624b3b863fa2b7a4901b391eb0.tar.gz
rulesets-eea93f152460b9624b3b863fa2b7a4901b391eb0.zip
1 files changed, 36 insertions, 3 deletions
diff --git a/.ai/workflows/work-the-backlog.org b/.ai/workflows/work-the-backlog.org
index 418267a..d3e77c0 100644
--- a/.ai/workflows/work-the-backlog.org
+++ b/.ai/workflows/work-the-backlog.org
@@ -43,6 +43,7 @@ Every task in the set ends in exactly one of:
 - =deferred-VERIFY= — a defer-checklist hit; a =VERIFY= filed naming what's missing or risky.
 - =dropped-by-craig= — removed from the run at the speedrun pre-flight Q&A ("skip this").
 - =skipped-ineligible= — failed the mechanical eligibility gate.
+- =failed= — implementation was attempted and abandoned: the tree is left working (never commit a broken state), the failure is surfaced in the run summary, and the run continues to the next task.
 
 The run summary lists each task with its outcome, plus the remaining set when the cap stopped the run.
 
@@ -53,11 +54,11 @@ For the task set, in order, until the run cap is hit:
 1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task.
 2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist.
 3. *Defer checklist* (below). Any hit → defer: file the =VERIFY= naming the gap and record =deferred-VERIFY= (or, under the speedrun preset, route a quick-question gap to the pre-flight Q&A), next task.
-4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped.
+4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped. If implementation fails partway, leave the tree working, record =failed=, surface it, and continue to the next task.
 5. *Commit autonomy branch:*
    - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=.
    - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=.
-6. *Record metrics* for the task (the JSONL append — lands with the feature's Phase 5).
+6. *Record metrics* for the task (the JSONL append — see Metrics below).
 7. Decrement the cap. At zero, stop.
 
 After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary either way.
@@ -151,7 +152,39 @@ notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>"
 
 * Metrics
 
-Each task outcome appends one JSONL record to =.ai/metrics/work-the-backlog.jsonl= (git-tracked). Logging is a side effect that never alters execution. The record schema lands with the feature's Phase 5.
+Each task outcome appends one JSON line to the project's =.ai/metrics/work-the-backlog.jsonl= — git-tracked, append-only, =jq=-queryable. Create the directory and file on the first append. Logging is a side effect only: a failed append surfaces a warning in the run summary but never blocks, reorders, or aborts execution.
+
+One record per task, written at the moment its outcome is decided:
+
+| Field              | Meaning                                                                                         |
+|--------------------+-------------------------------------------------------------------------------------------------|
+| =ts=               | ISO-8601 timestamp of the task outcome                                                          |
+|--------------------+-------------------------------------------------------------------------------------------------|
+| =run_id=           | UUID shared by every record in one run (=uuidgen= at run start)                                 |
+|--------------------+-------------------------------------------------------------------------------------------------|
+| =project=          | project basename                                                                                |
+|--------------------+-------------------------------------------------------------------------------------------------|
+| =caller=           | =loop= / =speedrun= / =manual=                                                                  |
+|--------------------+-------------------------------------------------------------------------------------------------|
+| =task=             | the task heading (slug)                                                                         |
+|--------------------+-------------------------------------------------------------------------------------------------|
+| =outcome=          | =implemented-committed= / =implemented-diff= / =deferred-verify= / =skipped-ineligible= /       |
+|                    | =dropped-by-craig= / =failed=                                                                   |
+|--------------------+-------------------------------------------------------------------------------------------------|
+| =defer_reason=     | =underspecified= / =data-loss= / =already-satisfied= / =needs-deliberation= — set on            |
+|                    | =deferred-verify= records only                                                                  |
+|--------------------+-------------------------------------------------------------------------------------------------|
+| =upfront_decision= | =true= when a pre-flight answer was recorded and used for this task                             |
+|--------------------+-------------------------------------------------------------------------------------------------|
+| =wall_clock_s=     | seconds from task start to outcome                                                              |
+|--------------------+-------------------------------------------------------------------------------------------------|
+| =commit_sha=       | committed tasks: the commit SHA (comma-separated when the task decomposed into several); empty  |
+|                    | otherwise                                                                                       |
+|--------------------+-------------------------------------------------------------------------------------------------|
+| =review_findings=  | count of =/review-code= Critical + Important findings on this task                              |
+|--------------------+-------------------------------------------------------------------------------------------------|
+
+The =outcome= slugs map one-to-one onto the outcome vocabulary above (=implemented-diff= is =implemented-diff-surfaced=; =deferred-verify= is =deferred-VERIFY=). Per-run rollups (attempted / completed / deferred / dropped, wall-clock total, findings per commit) are computed at synthesis, not stored per record. The =commit_sha= field is what the synthesis step's corrections signal keys on — whether a later commit reverted or hand-fixed an autonomous one — so never omit it on a committed task.
 
 * Caller: the inbox auto-loop
author	Craig Jennings <c@cjennings.net>	2026-07-02 01:26:47 -0400
committer	Craig Jennings <c@cjennings.net>	2026-07-02 01:26:47 -0400
commit	eea93f152460b9624b3b863fa2b7a4901b391eb0 (patch)
tree	0cc0265661d97e56f173a72ac5a71f1958c62078 /.ai/workflows/work-the-backlog.org
parent	04561b2ac0829597ccca2a5f0e4f0319eb5c7cef (diff)
download	rulesets-eea93f152460b9624b3b863fa2b7a4901b391eb0.tar.gz rulesets-eea93f152460b9624b3b863fa2b7a4901b391eb0.zip