docs(task-audit): add tag-vocabulary enforcement and verify-then-close

Two Phase C behaviors, both surfaced auditing an Emacs-config todo.org. Enforce a project's declared closed tag set (strip tags outside it) where the legend marks the set exhaustive, leaving open-vocabulary projects untouched. For a task whose code shipped but awaits a manual or visual check, file that check under the project's manual-testing parent (dedup first) and close the implementation task, rather than letting "done but unverified" linger half-open.
author: Craig Jennings <c@cjennings.net> 2026-06-12 07:30:57 -0500
committer: Craig Jennings <c@cjennings.net> 2026-06-12 07:30:57 -0500
commit: fbeef0c83b15eb7a82cdb8c2b999e232e4f997b6 (patch)
tree: 29d3d299a41229478ae07f04b0ca05de1329f4d4
parent: 8e18033ba47e9b143ce141898cde909080a299ec (diff)
download: rulesets-fbeef0c83b15eb7a82cdb8c2b999e232e4f997b6.tar.gz
rulesets-fbeef0c83b15eb7a82cdb8c2b999e232e4f997b6.zip
2 files changed, 28 insertions, 0 deletions
diff --git a/.ai/workflows/task-audit.org b/.ai/workflows/task-audit.org
index 2b0ac29..67ce496 100644
--- a/.ai/workflows/task-audit.org
+++ b/.ai/workflows/task-audit.org
@@ -73,8 +73,12 @@ For every STALE task, edit it in the main thread:
 - Mark statuses that moved; rewrite "waiting on X" lines whose X resolved.
 - Fix dead/renamed =file:= links. When you fix or verify a link that has a matching dead-link entry in the project's lint-followups file, reap that entry in the same edit so the two artifacts don't drift. Scope this strictly to dead-link entries. Do not pull general lint cleanup into the audit, which mixes two concerns and slows it.
 - *Consolidate duplicates* — when several tasks track the same thing, fold them into one home and delete the duplicates (per the user's call on which is canonical).
+- *Close tasks the evidence shows are done.* When sessions, commits, tests, or tickets clearly show the work landed, close the task per =todo-format.md= (depth-based: top-level → =DONE= + =CLOSED:=; sub-task → dated event-log rewrite). Split the done case:
+  - *Done and verified, or needing no human check* — close it directly.
+  - *Code-complete but awaiting a human verification* (a visual/aesthetic sign-off, a live-device or live-remote check, an interactive walk) — do not leave it lingering half-open, and do not silently mark the feature verified. Instead: (1) file the pending check as a child under the project's manual-testing parent (e.g. =Manual testing and validation= in =todo.org=), using the structured shape from =verification.md= (descriptive title, "What we're verifying", numbered steps, "Expected") — but search that parent first and skip filing when an equivalent check already exists; then (2) close the original implementation task. The work is done; the human check lives on as a tracked manual-test that promotes to a bug if it fails. This keeps "done but unverified" from piling up as half-open tasks while still honoring "never claim a fix verified before the user confirms."
 - *Ensure priority is set per the project scheme.* The top of the project's =todo.org= should carry the priority legend (=[#A]= through =[#D]=). Every task should carry an explicit priority cookie. If a cookie is missing, or no longer matches the reconciled facts, assign the right level per the legend. If the level is unambiguous from the body, do it autonomously; if it's a judgment call (especially the [#A] / [#B] line for important-but-not-urgent work), flag NEEDS-USER. Also enforce the [#A]-discipline rule from the legend — an [#A] task without a =SCHEDULED:= or =DEADLINE:= line is mis-graded and is either down-graded to [#B] (when reconciled facts say "important but not urgent") or surfaced as NEEDS-USER for the user to date.
 - *Ensure a type tag is set.* Every task carries one type tag from the project's tag legend (typically =:feature:= / =:chore:= / =:spec:= / =:bug:=). If missing or wrong, assign or correct it from the body when the type is unambiguous. If two tags fit (a refactor that also fixes a bug; a spec that's also a chore), flag NEEDS-USER rather than picking one silently.
+- *Enforce the project's declared tag vocabulary.* If the project's tag legend declares an *exhaustive* set of allowed tags, strip from each task any tag outside that set — the heading and parent section already carry topic/scope context, so ad-hoc tags only fragment the vocabulary and defeat tag-based filtering. Normalize near-duplicate spellings to the canonical tag (a plural to its singular, say). Where the legend does not declare the set closed, leave existing tags alone; this step applies only where the allowed set is exhaustive by design.
 - *Re-assess the =:quick:= and =:solo:= tags* — reconciliation can change a task's effort or autonomy: a resolved dependency may make a stuck task =:solo:=, a scope cut may make it =:quick:=, and new complexity surfaced by the sources can invalidate either. Add or remove the tags per the definitions in the project's tag legend (and [[file:task-review.org][task-review.org]]) when the reconciled facts make the call clear. When they don't — an effort estimate you can't pin down, a =:solo:= gate you can't confirm — it's a NEEDS-USER flag, not a guess.
 - Bump =:LAST_REVIEWED:= on each edited task.
 
@@ -114,7 +118,17 @@ Skip the chain only when the user scoped the run to "audit only," or when =task-
 4. *Concurrent sub-agent writes to =todo.org=.* Investigation fans out; edits stay serial in the main thread.
 5. *Reconciling against memory instead of the live sources.* Email/chat/ticketing/recordings move independently of the task text — check them.
 6. *Doing relevance grooming here.* Keep/kill/priority is =task-review='s job; this workflow is about factual accuracy.
+7. *Leaving a code-complete task open because it awaits manual verification.* File the human check under the manual-testing parent (dedup first) and close the implementation task; "done but unverified" should not linger as a half-open task.
 
 * Validation
 
 Created and validated 2026-05-22 against the DeepSat work =todo.org=: reconciled the open-work tasks one at a time, applied factual updates (a contract task's gating, a hiring task whose interview outcome was recovered by transcribing an un-processed meeting recording, a partnership task's stale comms-hold + consolidation of two duplicate trackers, plus several "waiting on X resolved" updates), and surfaced the judgment-call subset (shelve-vs-push, awaiting-team-feedback, who-initiated-this) for the user to adjudicate.
+
+* Revision Notes
+
+** 2026-06-12 — Tag-vocabulary enforcement + code-complete-but-unverified closing
+
+Two Phase C behaviors added, both surfaced by an Emacs-config =todo.org= audit:
+
+- *Tag-vocabulary enforcement.* That project declares a closed tag set (=bug=, =feature=, =refactor=, =test=, =quick=, =solo=); the audit had to strip ~44 ad-hoc tags that had accumulated across the file. The prior workflow only checked that a type tag was *present* — it had no concept of an exhaustive allowed set. The new bullet enforces a declared closed vocabulary and leaves open-vocabulary projects untouched.
+- *Code-complete-but-unverified closing.* Many tasks had shipped (tests green, live in the daemon) but stayed open awaiting a manual or visual verification, so they accumulated as half-open. Leaving them open is noise; auto-closing them would violate "never claim a fix verified before the user confirms." The fix routes the pending human check into the project's =Manual testing and validation= parent (dedup-checked) per =verification.md='s manual-verification hand-off, then closes the implementation task. The work is done and the check is tracked; a failed check promotes to a bug.
diff --git a/claude-templates/.ai/workflows/task-audit.org b/claude-templates/.ai/workflows/task-audit.org
index 2b0ac29..67ce496 100644
--- a/claude-templates/.ai/workflows/task-audit.org
+++ b/claude-templates/.ai/workflows/task-audit.org
@@ -73,8 +73,12 @@ For every STALE task, edit it in the main thread:
 - Mark statuses that moved; rewrite "waiting on X" lines whose X resolved.
 - Fix dead/renamed =file:= links. When you fix or verify a link that has a matching dead-link entry in the project's lint-followups file, reap that entry in the same edit so the two artifacts don't drift. Scope this strictly to dead-link entries. Do not pull general lint cleanup into the audit, which mixes two concerns and slows it.
 - *Consolidate duplicates* — when several tasks track the same thing, fold them into one home and delete the duplicates (per the user's call on which is canonical).
+- *Close tasks the evidence shows are done.* When sessions, commits, tests, or tickets clearly show the work landed, close the task per =todo-format.md= (depth-based: top-level → =DONE= + =CLOSED:=; sub-task → dated event-log rewrite). Split the done case:
+  - *Done and verified, or needing no human check* — close it directly.
+  - *Code-complete but awaiting a human verification* (a visual/aesthetic sign-off, a live-device or live-remote check, an interactive walk) — do not leave it lingering half-open, and do not silently mark the feature verified. Instead: (1) file the pending check as a child under the project's manual-testing parent (e.g. =Manual testing and validation= in =todo.org=), using the structured shape from =verification.md= (descriptive title, "What we're verifying", numbered steps, "Expected") — but search that parent first and skip filing when an equivalent check already exists; then (2) close the original implementation task. The work is done; the human check lives on as a tracked manual-test that promotes to a bug if it fails. This keeps "done but unverified" from piling up as half-open tasks while still honoring "never claim a fix verified before the user confirms."
 - *Ensure priority is set per the project scheme.* The top of the project's =todo.org= should carry the priority legend (=[#A]= through =[#D]=). Every task should carry an explicit priority cookie. If a cookie is missing, or no longer matches the reconciled facts, assign the right level per the legend. If the level is unambiguous from the body, do it autonomously; if it's a judgment call (especially the [#A] / [#B] line for important-but-not-urgent work), flag NEEDS-USER. Also enforce the [#A]-discipline rule from the legend — an [#A] task without a =SCHEDULED:= or =DEADLINE:= line is mis-graded and is either down-graded to [#B] (when reconciled facts say "important but not urgent") or surfaced as NEEDS-USER for the user to date.
 - *Ensure a type tag is set.* Every task carries one type tag from the project's tag legend (typically =:feature:= / =:chore:= / =:spec:= / =:bug:=). If missing or wrong, assign or correct it from the body when the type is unambiguous. If two tags fit (a refactor that also fixes a bug; a spec that's also a chore), flag NEEDS-USER rather than picking one silently.
+- *Enforce the project's declared tag vocabulary.* If the project's tag legend declares an *exhaustive* set of allowed tags, strip from each task any tag outside that set — the heading and parent section already carry topic/scope context, so ad-hoc tags only fragment the vocabulary and defeat tag-based filtering. Normalize near-duplicate spellings to the canonical tag (a plural to its singular, say). Where the legend does not declare the set closed, leave existing tags alone; this step applies only where the allowed set is exhaustive by design.
 - *Re-assess the =:quick:= and =:solo:= tags* — reconciliation can change a task's effort or autonomy: a resolved dependency may make a stuck task =:solo:=, a scope cut may make it =:quick:=, and new complexity surfaced by the sources can invalidate either. Add or remove the tags per the definitions in the project's tag legend (and [[file:task-review.org][task-review.org]]) when the reconciled facts make the call clear. When they don't — an effort estimate you can't pin down, a =:solo:= gate you can't confirm — it's a NEEDS-USER flag, not a guess.
 - Bump =:LAST_REVIEWED:= on each edited task.
 
@@ -114,7 +118,17 @@ Skip the chain only when the user scoped the run to "audit only," or when =task-
 4. *Concurrent sub-agent writes to =todo.org=.* Investigation fans out; edits stay serial in the main thread.
 5. *Reconciling against memory instead of the live sources.* Email/chat/ticketing/recordings move independently of the task text — check them.
 6. *Doing relevance grooming here.* Keep/kill/priority is =task-review='s job; this workflow is about factual accuracy.
+7. *Leaving a code-complete task open because it awaits manual verification.* File the human check under the manual-testing parent (dedup first) and close the implementation task; "done but unverified" should not linger as a half-open task.
 
 * Validation
 
 Created and validated 2026-05-22 against the DeepSat work =todo.org=: reconciled the open-work tasks one at a time, applied factual updates (a contract task's gating, a hiring task whose interview outcome was recovered by transcribing an un-processed meeting recording, a partnership task's stale comms-hold + consolidation of two duplicate trackers, plus several "waiting on X resolved" updates), and surfaced the judgment-call subset (shelve-vs-push, awaiting-team-feedback, who-initiated-this) for the user to adjudicate.
+
+* Revision Notes
+
+** 2026-06-12 — Tag-vocabulary enforcement + code-complete-but-unverified closing
+
+Two Phase C behaviors added, both surfaced by an Emacs-config =todo.org= audit:
+
+- *Tag-vocabulary enforcement.* That project declares a closed tag set (=bug=, =feature=, =refactor=, =test=, =quick=, =solo=); the audit had to strip ~44 ad-hoc tags that had accumulated across the file. The prior workflow only checked that a type tag was *present* — it had no concept of an exhaustive allowed set. The new bullet enforces a declared closed vocabulary and leaves open-vocabulary projects untouched.
+- *Code-complete-but-unverified closing.* Many tasks had shipped (tests green, live in the daemon) but stayed open awaiting a manual or visual verification, so they accumulated as half-open. Leaving them open is noise; auto-closing them would violate "never claim a fix verified before the user confirms." The fix routes the pending human check into the project's =Manual testing and validation= parent (dedup-checked) per =verification.md='s manual-verification hand-off, then closes the implementation task. The work is done and the check is tracked; a failed check promotes to a bug.
author	Craig Jennings <c@cjennings.net>	2026-06-12 07:30:57 -0500
committer	Craig Jennings <c@cjennings.net>	2026-06-12 07:30:57 -0500
commit	fbeef0c83b15eb7a82cdb8c2b999e232e4f997b6 (patch)
tree	29d3d299a41229478ae07f04b0ca05de1329f4d4
parent	8e18033ba47e9b143ce141898cde909080a299ec (diff)
download	rulesets-fbeef0c83b15eb7a82cdb8c2b999e232e4f997b6.tar.gz rulesets-fbeef0c83b15eb7a82cdb8c2b999e232e4f997b6.zip