diff options
| author | Craig Jennings <c@cjennings.net> | 2026-07-01 22:10:59 -0400 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-07-01 22:10:59 -0400 |
| commit | 2a45f079749ad08f47eb9debc457ed3bc45fae38 (patch) | |
| tree | f1bb1dbb4da677ab8173363cbf916c729cdb65d8 | |
| parent | c15b695b650c276670948cf7944feefa16a96d11 (diff) | |
| download | rulesets-2a45f079749ad08f47eb9debc457ed3bc45fae38.tar.gz rulesets-2a45f079749ad08f47eb9debc457ed3bc45fae38.zip | |
feat(tags): hard :solo:/:quick: definitions + mandatory review/audit assessment
Phase 0 of the autonomous-batch (speedrun) build. todo-format.md now carries fixed cross-project definitions: :solo: is the autonomy/eligibility tag (buildable, agent-verifiable, no design deliberation — at most one or two quick upfront-answerable decisions, which the speedrun pre-flight Q&A batches), and :quick: is a ≤30-minute effort hint that never gates eligibility. task-review and task-audit now treat the tag assessment as mandatory — a pass that skips it is incomplete.
task-review's :solo: gate 3 also moves from "no upfront decision" to the no-deliberation form: the stricter wording predated the pre-flight Q&A decision and would have wrongly excluded tasks with a quick answerable question.
| -rw-r--r-- | .ai/workflows/task-audit.org | 2 | ||||
| -rw-r--r-- | .ai/workflows/task-review.org | 6 | ||||
| -rw-r--r-- | claude-rules/todo-format.md | 31 | ||||
| -rw-r--r-- | claude-templates/.ai/workflows/task-audit.org | 2 | ||||
| -rw-r--r-- | claude-templates/.ai/workflows/task-review.org | 6 | ||||
| -rw-r--r-- | todo.org | 3 |
6 files changed, 44 insertions, 6 deletions
diff --git a/.ai/workflows/task-audit.org b/.ai/workflows/task-audit.org index b6e59dc..8dd2fb2 100644 --- a/.ai/workflows/task-audit.org +++ b/.ai/workflows/task-audit.org @@ -79,7 +79,7 @@ For every STALE task, edit it in the main thread: - *Ensure priority is set per the project scheme.* The top of the project's =todo.org= should carry the priority legend (=[#A]= through =[#D]=). Every task should carry an explicit priority cookie. If a cookie is missing, or no longer matches the reconciled facts, assign the right level per the legend. If the level is unambiguous from the body, do it autonomously; if it's a judgment call (especially the [#A] / [#B] line for important-but-not-urgent work), flag NEEDS-USER. Also enforce the [#A]-discipline rule from the legend — an [#A] task without a =SCHEDULED:= or =DEADLINE:= line is mis-graded and is either down-graded to [#B] (when reconciled facts say "important but not urgent") or surfaced as NEEDS-USER for the user to date. - *Ensure a type tag is set.* Every task carries one type tag from the project's tag legend (typically =:feature:= / =:chore:= / =:spec:= / =:bug:=). If missing or wrong, assign or correct it from the body when the type is unambiguous. If two tags fit (a refactor that also fixes a bug; a spec that's also a chore), flag NEEDS-USER rather than picking one silently. - *Enforce the project's declared tag vocabulary.* If the project's tag legend declares an *exhaustive* set of allowed tags, strip from each task any tag outside that set — the heading and parent section already carry topic/scope context, so ad-hoc tags only fragment the vocabulary and defeat tag-based filtering. Normalize near-duplicate spellings to the canonical tag (a plural to its singular, say). Where the legend does not declare the set closed, leave existing tags alone; this step applies only where the allowed set is exhaustive by design. -- *Re-assess the =:quick:= and =:solo:= tags* — reconciliation can change a task's effort or autonomy: a resolved dependency may make a stuck task =:solo:=, a scope cut may make it =:quick:=, and new complexity surfaced by the sources can invalidate either. Add or remove the tags per the definitions in the project's tag legend (and [[file:task-review.org][task-review.org]]) when the reconciled facts make the call clear. When they don't — an effort estimate you can't pin down, a =:solo:= gate you can't confirm — it's a NEEDS-USER flag, not a guess. +- *Re-assess the =:quick:= and =:solo:= tags (mandatory — an audit that skips this is incomplete).* Reconciliation can change a task's effort or autonomy: a resolved dependency may make a stuck task =:solo:=, a scope cut may make it =:quick:=, and new complexity surfaced by the sources can invalidate either. Add or remove the tags per the hard definitions in [[file:../../claude-rules/todo-format.md][todo-format.md]] ("Hard definitions: :solo: and :quick:"; task-review carries the same three-gate walk). Autonomous execution reads =:solo:= as its eligibility gate and trusts the tag, so a stale one is a run-time hazard, not cosmetic drift. When the call isn't clear — an effort estimate you can't pin down, a =:solo:= gate you can't confirm — it's a NEEDS-USER flag, not a guess. - Bump =:LAST_REVIEWED:= on each edited task. Follow =todo-format.md= for completion mechanics (depth-based DONE vs dated-rewrite) and the working-files / link-hygiene rules when moving artifacts. diff --git a/.ai/workflows/task-review.org b/.ai/workflows/task-review.org index 7b6327b..ba1571a 100644 --- a/.ai/workflows/task-review.org +++ b/.ai/workflows/task-review.org @@ -57,7 +57,9 @@ Keep is the common case — most tasks are still right and just need re-stamping *** Tagging =:quick:= — small tasks -While reviewing each task, estimate its effort. If you judge it *30 minutes or less* and it doesn't already carry =:quick:=, add the tag to the heading line. If the heading and body don't tell you how long it'll take, *ask Craig* — don't guess. A wrong =:quick:= is worse than none: the tag exists so Craig can grab a genuinely small task in a spare moment, and a mislabeled one wastes that moment. +The =:quick:= and =:solo:= assessments (this section and the next) are *mandatory* for every reviewed task except a Kill — a review that skips them is incomplete. The hard definitions live in [[file:../../claude-rules/todo-format.md][todo-format.md]] ("Hard definitions: :solo: and :quick:"); autonomous execution (work-the-backlog / the no-approvals speedrun) reads =:solo:= as its eligibility gate and trusts the author's tag, so the run-time gate is only as trustworthy as this pass. + +While reviewing each task, estimate its effort. If you judge it *30 minutes or less* and it doesn't already carry =:quick:=, add the tag to the heading line. If the heading and body don't tell you how long it'll take, *ask Craig* — don't guess. A wrong =:quick:= is worse than none: the tag exists so Craig can grab a genuinely small task in a spare moment, and a mislabeled one wastes that moment. =:quick:= is an effort hint only, never an eligibility gate — size does not decide what runs autonomously. This is orthogonal to the action chosen — a task can be kept (or re-graded, or marked DOING) *and* tagged =:quick:= in the same pass. Skip the assessment on a Kill, since it's leaving the pool. Tags go on the heading line per [[file:../../claude-rules/todo-format.md][todo-format.md]], sharing one =:tag1:tag2:= cluster. @@ -67,7 +69,7 @@ While reviewing each task, judge whether Claude could build *and* verify it with 1. *Buildable* — Claude has the capability and access to do the work. 2. *Verifiable by Claude* — an objective or local check exists that Claude can run itself. Craig's routine spot-checking does not count against this, and neither does handing off a residual human-in-the-loop confirmation as a structured manual-testing reminder (the =verification.md= "Handing Off Manual Verification" pattern). The disqualifier is having no verification path of Claude's own at all — when the success criterion is only judgeable by Craig's eyes or subjective taste. -3. *No upfront decision* — no design or preference call Craig must make before Claude can begin. +3. *No deliberation* — no open design question and no "weigh these approaches" with real tradeoffs. At most one or two *quick, upfront-answerable* factual decisions are allowed — the speedrun preset batches those into its pre-flight Q&A, so they don't break the hands-off run. A genuine design or preference call disqualifies. If any gate is shaky, leave the tag off. Like =:quick:=, a wrong =:solo:= is worse than none — it tells Craig he can hand the task off and walk away, so a mislabeled one wastes that trust. When the heading and body don't make all three gates clear, ask Craig instead of guessing. diff --git a/claude-rules/todo-format.md b/claude-rules/todo-format.md index 90d801f..5e9ca32 100644 --- a/claude-rules/todo-format.md +++ b/claude-rules/todo-format.md @@ -33,6 +33,37 @@ When a project's `todo.org` lacks the section, add it before filing or grading further tasks — propose the priority semantics and tag set from the project's existing usage, and confirm with Craig. +### Hard definitions: `:solo:` and `:quick:` (fixed across projects) + +A project's scheme may add or rename its other tags, but these two carry +fixed definitions everywhere, because autonomous execution +(work-the-backlog / the no-approvals speedrun) reads `:solo:` as its +eligibility gate and trusts the author's tag rather than re-deriving +autonomy at run time. + +- **`:solo:` — autonomy.** The task can be completed *and verified* without + Craig's involvement beyond at most one or two quick decisions that can be + stated and answered before work starts. No open design question, no + "weigh these approaches," no waiting on Craig mid-task. Three gates, all + must hold: *buildable* (the agent has the capability and access), + *verifiable by the agent* (an objective or local check it can run itself — + handing off a residual human-in-the-loop confirmation as a structured + manual-testing reminder does not disqualify), and *no deliberation* (a + quick, upfront-answerable factual question is allowed — it gets batched + into the speedrun's pre-flight Q&A; a genuine design or preference call + is not). A wrong `:solo:` is worse than none: it tells Craig he can hand + the task off and walk away when he can't. +- **`:quick:` — effort hint only.** Likely 30 minutes or less from start + through verification. Informational, for batching and estimating a run's + duration; never an eligibility gate. `:quick:` and `:solo:` are + orthogonal — a bounded refactor can be `:solo:` but slow; a five-minute + change hinging on a preference call is `:quick:` but not `:solo:`. + +Both tags are applied at task creation and **re-checked as a mandatory +step** in the task-review and task-audit workflows, so the run-time gate +can trust the tag. A review or audit that skips the `:solo:`/`:quick:` +assessment is incomplete. + ### Bug priority from severity × frequency (mandatory where a codebase exists) Some projects carry a codebase — source the project maintains under version diff --git a/claude-templates/.ai/workflows/task-audit.org b/claude-templates/.ai/workflows/task-audit.org index b6e59dc..8dd2fb2 100644 --- a/claude-templates/.ai/workflows/task-audit.org +++ b/claude-templates/.ai/workflows/task-audit.org @@ -79,7 +79,7 @@ For every STALE task, edit it in the main thread: - *Ensure priority is set per the project scheme.* The top of the project's =todo.org= should carry the priority legend (=[#A]= through =[#D]=). Every task should carry an explicit priority cookie. If a cookie is missing, or no longer matches the reconciled facts, assign the right level per the legend. If the level is unambiguous from the body, do it autonomously; if it's a judgment call (especially the [#A] / [#B] line for important-but-not-urgent work), flag NEEDS-USER. Also enforce the [#A]-discipline rule from the legend — an [#A] task without a =SCHEDULED:= or =DEADLINE:= line is mis-graded and is either down-graded to [#B] (when reconciled facts say "important but not urgent") or surfaced as NEEDS-USER for the user to date. - *Ensure a type tag is set.* Every task carries one type tag from the project's tag legend (typically =:feature:= / =:chore:= / =:spec:= / =:bug:=). If missing or wrong, assign or correct it from the body when the type is unambiguous. If two tags fit (a refactor that also fixes a bug; a spec that's also a chore), flag NEEDS-USER rather than picking one silently. - *Enforce the project's declared tag vocabulary.* If the project's tag legend declares an *exhaustive* set of allowed tags, strip from each task any tag outside that set — the heading and parent section already carry topic/scope context, so ad-hoc tags only fragment the vocabulary and defeat tag-based filtering. Normalize near-duplicate spellings to the canonical tag (a plural to its singular, say). Where the legend does not declare the set closed, leave existing tags alone; this step applies only where the allowed set is exhaustive by design. -- *Re-assess the =:quick:= and =:solo:= tags* — reconciliation can change a task's effort or autonomy: a resolved dependency may make a stuck task =:solo:=, a scope cut may make it =:quick:=, and new complexity surfaced by the sources can invalidate either. Add or remove the tags per the definitions in the project's tag legend (and [[file:task-review.org][task-review.org]]) when the reconciled facts make the call clear. When they don't — an effort estimate you can't pin down, a =:solo:= gate you can't confirm — it's a NEEDS-USER flag, not a guess. +- *Re-assess the =:quick:= and =:solo:= tags (mandatory — an audit that skips this is incomplete).* Reconciliation can change a task's effort or autonomy: a resolved dependency may make a stuck task =:solo:=, a scope cut may make it =:quick:=, and new complexity surfaced by the sources can invalidate either. Add or remove the tags per the hard definitions in [[file:../../claude-rules/todo-format.md][todo-format.md]] ("Hard definitions: :solo: and :quick:"; task-review carries the same three-gate walk). Autonomous execution reads =:solo:= as its eligibility gate and trusts the tag, so a stale one is a run-time hazard, not cosmetic drift. When the call isn't clear — an effort estimate you can't pin down, a =:solo:= gate you can't confirm — it's a NEEDS-USER flag, not a guess. - Bump =:LAST_REVIEWED:= on each edited task. Follow =todo-format.md= for completion mechanics (depth-based DONE vs dated-rewrite) and the working-files / link-hygiene rules when moving artifacts. diff --git a/claude-templates/.ai/workflows/task-review.org b/claude-templates/.ai/workflows/task-review.org index 7b6327b..ba1571a 100644 --- a/claude-templates/.ai/workflows/task-review.org +++ b/claude-templates/.ai/workflows/task-review.org @@ -57,7 +57,9 @@ Keep is the common case — most tasks are still right and just need re-stamping *** Tagging =:quick:= — small tasks -While reviewing each task, estimate its effort. If you judge it *30 minutes or less* and it doesn't already carry =:quick:=, add the tag to the heading line. If the heading and body don't tell you how long it'll take, *ask Craig* — don't guess. A wrong =:quick:= is worse than none: the tag exists so Craig can grab a genuinely small task in a spare moment, and a mislabeled one wastes that moment. +The =:quick:= and =:solo:= assessments (this section and the next) are *mandatory* for every reviewed task except a Kill — a review that skips them is incomplete. The hard definitions live in [[file:../../claude-rules/todo-format.md][todo-format.md]] ("Hard definitions: :solo: and :quick:"); autonomous execution (work-the-backlog / the no-approvals speedrun) reads =:solo:= as its eligibility gate and trusts the author's tag, so the run-time gate is only as trustworthy as this pass. + +While reviewing each task, estimate its effort. If you judge it *30 minutes or less* and it doesn't already carry =:quick:=, add the tag to the heading line. If the heading and body don't tell you how long it'll take, *ask Craig* — don't guess. A wrong =:quick:= is worse than none: the tag exists so Craig can grab a genuinely small task in a spare moment, and a mislabeled one wastes that moment. =:quick:= is an effort hint only, never an eligibility gate — size does not decide what runs autonomously. This is orthogonal to the action chosen — a task can be kept (or re-graded, or marked DOING) *and* tagged =:quick:= in the same pass. Skip the assessment on a Kill, since it's leaving the pool. Tags go on the heading line per [[file:../../claude-rules/todo-format.md][todo-format.md]], sharing one =:tag1:tag2:= cluster. @@ -67,7 +69,7 @@ While reviewing each task, judge whether Claude could build *and* verify it with 1. *Buildable* — Claude has the capability and access to do the work. 2. *Verifiable by Claude* — an objective or local check exists that Claude can run itself. Craig's routine spot-checking does not count against this, and neither does handing off a residual human-in-the-loop confirmation as a structured manual-testing reminder (the =verification.md= "Handing Off Manual Verification" pattern). The disqualifier is having no verification path of Claude's own at all — when the success criterion is only judgeable by Craig's eyes or subjective taste. -3. *No upfront decision* — no design or preference call Craig must make before Claude can begin. +3. *No deliberation* — no open design question and no "weigh these approaches" with real tradeoffs. At most one or two *quick, upfront-answerable* factual decisions are allowed — the speedrun preset batches those into its pre-flight Q&A, so they don't break the hands-off run. A genuine design or preference call disqualifies. If any gate is shaky, leave the tag off. Like =:quick:=, a wrong =:solo:= is worse than none — it tells Craig he can hand the task off and walk away, so a mislabeled one wastes that trust. When the heading and body don't make all three gates clear, ask Craig instead of guessing. @@ -393,6 +393,9 @@ Skeptical-review read (open design questions to resolve in the spec, not settled - *Auto-pull vs explicit list* — whether the set comes from an explicit ordered list or a tag/priority query. - *Guardrails* — must refuse to speedrun tasks needing design decisions or carrying data-loss risk without a checkpoint (the sender's biased-safe unused-tile flag is the worked example). +*** 2026-07-01 Wed @ 22:10:35 -0400 Phase 0 landed — hard tag definitions + review/audit enforcement +todo-format.md gained the "Hard definitions: :solo: and :quick:" subsection under the scheme header (fixed across projects: :solo: = buildable + agent-verifiable + no deliberation, with one-or-two upfront-answerable quick decisions allowed per the ratified spec; :quick: = ≤30-min effort hint, never a gate). task-review.org: the two tagging sections are now explicitly mandatory ("a review that skips them is incomplete") and gate 3 was realigned from "no upfront decision" to the spec's no-deliberation form — the stricter old wording predated the pre-flight-Q&A decision and would have wrongly excluded quick-question tasks. task-audit.org: the re-assess bullet is marked mandatory and points at the todo-format hard definitions as canonical. Phases 1-6 (work-the-backlog extraction, callers, commit gate, checklist/Q&A/page, metrics, synthesis) remain. + *** 2026-06-16 Tue @ 00:53:36 -0500 Spec written; design questions answered Craig's "your call" (2026-06-16) answered in [[file:docs/design/2026-06-16-autonomous-batch-execution-spec.org][the autonomous-batch execution spec]], which reconciles this with Phase E into one feature: - *Most effective / workflow-vs-preset:* one dedicated =work-the-backlog.org= workflow holds the execution loop; "fix speedrun" is a thin named preset (no-approvals + always-push + end page) feeding it an explicit list, and the inbox-zero loop feeds it a tag query. Pros of the shared workflow: one execution loop to audit, inbox-zero's three callers stay clean, both input shapes reuse one guardrail set. Cons: one more workflow file and a caller-to-workflow indirection. The con list is shorter and lighter than the duplication cost of two separate features, which is why the shared workflow wins. The pros carry the more important entries (single audit surface, clean seam). |
