| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
verification.md required running tests/lint/typecheck/build before claiming done, but said nothing about what to do when a command can't run. A new "When You Cannot Verify" section requires a four-part report (command attempted, why it couldn't run, risk left unverified, and the smallest next command for the user) and states the principle that a check that didn't run is never reported as a pass.
|
| | |
|
| |
|
|
| |
Two audit fixes. elisp.md's "prefer Write over Edits" advice was tool-specific. It's now framed around intent: edit cohesively, then run paren and byte-compile checks immediately, whatever the editing mechanism. elisp-testing.md gains batch-mode reproducibility (emacs --batch as source of truth, no interactive state, no blocking prompts), state isolation (temp user-emacs-directory, explicit load-path, declared deps only), and byte-compile/native-comp warning handling, with native-comp gated on availability and kept opt-in.
|
| |
|
|
| |
Two audit fixes. The "prefer in-memory SQLite" advice is risky when prod is Postgres or MySQL — SQLite diverges on query semantics, constraints, transactions, JSON, time zones, and indexes, so a test can pass on SQLite and fail in prod. ORM/query tests now use a production-like DB, with SQLite reserved for pure unit tests. The "never mock the ORM" rule is also clarified: don't mock ORM internals, but a thin orchestration unit can inject a fake at a deliberate data-access port it owns.
|
| | |
|
| |
|
|
| |
Two audit fixes. Framework-agnostic findings (Claude reading import graphs) now carry a confidence level (High/Medium/Low) and how it was determined, with a required "not fully checked because" note when scale or dynamic imports cap certainty, so a partial read isn't presented as exhaustive. Unconfigured language tools are no longer skipped silently: each detected language whose tool didn't run gets an Info finding, so the audit shows what was and wasn't verified.
|
| |
|
|
| |
Two audit fixes. Section 10's thin quality-scenario template becomes the arc42/Q42 six-part form (source, stimulus, environment, artifact, response, response measure), making each scenario testable. Generated docs now carry staleness and ownership metadata: a per-section header (owner, generated-against commit and date, review cadence, stale-when conditions) and a whole-document Doc Status table, so a reader can tell whether a section still matches the code.
|
| |
|
|
| |
Two audit fixes. A new Phase 4 (Trust, Data, and Compliance) surfaces trust boundaries, data classification, abuse cases, privacy, compliance evidence, and ownership before the paradigm shortlist, so the architecture is drawn around them rather than retrofitted by a later security-check. Phase 5 now splits the choice in two: pick one paradigm (monolith, microservices, event-driven, and so on), then compose tactical patterns onto it (DDD, hexagonal, CQRS, event sourcing), with composition examples and an anti-pattern against treating a pattern as an alternative to a paradigm.
|
| |
|
|
| |
Two audit fixes. Sample ADRs asserted technical claims as timeless fact (MongoDB transactions). The example is now dated and sourced, and a "cite, don't assert" rule requires a link, version, or checked-date for any concrete technical claim. The status vocabulary was mixed (Accepted, Decided, Superseded, Not Accepted). It's now a canonical five (Proposed, Accepted, Rejected, Deprecated, Superseded) with an explicit immutability rule: an accepted ADR's body is frozen, and a changed decision gets a new superseding ADR while the old one stays as the record.
|
| | |
|
| |
|
|
| |
Two audit fixes to c4-analyze and c4-diagram. Both treated draw.io as the only output. They now offer draw.io, Structurizr DSL, Mermaid (native C4 types), and PlantUML, with a headless fallback that emits a text notation instead of failing when drawio or a GUI is absent. Both also gained an abstraction-boundary section (a Container is a deployable unit, not a Docker container, and a Component isn't separately deployable) and a check that every element and relationship stays at the diagram's single C4 level.
|
| |
|
|
| |
Two audit fixes. A "Tool availability" section degrades gracefully when Linear MCP, gh, /voice, or Playwright are missing: do what the available tools allow, surface what couldn't run, don't fail the phase. A "Ceremony scale" section sizes the process to the work, so a trivial fix skips the ticket, branch, and gates unless asked. Separately, the claim now splits by tracker: personal todo claims wait until after the Justify gate (no rollback needed if killed), while Linear and GitHub claims happen first to signal intent but record prior state so a killed task restores cleanly.
|
| |
|
|
| |
Two audit fixes. Base-branch detection returned a merge-base SHA where a branch name was needed. Phase 2 now resolves the branch name (open PR base, then origin/HEAD, then ask) and computes the merge-base SHA separately. Option 1's merge gained pre-flight safety: a dirty-tree refusal with no auto-stash, protected-branch awareness, an upstream-gated ff-only pull, and merge-commit-vs-rebase as a team-policy choice. Worktree detection moved from grepping branch names to a git-dir vs git-common-dir comparison.
|
| | |
|
| |
|
|
| |
codify now runs two mandatory checks before writing a CLAUDE.md entry: a stale-entry scan (update or remove a no-longer-true entry in place rather than appending a contradiction around it) and a privacy check asking "safe if the project were public?" and "belongs in private memory instead?", routing private content to auto-memory. These are gates, not background guidance.
|
| |
|
|
| |
Two audit fixes. The Meincke citation had the wrong title and was used to imply persuasion framing improves prompt quality. It now reads as the safety caution it is: applying the principles raised an LLM's compliance with objectionable requests from ~33% to ~72%, a reason for care, not a recipe. The correct title ("Call Me A Jerk...") and SSRN id are fixed in all three spots. Critique mode also gains an eval-harness step: for fragile or production prompts, run 3-5 adversarial examples against the old and new prompt and record the delta, so quality is verified rather than asserted.
|
| |
|
|
| |
Three audit fixes. Renamed Metrics to Measures throughout to match Salesforce's term (the "vanity metrics" idiom stays, since that's the anti-pattern name). Phase 8 task migration no longer transplants the task tree into the V2MOM — tasks stay in the backlog grouped by method, and each method links to where they live, keeping strategy and execution as separate sources of truth. Obstacles now carry a mitigation, owner, and review cadence, so the section is operational rather than just candid.
|
| | |
|
| |
|
|
| |
Two audit fixes. A t-way escalation section says to start pairwise, then raise specific high-risk clusters to 3-way or higher when history, safety, security, or coupling warrants it, with the sub-model order syntax shown. A second note explains PICT's ~ marker (a negative or invalid-value tag, not an operator) and adds an honest stop-at-the-model rule: if no PICT generator is installed, produce the model and stop rather than hand-writing rows and calling them generated.
|
| |
|
|
| |
Two audit fixes to the OWASP review. It now maps each finding to an OWASP Top 10 2021 category or a WSTG area, adding the four that were missing (Insecure Design, Software and Data Integrity Failures, Security Logging and Monitoring Failures, SSRF) with explicit checks for object and function-level authorization, SSRF URL fetches, update and dependency integrity, and logging gaps. A new optional-scanners step adds gitleaks/trufflehog, semgrep, OSV, and lockfile-diff review, with a network caveat: a scan that can't run reports "not run", never a silent pass.
|
| |
|
|
| |
Two audit fixes. Accessibility moves from an optional reference for interactive components to a required WCAG 2.2 gate before handoff, covering keyboard, focus visibility and not-obscured, target size, contrast, reduced motion, labels, and semantics, for all frontend work. references/accessibility.md gained the backing detail and moved from 2.1 to 2.2. A new "creative but bounded" section keeps the maximalist directions as tools within guardrails: domain fit, readability, responsive stability, and no decoration that degrades the workflow.
|
| | |
|
| |
|
|
|
|
| |
The Normal/Boundary/Error rule forced all three categories on every function, including pure adapters, generated code, and framework glue where a category would only re-test the framework. Step 7 now lets you skip a category in those cases, as long as you state and justify the skip in the plan and cover the behavior at integration or E2E level — a stated decision, never a silent omission. Step 12 points back to it.
The companion audit item about a missing typescript-testing.md reference turned out moot: that ruleset now exists and add-tests already references it correctly.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
Three audit-pass fixes across the debugging skills.
debug now captures environment and recent-change context (versions, flags, dataset, seed/clock, concurrency, recent commits) as a Phase-1 step. Many intermittent bugs live in state or environment, not a local code path, and "what changed recently" is often the fastest route to the cause.
root-cause-trace's defense-in-depth said to add a check at every layer that could have caught the bad value, which breeds validation spam. It now adds checks only at boundary-owning layers (ingress, persistence, the invariant owner, final render), and says a pass-through function that owns neither a boundary nor an invariant shouldn't get a duplicate null check.
five-whys now makes each link carry an evidence field and a counterfactual: if you remove this cause, does the symptom above still happen? That's the guard against a tidy chain that reads well but wouldn't have prevented the failure.
|
| | |
|
| |
|
|
| |
playwright-js defaulted headed, playwright-py headless, with no explanation, so an agent could flip modes by habit. I added a matching purpose-based decision table to each (headed for interactive debugging, headless for CI/pytest), and made each skill name its own default and point at the other. I also softened the absolutist "always headless" comment in the py example.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
Two audit-pass fixes to respond-to-review, plus a stale-reference tidy in respond-to-cj-comments.
The Gather step fetched a flat comment list, which misses thread resolution and re-processes feedback that's already settled. It now pulls unresolved review threads via gh api graphql (skipping resolved ones), keeps REST only for top-level conversation comments, and resolves a thread only after the fix is verified.
The commit guidance suggested "fix: Address review — [description]", which puts the review process into git log against commits.md and used a non-ASCII dash. It now names the actual fix and leaves the how-it-surfaced detail in the PR thread.
respond-to-cj-comments lost a stale "humanizer" mention, since that skill is voice now. Two adjacent audit items came out moot: it no longer embeds an absolute path, and its humanizer/emacsclient fallback was overtaken by the /voice migration and the in-place VERIFY pattern.
|
| | |
|
| |
|
|
|
|
|
|
| |
Two clarifications to review-code where it appeared to contradict other rules.
The "trust CI, don't run builds" rule read as a blanket license to skip verification. I scoped it to reviewing a diff, not shipping one. A pre-commit or pre-push flow still owes the local verification verification.md requires. Reading a PR doesn't duplicate CI. Producing one doesn't get to skip it.
The CLAUDE.md-adherence audit could put a CLAUDE.md citation into a team-visible PR comment, which commits.md says not to do. I added two modes. A private review cites CLAUDE.md directly. A public review translates the rule into the engineering reason and doesn't name the file, since a teammate can act on the reason but not on a file they can't reach.
|
| |
|
|
|
|
| |
The .ai/-tracking check used to decide two things at once: which voice patterns ran and whether the approval gate fired. In a team repo that meant losing the 8 personal patterns and the gate together. I split them. Publish artifacts (commit messages, PR titles and bodies, PR review comments) always run /voice personal now, because they go out under my name regardless of the repo. The .ai/ check decides only the gate: applied in my personal repos, skipped for velocity in shared ones.
The gap this closes: a team-repo PR comment used to skip pattern #39, the public-artifact scope flag, which is exactly the check that matters most when teammates can read it.
|
| | |
|
| |
|
|
|
|
| |
Two related changes to review-code's strengths guidance. The mandatory "three minimum" could force filler on a tiny diff or padded praise on a weak PR, so I relaxed it to up to three specific strengths, with an honest "nothing notable" allowed when the diff doesn't earn them. I also reframed the old "No Strengths section" anti-pattern as "skipping strengths out of laziness": a substantive diff still demands them, a weak one doesn't.
The other change tells reviewers to name the good thing and stop, without explaining why it's good. Explaining praise reads as sycophantic since the author already knows the rationale. Elaboration is for findings, not compliments.
|
| |
|
|
|
|
| |
Step 3.5's Linear ticket-state sweep uses gh to find the merged PR for a Dev-Review ticket, which assumes a GitHub-family host. That holds today because DeepSat, the only Linear-using project, lives on GHE where gh talks to the API.
I added a note flagging the assumption rather than rewriting the step to be provider-agnostic. A future Linear project on GitLab, Gitea, or Bitbucket would need a different PR lookup, but none exists yet, so documenting the boundary beats building for a host we don't have.
|
| |
|
|
|
|
|
|
|
|
| |
Startup synced the .ai/ templates into the current project every session but never checked the language bundle (elisp, python) installed in .claude/. Bundle drift went unnoticed until someone re-ran make install-lang by hand: a generic rule added to claude-rules/ after the last install, or a changed validator hook.
scripts/sync-language-bundle.sh closes that gap. It fingerprints which bundle a project has by the presence of the language's own rule files (elisp.md, python-testing.md), then reconciles against the canonical source: auto-fix for rulesets-owned files (.claude/rules/*.md, .claude/hooks/*, githooks/*), surface-only for settings.json, which a project may have customized. CLAUDE.md is left alone. It's seed-only in install-lang and project-owned afterward, the same reason diff-lang skips it.
Startup Phase A step 12 calls it for the current project, guarded so older checkouts that lack the script still boot. It writes only under .claude/ and githooks/, disjoint from the .ai/ rsync paths, so the parallel batch stays safe. A script rather than a make target keeps the Makefile-parse layer off the boot path. The absolute rulesets path it depends on is the same one the rsyncs already carry.
Tested: 11 bats cases (no-bundle, clean, drifted rule/hook auto-fixed, surfaced settings.json asserted unmodified, absent CLAUDE.md not flagged, python detection, $PWD default, bad path). A smoke run against a copy of a real elisp project's .claude/ caught a perpetual "CLAUDE.md missing" alarm, which is what drove dropping CLAUDE.md from the surface set.
|
| |
|
|
| |
The lint sweep re-flagged the line-2070 misplaced-heading false positive (a ** inside verbatim markers in a DONE body), and the staleness check counted 12 top-level tasks unreviewed past 30 days.
|
| |
|
|
|
|
| |
I added a per-task effort check to the task-review walk. If a task looks like 30 minutes or less and isn't already tagged, mark it :quick: on the heading. When the heading and body don't make the effort clear, ask instead of guessing. A mislabeled :quick: defeats the point, since the tag exists so Craig can grab a genuinely small task in a spare moment.
I edited the canonical claude-templates copy and the synced project mirror together, so the next startup rsync won't revert them.
|
| |
|
|
| |
Archive the DONE task-review implementation and the cancelled OV-1 skill from Open Work to Resolved. The follow-ups file picks up one lint judgment and the review-habit staleness line for the next daily-prep.
|
| |
|
|
| |
The habit is built and smoke-tested — the staleness script with count and --list modes, the wrap-up health check, the task-review workflow, and the startup nudge all shipped, and the first review cycle ran clean against the live list. The elisp component was dropped under Shape B. The daily habit carries on through the startup nudge and the wrap-up watchdog.
|
| |
|
|
| |
This is the first task-review cycle. I re-graded create-documentation, the 2026-05-04 audit review pass, and /update-skills from [#A] to [#C]; bumped the wrap-it-up GitHub-remote chore to [#A] and tagged it :quick:; and cancelled the OV-1 DoDAF skill. The kept and re-graded tasks get a :LAST_REVIEWED: stamp so the staleness watchdog and the rotation know they've been looked at.
|
| |
|
|
|
|
| |
The spec recommended an Emacs keystroke mode (task-review.el). Implementation went the other way — a pure Claude workflow, no elisp — because the interactive mode would couple a rulesets-owned file to archsetup's init.el, and the daily Claude touchpoint already exists in daily-prep. I added a Revision section at the top recording the change: pure workflow, rulesets-owned, the task-review.org / open-tasks.org name swap, the staleness --list selection, and the startup nudge promoted to template-level. The elisp architecture and ERT sections stay as a record of the abandoned approach, flagged superseded.
The todo task moves to DOING with per-component status: everything but the smoke test is done, and component 3 (the elisp) is dropped.
|
| |
|
|
|
|
|
|
|
|
| |
The new task-review.org workflow is the daily habit that retires the old date-coverage scan. It surfaces the oldest-unreviewed top-level tasks, walks them one at a time, and records each outcome — keep, re-grade, kill, mark DOING, or edit — stamping :LAST_REVIEWED: as it goes. It's a pure Claude workflow, no elisp. open-tasks.org displays the list; this one changes it.
task-review-staleness.sh gains a --list mode that emits the N oldest-unreviewed tasks (line, review date, heading), oldest first, so the workflow walks a deterministic batch instead of eyeballing todo.org. Never-reviewed and unparseable-date tasks sort oldest. Seven new bats cases cover ordering, the count limit, exclusions, and output format; count mode is unchanged.
startup.org gains the matching nudge. Phase A counts tasks unreviewed for >7 days and Phase C surfaces one line when that count is non-zero, pointing at the workflow. It lives in the template startup.org rather than the project-only startup-extras layer, so every project picks it up the same way it picks up the wrap-up health check.
The INDEX entry is added with the "task review" triggers the rename freed up.
|
| |
|
|
|
|
|
|
| |
The list-and-pick-next workflow was named task-review.org, but "task review" better describes a list-hygiene habit that re-grades and prunes tasks, not one that just displays them. I'm freeing the task-review.org name (and the "task review" trigger) for that habit, which lands next.
This workflow goes back to open-tasks.org — the name it carried before it merged with whats-next.org. Its content and INDEX entry drop the "task review" trigger and point at task-review.org for the hygiene habit. Behavior is unchanged; only the name and the routing phrases move.
The rename touches both the canonical workflow and the project mirror.
|
| |
|
|
|
|
|
|
| |
The date-coverage scan flagged every open [#A]/[#B] task with no DEADLINE or SCHEDULED, on the assumption that high-priority work needs a date. That assumption is wrong (research and watch-list tasks are legitimately dateless), so the scan generated dismiss-or-paper-over cleanup at every wrap-up.
The replacement watches the daily task-review habit instead. It calls task-review-staleness.sh todo.org 30 and, when the count is non-zero, writes one summary line to the follow-ups file: N top-level tasks unreviewed for >30 days. There's no per-task dump, because the per-task walk is the review habit's job. Staleness, not datelessness, is the signal worth surfacing.
The change lands in both the canonical workflow and the project mirror.
|
| | |
|
| |
|
|
|
|
|
|
| |
First component of the daily task-review habit from docs/design/task-review.org. The staleness count is the shared primitive both the wrap-up health check (threshold 30) and the startup reminder (threshold 7) call, so it lives in one tested script rather than being reimplemented in each workflow.
The script counts top-level todo.org tasks whose review has gone stale: depth-2 headings with a TODO/DOING/VERIFY keyword and an [#A]/[#B]/[#C] cookie, where LAST_REVIEWED is missing, unparseable, or older than the threshold. Age uses a strict greater-than, so a task reviewed exactly N days ago is still fresh. Today normalizes to local midnight before the diff, and the day count rounds to the nearest day, so a DST hour can't push a boundary task across the line.
Twelve bats cases cover the normal, boundary, and error categories. Dates are generated relative to the current date rather than hardcoded. The script path resolves as the sibling-of-parent of the test file, so the suite runs identically from the canonical claude-templates tree and the rsync'd project mirror. Makefile test target now globs .ai/scripts/tests for bats alongside scripts/tests.
|
| | |
|
| |
|
|
| |
The startup rsync propagated the Working-Files Convention section from canonical claude-templates/.ai/protocols.org into the in-repo mirror. Mechanical catch-up, no content authored here.
|
| |
|
|
|
|
| |
review-code was a command with disable-model-invocation set, so the model could never reach for it on its own. Every time a review fit the moment, the agent had to hand back to me to type the slash command. Moving it to a skill makes it model-invocable while it stays slash-invocable as /review-code.
git mv keeps the file history (99% rename). The frontmatter drops disable-model-invocation and gains name: review-code; the body is unchanged. It also drops the discovery-check paragraph in commits.md, which only existed to find the command on disk when it was missing from the skills list, moot now that the skill shows up there.
|