| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| |
|
|
| |
codify now runs two mandatory checks before writing a CLAUDE.md entry: a stale-entry scan (update or remove a no-longer-true entry in place rather than appending a contradiction around it) and a privacy check asking "safe if the project were public?" and "belongs in private memory instead?", routing private content to auto-memory. These are gates, not background guidance.
|
| |
|
|
| |
Two audit fixes. The Meincke citation had the wrong title and was used to imply persuasion framing improves prompt quality. It now reads as the safety caution it is: applying the principles raised an LLM's compliance with objectionable requests from ~33% to ~72%, a reason for care, not a recipe. The correct title ("Call Me A Jerk...") and SSRN id are fixed in all three spots. Critique mode also gains an eval-harness step: for fragile or production prompts, run 3-5 adversarial examples against the old and new prompt and record the delta, so quality is verified rather than asserted.
|
| |
|
|
| |
Three audit fixes. Renamed Metrics to Measures throughout to match Salesforce's term (the "vanity metrics" idiom stays, since that's the anti-pattern name). Phase 8 task migration no longer transplants the task tree into the V2MOM — tasks stay in the backlog grouped by method, and each method links to where they live, keeping strategy and execution as separate sources of truth. Obstacles now carry a mitigation, owner, and review cadence, so the section is operational rather than just candid.
|
| | |
|
| |
|
|
| |
Two audit fixes. A t-way escalation section says to start pairwise, then raise specific high-risk clusters to 3-way or higher when history, safety, security, or coupling warrants it, with the sub-model order syntax shown. A second note explains PICT's ~ marker (a negative or invalid-value tag, not an operator) and adds an honest stop-at-the-model rule: if no PICT generator is installed, produce the model and stop rather than hand-writing rows and calling them generated.
|
| |
|
|
| |
Two audit fixes to the OWASP review. It now maps each finding to an OWASP Top 10 2021 category or a WSTG area, adding the four that were missing (Insecure Design, Software and Data Integrity Failures, Security Logging and Monitoring Failures, SSRF) with explicit checks for object and function-level authorization, SSRF URL fetches, update and dependency integrity, and logging gaps. A new optional-scanners step adds gitleaks/trufflehog, semgrep, OSV, and lockfile-diff review, with a network caveat: a scan that can't run reports "not run", never a silent pass.
|
| |
|
|
| |
Two audit fixes. Accessibility moves from an optional reference for interactive components to a required WCAG 2.2 gate before handoff, covering keyboard, focus visibility and not-obscured, target size, contrast, reduced motion, labels, and semantics, for all frontend work. references/accessibility.md gained the backing detail and moved from 2.1 to 2.2. A new "creative but bounded" section keeps the maximalist directions as tools within guardrails: domain fit, readability, responsive stability, and no decoration that degrades the workflow.
|
| | |
|
| |
|
|
|
|
| |
The Normal/Boundary/Error rule forced all three categories on every function, including pure adapters, generated code, and framework glue where a category would only re-test the framework. Step 7 now lets you skip a category in those cases, as long as you state and justify the skip in the plan and cover the behavior at integration or E2E level — a stated decision, never a silent omission. Step 12 points back to it.
The companion audit item about a missing typescript-testing.md reference turned out moot: that ruleset now exists and add-tests already references it correctly.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
Three audit-pass fixes across the debugging skills.
debug now captures environment and recent-change context (versions, flags, dataset, seed/clock, concurrency, recent commits) as a Phase-1 step. Many intermittent bugs live in state or environment, not a local code path, and "what changed recently" is often the fastest route to the cause.
root-cause-trace's defense-in-depth said to add a check at every layer that could have caught the bad value, which breeds validation spam. It now adds checks only at boundary-owning layers (ingress, persistence, the invariant owner, final render), and says a pass-through function that owns neither a boundary nor an invariant shouldn't get a duplicate null check.
five-whys now makes each link carry an evidence field and a counterfactual: if you remove this cause, does the symptom above still happen? That's the guard against a tidy chain that reads well but wouldn't have prevented the failure.
|
| | |
|
| |
|
|
| |
playwright-js defaulted headed, playwright-py headless, with no explanation, so an agent could flip modes by habit. I added a matching purpose-based decision table to each (headed for interactive debugging, headless for CI/pytest), and made each skill name its own default and point at the other. I also softened the absolutist "always headless" comment in the py example.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
Two audit-pass fixes to respond-to-review, plus a stale-reference tidy in respond-to-cj-comments.
The Gather step fetched a flat comment list, which misses thread resolution and re-processes feedback that's already settled. It now pulls unresolved review threads via gh api graphql (skipping resolved ones), keeps REST only for top-level conversation comments, and resolves a thread only after the fix is verified.
The commit guidance suggested "fix: Address review — [description]", which puts the review process into git log against commits.md and used a non-ASCII dash. It now names the actual fix and leaves the how-it-surfaced detail in the PR thread.
respond-to-cj-comments lost a stale "humanizer" mention, since that skill is voice now. Two adjacent audit items came out moot: it no longer embeds an absolute path, and its humanizer/emacsclient fallback was overtaken by the /voice migration and the in-place VERIFY pattern.
|
| | |
|
| |
|
|
|
|
|
|
| |
Two clarifications to review-code where it appeared to contradict other rules.
The "trust CI, don't run builds" rule read as a blanket license to skip verification. I scoped it to reviewing a diff, not shipping one. A pre-commit or pre-push flow still owes the local verification verification.md requires. Reading a PR doesn't duplicate CI. Producing one doesn't get to skip it.
The CLAUDE.md-adherence audit could put a CLAUDE.md citation into a team-visible PR comment, which commits.md says not to do. I added two modes. A private review cites CLAUDE.md directly. A public review translates the rule into the engineering reason and doesn't name the file, since a teammate can act on the reason but not on a file they can't reach.
|
| |
|
|
|
|
| |
The .ai/-tracking check used to decide two things at once: which voice patterns ran and whether the approval gate fired. In a team repo that meant losing the 8 personal patterns and the gate together. I split them. Publish artifacts (commit messages, PR titles and bodies, PR review comments) always run /voice personal now, because they go out under my name regardless of the repo. The .ai/ check decides only the gate: applied in my personal repos, skipped for velocity in shared ones.
The gap this closes: a team-repo PR comment used to skip pattern #39, the public-artifact scope flag, which is exactly the check that matters most when teammates can read it.
|
| | |
|
| |
|
|
|
|
| |
Two related changes to review-code's strengths guidance. The mandatory "three minimum" could force filler on a tiny diff or padded praise on a weak PR, so I relaxed it to up to three specific strengths, with an honest "nothing notable" allowed when the diff doesn't earn them. I also reframed the old "No Strengths section" anti-pattern as "skipping strengths out of laziness": a substantive diff still demands them, a weak one doesn't.
The other change tells reviewers to name the good thing and stop, without explaining why it's good. Explaining praise reads as sycophantic since the author already knows the rationale. Elaboration is for findings, not compliments.
|
| |
|
|
|
|
| |
Step 3.5's Linear ticket-state sweep uses gh to find the merged PR for a Dev-Review ticket, which assumes a GitHub-family host. That holds today because DeepSat, the only Linear-using project, lives on GHE where gh talks to the API.
I added a note flagging the assumption rather than rewriting the step to be provider-agnostic. A future Linear project on GitLab, Gitea, or Bitbucket would need a different PR lookup, but none exists yet, so documenting the boundary beats building for a host we don't have.
|
| |
|
|
|
|
|
|
|
|
| |
Startup synced the .ai/ templates into the current project every session but never checked the language bundle (elisp, python) installed in .claude/. Bundle drift went unnoticed until someone re-ran make install-lang by hand: a generic rule added to claude-rules/ after the last install, or a changed validator hook.
scripts/sync-language-bundle.sh closes that gap. It fingerprints which bundle a project has by the presence of the language's own rule files (elisp.md, python-testing.md), then reconciles against the canonical source: auto-fix for rulesets-owned files (.claude/rules/*.md, .claude/hooks/*, githooks/*), surface-only for settings.json, which a project may have customized. CLAUDE.md is left alone. It's seed-only in install-lang and project-owned afterward, the same reason diff-lang skips it.
Startup Phase A step 12 calls it for the current project, guarded so older checkouts that lack the script still boot. It writes only under .claude/ and githooks/, disjoint from the .ai/ rsync paths, so the parallel batch stays safe. A script rather than a make target keeps the Makefile-parse layer off the boot path. The absolute rulesets path it depends on is the same one the rsyncs already carry.
Tested: 11 bats cases (no-bundle, clean, drifted rule/hook auto-fixed, surfaced settings.json asserted unmodified, absent CLAUDE.md not flagged, python detection, $PWD default, bad path). A smoke run against a copy of a real elisp project's .claude/ caught a perpetual "CLAUDE.md missing" alarm, which is what drove dropping CLAUDE.md from the surface set.
|
| |
|
|
| |
The lint sweep re-flagged the line-2070 misplaced-heading false positive (a ** inside verbatim markers in a DONE body), and the staleness check counted 12 top-level tasks unreviewed past 30 days.
|
| |
|
|
|
|
| |
I added a per-task effort check to the task-review walk. If a task looks like 30 minutes or less and isn't already tagged, mark it :quick: on the heading. When the heading and body don't make the effort clear, ask instead of guessing. A mislabeled :quick: defeats the point, since the tag exists so Craig can grab a genuinely small task in a spare moment.
I edited the canonical claude-templates copy and the synced project mirror together, so the next startup rsync won't revert them.
|
| |
|
|
| |
Archive the DONE task-review implementation and the cancelled OV-1 skill from Open Work to Resolved. The follow-ups file picks up one lint judgment and the review-habit staleness line for the next daily-prep.
|
| |
|
|
| |
The habit is built and smoke-tested — the staleness script with count and --list modes, the wrap-up health check, the task-review workflow, and the startup nudge all shipped, and the first review cycle ran clean against the live list. The elisp component was dropped under Shape B. The daily habit carries on through the startup nudge and the wrap-up watchdog.
|
| |
|
|
| |
This is the first task-review cycle. I re-graded create-documentation, the 2026-05-04 audit review pass, and /update-skills from [#A] to [#C]; bumped the wrap-it-up GitHub-remote chore to [#A] and tagged it :quick:; and cancelled the OV-1 DoDAF skill. The kept and re-graded tasks get a :LAST_REVIEWED: stamp so the staleness watchdog and the rotation know they've been looked at.
|
| |
|
|
|
|
| |
The spec recommended an Emacs keystroke mode (task-review.el). Implementation went the other way — a pure Claude workflow, no elisp — because the interactive mode would couple a rulesets-owned file to archsetup's init.el, and the daily Claude touchpoint already exists in daily-prep. I added a Revision section at the top recording the change: pure workflow, rulesets-owned, the task-review.org / open-tasks.org name swap, the staleness --list selection, and the startup nudge promoted to template-level. The elisp architecture and ERT sections stay as a record of the abandoned approach, flagged superseded.
The todo task moves to DOING with per-component status: everything but the smoke test is done, and component 3 (the elisp) is dropped.
|
| |
|
|
|
|
|
|
|
|
| |
The new task-review.org workflow is the daily habit that retires the old date-coverage scan. It surfaces the oldest-unreviewed top-level tasks, walks them one at a time, and records each outcome — keep, re-grade, kill, mark DOING, or edit — stamping :LAST_REVIEWED: as it goes. It's a pure Claude workflow, no elisp. open-tasks.org displays the list; this one changes it.
task-review-staleness.sh gains a --list mode that emits the N oldest-unreviewed tasks (line, review date, heading), oldest first, so the workflow walks a deterministic batch instead of eyeballing todo.org. Never-reviewed and unparseable-date tasks sort oldest. Seven new bats cases cover ordering, the count limit, exclusions, and output format; count mode is unchanged.
startup.org gains the matching nudge. Phase A counts tasks unreviewed for >7 days and Phase C surfaces one line when that count is non-zero, pointing at the workflow. It lives in the template startup.org rather than the project-only startup-extras layer, so every project picks it up the same way it picks up the wrap-up health check.
The INDEX entry is added with the "task review" triggers the rename freed up.
|
| |
|
|
|
|
|
|
| |
The list-and-pick-next workflow was named task-review.org, but "task review" better describes a list-hygiene habit that re-grades and prunes tasks, not one that just displays them. I'm freeing the task-review.org name (and the "task review" trigger) for that habit, which lands next.
This workflow goes back to open-tasks.org — the name it carried before it merged with whats-next.org. Its content and INDEX entry drop the "task review" trigger and point at task-review.org for the hygiene habit. Behavior is unchanged; only the name and the routing phrases move.
The rename touches both the canonical workflow and the project mirror.
|
| |
|
|
|
|
|
|
| |
The date-coverage scan flagged every open [#A]/[#B] task with no DEADLINE or SCHEDULED, on the assumption that high-priority work needs a date. That assumption is wrong (research and watch-list tasks are legitimately dateless), so the scan generated dismiss-or-paper-over cleanup at every wrap-up.
The replacement watches the daily task-review habit instead. It calls task-review-staleness.sh todo.org 30 and, when the count is non-zero, writes one summary line to the follow-ups file: N top-level tasks unreviewed for >30 days. There's no per-task dump, because the per-task walk is the review habit's job. Staleness, not datelessness, is the signal worth surfacing.
The change lands in both the canonical workflow and the project mirror.
|
| | |
|
| |
|
|
|
|
|
|
| |
First component of the daily task-review habit from docs/design/task-review.org. The staleness count is the shared primitive both the wrap-up health check (threshold 30) and the startup reminder (threshold 7) call, so it lives in one tested script rather than being reimplemented in each workflow.
The script counts top-level todo.org tasks whose review has gone stale: depth-2 headings with a TODO/DOING/VERIFY keyword and an [#A]/[#B]/[#C] cookie, where LAST_REVIEWED is missing, unparseable, or older than the threshold. Age uses a strict greater-than, so a task reviewed exactly N days ago is still fresh. Today normalizes to local midnight before the diff, and the day count rounds to the nearest day, so a DST hour can't push a boundary task across the line.
Twelve bats cases cover the normal, boundary, and error categories. Dates are generated relative to the current date rather than hardcoded. The script path resolves as the sibling-of-parent of the test file, so the suite runs identically from the canonical claude-templates tree and the rsync'd project mirror. Makefile test target now globs .ai/scripts/tests for bats alongside scripts/tests.
|
| | |
|
| |
|
|
| |
The startup rsync propagated the Working-Files Convention section from canonical claude-templates/.ai/protocols.org into the in-repo mirror. Mechanical catch-up, no content authored here.
|
| |
|
|
|
|
| |
review-code was a command with disable-model-invocation set, so the model could never reach for it on its own. Every time a review fit the moment, the agent had to hand back to me to type the slash command. Moving it to a skill makes it model-invocable while it stays slash-invocable as /review-code.
git mv keeps the file history (99% rename). The frontmatter drops disable-model-invocation and gains name: review-code; the body is unchanged. It also drops the discovery-check paragraph in commits.md, which only existed to find the command on disk when it was missing from the skills list, moot now that the skill shows up there.
|
| |
|
|
| |
Records this session's process-rule additions (Discovery check in commits.md Step 1; mechanical primary trigger for session-context writes), ai launcher polish (per-project opening line + explicit end-of-session window placement), and the new triggers.md for cross-project launch phrases. Lint-followups carries the recurring misplaced-heading judgment at line 2143 (false positive: `**` inside `=...=` verbatim, leave alone) plus a date-coverage list, both deferred per the task-review spec.
|
| |
|
|
|
|
| |
The bug was real on 2026-05-15 but is moot in current state. inbox-send.py's discovery scans ~/code/* and ~/projects/* single-level, so claude-templates/ (two levels under ~/code/) is never routable as a target. The 2026-05-15 incident was a one-time manual workaround because rulesets/inbox/ didn't exist yet; 470085f added that root inbox and the same session removed claude-templates/inbox/. Nothing routes to the subtree inbox now, and the subtree inbox doesn't exist.
Phase A scanning only ./inbox/ is correct given current state. No code change needed; this is just a depth-based completion flip per the todo-format convention.
|
| |
|
|
|
|
|
|
| |
Adds claude-rules/triggers.md as the home for phrases the user says from any cwd to invoke a cross-project action. First entry: "launch project X" → run the ai script in single-project mode targeting the matched basename. Ambiguity handling: list candidates and ask rather than guess.
The trigger phrases already in protocols.org ("Let's run the [X] workflow", "Wrap it up") are project-scoped. They assume an active .ai/ session. Cross-project launchers don't fit that layer; they need to work from any cwd, including outside any project. None of the existing claude-rules files (commits.md, testing.md, verification.md, subagents.md, interaction.md, cross-project.md, todo-format.md) had a clean fit either. A new file is the smallest architectural change.
Makefile picks up new claude-rules/*.md files via wildcard, so no Makefile change needed. make install created the symlink at ~/.claude/rules/triggers.md; make doctor reports 39/0/0.
|
| |
|
|
|
|
| |
tmux's default new-window placement chooses the lowest free index, which can land between existing windows when the session's window indexes have gaps. The -a flag plus the :{end} target makes it explicit: insert after the existing last window, every time.
The change only touches create_window. The two tmux new-session sites (single_mode and multi_mode first window) create fresh sessions; the first window's position is whatever base-index dictates, and there's nothing to append after.
|
| |
|
|
|
|
|
|
|
|
| |
The ai launcher used to send claude a fixed instruction string ('Read .ai/protocols.org and follow all instructions.'). Every window opened the same way, with no signal in the conversation about which machine or project it belonged to. That's fine when there's one window, less fine when an ai-session has 5+ windows across two machines.
The new build_instructions helper formats the opener per project: "This is <host> <name> project. Follow all instructions in .ai/protocols.org." Host comes from uname -n (POSIX, no dependency on the hostname binary which isn't installed by default on Arch). The project name distinguishes windows; the .ai/ prefix on the path stays so claude resolves the file reliably on first read.
The three send-keys call sites all use the helper now (create_window, single_mode new-session, multi_mode first-window). Smoke-tested with rulesets, .emacs.d, jr-estate basenames.
Also fixes the stale Source/Install header comments. They still pointed at ~/projects/claude-templates/bin/ai, the pre-fold path the subtree merge replaced.
|
| |
|
|
|
|
| |
I added claude-rules/working-files.md as the canonical convention. In-progress task artifacts live in working/<task-slug>/ under the project root. On task completion the files get renamed individually with a YYYY-MM-DD-<slug>-<descriptor> shape and moved flat into assets/ (or the area-specific assets/). Never rename the directory as a substitute for filing, since that loses the flat-searchable property of assets/.
claude-templates/.ai/protocols.org gained a short reference to the rule so it propagates to every project's .ai/protocols.org on the next startup rsync.
|
| |
|
|
|
|
|
|
| |
The "If this session crashed right now..." heuristic was the trigger. It pushes the decision onto the agent every turn, and the agent's bias is to defer when no obvious milestone just happened. Four recent sessions in the work project showed the same drift. The pattern: substantive work, no mid-session writes, a wrap-time reconstruction afterward.
The new primary trigger is mechanical. A turn that called any state-modifying tool (Edit, Write, Agent dispatch, MCP write, or Bash that mutates state) writes to the Session Log before the closing user-facing message. Pure-read turns (Read, Glob, Grep, read-only Bash) don't trigger. The existing high-loss bullets stay as elaboration. They aren't the trigger.
The 5-turn safety net remains. The judgment heuristic is gone. The primary trigger replaces it.
|
| | |
|
| |
|
|
|
|
|
|
| |
Step 1 told the agent to run /review-code but didn't say what to do when the skill exists on disk yet isn't in the session's available-skills list. The list covers plugin-installed skills only. User commands under ~/.claude/commands/ are routable as slash-commands but don't appear in it, so the agent could declare /review-code unavailable and fall through to the trivial-one-liner exception in Step 2.
The new Discovery check tells the agent to verify both ~/.claude/commands/review-code.md and ./.claude/commands/review-code.md on disk before declaring the skill unavailable, and surface the mismatch rather than auto-skipping.
Also drops three absorbed or stale inbox files: the skill-discovery handoff (signal absorbed by this edit), the missing-inbox-dir handoff (already resolved by 470085f), and a stale date-coverage scan output (deferred until the task-review habit lands).
|
| | |
|
| | |
|
| |
|
|
|
|
| |
The `[ ] && echo` shortcut propagates the test's exit status out of the
while loop, which can muddy the pipeline's overall exit. The `if` form
keeps the loop body's status decoupled from the filter check.
|
| |
|
|
| |
Records today's task-review spec + google-keep MCP install work. Lint-followups carries one misplaced-heading judgment (line 2153, false positive — `**` inside `=...=` verbatim) and a 9-entry date-coverage list, both deferred per the task-review spec's plan to retire the date-coverage scan once implemented.
|