aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* chore(todo): close playwright networkidle + emoji audit itemsCraig Jennings2026-05-221-13/+6
|
* refactor(skills): locator-first playwright guidance, drop emoji markersCraig Jennings2026-05-228-76/+119
| | | | | | | | Two cleanups to the playwright skills, landed together since they overlap the same files. The skills taught networkidle as the readiness check and leaned on raw page.click/fill/waitForSelector. Playwright discourages networkidle for readiness, so the guidance in both SKILL.md files now waits for a visible app landmark via a web assertion or locator, the login and form examples use getByLabel/getByRole plus expect, the API reference leads with that pattern, and lib/helpers.js defaults waitForPageReady to load (preferring a caller-supplied landmark) and races the success indicator in authenticate instead of waiting on networkidle. The second cleanup strips emoji console markers across run.js, helpers.js, both SKILL.md files, and the py examples, replacing each with a plain ASCII tag like [ok], [error], or [scan]. node --check and py_compile pass, and an emoji grep comes back clean.
* chore(todo): close hooks code audit itemsCraig Jennings2026-05-221-13/+6
|
* feat(hooks): scan file-backed messages and harden rm parsingCraig Jennings2026-05-2210-13/+550
| | | | | | | | | | Two audit gaps in the confirmation hooks, plus the test harness they were missing. The git-commit and gh-pr-create hooks scanned for AI attribution but only saw inline messages. A commit made with -F/--file or a PR made with --body-file slipped through, since the hook stored a placeholder instead of the file's text, and the publish flow uses -F constantly. A new read_referenced_file helper in _common.py reads the referenced local file (missing, oversized, or non-UTF-8 returns None, which means "couldn't inspect" and never "clean"), so attribution scanning now sees the real committed and posted text. An unreadable file falls through to the existing ask-anyway path. destructive-bash-confirm.py parsed rm flags by splitting on whitespace, which mangled quoted paths and missed flag variants. detect_rm_rf now tokenizes with shlex, so quoted or spaced paths and combined, separate, or reordered flags all parse. It fails toward asking (a sentinel that still fires the modal) on unbalanced quotes, or when a forced recursive rm sits alongside a pipeline, compound command, substitution, or redirect, since target attribution isn't trustworthy there. The supported and unsupported shell constructs are documented in the docstrings. These hooks had no tests and weren't in make test. Added a pytest harness under hooks/tests (an importlib-by-path loader, since the hook filenames are hyphenated) with 54 tests across the three hooks and the shared helper, and wired hooks/tests into make test. Full suite green.
* chore(todo): close hooks README audit itemCraig Jennings2026-05-221-5/+3
|
* docs(hooks): sync README install snippets with the opt-in destructive hookCraig Jennings2026-05-221-1/+9
| | | | The README's manual-install and settings-JSON snippets omitted destructive-bash-confirm.py, while the canonical hooks/settings-snippet.json already wires all three. Brought the README in line: added the opt-in symlink step, added the settings entry, and reworded the note so all three read as no-op-safe with the destructive gate flagged as opt-in (make install-hooks excludes it by default).
* chore(todo): close verification/testing/subagents audit itemsCraig Jennings2026-05-221-21/+12
|
* docs(rules): add pre-dispatch availability and cost checksCraig Jennings2026-05-221-0/+27
| | | | subagents.md assumed the Agent tool exists. A new "Pre-Dispatch Checks" section adds two gates before any spawn: Availability (no Agent capability means do the work in the main thread under the same scope and constraints the contract would enforce) and Cost (when writing the full contract costs more than the task, do it inline). Both cross-reference the existing "Don't Subagent At All" guidance rather than duplicating it.
* docs(rules): add escalation testing and a spike protocol to testing.mdCraig Jennings2026-05-221-1/+61
| | | | Two additions. An "Escalation Beyond Category and Pairwise" section adds property-based testing (for invariants across a broad input domain) and mutation testing (for when high coverage hides thin assertions), both as escalation paths rather than always-on gates. And the "I need to spike first" excuse is formalized into a disciplined spike protocol: TDD stays the default, but a spike is sanctioned only when timeboxed, not committed, and followed by the first failing test before productionizing.
* docs(rules): add an unable-to-verify reporting standardCraig Jennings2026-05-221-0/+13
| | | | verification.md required running tests/lint/typecheck/build before claiming done, but said nothing about what to do when a command can't run. A new "When You Cannot Verify" section requires a four-part report (command attempted, why it couldn't run, risk left unverified, and the smallest next command for the user) and states the principle that a check that didn't run is never reported as a pass.
* chore(todo): close language-ruleset audit itemsCraig Jennings2026-05-221-24/+12
|
* docs(languages): tighten elisp coding and testing rulesCraig Jennings2026-05-222-1/+45
| | | | Two audit fixes. elisp.md's "prefer Write over Edits" advice was tool-specific. It's now framed around intent: edit cohesively, then run paren and byte-compile checks immediately, whatever the editing mechanism. elisp-testing.md gains batch-mode reproducibility (emacs --batch as source of truth, no interactive state, no blocking prompts), state isolation (temp user-emacs-directory, explicit load-path, declared deps only), and byte-compile/native-comp warning handling, with native-comp gated on availability and kept opt-in.
* docs(languages): revise python-testing SQLite and ORM-mocking guidanceCraig Jennings2026-05-221-2/+13
| | | | Two audit fixes. The "prefer in-memory SQLite" advice is risky when prod is Postgres or MySQL — SQLite diverges on query semantics, constraints, transactions, JSON, time zones, and indexes, so a test can pass on SQLite and fail in prod. ORM/query tests now use a production-like DB, with SQLite reserved for pure unit tests. The "never mock the ORM" rule is also clarified: don't mock ORM internals, but a thin orchestration unit can inject a fake at a deliberate data-access port it owns.
* chore(todo): close architecture audit itemsCraig Jennings2026-05-221-45/+24
|
* docs(commands): make arch-evaluate findings honest about certaintyCraig Jennings2026-05-221-3/+30
| | | | Two audit fixes. Framework-agnostic findings (Claude reading import graphs) now carry a confidence level (High/Medium/Low) and how it was determined, with a required "not fully checked because" note when scale or dynamic imports cap certainty, so a partial read isn't presented as exhaustive. Unconfigured language tools are no longer skipped silently: each detected language whose tool didn't run gets an Info finding, so the audit shows what was and wasn't verified.
* docs(commands): add Q42 scenarios and staleness metadata to arch-documentCraig Jennings2026-05-221-5/+52
| | | | Two audit fixes. Section 10's thin quality-scenario template becomes the arc42/Q42 six-part form (source, stimulus, environment, artifact, response, response measure), making each scenario testable. Generated docs now carry staleness and ownership metadata: a per-section header (owner, generated-against commit and date, review cadence, stale-when conditions) and a whole-document Doc Status table, so a reader can tell whether a section still matches the code.
* docs(commands): strengthen arch-design security inputs and paradigm modelCraig Jennings2026-05-221-16/+78
| | | | Two audit fixes. A new Phase 4 (Trust, Data, and Compliance) surfaces trust boundaries, data classification, abuse cases, privacy, compliance evidence, and ownership before the paradigm shortlist, so the architecture is drawn around them rather than retrofitted by a later security-check. Phase 5 now splits the choice in two: pick one paradigm (monolith, microservices, event-driven, and so on), then compose tactical patterns onto it (DDD, hexagonal, CQRS, event sourcing), with composition examples and an anti-pattern against treating a pattern as an alternative to a paradigm.
* docs(commands): make arch-decide examples timeless, standardize ADR statusesCraig Jennings2026-05-221-4/+9
| | | | Two audit fixes. Sample ADRs asserted technical claims as timeless fact (MongoDB transactions). The example is now dated and sourced, and a "cite, don't assert" rule requires a link, version, or checked-date for any concrete technical claim. The status vocabulary was mixed (Accepted, Decided, Superseded, Not Accepted). It's now a canonical five (Proposed, Accepted, Rejected, Deprecated, Superseded) with an explicit immutability rule: an accepted ADR's body is frozen, and a changed decision gets a new superseding ADR while the old one stays as the record.
* chore(todo): close branch-workflow and c4 audit itemsCraig Jennings2026-05-221-36/+18
|
* docs(commands): make c4 skills notation-independent and level-consistentCraig Jennings2026-05-222-10/+70
| | | | Two audit fixes to c4-analyze and c4-diagram. Both treated draw.io as the only output. They now offer draw.io, Structurizr DSL, Mermaid (native C4 types), and PlantUML, with a headless fallback that emits a text notation instead of failing when drawio or a GUI is absent. Both also gained an abstraction-boundary section (a Container is a deployable unit, not a Docker container, and a Component isn't separately deployable) and a check that every element and relationship stays at the diagram's single C4 level.
* docs(commands): add tool-availability and ceremony scale to start-workCraig Jennings2026-05-221-5/+41
| | | | Two audit fixes. A "Tool availability" section degrades gracefully when Linear MCP, gh, /voice, or Playwright are missing: do what the available tools allow, surface what couldn't run, don't fail the phase. A "Ceremony scale" section sizes the process to the work, so a trivial fix skips the ticket, branch, and gates unless asked. Separately, the claim now splits by tracker: personal todo claims wait until after the Justify gate (no rollback needed if killed), while Linear and GitHub claims happen first to signal intent but record prior state so a killed task restores cleanly.
* docs(commands): fix finish-branch base detection and merge safetyCraig Jennings2026-05-221-10/+68
| | | | Two audit fixes. Base-branch detection returned a merge-base SHA where a branch name was needed. Phase 2 now resolves the branch name (open PR base, then origin/HEAD, then ask) and computes the merge-base SHA separately. Option 1's merge gained pre-flight safety: a dirty-tree refusal with no auto-stash, protected-branch awareness, an upstream-gated ff-only pull, and merge-commit-vs-rebase as a team-policy choice. Worktree detection moved from grepping branch names to a git-dir vs git-common-dir comparison.
* chore(todo): close v2mom, prompt-engineering, and codify audit itemsCraig Jennings2026-05-221-36/+18
|
* docs(commands): add stale-entry and privacy pre-write checks to codifyCraig Jennings2026-05-221-0/+5
| | | | codify now runs two mandatory checks before writing a CLAUDE.md entry: a stale-entry scan (update or remove a no-longer-true entry in place rather than appending a contradiction around it) and a privacy check asking "safe if the project were public?" and "belongs in private memory instead?", routing private content to auto-memory. These are gates, not background guidance.
* docs(commands): fix prompt-engineering citation and add an eval harnessCraig Jennings2026-05-221-3/+5
| | | | Two audit fixes. The Meincke citation had the wrong title and was used to imply persuasion framing improves prompt quality. It now reads as the safety caution it is: applying the principles raised an LLM's compliance with objectionable requests from ~33% to ~72%, a reason for care, not a recipe. The correct title ("Call Me A Jerk...") and SSRN id are fixed in all three spots. Critique mode also gains an eval-harness step: for fragile or production prompts, run 3-5 adversarial examples against the old and new prompt and record the delta, so quality is verified rather than asserted.
* docs(commands): align create-v2mom with Salesforce V2MOM conventionsCraig Jennings2026-05-221-38/+49
| | | | Three audit fixes. Renamed Metrics to Measures throughout to match Salesforce's term (the "vanity metrics" idiom stays, since that's the anti-pattern name). Phase 8 task migration no longer transplants the task tree into the V2MOM — tasks stay in the backlog grouped by method, and each method links to where they live, keeping strategy and execution as separate sources of truth. Obstacles now carry a mitigation, owner, and review cadence, so the section is operational rather than just candid.
* chore(todo): close frontend, security, and pairwise audit itemsCraig Jennings2026-05-221-44/+18
|
* docs(skills): add t-way escalation and honest generator path to pairwise-testsCraig Jennings2026-05-221-0/+15
| | | | Two audit fixes. A t-way escalation section says to start pairwise, then raise specific high-risk clusters to 3-way or higher when history, safety, security, or coupling warrants it, with the sub-model order syntax shown. A second note explains PICT's ~ marker (a negative or invalid-value tag, not an operator) and adds an honest stop-at-the-model rule: if no PICT generator is installed, produce the model and stop rather than hand-writing rows and calling them generated.
* docs(commands): update security-check to OWASP 2021 + scanner toolingCraig Jennings2026-05-221-9/+22
| | | | Two audit fixes to the OWASP review. It now maps each finding to an OWASP Top 10 2021 category or a WSTG area, adding the four that were missing (Insecure Design, Software and Data Integrity Failures, Security Logging and Monitoring Failures, SSRF) with explicit checks for object and function-level authorization, SSRF URL fetches, update and dependency integrity, and logging gaps. A new optional-scanners step adds gitleaks/trufflehog, semgrep, OSV, and lockfile-diff review, with a network caveat: a scan that can't run reports "not run", never a silent pass.
* docs(skills): gate accessibility and bound aesthetics in frontend-designCraig Jennings2026-05-222-5/+42
| | | | Two audit fixes. Accessibility moves from an optional reference for interactive components to a required WCAG 2.2 gate before handoff, covering keyboard, focus visibility and not-obscured, target size, contrast, reduced motion, labels, and semantics, for all frontend work. references/accessibility.md gained the backing detail and moved from 2.1 to 2.2. A new "creative but bounded" section keeps the maximalist directions as tools within guardrails: domain fit, readability, responsive stability, and no decoration that degrades the workflow.
* chore(todo): close add-tests audit items (1 fixed, 1 moot)Craig Jennings2026-05-222-12/+75
|
* docs(skills): add a category-exception protocol to add-testsCraig Jennings2026-05-221-1/+2
| | | | | | The Normal/Boundary/Error rule forced all three categories on every function, including pure adapters, generated code, and framework glue where a category would only re-test the framework. Step 7 now lets you skip a category in those cases, as long as you state and justify the skip in the plan and cover the behavior at integration or E2E level — a stated decision, never a silent omission. Step 12 points back to it. The companion audit item about a missing typescript-testing.md reference turned out moot: that ruleset now exists and add-tests already references it correctly.
* chore(todo): close debugging-skills audit itemsCraig Jennings2026-05-221-17/+9
|
* docs(skills): tighten debug, root-cause-trace, and five-whysCraig Jennings2026-05-223-7/+16
| | | | | | | | | | Three audit-pass fixes across the debugging skills. debug now captures environment and recent-change context (versions, flags, dataset, seed/clock, concurrency, recent commits) as a Phase-1 step. Many intermittent bugs live in state or environment, not a local code path, and "what changed recently" is often the fastest route to the cause. root-cause-trace's defense-in-depth said to add a check at every layer that could have caught the bad value, which breeds validation spam. It now adds checks only at boundary-owning layers (ingress, persistence, the invariant owner, final render), and says a pass-through function that owns neither a boundary nor an invariant shouldn't get a duplicate null check. five-whys now makes each link carry an evidence field and a counterfactual: if you remove this cause, does the symptom above still happen? That's the guard against a tidy chain that reads well but wouldn't have prevented the failure.
* chore(todo): close playwright headed/headless itemCraig Jennings2026-05-221-6/+3
|
* docs(skills): document headed vs headless choice in playwright skillsCraig Jennings2026-05-222-2/+13
| | | | playwright-js defaulted headed, playwright-py headless, with no explanation, so an agent could flip modes by habit. I added a matching purpose-based decision table to each (headed for interactive debugging, headless for CI/pytest), and made each skill name its own default and point at the other. I also softened the absolutist "always headless" comment in the py example.
* chore(todo): close PR-response audit items (2 fixed, 2 moot)Craig Jennings2026-05-221-24/+12
|
* docs(commands): tighten respond-to-review PR-feedback workflowCraig Jennings2026-05-222-5/+7
| | | | | | | | | | Two audit-pass fixes to respond-to-review, plus a stale-reference tidy in respond-to-cj-comments. The Gather step fetched a flat comment list, which misses thread resolution and re-processes feedback that's already settled. It now pulls unresolved review threads via gh api graphql (skipping resolved ones), keeps REST only for top-level conversation comments, and resolves a thread only after the fix is verified. The commit guidance suggested "fix: Address review — [description]", which puts the review process into git log against commits.md and used a non-ASCII dash. It now names the actual fix and leaves the how-it-surfaced detail in the PR thread. respond-to-cj-comments lost a stale "humanizer" mention, since that skill is voice now. Two adjacent audit items came out moot: it no longer embeds an absolute path, and its humanizer/emacsclient fallback was overtaken by the /voice migration and the in-place VERIFY pattern.
* chore(todo): close review-code CI-trust and CLAUDE.md-citation itemsCraig Jennings2026-05-221-15/+7
|
* docs(skills): scope review-code's CI-trust and CLAUDE.md-citation rulesCraig Jennings2026-05-221-1/+8
| | | | | | | | Two clarifications to review-code where it appeared to contradict other rules. The "trust CI, don't run builds" rule read as a blanket license to skip verification. I scoped it to reviewing a diff, not shipping one. A pre-commit or pre-push flow still owes the local verification verification.md requires. Reading a PR doesn't duplicate CI. Producing one doesn't get to skip it. The CLAUDE.md-adherence audit could put a CLAUDE.md citation into a team-visible PR comment, which commits.md says not to do. I added two modes. A private review cites CLAUDE.md directly. A public review translates the rule into the engineering reason and doesn't name the file, since a teammate can act on the reason but not on a file they can't reach.
* docs(commits): decouple voice patterns from the approval gateCraig Jennings2026-05-221-5/+9
| | | | | | The .ai/-tracking check used to decide two things at once: which voice patterns ran and whether the approval gate fired. In a team repo that meant losing the 8 personal patterns and the gate together. I split them. Publish artifacts (commit messages, PR titles and bodies, PR review comments) always run /voice personal now, because they go out under my name regardless of the repo. The .ai/ check decides only the gate: applied in my personal repos, skipped for velocity in shared ones. The gap this closes: a team-repo PR comment used to skip pattern #39, the public-artifact scope flag, which is exactly the check that matters most when teammates can read it.
* chore(todo): close GH-assumption and review-code strengths tasksCraig Jennings2026-05-221-5/+5
|
* docs(skills): keep review-code praise honest and unforcedCraig Jennings2026-05-221-3/+5
| | | | | | Two related changes to review-code's strengths guidance. The mandatory "three minimum" could force filler on a tiny diff or padded praise on a weak PR, so I relaxed it to up to three specific strengths, with an honest "nothing notable" allowed when the diff doesn't earn them. I also reframed the old "No Strengths section" anti-pattern as "skipping strengths out of laziness": a substantive diff still demands them, a weak one doesn't. The other change tells reviewers to name the good thing and stop, without explaining why it's good. Explaining praise reads as sycophantic since the author already knows the rationale. Elaboration is for findings, not compliments.
* docs(workflows): document GitHub-family assumption in wrap-it-up Step 3.5Craig Jennings2026-05-222-2/+10
| | | | | | Step 3.5's Linear ticket-state sweep uses gh to find the merged PR for a Dev-Review ticket, which assumes a GitHub-family host. That holds today because DeepSat, the only Linear-using project, lives on GHE where gh talks to the API. I added a note flagging the assumption rather than rewriting the step to be provider-agnostic. A future Linear project on GitLab, Gitea, or Bitbucket would need a different PR lookup, but none exists yet, so documenting the boundary beats building for a host we don't have.
* feat(startup): sync language bundles per project on session launchCraig Jennings2026-05-224-0/+299
| | | | | | | | | | Startup synced the .ai/ templates into the current project every session but never checked the language bundle (elisp, python) installed in .claude/. Bundle drift went unnoticed until someone re-ran make install-lang by hand: a generic rule added to claude-rules/ after the last install, or a changed validator hook. scripts/sync-language-bundle.sh closes that gap. It fingerprints which bundle a project has by the presence of the language's own rule files (elisp.md, python-testing.md), then reconciles against the canonical source: auto-fix for rulesets-owned files (.claude/rules/*.md, .claude/hooks/*, githooks/*), surface-only for settings.json, which a project may have customized. CLAUDE.md is left alone. It's seed-only in install-lang and project-owned afterward, the same reason diff-lang skips it. Startup Phase A step 12 calls it for the current project, guarded so older checkouts that lack the script still boot. It writes only under .claude/ and githooks/, disjoint from the .ai/ rsync paths, so the parallel batch stays safe. A script rather than a make target keeps the Makefile-parse layer off the boot path. The absolute rulesets path it depends on is the same one the rsyncs already carry. Tested: 11 bats cases (no-bundle, clean, drifted rule/hook auto-fixed, surfaced settings.json asserted unmodified, absent CLAUDE.md not flagged, python detection, $PWD default, bad path). A smoke run against a copy of a real elisp project's .claude/ caught a perpetual "CLAUDE.md missing" alarm, which is what drove dropping CLAUDE.md from the surface set.
* chore(inbox): queue wrap-up lint and task-review follow-upsCraig Jennings2026-05-212-0/+57
| | | | The lint sweep re-flagged the line-2070 misplaced-heading false positive (a ** inside verbatim markers in a DONE body), and the staleness check counted 12 top-level tasks unreviewed past 30 days.
* feat(workflows): tag <=30min tasks :quick: during task reviewCraig Jennings2026-05-212-2/+16
| | | | | | I added a per-task effort check to the task-review walk. If a task looks like 30 minutes or less and isn't already tagged, mark it :quick: on the heading. When the heading and body don't make the effort clear, ask instead of guessing. A mislabeled :quick: defeats the point, since the tag exists so Craig can grab a genuinely small task in a spare moment. I edited the canonical claude-templates copy and the synced project mirror together, so the next startup rsync won't revert them.
* chore(ai): archive session record and task-review todo cleanupCraig Jennings2026-05-203-96/+261
| | | | Archive the DONE task-review implementation and the cancelled OV-1 skill from Open Work to Resolved. The follow-ups file picks up one lint judgment and the review-habit staleness line for the next daily-prep.
* chore(todo): close the task-review implementation as doneCraig Jennings2026-05-201-2/+3
| | | | The habit is built and smoke-tested — the staleness script with count and --list modes, the wrap-up health check, the task-review workflow, and the startup nudge all shipped, and the first review cycle ran clean against the live list. The elisp component was dropped under Shape B. The daily habit carries on through the startup nudge and the wrap-up watchdog.
* chore(todo): re-grade and prune tasks in a review passCraig Jennings2026-05-201-5/+26
| | | | This is the first task-review cycle. I re-graded create-documentation, the 2026-05-04 audit review pass, and /update-skills from [#A] to [#C]; bumped the wrap-it-up GitHub-remote chore to [#A] and tagged it :quick:; and cancelled the OV-1 DoDAF skill. The kept and re-graded tasks get a :LAST_REVIEWED: stamp so the staleness watchdog and the rotation know they've been looked at.