diff options
| author | Craig Jennings <c@cjennings.net> | 2026-05-14 21:47:30 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-05-14 21:47:30 -0500 |
| commit | 77eaaf814ace3fdb456d9683898d5693d402b3df (patch) | |
| tree | 8cac42e3953fba0983f792f6f5b27ca638c524ff | |
| parent | 372fb766a91e00428abbcc521b26b546e38704c9 (diff) | |
| download | rulesets-77eaaf814ace3fdb456d9683898d5693d402b3df.tar.gz rulesets-77eaaf814ace3fdb456d9683898d5693d402b3df.zip | |
docs(todo): start memory-sync investigation, bump audit priorities
Memory-sync [#A] TODO flipped to DOING. Added a dated work-log
subheader recording what's on disk under ~/.claude/projects/ (plain
files, no symlinks, no git) and the proposed fix (stow
~/.claude/projects via archsetup/dotfiles/common/). Added a VERIFY
child awaiting approval on the stow approach before any moves.
Side effect of the wrap-up's --sync-child-priority pass: 43 [#B]
children of the [#A] "Review pass: tighten skills and rulesets after
2026-05-04 audit" parent bumped to [#A] to match the parent. Tag any
of them :no-sync: if they should stay at [#B].
Archived 2026-05-14-21-43-lint-org-build-and-memory-sync-investigation
to .ai/sessions/. The day shipped /lint-org as a four-commit feature
across claude-templates and rulesets, then this short follow-on pass
processed the line-11 src-block via /respond-to-cj-comments.
| -rw-r--r-- | .ai/sessions/2026-05-14-21-43-lint-org-build-and-memory-sync-investigation.org | 91 | ||||
| -rw-r--r-- | todo.org | 116 |
2 files changed, 160 insertions, 47 deletions
diff --git a/.ai/sessions/2026-05-14-21-43-lint-org-build-and-memory-sync-investigation.org b/.ai/sessions/2026-05-14-21-43-lint-org-build-and-memory-sync-investigation.org new file mode 100644 index 0000000..cee2868 --- /dev/null +++ b/.ai/sessions/2026-05-14-21-43-lint-org-build-and-memory-sync-investigation.org @@ -0,0 +1,91 @@ +#+TITLE: Session Context — /lint-org skill build +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-05-14 + +* Summary + +** Active Goal + +Two arcs. First (main): build the =/lint-org= command end to end per the spec at =.ai/specs/lint-org-skill-spec.md= — TDD elisp script with mechanical fixers + ERT suite, a =.claude/commands/lint-org.md= orchestrator that walks judgments inline, wrap-up integration so each evening's wrap-it-up runs the mechanical pass on =todo.org=, ship across claude-templates + rulesets in 4 commits. Second (follow-on): run =/respond-to-cj-comments= against the one cj annotation on todo.org line 11 — turned out to be a clarifying note on the memory-sync TODO, investigated the actual storage layout, filed a VERIFY for the proposed stow-based fix. + +** Decisions + +- Build as a *command* (=.claude/commands/lint-org.md=) not a model-invocable skill. Convention post =aa69245= — user-invoked entry points are commands. +- Script emits structured stdout (one summary line + plist per issue) so the command layer parses cleanly without re-running org-lint. +- =--check= for preview, =--followups-file=PATH= for the wrap-up's deferred-judgment routing. The script handles the append itself, so wrap-up doesn't need bash glue. +- Followups path defaults to =~/projects/work/inbox/lint-followups.org= (where daily-prep merges in) with project-local =.ai/lint-followups.org= fallback. Override via =$LINT_ORG_FOLLOWUPS=. +- Mechanical fixers process in descending line order so additions/deletions don't perturb earlier line numbers. +- =misplaced-heading= is conditionally mechanical: =**X.**= at line start → =*X.*= is auto-fixed; =*** Foo= inside =verbatim= markup stays judgment. Classifier checks line N and N-1 since org-lint reports the blank line *after* the offender, not the offender itself. +- Backup before any write (=/tmp/<basename>.before-lint-pass.<timestamp>=). Skipped in =--check= mode. +- Canonical-source procedure followed: script + workflow edits in claude-templates first, then rsync to rulesets, then commit in both (saved memory =project_ai_scripts_canonical_source.md=). +- Lint-org walk on the live todo.org: picked option 5 (skip) for the line-11 =cj:= block warnings since the block is actually a personal-note convention, not source code — the right tool was =/respond-to-cj-comments=, not =/lint-org=. + +** Data Collected / Findings + +- org-lint result shape on emacs 30.2: =(id [marker trust msg checker])=. Marker is a propertized string of the line number (text property =org-lint-marker= holds the actual marker); checker is the =org-lint-checker= struct (=org-lint-checker-name= → symbol). =org-lint--get-line-number= doesn't exist in this version — was a wrong guess from training data. +- org-lint's =misplaced-heading= marker points at the *blank line after* the heading-like text, not the offending line. Both the markdown-bold case and the verbatim-asterisk case behave this way. The classifier has to look at LINE-1 first, then LINE. +- =timestamp-syntax= warning fires on bare YYYY-MM-DD timestamps without a day-name (=<2026-05-20>= → "Parsed as: <2026-05-20 Wed>"). Not in the mechanical category list; falls through to judgment. +- =todo.org= currently has 3 lint warnings, all on line 11: =empty-header-argument=, =wrong-header-argument=, =suspicious-language-in-src-block= — all from the same =#+begin_src cj: comment= block. Left in place by user choice (option 5 in the live walk); the cj block was handled separately via =/respond-to-cj-comments=. +- =~/.claude/projects/-home-cjennings-code-rulesets/memory/= contains four files (=MEMORY.md=, =feedback_never_guess.md=, =project_ai_scripts_canonical_source.md=, =reference_pdftools_venv.md=). Plain dir, no symlinks, no enclosing git checkout. Memory does *not* currently sync across machines. Proposed fix: stow =~/.claude/projects= via =archsetup/dotfiles/common/.claude/projects/=. Filed as a VERIFY for Craig's approval before any moves. +- Test totals after this session: 240 pytest + 22 lint-org ERT + 23 todo-cleanup ERT = 285 green. Byte-compile clean on the new elisp. + +** Files Modified + +claude-templates (canonical, both commits pushed to =origin/main=): +- =138f35f feat(lint-org): add script with mechanical fixers and ERT suite= — =.ai/scripts/lint-org.el= (365 lines) + =.ai/scripts/tests/test-lint-org.el= (465 lines). +- =4eba98c docs(wrap-it-up): run lint-org on todo.org at wrap-up= — =.ai/workflows/wrap-it-up.org= new =*** Lint org files= subsection + validation-checklist line. + +rulesets (both pushed to =origin/main=): +- =f5b8688 chore(ai): sync lint-org script and wrap-it-up from claude-templates= — byte-identical pull of the script, tests, and wrap-it-up section. +- =9f62a7c feat(lint-org): add /lint-org command + file design spec= — =.claude/commands/lint-org.md= (162 lines), =.ai/specs/lint-org-skill-spec.md= (moved from =inbox/=), =todo.org= new =[#A] Build /lint-org= entry. +- =todo.org= (uncommitted before wrap): cj-source-block on line 11 resolved via =/respond-to-cj-comments=. Parent =[#A]= memory-sync TODO flipped to =DOING=, dated work-log subheader added under it with the memory storage investigation findings, =VERIFY= child added asking for approval on the proposed stow-based sync. + +** Next Steps + +- Carryover: =/update-skills= skill, =create-documentation= skill (large, ~600 lines of research notes already drafted in todo.org), 2026-05-04 audit review pass (12 [#A] + 38 [#B] sub-items), memory-sync setup pending the =VERIFY= answer on the proposed stow approach, =[#B]= fold-claude-templates-into-rulesets, =[#B]= =make audit=. +- =/lint-org= goes live in the next wrap-up: =.ai/workflows/wrap-it-up.org= Step 3 runs the mechanical pass on =todo.org= and routes judgments to the follow-ups file (this session's wrap is the first one to exercise it). +- The 3 lint warnings on todo.org line 11 stay until Craig either fixes the =cj:= src-block by hand or registers =cj:= as a known org-babel language in Emacs init. Won't auto-fix. + +* Session Log + +** Startup + spec triage (13:05 CDT) + +Clean startup — no interrupted session, no cross-agent traffic, no reminders. Inbox had one file: =lint-org-skill-spec.md= dropped today at 12:54, a full spec for a =/lint-org= skill that wraps =org-lint=. Spec defines two modes (interactive, mechanical-only), four mechanical-fix categories (item-number, missing-language-in-src-block, misplaced-planning-info, markdown-bold), four judgment categories (link-to-local-file, invalid-fuzzy-link, verbatim-asterisk, suspicious-language), and a wrap-up integration that runs the mechanical pass on todo.org each night. + +Filed: moved spec to =.ai/specs/lint-org-skill-spec.md= and added =[#A] Build =/lint-org= skill + wrap-up integration= to todo.org (just before the =/update-skills= entry). Craig picked "build it now." + +** API probe + repo conventions + +Probed =org-lint= output format in =emacs --batch -Q=. Each report item is =(id [marker trust msg checker])= — the marker is a propertized string of the line number, the checker is an =org-lint-checker= struct (name is the field that identifies the category). Earlier attempt used =org-lint--get-line-number= which doesn't exist in this Emacs version (30.2); the line number is just the marker string content. + +Current todo.org state: 3 warnings, all on line 11, from a single =cj:= src block (empty header argument "comment", missing colon, unknown source-block language). These are =suspicious-language-in-src-block= — judgment category per the spec. + +Conventions confirmed: +- Commands live in =.claude/commands/= (rulesets-only). Patterns: =start-work.md=, =review-code.md=, =respond-to-cj-comments.md=, etc. +- Elisp scripts live in =.ai/scripts/=, canonical in =~/projects/claude-templates/=, rsync'd to rulesets per the canonical-source procedure. +- ERT tests live in =.ai/scripts/tests/test-<name>.el=. Makefile target =test-scripts= globs =tests/test-*.el= and runs each. +- =todo-cleanup.el= is the precedent — =lexical-binding=, =defconst= for module constants, =defvar= for state, CLI dispatch behind a =tc--cli-invocation-p= guard so =require= from tests doesn't fire it. + +** Build plan (in flight) + +Eight tasks tracked. Currently on #1 (creating this file). Next: ERT tests first (TDD), then implementation, then command file, then smoke test, then wrap-up integration, then commit/push in both repos. + +Key design calls: +- It's a command, not a model-invocable skill. Convention post =aa69245=. +- Script emits structured stdout (s-expressions or JSON-ish lines) so the command layer can walk judgments without re-parsing org-lint output. +- For mechanical-only mode, the script applies fixes and emits remaining judgments on stdout; the command layer (or wrap-up workflow) routes them to a carry-forward file. Don't bake the carry-forward path into the script — keep routing decoupled. +- Defer multi-file invocation. Defer the carry-forward-file location decision (spec lean: per-project =.ai/lint-followups.org= or similar; finalize during wrap-up integration). + +** Build complete (18:31 CDT) + +Tasks 1-7 done. Quick log: + +- =.ai/scripts/lint-org.el= shipped — 4 mechanical fixers (item-number, missing-language-in-src-block, misplaced-planning-info, misplaced-heading markdown-bold case), judgment-emission for the rest, =--check= preview mode, =--followups-file=PATH= for daily-prep handoff. Descending line-order processing so fixes don't perturb earlier line numbers. +- =.ai/scripts/tests/test-lint-org.el= shipped — 22 ERT cases: Normal/Boundary/Error per category, idempotency for each mechanical fixer, =--check= mode, mixed-fixture integration, backup-file creation, followups-file behavior (append on judgments, no-op when none, skipped in check mode). +- Caught two implementation bugs via the TDD loop: (1) =org-lint= reports =misplaced-heading= at the *blank line after* the offender, not the offender itself — fixer now scans LINE and LINE-1; (2) =lo-fix-misplaced-planning= wasn't positioning point before =re-search-backward= — now goes to the reported line first. +- =.claude/commands/lint-org.md= shipped — interactive mode walks each judgment with inline numbered options (per =interaction.md=); mechanical-only mode defers via =--followups-file=. Edge cases documented. +- =.ai/workflows/wrap-it-up.org= updated — new =*** Lint org files= subsection in Step 3 runs the script with =--followups-file=$LINT_ORG_FOLLOWUPS= defaulting to =~/projects/work/inbox/lint-followups.org=, project-local fallback. Validation checklist gained one line. +- All canonical edits in claude-templates first, then rsync to rulesets. =make test=: 240 pytest + 22 lint-org ERT + 23 todo-cleanup ERT = 285 green. +- Smoke against live todo.org: =--check= reports 3 judgments on line 11 (the =cj:= src block — empty-header-argument, wrong-header-argument, suspicious-language), file MD5 unchanged. End-to-end smoke on a synthetic fixture: 4 mechanicals applied correctly, 1 judgment emitted, backup landed in =/tmp/=. + +Task 8 (commits in both repos) — pausing here to confirm before pushing. @@ -7,10 +7,32 @@ Project-scoped (not the global =~/sync/org/roam/inbox.org= list). * Rulesets Open Work -** TODO [#A] Check that memories are sync'd across machines via git.m -#+begin_src cj: comment -this means we need to link the memory file in ~/.claude if it's not already -#+end_src +** DOING [#A] Check that memories are sync'd across machines via git.m + +*** 2026-05-14 Thu @ 19:14:11 -0500 Investigate current memory storage + +Memory files live at +[[file:/home/cjennings/.claude/projects/-home-cjennings-code-rulesets/memory/][~/.claude/projects/-home-cjennings-code-rulesets/memory/]] +— four files including =MEMORY.md= and three individual entries +(=feedback_never_guess.md=, =project_ai_scripts_canonical_source.md=, +=reference_pdftools_venv.md=). The directory is a plain unmanaged dir +(no symlink, no enclosing git checkout). Neither +[[file:/home/cjennings/.claude/][~/.claude/]] itself nor any subtree +containing the project-memory dirs is tracked in +[[file:/home/cjennings/code/archsetup/][archsetup]] or +[[file:/home/cjennings/code/rulesets/][rulesets]]. Without a symlink +into a stowed or tracked location, memory files don't survive a new +machine setup or a dotfiles restore. + +Proposed setup: stow =~/.claude/projects= → +=archsetup/dotfiles/common/.claude/projects/= (path doesn't exist yet +— it's the target location pending VERIFY). +Create the destination in archsetup, move existing per-project +=projects/<encoded-cwd>/memory/= dirs there, run =stow= to link, then +commit + push archsetup. After that, every machine running =stow= +picks up the same memory tree. + +*** VERIFY Approve stow-based sync of ~/.claude/projects via archsetup/dotfiles/common/ ** TODO [#B] Document rulesets + claude-templates pull-before-project ordering in protocols.org Startup currently pulls claude-templates in Phase A.0 and fast-forwards the @@ -815,20 +837,20 @@ calls =networkidle= discouraged for testing. Keep reconnaissance, but revise it to wait for a visible app-specific landmark instead of treating network quiet as readiness. -*** TODO [#B] =playwright-js= and =playwright-py=: reconcile headless/visible-browser defaults +*** TODO [#A] =playwright-js= and =playwright-py=: reconcile headless/visible-browser defaults =playwright-js= says visible Chromium by default; =playwright-py= says headless by default. That may be intentional, but the difference should be explicit: interactive visual debugging -> headed, CI/pytest smoke tests -> headless. Add a small decision table so agents don't flip modes by habit. -*** TODO [#B] =playwright-js= and =playwright-py=: remove emoji console markers from examples +*** TODO [#A] =playwright-js= and =playwright-py=: remove emoji console markers from examples The broader rules discourage emojis in shared engineering output. The Playwright examples print camera/check/cross emoji. Replace with plain ASCII status prefixes. -*** TODO [#B] =frontend-design=: make accessibility non-optional and align with WCAG 2.2 +*** TODO [#A] =frontend-design=: make accessibility non-optional and align with WCAG 2.2 The workflow only loads =references/accessibility.md= for interactive components. Accessibility should be a baseline for all frontend work: keyboard @@ -836,7 +858,7 @@ operation, focus visibility/not-obscured, target size, contrast, reduced motion, labels, and semantic structure. Add WCAG 2.2-oriented gates before handoff. -*** TODO [#B] =frontend-design=: harmonize aesthetic guidance with current UI anti-pattern rules +*** TODO [#A] =frontend-design=: harmonize aesthetic guidance with current UI anti-pattern rules The skill encourages gradient meshes, heavy texture, custom cursors, overlap, and maximalist directions. Those can conflict with the repo's newer frontend @@ -854,7 +876,7 @@ each finding maps to either OWASP Top 10 2021 or a WSTG area, and add explicit checks for authorization object/function-level access, SSRF URL fetches, integrity of update/plugin paths, and security-relevant logging gaps. -*** TODO [#B] =security-check=: add practical tooling and offline/network caveats +*** TODO [#A] =security-check=: add practical tooling and offline/network caveats Add optional use of project-configured scanners such as =gitleaks= or =trufflehog= for secrets, =semgrep= for source patterns, =pip-audit= / =npm @@ -862,7 +884,7 @@ audit= / OSV where configured, and lockfile diff review. Note that dependency audits may need network access and should report "not run" clearly rather than silently passing. -*** TODO [#B] =pairwise-tests=: add t-way escalation guidance beyond pairwise +*** TODO [#A] =pairwise-tests=: add t-way escalation guidance beyond pairwise Pairwise is a pragmatic default, but NIST's combinatorial testing work covers higher-strength t-way arrays too. Add a rule: start with pairwise for broad @@ -870,7 +892,7 @@ coverage, escalate selected high-risk parameter clusters to 3-way or higher when history, safety, security, or domain reasoning suggests faults require more than two interacting factors. -*** TODO [#B] =pairwise-tests=: clarify negative value syntax and actual generator availability +*** TODO [#A] =pairwise-tests=: clarify negative value syntax and actual generator availability The examples use =~0= style values that are PICT-specific and easy to misread. Add a short "negative testing values are labels, not operators unless @@ -884,14 +906,14 @@ V2MOM's final M is officially "Measures." The skill uses "Metrics" throughout. Either rename the section and description to "Measures" or add a clear note that this fork intentionally says "Metrics" while preserving the V2MOM concept. -*** TODO [#B] =create-v2mom=: prevent task migration from turning V2MOM into a backlog +*** TODO [#A] =create-v2mom=: prevent task migration from turning V2MOM into a backlog Salesforce presents V2MOM as a simple alignment framework. This skill's optional task-migration phase can make the V2MOM the entire todo system. Split strategy from execution: keep the V2MOM concise, and link to method-specific backlogs instead of embedding every task under the strategic document. -*** TODO [#B] =create-v2mom=: add mitigation/owner fields for Obstacles +*** TODO [#A] =create-v2mom=: add mitigation/owner fields for Obstacles The current Obstacles phase captures barriers but not consistently how each will be overcome. Add "mitigation, owner, and review cadence" per obstacle so @@ -906,14 +928,14 @@ result: it shows persuasion can raise compliance with objectionable requests, which is a cautionary prompt-safety finding, not broad evidence that persuasion principles improve engineering prompt quality. -*** TODO [#B] =prompt-engineering=: add an evaluation harness requirement for production prompts +*** TODO [#A] =prompt-engineering=: add an evaluation harness requirement for production prompts Prompt critique currently ends with a rewrite and checklist. Add a requirement for fragile or reusable prompts: create 3-5 adversarial/edge examples, run the old and new prompt against them, and record the observed behavioral delta. Without examples, prompt quality remains asserted rather than verified. -*** TODO [#B] =codify=: add stale-entry review and privacy checks before writing project =CLAUDE.md= +*** TODO [#A] =codify=: add stale-entry review and privacy checks before writing project =CLAUDE.md= The skill has good gates, but it should explicitly scan for stale entries, private context, and team-visible leakage before appending. Add "would this be @@ -928,7 +950,7 @@ before completion. Clarify: code review should not duplicate CI while reading a PR, but pre-commit/pre-push workflows still need local verification or a clear "not run because..." statement. -*** TODO [#B] =review-code=: handle public-artifact scope when citing =CLAUDE.md= +*** TODO [#A] =review-code=: handle public-artifact scope when citing =CLAUDE.md= The skill requires auditing and reporting =CLAUDE.md= adherence, while =commits.md= says personal tooling files should not be cited as authority in @@ -936,7 +958,7 @@ public artifacts. Add two output modes: private/internal review may cite =CLAUDE.md= directly; public/team review should translate the rule into the underlying engineering reason without naming personal rulesets. -*** TODO [#B] =review-code=: relax mandatory "three strengths" for tiny or failing diffs +*** TODO [#A] =review-code=: relax mandatory "three strengths" for tiny or failing diffs "Three minimum" strengths can force filler on small diffs or bad PRs. Adjust to "up to three specific strengths; say none found when appropriate" so the review @@ -949,21 +971,21 @@ conflicts with =commits.md='s "what changed and why, not the process" rule and also uses a non-ASCII dash. Replace with conventional subjects that name the actual fix, e.g. =fix: validate export filename=. -*** TODO [#B] =respond-to-review=: use unresolved review threads and resolution state, not only flat comments +*** TODO [#A] =respond-to-review=: use unresolved review threads and resolution state, not only flat comments Fetching inline and top-level comments via REST misses thread resolution and can re-process already-resolved feedback. Add the same thread-level workflow as the GitHub comment-addressing skill: gather unresolved threads, group by requested change, implement, reply, and resolve only after verification. -*** TODO [#B] =respond-to-cj-comments=: remove personal absolute path references from public-writing instructions +*** TODO [#A] =respond-to-cj-comments=: remove personal absolute path references from public-writing instructions The skill embeds =/home/cjennings/code/rulesets/claude-rules/commits.md= in the public-writing section. That contradicts the public-artifact scope rule. Refer to "the commit/public-writing rules" internally, and ensure any emitted public text never cites the local path. -*** TODO [#B] =respond-to-cj-comments=: add fallback when =humanizer= or =emacsclient= is unavailable +*** TODO [#A] =respond-to-cj-comments=: add fallback when =humanizer= or =emacsclient= is unavailable The workflow requires =/humanizer= and opens long summaries in =emacsclient=. Neither is guaranteed in a fresh environment. Add tool-availability checks and @@ -978,14 +1000,14 @@ base. Replace with explicit branch detection: upstream PR base if present, configured default branch from =origin/HEAD=, or user-selected branch, then compute merge-base separately. -*** TODO [#B] =finish-branch=: make pull/merge steps safer and worktree-aware +*** TODO [#A] =finish-branch=: make pull/merge steps safer and worktree-aware Option 1 runs =git pull= and =git merge --no-ff= after checkout. Add checks for dirty worktree, upstream tracking, protected branches, and rebase-vs-merge team policy. Worktree detection via grepping branch names is fragile; use =git worktree list --porcelain= or =git rev-parse --git-common-dir= based checks. -*** TODO [#B] =start-work=: add tool-availability and ceremony-scaling rules +*** TODO [#A] =start-work=: add tool-availability and ceremony-scaling rules The workflow assumes Linear MCP, GitHub CLI, =humanizer=, Playwright skills, and multi-commit TDD ceremony. Add a first-class "tools unavailable" path and a @@ -993,102 +1015,102 @@ ceremony scale: trivial local fixes should not require the full ticket, branch, three approval gates, and commit-per-phase flow unless the user wants that process. -*** TODO [#B] =start-work=: resolve the "claim before justify" rollback risk +*** TODO [#A] =start-work=: resolve the "claim before justify" rollback risk The skill marks Linear/GitHub/todo tasks in progress before the Justify gate, then says rolling back is required if justification fails. Consider moving claiming after Gate 1 for personal todo tasks, or make the rollback steps explicit per tracker with stored prior state. -*** TODO [#B] =add-tests=: fix missing =typescript-testing.md= reference or add the ruleset +*** TODO [#A] =add-tests=: fix missing =typescript-testing.md= reference or add the ruleset Phase 3 references =typescript-testing.md=, but this repo currently has Python and Elisp testing rules only. Either add the TypeScript ruleset or change the skill to discover project-local JS/TS testing conventions instead of pointing to a missing file. -*** TODO [#B] =add-tests=: add explicit exceptions to "all three categories per function" +*** TODO [#A] =add-tests=: add explicit exceptions to "all three categories per function" The Normal/Boundary/Error rule is useful, but some functions are pure adapters, generated code, tiny wrappers, or framework glue. Add an exception protocol: state why a category does not apply, and cover the behavior at the integration or E2E level when unit categories would test framework behavior. -*** TODO [#B] =debug=: capture environment and recent-change context before hypotheses +*** TODO [#A] =debug=: capture environment and recent-change context before hypotheses The debugging workflow covers reproduction and logs, but should explicitly record environment, versions, feature flags, data set, seed/time, concurrency, and recent commits/config changes. Many intermittent failures are environment or state transitions, not just local code paths. -*** TODO [#B] =root-cause-trace=: constrain defense-in-depth to trust boundaries and invariants +*** TODO [#A] =root-cause-trace=: constrain defense-in-depth to trust boundaries and invariants The skill says add defense at each intermediate layer that could have caught the bad value. That risks validation spam. Tighten it: add checks at ingress, trust boundaries, persistence boundaries, and invariant-owning layers; avoid duplicative null checks in every pass-through function. -*** TODO [#B] =five-whys=: require evidence and counterfactual validation per why +*** TODO [#A] =five-whys=: require evidence and counterfactual validation per why The skill says "one best-supported answer" but should require an evidence field for each link and a counterfactual check: if this cause were removed, would the next symptom plausibly disappear? This reduces monocausal storytelling. -*** TODO [#B] =brainstorm=: add timebox and research/source rules for high-stakes designs +*** TODO [#A] =brainstorm=: add timebox and research/source rules for high-stakes designs The one-question-at-a-time flow can run long. Add a timebox and a rule that claims about markets, regulations, tools, vendors, or current APIs require fresh sources. The design doc should distinguish researched facts from assumptions. -*** TODO [#B] =arch-decide=: make examples technically timeless and avoid unverifiable claims +*** TODO [#A] =arch-decide=: make examples technically timeless and avoid unverifiable claims The sample ADRs include claims such as MongoDB lacking ACID for multi-document transactions "at decision time." Examples age and can teach stale facts. Replace with either clearly dated examples or domain-neutral placeholders, and require references for real technical claims in generated ADRs. -*** TODO [#B] =arch-decide=: standardize statuses and immutability language +*** TODO [#A] =arch-decide=: standardize statuses and immutability language The skill mixes Accepted, Decided, Deprecated, Superseded, Rejected, and "Not Accepted." Pick a canonical status set and state that accepted ADR content is not edited except for status/link metadata; changed decisions get new ADRs that supersede old ones. -*** TODO [#B] =arch-design=: add threat modeling and privacy/compliance as first-class design inputs +*** TODO [#A] =arch-design=: add threat modeling and privacy/compliance as first-class design inputs Security appears as one quality attribute, but architecture design should also ask about trust boundaries, data classification, abuse cases, privacy constraints, compliance evidence, and operational ownership. These influence architecture early and should not wait for =security-check=. -*** TODO [#B] =arch-design=: separate architecture paradigms from tactical patterns +*** TODO [#A] =arch-design=: separate architecture paradigms from tactical patterns The candidate table mixes paradigms (modular monolith, microservices, event-driven) with tactical or partial patterns (DDD, CQRS, event sourcing). Revise the matrix so candidates can compose patterns rather than treating each as a mutually exclusive architecture choice. -*** TODO [#B] =arch-document=: strengthen quality scenarios using arc42/Q42 structure +*** TODO [#A] =arch-document=: strengthen quality scenarios using arc42/Q42 structure Section 10 currently says "Under [condition], the system should [response] within [measure]." Expand to a compact quality-scenario template: source, stimulus, environment, artifact, response, response measure. This better matches architecture-quality practice and makes requirements testable. -*** TODO [#B] =arch-document=: add staleness and ownership metadata to generated docs +*** TODO [#A] =arch-document=: add staleness and ownership metadata to generated docs arc42 docs are living documents. Add owner, source commit/date, review cadence, and "known stale when..." notes per section or in the README so generated docs do not become authoritative after the code has moved on. -*** TODO [#B] =arch-evaluate=: add confidence levels for framework-agnostic findings +*** TODO [#A] =arch-evaluate=: add confidence levels for framework-agnostic findings Claude-read import graphs and public API comparisons can be incomplete in large or dynamic languages. Add confidence/provenance per finding and require "not fully checked because..." when scale or dynamic imports limit certainty. -*** TODO [#B] =arch-evaluate=: report skipped tool checks explicitly +*** TODO [#A] =arch-evaluate=: report skipped tool checks explicitly The workflow says skip unconfigured language-specific tools silently, but the review checklist also wants checks run. For audit usefulness, list detected @@ -1100,13 +1122,13 @@ C4 is notation-independent. These skills hard-require draw.io XML, PNG export, and opening draw.io desktop. Add supported outputs (Structurizr DSL, Mermaid, PlantUML, draw.io) and a fallback path when =drawio= or a GUI is unavailable. -*** TODO [#B] =c4-analyze= and =c4-diagram=: clarify C4 abstraction boundaries +*** TODO [#A] =c4-analyze= and =c4-diagram=: clarify C4 abstraction boundaries Emphasize that C4 Containers are deployable/runnable units, not necessarily Docker containers, and that Components are not separately deployable. Add a check that every relationship and element stays at one abstraction level. -*** TODO [#B] =commits.md=: split DeepSat/Linear/Slack-specific publishing rules from global commit rules +*** TODO [#A] =commits.md=: split DeepSat/Linear/Slack-specific publishing rules from global commit rules The global commit rule file includes Linear status transitions and a hard-coded Slack channel. That is team-specific and may leak or misfire in unrelated @@ -1119,26 +1141,26 @@ Several workflows make =humanizer= mandatory, but no =humanizer= skill exists in this repo. Either add the skill, install instructions, or a fallback plain-English pass that satisfies the same checks without an external skill. -*** TODO [#B] =verification.md=: add explicit "unable to verify" reporting standard +*** TODO [#A] =verification.md=: add explicit "unable to verify" reporting standard The rule says run tests/lint/typecheck/build before claiming done. Add the required final wording when a command cannot be run: command attempted, reason it could not run, risk left unverified, and the smallest next command for the user to run. -*** TODO [#B] =testing.md=: add property-based and mutation testing as escalation paths +*** TODO [#A] =testing.md=: add property-based and mutation testing as escalation paths The testing rules cover categories and pairwise matrices. Add guidance for property-based testing when invariants matter across broad input domains, and mutation testing when test quality is suspect despite high coverage. -*** TODO [#B] =testing.md=: soften absolute TDD with an explicit spike protocol +*** TODO [#A] =testing.md=: soften absolute TDD with an explicit spike protocol The rule currently treats TDD as non-negotiable. Keep TDD as the default, but define a disciplined spike exception: timebox, do not commit spike code, write the first failing test before productionizing the discovered approach. -*** TODO [#B] =subagents.md=: add capability/availability and cost checks +*** TODO [#A] =subagents.md=: add capability/availability and cost checks The rule assumes subagents exist and should handle failures. Add "if the environment lacks subagents, continue locally and preserve the same scope @@ -1152,21 +1174,21 @@ semantics, constraints, transactions, JSON, time zones, and indexes differ. Recommend production-like DBs for ORM/query behavior and reserve SQLite for pure unit tests that do not depend on database semantics. -*** TODO [#B] =languages/python/claude/rules/python-testing.md=: separate "never mock ORM" from true unit-test boundaries +*** TODO [#A] =languages/python/claude/rules/python-testing.md=: separate "never mock ORM" from true unit-test boundaries For domain services, real model methods and validation are usually right. For thin orchestration units, a repository/interface fake may be cleaner than hitting a real database. Clarify the boundary: do not mock ORM internals, but do inject fakes at deliberate data-access ports. -*** TODO [#B] =languages/elisp/claude/rules/elisp.md=: update editing workflow to avoid tool-specific advice +*** TODO [#A] =languages/elisp/claude/rules/elisp.md=: update editing workflow to avoid tool-specific advice The rule says prefer Write over repeated Edits. That advice is Claude-tooling specific and can conflict with environments that require patch-based edits. Rephrase around the intent: for nontrivial Elisp, make cohesive edits and run paren/byte-compile checks immediately. -*** TODO [#B] =languages/elisp/claude/rules/elisp-testing.md=: add batch-mode and native-comp caveats +*** TODO [#A] =languages/elisp/claude/rules/elisp-testing.md=: add batch-mode and native-comp caveats ERT guidance is solid, but add rules for =emacs --batch= reproducibility, isolating =user-emacs-directory= / package state, and optionally catching @@ -1186,7 +1208,7 @@ unparseable or just display the file path, so attribution scanning may miss the actual committed/posted text. Read safe local files referenced by =-F=, =--file=, and =--body-file= before deciding whether the command is clean. -*** TODO [#B] =hooks/destructive-bash-confirm.py=: replace regex command parsing with shell-aware parsing where possible +*** TODO [#A] =hooks/destructive-bash-confirm.py=: replace regex command parsing with shell-aware parsing where possible The hook's regexes can miss quoted paths, variables, aliases, =env= wrappers, or compound commands, and can misidentify targets. Use =shlex= for simple |
