#+TITLE: Rulesets — Work #+AUTHOR: Craig Jennings #+DATE: 2026-04-19 Tracking TODOs for the rulesets repo that span more than one commit. Project-scoped (not the global =~/sync/org/roam/inbox.org= list). * Priority and Tag Scheme ** Priority - =[#A]= *Urgent risk or current workflow blocker.* Credential exposure, data loss, destructive behavior, startup breakage, failing tests that block work, or a feature/refactor that unblocks a core daily workflow. =[#A]= requires a =SCHEDULED:= or =DEADLINE:= date — if it can't be dated, it isn't really =[#A]=. - =[#B]= *Important planned work.* Concrete bugs, high-leverage architecture cleanup, brittle load-order or test gaps, dependency failures, or feature work with clear design and expected near-term use. - =[#C]= *Useful but optional.* Low-risk cleanup, ergonomics, smoke tests, investigations with limited current impact, or feature work that would improve the setup but isn't yet a committed workflow. - =[#D]= *Someday or watchlist.* Speculative features, tiny polish, upstream tracking, optimizations without current pain, deferred ideas that shouldn't compete with active maintenance. The scheme is importance-driven with optional urgency lift. Priority signals "does this matter and when," not "how big" — effort lives in the tags. ** Tags Every task carries one *type tag* from this set: - =:feature:= — adds new capability. - =:chore:= — meta or housekeeping (tooling, sync, version bump, mechanical cleanup). - =:spec:= — design document, brainstorm output, or research-backed proposal that precedes implementation. - =:bug:= — fix to incorrect behavior. Optional *effort and autonomy tags* — orthogonal to type, both can apply on the same task: - =:quick:= — likely to take ≤30 minutes from start through verification. - =:solo:= — Claude can complete the work end to end, including verification, without input from Craig. Tags are assigned and refreshed by =task-audit=; =task-review= keeps them honest in passing. * Rulesets Open Work ** TODO [#C] Check that memories are sync'd across machines via git :spec: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: *** 2026-05-16 Sat @ 01:12:52 -0500 Spec #+begin_src cj: comment write the spec here. #+end_src *** 2026-05-14 Thu @ 19:14:11 -0500 Investigate current memory storage Memory files live at [[file:/home/cjennings/.claude/projects/-home-cjennings-code-rulesets/memory/][~/.claude/projects/-home-cjennings-code-rulesets/memory/]] — four files including =MEMORY.md= and three individual entries (=feedback_never_guess.md=, =project_ai_scripts_canonical_source.md=, =reference_pdftools_venv.md=). The directory is a plain unmanaged dir (no symlink, no enclosing git checkout). Neither [[file:/home/cjennings/.claude/][~/.claude/]] itself nor any subtree containing the project-memory dirs is tracked in [[file:/home/cjennings/code/archsetup/][archsetup]] or [[file:/home/cjennings/code/rulesets/][rulesets]]. Without a symlink into a stowed or tracked location, memory files don't survive a new machine setup or a dotfiles restore. Proposed setup: stow =~/.claude/projects= → =archsetup/dotfiles/common/.claude/projects/= (path doesn't exist yet — it's the target location pending VERIFY). Create the destination in archsetup, move existing per-project =projects//memory/= dirs there, run =stow= to link, then commit + push archsetup. After that, every machine running =stow= picks up the same memory tree. *** 2026-05-23 Sat @ 16:12:48 -0500 Decided: dedicated private repo, not stow Worked through dotfiles → rulesets → dedicated repo. Dropped stow/dotfiles (machine config, wrong cadence) and rulesets (it's pulled first in every session, so memory edits would dirty its tree and skip the startup =git pull --ff-only=). Chose a dedicated private repo on cjennings.net: storage is unified there while recall stays per-project (the encoded-cwd subdirs), since pooling recall would hurt relevance and risk work-private facts surfacing in personal-project artifacts. *** 2026-05-23 Sat @ 16:12:48 -0500 Shipped: claude-memory.git + folded symlinks Created bare =git@cjennings.net:claude-memory.git=, cloned to =~/.claude-memory= (later deleted in the reversal below), moved all 7 per-project =memory/= dirs in (54 files; work has 40) and replaced each live =~/.claude/projects//memory= with a folded dir-symlink so new memory lands in the clone and a push syncs it. Added =link-claude-memory.sh= (idempotent — recreates the symlinks on a new machine after clone) + README. Private repo, never GitHub (carries work/DeepSat memory). Initial import pushed (=f496370=). *** 2026-05-24 Sun @ 01:53:35 -0500 Reversed the migration — back to unmanaged per-project memory Cancelled the follow-up brainstorm and undid the dedicated-repo migration at Craig's call. Moved all 7 memory dirs back to =~/.claude/projects//memory/= (content preserved), deleted the =~/.claude-memory= clone, and deleted the bare =claude-memory.git= on the server. Memory is back to its original at-risk state, so the task reopens at [#C] pending a direction. The brainstorm landed on a two-tier idea for whenever this resumes: promote general lessons into a rulesets-tracked file symlinked into =~/.claude/rules/= (loaded into every project natively, one repo), and keep project-specific memory under each project's own =.ai/memory/= (committed where =.ai/= is tracked, at-risk where it's gitignored). Not implemented. ** TODO [#C] Build =create-documentation= skill for high-quality project/product docs :feature: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: Create a Claude skill named =create-documentation= that can plan, write, refresh, and review software documentation across README files, project docs, developer guides, API docs, operational docs, and generated/published doc sites. This is broader than =arch-document=. =arch-document= should remain the architecture-specific arc42 skill. =create-documentation= should know when to delegate to it for architecture documentation, but its main job is the full documentation system around a product or repo: onboarding, tutorials, how-to guides, reference, explanation, operations, troubleshooting, contribution, release/upgrade, and publication format. *** Why this matters The repo currently has strong skills for architecture, testing, review, debugging, and workflow. It does not have a general documentation skill that: - Chooses the right documentation type for the user need. - Audits existing docs against code and expected user journeys. - Creates a coherent doc map instead of dumping everything into =README.md=. - Writes in a consistent technical style. - Decides source/publish format intentionally (=.md=, =.org=, generated =.html=, OpenAPI, etc.). - Treats docs as a maintained product surface with verification, ownership, navigation, accessibility, and freshness checks. *** Research notes **** Documentation frameworks and best-practice sources - Diataxis separates documentation by reader need: - Tutorials: learning-oriented, take the reader by the hand. - How-to guides: task-oriented, solve a specific real problem. - Reference: information-oriented, accurate and complete lookup material. - Explanation: understanding-oriented, concepts, background, tradeoffs. Source: [[https://diataxis.fr/][Diataxis]] and the official guidance around tutorials/how-to/reference/explanation. - Django explicitly documents this same organization and teaches readers how to navigate it: tutorials for beginners, topic guides for concepts, reference for APIs, how-to guides for recipes. This is a major reason the docs feel navigable despite large scope. Source: [[https://docs.djangoproject.com/en/5.2/][Django documentation]] - Kubernetes separates concepts, tasks, tutorials, and reference. It also has current/previous-version docs, localization, contribution paths, and task-focused landing pages. Its docs are good at answering "what is this?" separately from "how do I do one thing?" Sources: [[https://kubernetes.io/docs/home/][Kubernetes docs home]], [[https://kubernetes.io/docs/tasks/][Kubernetes tasks]], [[https://kubernetes.io/docs/tutorials/][Kubernetes tutorials]] - Write the Docs emphasizes docs that are precursory, participatory, exemplary, consistent, current, discoverable, addressable, cumulative, and comprehensive. Especially important: incorrect docs are worse than missing docs, and examples should cover common use cases without overwhelming the reference. Source: [[https://www.writethedocs.org/guide/writing/docs-principles/][Write the Docs principles]] - Google developer docs guidance emphasizes project-specific style first, clarity and consistency, conversational but not frivolous tone, active voice, second person, descriptive links, global audience, accessibility, sentence case headings, numbered lists for procedures, code font for code, and alt text for images. Sources: [[https://developers.google.com/style/][Google developer documentation style guide]], [[https://developers.google.com/style/highlights][Google style highlights]], [[https://developers.google.com/style/accessibility][Google accessible docs]] - Google's doc best-practices page adds a pragmatic maintenance principle: minimum viable documentation, update docs with code, delete dead docs, prefer good over perfect, tell the story of code, and avoid duplication. Source: [[https://google.github.io/styleguide/docguide/best_practices.html][Google documentation best practices]] - The Good Docs Project is useful as a template source, especially for README, how-to, tutorial, concept, reference, troubleshooting, contributor, and release-note patterns. Do not vendor wholesale; use as prior art. Source: [[https://www.thegooddocsproject.dev/][The Good Docs Project]] **** Praised project docs to analyze and steal from ***** Django Why it works: - It labels the doc types directly and explains when to use each. - It has a beginner path, advanced tutorials, topic guides, API reference, how-to recipes, deployment, security, testing, release notes, and community help in one coherent index. - It is versioned, so readers know which framework version the docs target. - It cross-links introductory material to deeper references without making the first page a wall of every detail. Patterns to use: - Make the top-level docs home a routing page by reader intent. - Put "How these docs are organized" near the top when the doc set is large. - Split concept, task, tutorial, and reference instead of mixing them. - Include "getting help" and "not found?" paths so the docs have an exit ramp. Source: [[https://docs.djangoproject.com/en/5.2/][Django documentation]] ***** Kubernetes Why it works: - It has a large, complex product but maintains separate lanes for Concepts, Tasks, Tutorials, Reference, and Contribute. - Task pages are short sequences for one operation; tutorials are larger goals with several sections. This prevents "one page tries to teach everything." - It exposes version state clearly, including static old versions and current docs. - It supports localization and documentation contribution, which makes the docs a product surface rather than a side artifact. Patterns to use: - For platform or infrastructure docs, include Concepts / Tasks / Tutorials / Reference as first-class folders. - Create version/freshness metadata when docs are tied to released software. - Add doc contribution guidance for projects with external contributors. - Make operational tasks discoverable by category, not just search. Sources: [[https://kubernetes.io/docs/home/][Kubernetes docs home]], [[https://kubernetes.io/docs/tasks/][Kubernetes tasks]] ***** Rust Why it works: - Rust has a "bookshelf" rather than one overloaded manual: The Book, Rust by Example, standard library API reference, Reference, Cargo Guide, Error Index, Rustonomicon, release notes, platform support, policies, etc. - The learning path is honest about audience: "assume programmed before, not in any specific language." - Reference and learning material are separated. Advanced unsafe guidance gets its own book. - Offline docs via =rustup doc= are treated as part of the product. Patterns to use: - For broad ecosystems, create a documentation bookshelf rather than a single mega-doc. - Separate beginner path, examples, formal reference, advanced/unsafe topics, tooling docs, error index, release notes, and policies. - Document assumptions about reader experience. - Consider offline/local docs for CLI/library ecosystems. Source: [[https://doc.rust-lang.org/][Rust documentation]] ***** Stripe API docs Why it works: - The API reference is organized around resources and common cross-cutting concerns: authentication, errors, idempotency, pagination, request IDs, versioning, metadata, connected accounts. - It pairs prose with concrete request/response examples and client-library language selection. - It exposes test-mode vs live-mode distinctions early. - It offers "Copy for LLM" / "View as Markdown", which acknowledges modern consumption patterns without sacrificing normal docs UX. - Its reputation comes from matching developer mental models and making the common path implementable quickly, not just visual polish. Patterns to use: - API docs should be generated from or checked against OpenAPI/JSON schema or source annotations wherever possible. - Keep cross-cutting API behavior near the front, before endpoint lists. - Include runnable examples, auth, errors, pagination, versioning, idempotency, and sandbox/test data. - Consider LLM-friendly exports (=llms.txt=, "view as Markdown", stable anchors), but do not make the docs only for AI. Source: [[https://docs.stripe.com/api][Stripe API Reference]] ***** FastAPI Why it works: - Documentation is part of the framework's value proposition: OpenAPI and JSON Schema drive interactive Swagger UI and ReDoc automatically. - It reduces manual drift for API reference by deriving docs from typed code. - It integrates examples and tutorial-style explanations with standards-based generated reference. Patterns to use: - Prefer generated API reference from code/specs over hand-maintained endpoint tables. - Generated docs need human-written overview, concepts, authentication, examples, and operational guidance around them. - The skill should identify when an OpenAPI/Swagger/ReDoc/Scalar route already exists and improve metadata/schema quality instead of creating duplicate manual docs. Source: [[https://fastapi.tiangolo.com/features/][FastAPI features]] *** Format and presentation decisions **** Default source format: Markdown Use =.md= as the default for shared project documentation when: - The repo is on GitHub/GitLab/Forgejo and readers browse docs in the web UI. - The project already uses MkDocs, Docusaurus, VitePress, Sphinx+MyST, Jekyll, GitHub Pages, or plain README-driven docs. - Contributors are expected to edit docs without Emacs-specific tooling. - The docs need easy static-site publishing. - The content is README, tutorial, how-to, reference, troubleshooting, contributing, release notes, runbooks, or ordinary prose + code blocks. Markdown source works well because it is low-friction, reviewable in diffs, rendered by repository hosts, and supported by documentation site generators. MkDocs is a good reference point: Markdown source, YAML config, built-in dev server, static HTML output, and easy hosting. Source: [[https://www.mkdocs.org/][MkDocs]] **** Use Org when the document is Emacs-native or personal/planning-heavy Use =.org= when: - The user's workflow is explicitly Emacs/org-mode. - The document contains TODO states, schedules, priorities, tags, agenda integration, property drawers, clocking, or personal planning. - The document is an internal strategy/planning artifact such as V2MOM, research notes, meeting notes, task triage, or a living personal operating document. - The output may later be exported, but the source of truth is intended to be edited in org-mode. Do not default team-facing documentation to =.org= unless the team already uses org-mode. Org can export to HTML, but that does not make it the right authoring format for non-Emacs contributors. Sources: [[https://orgmode.org/org.html][Org manual]], [[https://orgmode.org/worg/org-tutorials/org-publish-html-tutorial.html][Org publish HTML tutorial]] **** Use HTML as generated/published output, rarely as hand-authored source Use =.html= when: - The deliverable is a published static documentation site. - The document needs interactive widgets, embedded API consoles, custom layout, or generated navigation/search. - The project already publishes docs as a website. - The target audience needs searchable, browsable, linkable pages rather than repo-local files. Prefer generated HTML from Markdown/Org/reStructuredText/AsciiDoc/OpenAPI over hand-authored HTML. Hand-edit HTML only for standalone artifacts, custom landing pages, or cases where the project already treats HTML templates as docs source. **** Consider generated/spec-backed formats Use generated reference when possible: - API reference: OpenAPI/Swagger/ReDoc/Scalar from code/spec. - CLI reference: generated from command parser/help output. - Library API reference: language-native doc tools such as rustdoc, pydoc, TypeDoc, JSDoc, Go doc, Sphinx autodoc, etc. - Config reference: generated from schema, types, or validated defaults. The skill should not duplicate generated reference by hand. It should improve source comments, schema descriptions, examples, front matter, and surrounding guides. **** Presentation requirements Every generated doc set should have: - A docs home or README that routes by reader intent. - Stable headings and anchors for addressability. - Descriptive link text, no "click here." - Search/navigation plan when docs exceed a handful of pages. - Version/freshness metadata when tied to released software. - Ownership/review cadence for docs likely to rot. - Accessible structure: semantic headings, alt text, no image-only info, tables only when appropriate, left-aligned text, readable code blocks. - Copyable commands and code examples. - "What changed?" / release notes / migration path when docs describe a new or changed behavior. - Troubleshooting path for common failures. - Clear prerequisites before procedures. - Verification steps after procedures. - Support/escalation path when the docs do not answer the question. - Optional LLM-friendly surfaces for larger doc sets: =llms.txt=, "copy as Markdown" equivalents, concise page summaries, and stable anchors. *** Proposed skill design **** Skill name and trigger Name: =create-documentation= Trigger when the user asks to: - create documentation, docs, README, guide, manual, runbook, tutorial, quickstart, API docs, CLI docs, troubleshooting docs, contributor docs, architecture-adjacent docs, release notes, upgrade guide, or doc site; - improve, audit, reorganize, or publish existing docs; - decide documentation structure or format for a project. Do not trigger for: - architecture-only arc42 docs when =arch-document= is the direct fit; - ADR creation (=arch-decide=); - design docs before implementation shape is known (=brainstorm= or =arch-design=); - prose polishing only (future writing/humanizer skill); - inline code comments/docstrings only, unless the user asks to create docs from them. **** V1 should be one orchestrating skill, not many separate skills Build v1 as one skill with explicit phases and subcommands rather than a set of separate skills. Rationale: - Documentation tasks often start ambiguous; the first job is classification. - Splitting too early creates command-discovery burden. - A single skill can dispatch to existing specialized skills (=arch-document=, =c4-diagram=, =security-check=, =playwright-js/py= for doc-site verification) without making users choose the internal pipeline. Support discoverable subcommands inside one skill: #+begin_example /create-documentation audit /create-documentation plan /create-documentation write /create-documentation refresh /create-documentation publish /create-documentation review #+end_example The default =/create-documentation = runs audit -> plan -> write -> review, asking for confirmation before broad rewrites. **** Future split if v1 gets too large If the skill grows past a manageable size, split into a discoverable =documentation-*= chain. Names and order: 1. =documentation-audit= — inventory existing docs, code/docs drift, reader journeys, missing doc types, stale/generated docs. 2. =documentation-plan= — choose audiences, doc map, formats, source of truth, publishing path, ownership, and freshness policy. 3. =documentation-write= — write or update the selected docs. 4. =documentation-reference= — generate or improve API/CLI/config/library reference from source/spec. 5. =documentation-publish= — configure MkDocs/Docusaurus/Sphinx/GitHub Pages or equivalent, build static HTML, verify links/search. 6. =documentation-review= — quality gate for accuracy, style, navigation, accessibility, examples, and freshness. Keep =create-documentation= as the orchestrator and user-facing entry point. The chain is discoverable because every helper starts with =documentation-= and the orchestrator prints the next command at each handoff. *** V1 workflow details **** Phase 1: Intake and classification Ask only what is missing from local context: - Who is the reader? New user, evaluator, integrator, maintainer, operator, contributor, auditor, support engineer? - What is the reader trying to do or understand? - Is this for a public project, internal team, personal workflow, regulated audience, or customer-facing product? - Is the output repo-browsed, web-published, printed/exported, or Emacs-native? - Is there existing code, existing docs, an API spec, generated reference, or only a concept? - What is the maintenance expectation? One-off, release-maintained, continuously updated? Classify the work into one or more doc types: - README / landing page. - Quickstart. - Tutorial. - How-to guide. - Concept/explanation. - API reference. - CLI reference. - Configuration reference. - Architecture docs (delegate to =arch-document= if arc42/C4/ADR-driven). - Operations/runbook. - Troubleshooting/FAQ. - Upgrade/migration/release notes. - Contributor/development docs. - Security/compliance docs. - Examples/cookbook. **** Phase 2: Audit existing material Inventory: - =README*=, =docs/=, =doc/=, =site/=, =mkdocs.yml=, =docusaurus.config.*=, =vitepress=, =sphinx=, =docs.rs=, =pkg.go.dev=, OpenAPI specs, generated docs folders, GitHub Pages config, ADRs, architecture docs, examples, scripts, CLI help, package metadata. - Existing doc type coverage: tutorial/how-to/reference/explanation. - Broken links, stale version numbers, commands that no longer exist, screenshots that may be stale, code snippets not exercised, doc/code drift. - Source of truth for generated docs. Flag generated files; do not hand-edit them until source is known. - Reader journey gaps: "new user can install?", "first success path?", "operator can recover?", "contributor can run tests?", "API consumer can authenticate and handle errors?" Use =rg= first. For API/CLI reference, prefer structured sources: OpenAPI/JSON Schema, package metadata, command =--help= output, docstrings, or language-native documentation tooling. **** Phase 3: Documentation plan Write a short plan before broad edits: - Audiences and priority order. - Proposed doc map/tree. - Doc type for each page. - Source format decision: =.md= / =.org= / generated spec / generated HTML. - Publishing target, if any. - Existing docs to preserve, move, merge, or delete. - Generated-reference strategy. - Ownership and freshness policy. - Verification plan. Stop for confirmation when the plan moves or rewrites more than one file. **** Phase 4: Write or update docs Writing rules: - Lead with the reader's goal, not the implementation history. - Put prerequisites before steps. - Use numbered lists for procedures. - Use bullets for non-ordered choices. - Use active voice and second person for instructions. - Keep sentences short and globally readable. - Define acronyms on first use. - Use code font for commands, file names, env vars, API names, and literals. - Use descriptive links. - Prefer examples that cover the common path and one meaningful edge/error path. - Separate examples/tutorials from dense reference. - Avoid stale duplication: link to canonical generated reference instead of copying it. - Include expected output after commands where it helps verification. - Include cleanup/rollback steps when procedures change state. - Include troubleshooting for common failures. - Avoid marketing voice in technical docs. State capability and constraints plainly. - No AI attribution in docs, examples, comments, generated pages, footers, or screenshots. Page skeletons: README / docs home: #+begin_example # ## Start here - New user: - Existing user with a task: - API lookup: - Maintainer/operator: ## Quick example ... ## Documentation map ... ## Support / contributing ... #+end_example Tutorial: #+begin_example # Tutorial: ## What you'll build ## Prerequisites ## Step 1 ... ## Checkpoint ## Step 2 ... ## What you learned ## Next #+end_example How-to: #+begin_example # How to ## When to use this ## Prerequisites ## Steps ## Verify ## Troubleshooting ## Related #+end_example Reference: #+begin_example # reference ## Summary ## Parameters / options / fields ## Behavior ## Errors ## Examples ## Version notes #+end_example Explanation: #+begin_example # ## Problem it solves ## Mental model ## How it fits with related concepts ## Tradeoffs and constraints ## Further reading #+end_example Runbook: #+begin_example # Runbook: ## Scope ## Preconditions ## Normal procedure ## Verification ## Rollback ## Alerts and escalation ## Post-incident notes #+end_example **** Phase 5: Presentation and publishing If docs are repo-local only: - Ensure links render on GitHub/GitLab. - Keep relative links stable. - Add an index if more than 4-5 docs exist. If docs are web-published: - Detect existing generator and follow it. - Prefer project-native tooling over introducing MkDocs/Docusaurus/Sphinx. - If no tooling exists and user wants a site, choose conservatively: - Python/simple repo: MkDocs Material is a pragmatic default. - JS/React ecosystem: Docusaurus or VitePress if already in stack. - Python libraries: Sphinx or MkDocs depending on existing ecosystem. - API docs: ReDoc/Swagger/Scalar from OpenAPI. - Build locally if dependencies exist. - Check links, nav, search, mobile viewport, and accessibility basics. - Do not commit generated =site/= output unless the project already does. **** Phase 6: Verification Verification should match doc type: - Commands in quickstarts/how-tos: run them or mark not run with reason. - Code snippets: compile/run where feasible, or use fenced language and note assumptions. - API docs: validate OpenAPI/spec if tooling exists. - Links: run link checker if configured; otherwise sample-check changed links. - Published site: build docs and inspect output. - Screenshots: verify current UI if included. - Generated docs: regenerate from source and confirm no unexpected diff. Final report must say: - Files created/changed. - Doc types covered. - Format/source-of-truth decisions. - What was verified. - What could not be verified. - Known gaps/follow-ups. *** Relationship to existing skills - =arch-document=: use when the requested docs are specifically architecture docs from brief + ADRs + C4/arc42. =create-documentation= may call it, then wrap the output in a broader docs map. - =c4-analyze= / =c4-diagram=: use for diagrams in architecture or concept docs when visual structure helps. - =brainstorm=: use before =create-documentation= when the product/feature itself is still unclear. - =arch-design= / =arch-decide=: use when documentation reveals missing architectural choices. - =security-check=: use when docs include security guidance, auth, secrets, deployment, or compliance claims. - =playwright-js= / =playwright-py=: use to verify published doc sites, interactive docs, screenshots, and browser-rendered examples. - =codify=: use after a documentation session reveals reusable project-specific documentation rules. *** Quality bar and anti-patterns The skill should reject: - A giant README that mixes tutorial, reference, architecture, and operations. - Duplicating generated API/CLI/config reference by hand. - Unverified commands in quickstarts without a "not run" note. - Screenshots with no alt text or no update path. - Tables used for layout instead of actual tabular data. - "Overview" pages that do not route readers to tasks. - Tutorials that become reference dumps. - How-to guides that explain concepts for pages before giving steps. - Reference pages that hide required options in prose. - Marketing claims without concrete examples. - Docs that mention local private paths, personal tooling, or AI attribution in public artifacts. - Publishing generated HTML as source unless the project explicitly owns HTML docs that way. *** Acceptance criteria for building the skill - [ ] Directory =create-documentation/= with =SKILL.md=. - [ ] Frontmatter description includes positive and negative triggers. - [ ] Skill body includes the V1 phases above. - [ ] Includes a source-format decision table for =.md= / =.org= / =.html= / generated spec/reference. - [ ] Includes doc-type classifier based on Diataxis plus README/runbook/API additions. - [ ] Includes examples/skeletons for README, tutorial, how-to, reference, explanation, runbook, troubleshooting, contributor docs, and API overview. - [ ] Includes audit checklist for existing repos. - [ ] Includes publishing guidance without hardcoding one static-site tool. - [ ] Includes verification checklist and "unable to verify" reporting. - [ ] Cross-references =arch-document=, =brainstorm=, =security-check=, =playwright-js=, =playwright-py=, and =codify=. - [ ] Adds =references/= only if needed; suggested files: - =references/doc-type-decision.md= - =references/style-guide.md= - =references/format-decision.md= - =references/page-skeletons.md= - =references/doc-audit-checklist.md= - [ ] Keep =SKILL.md= concise enough to load; move long skeletons/checklists to references for progressive disclosure. - [ ] Run =./scripts/lint.sh= after adding the skill. *** Open design questions before implementation - Should the user-facing command be exactly =/create-documentation= while internal helper names use =documentation-*=, or should all names share the =create-documentation = form? Recommendation: one skill with subcommands for v1. - Should Markdown be the hard default for team docs? Recommendation: yes, unless the project already uses org/reST/AsciiDoc or the output is personal Emacs-native planning. - Should the skill create a docs site automatically? Recommendation: no. It should propose a site when the doc set exceeds README-scale or when search, versioning, or public publishing is required. Ask before adding tooling. - Should it write docs before code exists? Recommendation: yes for specs, user journeys, and design docs, but route unclear feature/product decisions through =brainstorm= or =arch-design= first. - Should it include LLM-specific docs surfaces? Recommendation: optional for public/library/API docs: =llms.txt= or markdown export is valuable, but normal human navigation remains primary. ** TODO [#C] Build =/update-skills= skill for keeping forks in sync with upstream :feature: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: The rulesets repo has a growing set of forks (=arch-decide= from wshobson/agents, =playwright-js= from lackeyjb/playwright-skill, =playwright-py= from anthropics/skills/webapp-testing). Over time, upstream releases fixes, new templates, or scope expansions that we'd want to pull in without losing our local modifications. A skill should handle this deliberately rather than by manual re-cloning. *** 2026-05-16 Sat @ 01:14:40 -0500 Specification #+begin_src cj: comment write the specification here. #+end_src *** 2026-05-16 Sat @ 01:14:20 -0500 original goals and decisions **** Design decisions (agreed) - *Upstream tracking:* per-fork manifest =.skill-upstream= (YAML or JSON): - =url= (GitHub URL) - =ref= (branch or tag) - =subpath= (path inside the upstream repo when it's a monorepo) - =last_synced_commit= (updated on successful sync) - *Local modifications:* 3-way merge. Requires a pristine baseline snapshot of the upstream-at-time-of-fork. Store under =.skill-upstream/baseline/= or similar; committed to the rulesets repo so the merge base is reproducible. - *Apply changes:* skill edits files directly with per-file confirmation. - *Conflict policy:* per-hunk prompt inside the skill. When a 3-way merge produces a conflict, the skill walks each conflicting hunk and asks Craig: keep-local / take-upstream / both / skip. Editor-independent; works on machines where Emacs isn't available. Fallback when baseline is missing or corrupt (can't run 3-way merge): write =.local=, =.upstream=, =.baseline= files side-by-side and surface as manual review. **** V1 Scope - [ ] Skill at =~/code/rulesets/update-skills/= - [ ] Discovery: scan sibling skill dirs for =.skill-upstream= manifests - [ ] Helper script (bash or python) to: - Clone each upstream at =ref= shallowly into =/tmp/= - Compare current skill state vs latest upstream vs stored baseline - Classify each file: =unchanged= / =upstream-only= / =local-only= / =both-changed= - For =both-changed=: run =git merge-file --stdout =; if clean, write result directly; if conflicts, parse the conflict-marker output and feed each hunk into the per-hunk prompt loop - [ ] Per-hunk prompt loop: - Show base / local / upstream side-by-side for each conflicting hunk - Ask: keep-local / take-upstream / both (concatenate) / skip (leave marker) - Assemble resolved hunks into the final file content - [ ] Per-fork summary output with file-level classification table - [ ] Per-file confirmation flow (yes / no / show-diff) BEFORE per-hunk loop - [ ] On successful sync: update =last_synced_commit= in the manifest - [ ] =--dry-run= to preview without writing **** V2+ (deferred) - [ ] Track upstream *releases* (tags) not just branches, so skill can propose "upgrade from v1.2 to v1.3" with release notes pulled in - [ ] Generate patch files as an alternative apply method (for users who prefer =git apply= / =patch= over in-place edits) - [ ] Non-interactive mode (=--non-interactive= / CI): skip conflict resolution, emit side-by-side files for later manual review - [ ] Auto-run on a schedule via Claude Code background agent - [ ] Summary of aggregate upstream activity across all forks (which forks have upstream changes waiting, which don't) - [ ] Optional editor integration: on machines with Emacs, offer =M-x smerge-ediff= as an alternate path for users who prefer ediff over per-hunk prompts **** Initial forks to enumerate (for manifest bootstrap) - [ ] =arch-decide= → =wshobson/agents= :: =plugins/documentation-generation/skills/architecture-decision-records= :: MIT - [ ] =playwright-js= → =lackeyjb/playwright-skill= :: =skills/playwright-skill= :: MIT - [ ] =playwright-py= → =anthropics/skills= :: =skills/webapp-testing= :: Apache-2.0 **** Open questions - [ ] What happens when upstream *renames* a file we fork? Skill would see "file gone from upstream, still present locally" — drop, keep, or prompt? - [ ] What happens when upstream splits into multiple forks (e.g., a plugin reshuffles its structure)? Probably out of scope for v1; manual migration. - [ ] Rate-limit / offline mode: if GitHub is unreachable, should skill fail or degrade gracefully? Likely degrade; print warning per fork. ** TODO [#C] Build /research-writer — clean-room synthesis for research-backed long-form :feature: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: Gap in current rulesets: between =brainstorm= (idea refinement → design doc) and =arch-document= (arc42 technical docs), there's no skill for research-backed long-form prose — blog posts, essays, white papers, proposals with data backing, article-length content with citations. Craig writes documents across many contexts (defense-contractor work, personal, technical, proposals). The gap is real. *Evaluated 2026-04-19:* ComposioHQ/awesome-claude-skills has a =content-research-writer= skill (540 lines, 14 KB) that attempts this. *Not adopting:* - Parent repo has no LICENSE file — reuse legally ambiguous - Bloated: 540 lines of prose-scaffolding with no tooling - No citation-style enforcement (APA/Chicago/IEEE/MLA) - No source-quality heuristics (primary vs secondary, peer-review, recency) - Fictional example citations in the skill itself (models the hallucination failure mode a citation-focused skill should prevent) - No citation-verification step - Overlaps with =humanizer= at polish with no composition guidance *Patterns worth lifting clean-room (from their better parts):* - Folder convention =~/writing//= with =outline.md=, =research.md=, versioned drafts, =sources/= - Section-by-section feedback loop (outline validated → per-section research validated → per-section draft validated) - Hook alternatives pattern (generate three hook variants with rationale) *Additions for the clean-room version (v1):* - Citation-style selection (APA / Chicago / MLA / IEEE / custom) with style-specific examples and a pick-one step up front - Source-quality heuristics: primary > secondary; peer-reviewed; recency thresholds by domain; publisher reputation; funding transparency - Citation-verification discipline: fetch real sources, never fabricate, mark unverifiable claims with =[citation needed]= rather than inventing - Composition hand-off to =/humanizer= at the polish stage - Classification awareness: if the working directory or context signals defense / regulated territory, flag any sentence that might touch CUI or classified material before emission *Target:* ~150-200 lines, clean-room per blanket policy. *When to build:* wait for a real research-writing task to validate the design against actual document patterns. Building preemptively risks tuning for my guess at Craig's workflow rather than his real one. Triggers that would prompt "let's build it now": - Starting a white paper / proposal that needs citation discipline - Writing a technical blog post with external references - A pattern of hitting the same research-writing friction 3+ times Upstream reference (do not vendor): ComposioHQ/awesome-claude-skills =content-research-writer/SKILL.md=. ** TODO [#C] Try Skill Seekers on a real DeepSat docs-briefing need :chore: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: =Skill Seekers= ([[https://github.com/yusufkaraaslan/Skill_Seekers]]) is a Python CLI + MCP server that ingests 18 source types (docs sites, PDFs, GitHub repos, YouTube videos, Confluence, Notion, OpenAPI specs, etc.) and exports to 20+ AI targets including Claude skills. MIT licensed, 12.9k stars, active as of 2026-04-12. *Evaluated: 2026-04-19 — not adopted for rulesets.* Generates *reference-style* skills (encyclopedic dumps of scraped source material), not *operational* skills (opinionated how-we-do-things content). Doesn't fit the rulesets curation pattern. *Next-trigger experiment (this TODO):* the next time a DeepSat task needs Claude briefed deeply on a specific library, API, or docs site — try: #+begin_src bash pip install skill-seekers skill-seekers create --target claude #+end_src Measure output quality vs hand-curated briefing. If usable, consider installing as a persistent tool. If output is bloated / under-structured, discard and stick with hand briefing. *Candidate first experiments (pick one from an actual need, don't invent):* - A Django ORM reference skill scoped to the version DeepSat pins - An OpenAPI-to-skill conversion for a partner-vendor API - A React hooks reference skill for the frontend team's current patterns - A specific AWS service's docs (e.g. GovCloud-flavored) *Patterns worth borrowing into rulesets even without adopting the tool:* - Enhancement-via-agent pipeline (scrape raw → LLM pass → structured SKILL.md). Applicable if we ever build internal-docs-to-skill tooling. - Multi-target export abstraction (one knowledge extraction → many output formats). Clean design for any future multi-AI-tool workflow. *Concerns to verify on actual use:* - =LICENSE= has an unfilled =[Your Name/Username]= placeholder (MIT is unambiguous, but sloppy for a 12k-star project) - Default branch is =development=, not =main= — pin with care - Heavy commercialization signals (website at skillseekersweb.com, Trendshift promo, branded badges) — license might shift later; watch - Companion =skill-seekers-configs= community repo has only 8 stars despite main's 12.9k — ecosystem thinner than headline adoption ** TODO [#C] Revisit =c4-*= rename if a second notation skill ships :chore: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: Current naming keeps =c4-analyze= and =c4-diagram= as-is (framework prefix encodes the notation; "C4" is a discoverable brand). Suite membership is surfaced via the description footer, not the name. If a second notation-specific skill ever lands (=uml-*=, =erd-*=, =arc42-*=), the compound pattern =arch-analyze-= / =arch-diagram-= starts paying off: alphabetical clustering under 'a' amortizes across three+ skills, and the hierarchy becomes regular. At that point, rename all notation skills together in one pass. Trigger: adding skill #2 in the notation family. Don't pre-rename. Candidate future notation skills (not yet in scope — noted for when a real need arrives, not pre-emptively): - *UML* (Unified Modeling Language): OO design notation, 14 diagram types in practice dominated by class / sequence / state / component. Common in DoD / safety-critical / enterprise-architecture contexts. Tooling: PlantUML (text-to-diagram), Mermaid UML, draw.io. Would likely split into =uml-class=, =uml-sequence=, =uml-state= rather than one monolith — different audiences, different inputs. - *ERD* (Entity-Relationship Diagram): database schema modeling — entities, attributes, cardinality. Crow's Foot notation dominates practice; Chen is academic; IDEF1X is DoD-standard. Tooling: dbdiagram.io, Mermaid ERD, PlantUML, ERAlchemy (code-to-ERD for SQL). Natural fit as =erd-analyze= (extract from schema/migrations) and =erd-diagram= (generate from prose/model definitions). - *arc42*: already partially covered by =arch-document= (which emits arc42-structured docs). A standalone =arc42-*= skill would be redundant unless the arc42-specific visualizations need separation. Each answers a different question: - C4 → "What systems exist and how do they talk, at what zoom?" - UML class/sequence → "What does the code look like / what happens when X runs?" - ERD → "What's the database shape?" - arc42 → "What's the full architecture document?" Deferred pending an actual need that's blocked on not having one of these. *** DoD-specific notations (DeepSat context) Defense-contractor work uses a narrower, different notation set than commercial software. Document the trigger conditions and starting point so a future decision to build doesn't have to re-derive the landscape. **** SysML (Systems Modeling Language) UML 2 profile, dominant in DoD systems engineering. Six diagrams account for ~all practical use: - *Block Definition Diagram (BDD)* — structural; like UML class but for system blocks (components, subsystems, hardware). - *Internal Block Diagram (IBD)* — parts within a block and how they connect (flow ports, interfaces). - *Requirement diagram* — unique to SysML; traces requirements to satisfying blocks. Essential in regulated environments. - *Activity diagram* — behavioral flow. - *State machine* — same shape as UML. - *Sequence diagram* — same shape as UML. SysML v1.x is in the field; v2 is emerging but not yet adopted at scale (as of 2026-04). Tooling dominated by Cameo Systems Modeler / MagicDraw and Enterprise Architect. Text-based option: PlantUML + =plantuml-sysml= (git-friendly, growing niche). *Candidate skills*: =sysml-bdd=, =sysml-ibd=, =sysml-requirement=, =sysml-sequence=. Three or more in this cluster triggers the =arch-*-= rename discussion from the parent entry. **** DoDAF / UAF (architecture frameworks) Not notations themselves — frameworks that specify *which* viewpoints a program must deliver. Viewpoints are rendered using UML/SysML diagrams. - *DoDAF (DoD Architecture Framework)* — legacy but still contract-required on many programs. - *UAF (Unified Architecture Framework)* — DoDAF/MODAF successor, SysML-based. Gaining adoption on newer contracts. Common required viewpoints (formal CDRL deliverables or PDR/CDR review packages): - *OV-1* — High-Level Operational Concept Graphic. The "cartoon" showing the system in operational context with icons, arrows, surrounding actors/environment. *Universally asked for — informal or formal.* Starting point for any DoD diagram skill. - *OV-2* — Operational resource flows (nodes and flows). - *OV-5a/b* — Operational activities. - *SV-1* — Systems interfaces. Maps closely to C4 Container. - *SV-2* — Systems resource flows. - *SV-4* — Systems functionality. - *SV-10b* — Systems state transitions. *Informal ask ("send me an architecture diagram") → OV-1 + SV-1 satisfies 90% of the time.* Formal CDRL asks specify the viewpoint set contractually. *C4 gap*: C4 is rare in DoD. C4 System Context ≈ OV-1 in intent but not in visual convention. C4 Container ≈ SV-1. Expect a mapping step or reviewer pushback if delivering C4-shaped artifacts to a DoD audience. *Candidate skills*: =dodaf-ov1=, =dodaf-sv1= first (highest-value); =uaf-viewpoint= if newer contracts require UAF. **** IDEF1X (data modeling) FIPS 184 — federal standard for data modeling. Used in classified DoD data systems, intelligence databases, and anywhere the government specifies the data model. Same shape language as Crow's Foot but with different adornments and notation conventions. *Rule of thumb*: classified DoD data work → IDEF1X; unclassified contractor work → Crow's Foot unless the contract specifies otherwise. *Candidate skills*: =idef1x-diagram= / =idef1x-analyze= (parallel to a future =erd-diagram= / =erd-analyze= pair). **** Tooling baseline - *Cameo Systems Modeler / MagicDraw* (Dassault) — commercial SysML dominant in DoD programs. - *Enterprise Architect (Sparx)* — widely used for UML + SysML + DoDAF. - *Rhapsody (IBM)* — SysML with code generation; strong in avionics / embedded (FACE, ARINC). - *Papyrus (Eclipse)* — open source SysML; free but clunkier. - *PlantUML + plantuml-sysml* — text-based, version-controllable. Fits a git-centric workflow better than any GUI tool. **** Highest-value starting point If DeepSat contracts regularly require architecture deliverables, the highest-ROI first skill is =dodaf-ov1= (or whatever naming convention the rename discussion lands on). OV-1 is the universal currency in briefings, proposals, and reviews; it's the one artifact that shows up in every program regardless of contract specifics. Trigger for building: an actual DoD deliverable that's blocked on not having a skill to generate or check OV-1-shaped artifacts. Don't build speculatively — defense-specific notations are narrow enough that each skill should be driven by a concrete contract need, not aspiration. ** TODO [#C] Token-rotation helper for =@a-bonus/google-docs-mcp= OAuth refresh :feature:quick: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: When a Google refresh token gets revoked (re-grant scopes, removed Connected App, account password reset), recovery is currently manual: run =npx -y @a-bonus/google-docs-mcp= with the right env, follow the URL in a browser, kill the process, base64-encode the new =token.json=, decrypt =secrets.env.gpg=, replace the var, re-encrypt. A small =mcp/refresh-google-docs-token.sh = would chain that into one command. *** Sketch #+begin_src bash # usage: mcp/refresh-google-docs-token.sh personal profile="$1" gpg -d ... | grep -v "GOOGLE_DOCS_${profile^^}_TOKEN_B64" > /tmp/secrets.env.tmp GOOGLE_MCP_PROFILE="$profile" npx -y @a-bonus/google-docs-mcp & xdg-open # wait for ~/.config/google-docs-mcp/$profile/token.json to land kill %1 echo "GOOGLE_DOCS_${profile^^}_TOKEN_B64=$(base64 -w0 ~/.config/google-docs-mcp/$profile/token.json)" >> /tmp/secrets.env.tmp gpg -c --cipher-algo AES256 -o mcp/secrets.env.gpg.new /tmp/secrets.env.tmp mv mcp/secrets.env.gpg.new mcp/secrets.env.gpg rm /tmp/secrets.env.tmp #+end_src The flow tonight worked but took a handful of manual steps. One script collapses it. ** TODO [#C] Decide on category-3 rule copies in the deepsat tree :chore:quick: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: While symlinking personal-project =.claude/rules/= mirrors to the rulesets canonical on 2026-05-07, two locations didn't fit the "personal mirror → symlink" pattern and were left untouched pending judgment: - =~/projects/work/deepsat/code/coding-rulesets/claude-rules/{testing,verification}.md= — looks like a vendored team-shared copy. - =~/projects/work/deepsat/code/orchestration_dashboard_mvp/.claude/rules/{testing,verification}.md= — could be project-specific overrides. For each: read the file, diff against the rulesets canonical, decide whether it's an intentional diverge (leave alone), stale (sync content), or should canonicalize (replace with symlink and accept the cross-repo dependency). The orchestration_dashboard_mvp pair is the project where Vrezh's PR review surfaced this whole thread, so any decision there has team-visibility implications. ** TODO [#C] Audit language-specific rule files for cross-project duplication :chore: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: The four canonical rules (=commits=, =testing=, =verification=, =subagents=) are now symlinked across the five personal-project mirrors as of 2026-05-07. But several language-specific rule files exist in multiple project mirrors and may be duplicated or drifted: - =python-testing.md= in =~/projects/work/.claude/rules/= - =typescript-testing.md= in =~/projects/work/deepsat/code/.claude/rules/= - =elisp-testing.md= and =elisp.md= in =~/.emacs.d/=, =~/code/gloss/=, =~/code/chime/= The Elisp pair is the most suspicious — three repos using essentially the same rules. Audit: diff these across the projects, check for drift, then decide whether to canonicalize them under =~/code/rulesets/claude-rules/languages//= and symlink, or leave them as project-local. ** TODO [#C] Refactor =daily-prep.org= to delegate to =triage-intake.org= for the triage section :chore: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: =daily-prep.org= still does its own inline triage (Gmail × 3 accounts, Slack, Linear, GHE PRs, calendars) as part of the full prep flow. =triage-intake.org= is now a source-agnostic engine that loads =triage-intake..org= plugins (refactored 2026-05-26), so daily-prep could call the engine and consume its synthesis instead of duplicating the source-scan logic. That DRYs up a large workflow and keeps both flows in sync when sources change — a source change now lives in one plugin that both flows pick up. Scope: - Identify the sections in =daily-prep.org= that do the inline triage (the email / Slack / Linear / PR / calendar fan-out, plus the "Sources checked: ..." footer at the top of each generated prep doc). - Replace those sections with "run the =triage-intake.org= engine" and adapt the downstream sections (Heads-up, Day's Priorities, Carry-forwards) to read the engine's synthesis output rather than the inline scan results. - Verify the generated prep doc still has the same shape (Heads-up + Day's Priorities + Carry-forwards + Sources checked). - Reconcile source coverage: daily-prep's inline triage scans work accounts (3 Gmail, Slack, Linear, GHE PRs) that are project-specific plugins under =.ai/project-workflows/=, not general plugins. The delegation must ensure the engine loads those project plugins (Phase 0 globs both dirs) so nothing daily-prep currently scans drops out. Origin: came up while authoring =triage-intake.org= on 2026-05-11; body refreshed after the engine/plugin refactor on 2026-05-26. ** TODO [#C] Templatize =make coverage-summary= into the language bundles :feature: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: Borrow dotemacs's =make coverage-summary= into the language bundles. After =make coverage= writes a coverage file, =coverage-summary= prints per-unit covered/total with percentages, a unit-weighted project number, and a list of source files present on disk but missing from the coverage report. *The kernel — the only part worth building.* Weight the project number by file/module rather than by line, and count a source file absent from the report as 0% instead of omitting it. A module no test imports just doesn't appear in coverage.py or nyc output, so it silently fails to drag the number down. That missing-file detection is the value; everything else (per-file table, total) the built-in reporters already print, so don't reimplement those. *Scope Elisp-first.* Port the proven dotemacs version into the elisp bundle, prove the pattern end-to-end, then fan out. Don't open all four bundles at once. *Delivery (settled 2026-05-25).* Two rulesets-owned pieces per language: - The summary *script* ships in the bundle under =.claude/= (inside the now-gitignored tooling footprint), copied in on install and auto-fixed on drift by =sync-language-bundle.sh=, never committed by the project. - One *text file per language* holding the Makefile fragment (the =coverage-summary= target plus its =coverage= prerequisite) and a block recommending how to set up coverage for that language. The bundle never edits the project's own Makefile. - *New project:* install copies that file in for the project to own. - *Existing project:* sync drops the fragment into the project's =inbox/= rather than touching its Makefile — the project adopts it deliberately. *Prerequisite caveat.* The summary presumes a coverage harness exists (undercover, coverage.py, nyc, =go cover=). Several bundles may have no =make coverage= yet, so for those this task implies adding the harness first — or the per-language file documents it as a prereq. Per-language parser (the script is ~40 lines over each tool's output): - Elisp: undercover SimpleCov JSON (=.coverage/simplecov.json=) — dotemacs/auto-dim scripts already parse this. - Go: =go test -coverprofile=cover.out=; parse =cover.out= (simple text), or lean on =go tool cover -func=. - Python: =coverage json= per-file JSON, or lean on =coverage report=. - TypeScript/JS: nyc/Istanbul =coverage-final.json= / json-summary. Reference (dotemacs): =scripts/coverage-summary.el=, =modules/coverage-core.el=, and the =coverage= / =coverage-summary= Makefile targets. Origin: handoff from the .emacs.d session, 2026-05-25. ** TODO [#B] Cross-project pattern catalog :spec:thinking: :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: From pearl handoffs [[file:docs/design/2026-05-27-pattern-catalog-pearl-notes.org][2026-05-27]] + [[file:docs/design/2026-05-28-pattern-catalog-no-empty-input.org][2026-05-28 follow-up]]. Meta-question: how do good patterns travel from project A to project B? Pearl shipped three worked examples worth capturing — one-prompt picker with typed prefix (pearl-pick-source), magit-transient state buttons, and "no empty input as meaningful" (none-sentinel as first candidate). Each is a small principle with wide surface area; without a catalog, every project re-derives them from scratch. Open design questions before any implementation: - Catalog format — structured (one pattern per file with frontmatter) vs free-form doc - Surfacing mechanism — agent-driven (model spots opportunity) vs human-driven (Craig grep-searches) - Anti-patterns included or only what worked - Intake cadence — every time one lands, or batch review - Home — rulesets repo (agent visibility) vs Linear doc vs per-project cross-links Pearl recommends a one-page spec (problem + design + open questions + acceptance) before implementation. Pearl available to come back for spec-review iterations. *** 2026-05-28 Thu @ 08:12:55 -0500 Pearl shipped patterns 4-6, filed alongside the prior two Three more pearl handoffs landed and were filed during this audit. Filed: [[file:docs/design/2026-05-28-pattern-catalog-prompt-labels-and-defaults.org][prompt-labels-and-defaults]] (patterns 4-5: label-matches-behavior, default-most-common with friction-proportional-to-consequence) and [[file:docs/design/2026-05-28-pattern-catalog-prompt-collapse.org][prompt-collapse]] (pattern 6: collapse N orthogonal prompts into one enriched prompt). The catalog's evidence base is now four pearl notes in =docs/design/= covering six patterns plus the synthesizing principle Pearl articulated — "choices on screen, accurately labeled, ordered by what the user most often wants, friction sized to the cost of being wrong." ** TODO [#C] Generic agent runtime support — Codex spec v0 :spec:design: Codex drafted a v0 design doc for making rulesets runtime-neutral rather than Claude-Code-specific. Motivating cases: offline operation with a local LLM, and two LLMs running in the same project at the same time without trampling each other's session-context. Spec at [[file:docs/design/2026-05-28-generic-agent-runtime-spec.org]] (moved here from inbox on intake). Immediate correctness issue Codex flagged: the singleton .ai/session-context.org is unsafe under simultaneous agents. Codex recommends starting with Phase 1 only — add AI_AGENT_ID + session-context.d/.org without renaming the rest. Broader refactor proposes runtimes/ adapter manifests, generic install commands, language-bundle split (common/ + runtimes//), launcher refactor, local model service via llama.cpp/ollama. Big surface area, six phases. Before any implementation: needs a real review pass on the spec, and a decision on whether to do Phase 1 alone (low risk, fixes the race) vs commit to the larger arc. :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: ** TODO [#B] Codex Phase 1 — AI_AGENT_ID + session-context.d/.org :feature: :PROPERTIES: :CREATED: [2026-05-28 Thu] :LAST_REVIEWED: 2026-05-28 :END: Lifted from the broader codex runtime spec ([[file:docs/design/2026-05-28-generic-agent-runtime-spec.org]]) as the immediate-correctness slice independent of the larger arc. The singleton =.ai/session-context.org= is unsafe under simultaneous agents — two LLMs running in the same project at the same time would overwrite each other's session state. Scope: introduce an =AI_AGENT_ID= environment variable and split the single =session-context.org= into a per-agent =session-context.d/.org= directory. No other phases of the runtime refactor are in this task — keep the surface small, fix the race, ship. Touches: =.ai/protocols.org= (rename rule + recovery anchor), =.ai/workflows/startup.org= (Phase A check), wrap-up workflow (rename target), per-project session record discoverability. Verification: simulate two agents sharing a project (separate AI_AGENT_ID values) and confirm session-context writes land in distinct files without interleaving. Parent: see [[#16 Generic agent runtime support][Generic agent runtime support — Codex spec v0]] above for the larger arc this is sliced from. ** TODO [#B] Add Signal MCP server (rymurr/signal-mcp) :feature: SCHEDULED: <2026-05-29 Fri> :PROPERTIES: :CREATED: [2026-05-29 Fri] :LAST_REVIEWED: 2026-05-29 :END: Install [[https://github.com/rymurr/signal-mcp][rymurr/signal-mcp]] so Claude can call =send_message_to_user=, =send_message_to_group=, and =receive_message= natively rather than shelling out to the =page-signal= wrapper. Python, MCP framework, depends on =signal-cli= being configured locally. Two-way capability is the differentiator over the CLI: =receive_message= lets the agent listen for replies on the phone, enabling page-as-confirm flows, "should I proceed?" loops over Signal, and structured Q&A across devices. *** Dependency This depends on the Google Voice account being registered with =signal-cli= first. Sending from Craig's primary number to itself doesn't notify (Signal treats it as one account on linked devices). The MCP server takes =--user-id= at startup, one account per instance, so it has to point at the GV account, with the primary as the per-send recipient. If GV registration is still pending when this task runs, block here and surface that. *** Implementation - =mcp/servers.json= — add =signal-mcp= entry under stdio transport (=command=, =args=, optional =env= for the user-id pointer). - =mcp/README.org= — document the signal-cli + GV-registration dependency and the user-id pattern. - =mcp/secrets.env.gpg= — only if the MCP server's user-id needs to be encrypted (probably not; the GV number isn't a secret beyond being personal). - Verify: =make install-mcp= followed by =make check-mcp= shows =signal-mcp ok=; smoke-test via a Claude tool call sending a message + waiting on =receive_message=. *** Why this matters =page-signal= is the fast path (a hook, a script, a make recipe can call it without an MCP round-trip). The MCP server is the smart path. When Claude wants to send and then *react to the reply*, the CLI can't do that — only the MCP server can. The two complement each other; this task adds the second half. ** TODO [#C] Build Craig's writing voice profile from real corpora :spec: :PROPERTIES: :CREATED: [2026-05-29 Fri] :LAST_REVIEWED: 2026-05-29 :END: Build a grounded profile of Craig's actual writing voice by mining the corpora he's produced over time. The =voice/SKILL.md= patterns today are observation-derived (em-dash zero-tolerance, semicolon → period, contractions kept, sentence-fragment rewrite, felt-experience cut, etc.). Some are spot-on; others are intuition. A real corpus pass would tell us which patterns are genuinely Craig's voice and which were guesses, plus surface idioms, sentence structures, and vocabulary the current ruleset misses. *** Sources to mine - *Email* — sent folders across all three accounts (=gmail=, =dmail/DeepSat=, =cmail/Proton=). Filter to Craig-authored (not forwards or replies-just-quoting). Separate work voice (=dmail=) from personal voice (=gmail=, =cmail=) since they're likely distinct registers. - *Commit messages* — =git log --author= across his repos. Captures terse-imperative voice. - *PR descriptions and review comments* — same corpora. More deliberate prose than commits. - *Org files he authored* — =notes.org=, todo bodies he typed, design docs in =docs/design/=, journal entries. Heavier on first-person voice than emails. - *Slack/messages* — DeepSat work slack, family group, friends. Casual register. - *Long-form artifacts* — résumé, proposals, white papers, blog posts (if any). Skip session-context files, which are Claude-co-written and would muddy the signal. *** Output - =voice/references/voice-profile.org= (or =.md=) — the canonical reference doc: - Vocabulary tendencies (preferred verbs, avoided cliché classes, technical-vs-plain word choice). - Sentence structures (typical length, conjunction patterns, parenthetical use). - Punctuation patterns (em-dash actual frequency, semicolon vs period split, contraction rate). - Register markers (signs of formal vs casual mode, work vs personal). - Idioms and recurring phrasings. - "Anti-patterns" — phrasings Craig consistently avoids that show up in AI-generated prose. - Updated =voice/SKILL.md= patterns grounded in evidence rather than intuition. Patterns that the corpus confirms get strengthened; patterns the corpus contradicts get rewritten or removed. Each finding should cite at least two evidence samples from the corpora so the basis for a rule is reviewable. *** Approach Phase 1 (corpus assembly) — pull the relevant slices: sent-mail dumps, =git log --author --no-merges --pretty=format:'%B'=, =gh pr list --author= bodies, org-file extracts. Strip headers, replies-quoted blocks, signatures. Land in =voice/corpus/= (gitignored if the project's =.ai/= is gitignored, tracked if private repo with private remote). Phase 2 (analysis) — pass over the corpus with focused queries: distribution of em-dashes per 1000 words, semicolon count, contraction frequency by register, sentence-length histogram, top-N adjectives/adverbs, etc. Subagent dispatch fits here. Phase 3 (draft profile) — write =voice-profile.org= with findings + evidence. Surface contradictions with the current ruleset. Phase 4 (reconcile with voice/SKILL.md) — present the deltas to Craig. Each delta is one of: confirm existing rule with evidence, strengthen rule, weaken rule, add new pattern, remove unsupported pattern. Apply approved deltas. *** Privacy Email and Slack content is private. The corpus must NOT enter any commit unless rulesets stays on the private cjennings.net remote (which it does today). If a future move to a public remote is on the table, the corpus and any direct quotes have to go before that happens. The profile doc itself can stay (it's analysis, not raw content), but cite by pattern not by verbatim quote. *** Why this matters The voice skill earns its place when Craig sees the rewrite and recognizes it as his own voice rather than a "clean" AI voice that approximates him. Today the skill catches common AI tells (em-dashes, semicolons, the felt-experience tic), which is useful. Corpus-grounding would make it catch the absence of *Craig-specific positive traits* — the phrasings he actually reaches for — not just the AI traits he doesn't. Likely improves =/voice personal= output quality on PR bodies, commit messages, and email drafts. Compound interest over the long run. ** TODO [#C] Enumerate implementation tasks in =spec-review.org= Phase 6 :feature:solo: :PROPERTIES: :CREATED: [2026-05-28 Thu] :LAST_REVIEWED: 2026-05-28 :END: From pearl handoff 2026-05-28. =spec-review.org= Phase 6 currently says "log deferred work to =todo.org=: v1 implementation = [#B] ... vNext/someday = [#D]." That covers deferred and v1 in passing but doesn't lift the spec's =Implementation phases= section into a drop-in =todo.org= block. Proposed addition to Phase 6: a structured step that reads the spec's =Implementation phases= section and produces a =[#B] TODO= entry per phase (subject line, tags, one-line body, pointer back to spec), plus a final entry for the test surface (unit / integration / e2e / manual-verify mirroring the spec's =Acceptance criteria= when present). Emit under a new section "Implementation tasks (drop-in for todo.org)" in the review file. Format follows =todo-format.md= (terse heading, body holds context, tags on heading). Three wins: handoff is one paste not a re-read; forces specs to be implementable in pieces (a spec without a phase decomposition fails this step, surfacing the shape problem); closes the loop on =Acceptance criteria= as manual-verify entries. If the spec lacks an =Implementation phases= section, the step is the prompt to ask the author to add one before =Ready=. ** TODO [#C] Add =.aiignore= for agent inventory exclusions :chore: :PROPERTIES: :CREATED: [2026-05-28 Thu] :LAST_REVIEWED: 2026-05-28 :END: From the codex enhancement backlog (item #8). Filesystem scans by agents and helper scripts pick up =node_modules=, =__pycache__=, =.pytest_cache=, lockfiles, generated OAuth artifacts, and test caches, even when those are gitignored. Token waste during exploration and skewed project summaries. Scope: add a shared =.aiignore= file (or =rulesets-ignore.json= if a more structured format helps) listing default exclusions. Teach the scripts that walk the project (=audit.sh=, =diff-lang.sh=, =sync-language-bundle.sh=, future =catalog= work if any) to honor it. Document in =protocols.org= so agents know to consult it before naive recursive reads. Keep the lockfile policy explicit: ignored when a local skill dependency cache, tracked when reproducibility matters. ** TODO [#C] Workflow test harness — drift + integrity tests :feature: :PROPERTIES: :CREATED: [2026-05-28 Thu] :LAST_REVIEWED: 2026-05-28 :END: From the codex enhancement backlog (item #10). Startup's drift check catches index-vs-directory mismatches but not deeper integrity: a workflow that references a script that's been renamed, a plugin whose parent engine has been deleted, a required section missing from a newly-added workflow. Scope: add =scripts/tests/workflow-integrity.bats= (or pytest equivalent) verifying: - Every =.org= file in =.ai/workflows/= is either indexed in =INDEX.org= or classifiable as a source plugin under an indexed engine. - Every indexed workflow file actually exists. - Every =file:= or shell-command reference inside a workflow to a script under =.ai/scripts/= or =scripts/= resolves to an existing file. - Every source plugin maps to a parent workflow that exists and is indexed. - Required sections (Overview, When to Use, the workflow's main phases) are present in each workflow. - Workflow trigger phrases are unique enough to route — no two workflows claim the same exact trigger. Wire into =make test=. Run on the canonical =claude-templates/.ai/workflows/= as the source of truth. ** TODO [#C] Token-tier pilot on largest workflows :feature: :PROPERTIES: :CREATED: [2026-05-28 Thu] :LAST_REVIEWED: 2026-05-28 :END: From the codex enhancement backlog (item #5), scope-limited to a pilot rather than a universal template change. Apply a standardized section structure to the largest workflow files first — =startup.org= and =triage-intake.org= are the prime candidates. Sections: - *Summary* / *Quick Contract* — one-screen purpose and outputs. - *Execution* — the steps an agent must follow. - *Reference* — examples, edge cases, rationale, old decisions. - *History* / *Design Notes* — durable context not needed every run. Teach startup/routing to read =Summary= only at routing time, then =Execution= only for the selected workflow. Other sections become opt-in. After the pilot, evaluate: did the savings show up in real session token use? Did the structure constrain the workflow expressiveness too much? If yes to savings and no to constraint, expand to the next-largest workflows. If not, document why and stop. Don't templatize universally — shorter workflows don't need tiering. * Rulesets Resolved ** DONE [#C] Fix =cj-scan= false positives on cj fences nested inside other =#+begin_*= blocks :bug: CLOSED: [2026-05-15 Fri] =cj-scan.py= was matching =#+begin_src cj:= / =#+end_src= line-by-line without awareness of enclosing block scopes. A cj fence embedded inside a =#+begin_example= block (typically when documenting what the == (for any == other than =cj:= via the more-specific cj-open regex, which is checked first), it enters a wrapper state where every line is treated as content until the matching =#+end_= closer fires. Inside a wrapper, cj fence patterns and legacy inline =cj:= lines are both suppressed. Tests: added =TestCjScanNestedFencesIgnored= (6 tests) to =claude-templates/.ai/scripts/tests/test_cj_scan.py= covering nesting inside =#+begin_example=, =#+begin_src =, and =#+begin_quote=, plus regression guards that a wrapper closes cleanly (a subsequent real cj fence is still detected) and that an unclosed wrapper doesn't silently swallow later content into false-positive cj blocks. Full =make test-scripts= equivalent (=python3 -m pytest=): 302 passed, 1 skipped, 0 failures. ** DONE [#A] Add =make doctor= — verify ~/.claude/ matches repo + settings.json :feature: A drift detector that scans =~/.claude/= and reports anything inconsistent with what the repo expects. Single-command answer to "is my machine consistent with rulesets?" *** Why this matters A 2026-05-06 sweep found =~/.claude/hooks/= didn't exist on this machine even though =settings.json= referenced =~/.claude/hooks/precompact-priorities.sh= as a PreCompact hook. Compaction would have silently failed to invoke the hook. The fix was =make install-hooks=, but the breakage was invisible until I happened to grep for it. =make doctor= run regularly (or even as part of session start) would catch this kind of drift in seconds instead of after the fact. *** Checks - Every entry in =settings.json= ="hooks"= block points at a file that exists. - Every entry in =enabledPlugins= has a matching install under =~/.claude/plugins/data/=. - Every skill in =$(SKILLS)= has a working symlink at =~/.claude/skills/=. - Every rule in =$(RULES)= has a working symlink at =~/.claude/rules/=. - Every default hook has a symlink at =~/.claude/hooks/= (warn-only — opt-out is legitimate). - =settings.json= and =.mcp.json= symlinks resolve to the rulesets versions. - =mcp/install.py= state matches =claude mcp list= (every server in =servers.json= is registered). - No dangling symlinks anywhere under =~/.claude/=. *** Output One line per check: =ok= / =WARN= / =FAIL=. Final summary: =N ok, M warnings, K failures=. Exit non-zero on any failure so it can ride a pre-flight check. ** DONE [#A] Build =voice= skill — combine =humanizer= with universal + personal style passes :feature: Combine =humanizer= with universal good-writing passes (Strunk & White, Orwell, Plain English) and the personal-style passes from =commits.md=. Two modes — =general= for arbitrary writing, =personal= for commits/PRs/comments — share a foundation and diverge on register. Built and shipped 2026-05-07: =voice/SKILL.md= with 39 numbered patterns walked sequentially. Patterns 1-25 carried over from humanizer, 26-31 are universal good-writing additions, 32-39 are personal-only. Migrated three callers (=commits.md=, =respond-to-cj-comments.md=, =start-work.md=). Removed the standalone =humanizer= skill since voice supersedes it. *** Why this matters Three transformations want to run together for personal-mode artifacts (commits, PR titles + bodies, PR comments) but lived in three places: =humanizer= as a skill, S&W-style universal rules nowhere (applied ad-hoc), and the personal-style passes as prose steps in =commits.md= that got re-applied by hand each time. Costs: (1) the "I forgot pass (e)" failure mode — skipping a pass without flagging is a defect but happens in practice. (2) No single-call invocation of the full transform. (3) General-mode writing (research notes, philosophy, history) got only humanizer with no universal-prose pass at all. Combining brings them under one skill with one invocation. *** Design Two modes: - *general* (default) — for arbitrary writing not bound for commit/PR/comment publishing (research notes, philosophy/history essays, emails, README prose). Runs: - humanizer (current behavior — strip AI-generated-writing fingerprints) - tier-1 universal passes (canonical good-writing rules) - the 2 personal-style passes that have no register conflict (jargon-fragment rewrite, noun-ified verbs) - *personal* — for commits, PR titles + bodies, PR comments. Runs general PLUS: - 8 personal-only passes (first-person rewrite, semicolons, contractions, sentence-split, felt-experience, sentence fragments, terse cut, public-artifact scope check) The 8 personal-only passes are explicitly *not* in general mode. They conflict with academic / literary / philosophical register. Forcing first-person on a Foucault essay or stripping felt-experience from a journal entry would damage the writing. *** Tier 1 universals (v1) From Strunk & White, Orwell's "Politics and the English Language", Plain English Campaign, and Garner's Modern English Usage. Each is a detection-pattern + rewrite-rule pair, mechanical enough to apply consistently across runs. - *Omit needless words* — curated phrase list (=the fact that= → =that=/=because=, =in order to= → =to=, =at this point in time= → =now=, =due to the fact that= → =because=, =for the purpose of= → =to=, =in spite of= → =despite=, etc.) - *Long word → short word* — Plain English wordlist (~150 entries: =utilize=→=use=, =commence=→=start=, =terminate=→=end=, =facilitate=→=help=, =demonstrate=→=show=, =sufficient=→=enough=, =prior to=→=before=, =subsequent to=→=after=, =in the event that=→=if=, =a great deal of=→=much=) - *Active over passive voice* — detect "to be + past-participle" patterns. Suggestion-only in v1 (auto-rewrite is risky in technical contexts where passive is appropriate); graduate to auto-rewrite for unambiguous cases in v2. - *Comma splices* — detect independent clauses joined only by comma; rewrite to period or semicolon-then-period. - *Cliché flag* — small curated list (=at the end of the day=, =moving forward=, =going forward=, =at this juncture=, =circle back=, =low-hanging fruit=, =deep dive=, =leverage= as verb). *** Tier 2 universals (v2) - *Positive over negative form* (S&W) — =not unlike= → =like=, =do not fail to= → =remember to=, =did not pay any attention= → =ignored= - *Garner-style word-pair corrections* — comprise/compose, less/fewer, that/which (restrictive vs nonrestrictive), affect/effect, principal/principle - *Parallelism in lists* — detect mismatched grammar in bullet items - *Tense consistency* — flag mid-paragraph tense shifts - *Acronym definition on first use* — detect uppercase tokens used before being expanded *** Tier 3 (v3, may not land) - *Concrete-over-abstract* preference - *Emphatic word at sentence end* (S&W rule 18) - *Vary sentence length / rhythm* - *Reading-grade-level scoring* (Hemingway-style) *** Personal-style pass placement | # | Pass | Mode | Why | |---|------|------|-----| | 1 | First-person voice rewrite | personal only | Forces "I" voice; wrong for academic prose where third-person and "we" are conventional | | 2 | Jargon-fragment → complete sentence | both | Universal clarity, no genre conflict | | 3 | Semicolon → period/comma | personal only | Semicolons are conventional in long-form / academic prose | | 4 | Contractions ("it's", "don't") | personal only | Academic and formal writing typically avoids contractions | | 5 | Sentence split on conjunctions | personal only | Foucault, Hegel, Adorno deliberately use long compound sentences | | 6 | Felt-experience narration ("I'll feel this every time") | personal only | Personal essays *use* felt-experience as content | | 7 | Noun-ified verbs ("the ask", "a learn", "the spend") | both | Targets corporate-speak with curated wordlist; doesn't catch philosophical nominalizations like "the becoming" | | 8 | Sentence fragments → complete (in prose) | personal only | Fragments are valid stylistic devices in literary prose | | 9 | Terse cut (rhetorical padding: "worth noting", "it's important to understand") | personal only | Tier 1 omit-needless-words covers the worst offenders universally; aggressive cut conflicts with academic register | | 10 | Public-artifact scope check (local paths, private repos, personal tooling) | personal only — *flag-only*, no auto-rewrite | Operational/safety check, not stylistic; auto-masking risks silently editing meaningful text | *** Inclusive-language pass — explicitly excluded Considered and rejected. Conflicts with planned writing on philosophy/history topics (Foucault on sexuality and gender, history of slavery in New Orleans). Wordlist substitutions would override deliberate vocabulary choices in those genres. *** V1 scope - [ ] Skill at =~/code/rulesets/voice/= with =SKILL.md= - [ ] Frontmatter with positive triggers (commit, PR, comment, "humanize", "voice pass") and negative triggers (code, structured data, plain bullet lists) - [X] Mode invocation: default = =general= when invoked bare; =personal= invoked explicitly by publish-context callers - [X] humanizer content migrated from =humanizer/= → =voice/= - [X] Tier 1 universal passes implemented (5 patterns: #26-30, plus #31 noun-ified verbs as a universal personal addition) - [X] 2 personal passes that run in both modes (#30 jargon-fragment, #31 noun-ified verbs) - [X] 8 personal passes that run in personal mode only (#32 first-person, #33 semicolons, #34 contractions, #35 sentence-split, #36 felt-experience, #37 fragments, #38 terse cut, #39 scope check) - [X] Each pass = detection-pattern + rewrite-rule pair (#39 is detection + flag-only) - [X] Total v1 pattern count: 31 in general mode (humanizer's 25 + 4 tier-1 + 2 universal personal); +8 personal-only = 39 in personal mode - [X] Update =commits.md= to invoke =/voice personal= instead of "run =humanizer= and apply five passes manually" - [X] Remove the existing =humanizer/= skill (no callers outside this repo, all migrated) - [X] =make doctor= still passes - [X] =make lint= clean *** v2 (deferred) - [ ] Tier 2 universals (positive form, word-pair corrections, parallelism, tense consistency, acronym definition) - [ ] Per-pass severity flags for Tier 1 active-voice (suggestion-only when actor is implicit; auto-rewrite when actor is named) - [ ] Reporting mode: list which passes fired and which were no-ops *** v3 (aspirational, may not land) - [ ] Tier 3 (concrete-over-abstract, emphatic-word position, sentence-length variation, reading-grade scoring) - [ ] Progressive disclosure split: =voice/SKILL.md= orchestrator + =voice/passes/.md= per pass with worked examples *** Migration (resolved) Decision: deleted =humanizer/= entirely. Three callers (=commits.md=, =respond-to-cj-comments.md=, =start-work.md=) all updated to invoke =/voice= directly. No alias needed since nothing outside the repo invoked humanizer. *** Naming alternatives considered - =voice= — chosen. Captures both modes; broad enough. - =polish= — descriptive of multi-pass nature; less prescriptive about whose voice. - =house-style= — signals "this is the house style"; appropriate for personal repo. - =commit-voice= — too narrow (passes apply to research notes, emails, etc. in general mode). - =humanize= (extending current) — undersells the universal + personal additions. *** Open questions before implementation Resolved during implementation: - Default mode when =/voice= is invoked bare: =general=. Personal-context callers (=commits.md= publish flow, =respond-to-cj-comments.md=) invoke =/voice personal= explicitly. Avoids accidentally first-person-ifying research notes. - Reporting: skill prints "Summary of changes" listing which patterns fired (audit value). - Public-artifact scope check (#39): flag-only, user resolves manually. Blocking would frustrate on legitimate path mentions. - Tier 1 active-voice detection: suggestion-only in v1. Auto-rewrite for unambiguous cases deferred to v2. ** DONE [#B] Add =--archive-done= mode to =.ai/scripts/todo-cleanup.el= :feature: Opt-in mode that moves every level-2 subtree whose TODO state is DONE or CANCELLED out of the "Open Work" section and into the "Resolved" section of the same org file, subtree intact. - *Section matching.* Key on a top-level heading containing "Open Work" and one containing "Resolved" — that pairing is the only naming consistent across projects (=Work Open Work= / =Work Resolved= here; bare =Open Work= / =Resolved= elsewhere). Require exactly one match for each; otherwise skip with a clear message, no crash. - *Modes.* =--check= previews and writes nothing, same as the existing hygiene pass. Idempotent. Not run by default in the wrap-up flow — archiving is consequential, so it stays opt-in: =emacs --batch -q -l todo-cleanup.el --archive-done FILE=. - *Edge cases.* Source or target section missing; subtree at EOF; nested DONE subtree under an open parent stays put (only level-2 entries move); nothing to move → clean no-op. - *Tests.* TDD with ERT — the project's first elisp tests. Fixtures (synthetic) under =.ai/scripts/tests/=; run via =make test= (rulesets) or =make test-scripts= (claude-templates), which run pytest + every =tests/test-*.el= ERT suite. Cases: one DONE level-2 moves; multiple; CANCELLED also moves; structural (no-state) headings don't move; nested DONE under an open parent stays; level-2 DONE with open level-3 children moves intact; subtree at EOF; missing source/target section; ambiguous "Resolved"; lowercase headings; nothing-to-do; idempotency; =--check= preview + its idempotency; realistic-sample integration. Origin: came up while scrubbing a project's todo.org on 2026-05-11 — moving a big completed PROJECT subtree (plus a few smaller ones) into the Resolved section by hand was the cue to build a reusable tool. Built and shipped 2026-05-11: =--archive-done= added to =.ai/scripts/todo-cleanup.el= test-first; 13-test ERT suite (=tests/test-todo-cleanup.el=) + realistic synthetic fixture (=tests/fixtures/todo-sample.org=), wired into =make test= / =make test-scripts= alongside pytest. The CLI dispatch moved into =tc-main= behind a guard so the suite can =require= the file without firing it. Section matching is case-insensitive and tolerates the = Open Work= / = Resolved= naming variants. Opt-in only — not wired into the wrap-up flow. Source of truth is =~/projects/claude-templates/=; rsync'd into this repo. ** DONE [#B] Encode follow-up filing rules into =/start-work= CLOSED: [2026-05-15 Fri] Phase 4 step 5 of =/start-work= ("refactor audit") says any candidate that isn't fix-now must land in one of three buckets: fold-into-related-commit, separate =refactor:= commit, or "file a ticket or todo.org entry." The third disposition doesn't say *where* — which leaves the orchestrator picking a location ad-hoc. Result: follow-ups buried under children of an epic parent get orphaned when the parent closes, or follow-ups for standalone tasks scatter across the file with no convention. Proposed placement rule (already memorized for this project as =feedback_followups_as_siblings.md=, generalizing): - *Epic-style parent task* (level-2 with multiple level-3 children) → follow-ups file as level-2 *siblings* of the parent. Stays visible after parent closure. - *Standalone task* (level-2 with no children, or a level-3 inside another structure) → follow-up files as a new level-2 top-level entry in the same =* Open Work= section. Don't nest under the originating task. Both cases: include a "Triggered by: " line so a future reader sees what surfaced it. Update =.claude/commands/start-work.md= Phase 4 step 5's "Disposition for each candidate" section to spell this out. Update any cross-references in =commits.md= or other files that touch the discipline. Triggered by: 2026-05-15 fold-epic session — Craig flagged the gap mid-flight after I'd surfaced a follow-up but hadn't filed it. ** DONE [#A] Consolidate =.ai/= template infrastructure (fold + audit + install-ai + ratio) :feature: CLOSED: [2026-05-15 Fri] End-state: one repo (=rulesets=) is the single source of truth for =.ai/= template content. =make audit= verifies and applies drift across every =.ai/=-using project on the machine. =make install-ai= bootstraps new projects. Same setup propagated to ratio so both machines run the same way. Today (2026-05-15) the canonical-source rule got violated again: rulesets commit =372fb76= added a wrap-up subsection to =rulesets= without going through =claude-templates= first, and the next session's startup rsync was about to silently undo it. Two-repo coordination is the root cause; fold solves it. Build order: fold first (others depend on the new canonical path), then audit + install-ai in parallel, then test, then propagate to ratio. *** DONE [#A] Fold =claude-templates= into rulesets CLOSED: [2026-05-15 Fri] Two repos, one source of truth. =~/projects/claude-templates/= is the canonical =.ai/= template that gets rsync'd into every project at session start. Keeping it standalone means a second =git pull= in startup Phase A.0, a second remote to push to at wrap-up, and a split history any time a change touches both. Folding it into =rulesets/claude-templates/= gives one repo to clone on a fresh machine and one place to edit templates. **** Open design choices - *History.* =git subtree add --prefix=claude-templates ~/projects/claude-templates main= preserves the 84-commit history under the new prefix. Plain content copy (=cp -a= + =git add=) is simpler but loses history. Either is fine since the standalone repo stays archived on =cjennings.net=. - *Layout.* =rulesets/claude-templates/= mirrors the old repo name and sits next to =claude-rules/= cleanly. Alternative: absorb =.ai/= directly under a different name (=rulesets/.ai-template/= or similar). First option is clearer. - *bin/ai.* The standalone Makefile symlinks =$HOME/.local/bin/ai → bin/ai=. After the move, fold that into rulesets' Makefile as another install target. **** Mechanical steps 1. Subtree-merge or copy =~/projects/claude-templates/= into =rulesets/claude-templates/=. 2. Update 3 references in rulesets: - =.ai/protocols.org= line 163 — pointer in the "Let's run/do the X workflow" section. - =.ai/workflows/cross-agent-comms.org= line 8 — promotion-target path. - =.ai/workflows/startup.org= lines 22, 96-98 — Phase A.0 pull + Phase A rsync sources. 3. Update Phase A.0 of =startup.org= to pull rulesets instead of claude-templates. Inside rulesets sessions, the existing project-repo pull already covers it. Outside rulesets (every other project's session), Phase A.0 needs an explicit =git pull= on =~/code/rulesets/= before the rsync — otherwise the templates will be stale. 4. Replace =~/projects/claude-templates/= with a symlink to =~/code/rulesets/claude-templates/= for transition continuity. 5. After every active project has had one session start (and rsync'd the new =startup.org=), drop the symlink and archive =cjennings.net:git/claude-templates.git=. **** Bootstrap gap Every project on the machine has a =.ai/workflows/startup.org= that rsyncs from =~/projects/claude-templates/=. Until each project's startup.org gets refreshed (which happens via the rsync itself), the old path needs to keep resolving. The symlink at step 4 is the bridge: old paths resolve into the new location, the rsync delivers the updated startup.org, next session uses the new path directly. *** DONE [#A] Add =make audit= — drift detector across all =.ai/=-using projects CLOSED: [2026-05-15 Fri] Companion to =make doctor= (single-machine scope, checks =~/.claude/=). =audit= is cross-project scope: walks every directory on the machine that has a =.ai/=, diffs the synced template files against the canonical source, and reports drift. =--apply= flag rsyncs the drift into the project's working tree (no auto-commit). Catches stale projects without forcing a session start in each one. **** Open design choices - *Scope.* Template-sync drift is the useful flavor: for each project, diff =.ai/protocols.org=, =.ai/workflows/=, =.ai/scripts/= against the canonical source. - *Source path.* Post-fold: =~/code/rulesets/claude-templates/.ai/=. Build =audit= against the new path from day one. - *Project discovery.* Walk =~/code/=, =~/projects/=, =~/.emacs.d/= up to depth 3 for any directory containing =.ai/=. Skip the canonical source itself. - *Default mode is report-only.* =--apply= triggers rsync; =--force= overrides the dirty-skip safety. **** Per-project flow (designed 2026-05-15) For each discovered project, in order: 1. Verify =.ai/= exists (path probe). If missing → =FAIL=, skip, continue loop. 2. Detect git tracking via =git check-ignore .ai/= → =tracked= or =gitignored=. 3. Verify no uncommitted =.ai/= changes (=git status --porcelain .ai/=). Dirty → =WARN=, skip rsync unless =--force=. 4. Verify content matches canonical via three =rsync -a --dry-run --itemize-changes= calls (=protocols.org=, =workflows/=, =scripts/=). Zero items = clean. 5. Action (=--apply= only, drift detected): three =rsync -a [--delete]= calls. 6. Verify rsync converged (re-run the dry-runs; zero now). 7. Verify working-tree state after rsync (tracked projects). Report deltas. Do not auto-commit. 8. Verify no unpushed =.ai/= commits (=git log @{u}..HEAD -- .ai/=). Informational only. **** Output format (mirrors =doctor=) #+begin_example Claude-templates source: ok rulesets/claude-templates is current (origin/main) Per-project .ai/ drift: ok ~/projects/work applied ~/projects/homelab 3 files changed skipped ~/code/winvm uncommitted .ai/ (use --force) ok ~/projects/clipper Summary: 18 ok, 3 applied, 1 skipped, 0 failed #+end_example Exit code: =0= if all clean, no skips, no failures. =1= otherwise. **** Why not extend =make doctor= instead =doctor= has a clean meaning today: "is this machine's =~/.claude/= consistent with rulesets?" Mixing in cross-project =.ai/= drift muddies the exit code. Keep them separate. =audit= can optionally invoke =doctor= as its last check since both ask "did the symlinks keep up with the source?". A future =make all-checks= can wrap both. *** DONE [#A] Add =make install-ai PROJECT== — bootstrap =.ai/= in a fresh project CLOSED: [2026-05-15 Fri] Separate target from =audit= because operating on projects that lack =.ai/= is a distinct action. The absence might be intentional, so =audit= skips them. Bootstrap is explicit opt-in. **** Flow 1. Refuse if =.ai/= already exists in =PROJECT=. Message: "already installed; use =make audit --apply= to update." 2. Verify =PROJECT= is a git checkout (warn if not — works without git, loses some lifecycle benefits). 3. Create =PROJECT/.ai/= directory. 4. Rsync canonical content: =protocols.org=, =workflows/=, =scripts/= (same three rsyncs as =audit=). 5. Seed =PROJECT/.ai/notes.org= from a canonical template with project-name placeholder. 6. Create empty =PROJECT/.ai/sessions/= (with =.gitkeep= for tracked projects). 7. Track or gitignore =.ai/=? Default: ask. Flag: =--track= / =--gitignore=. 8. Print next-steps banner: =make install-lang LANG= PROJECT==; open Claude Code in the project. **** Symmetry with existing install targets #+begin_example make install-lang LANG=python PROJECT=/path # language bundle (existing) make install-ai PROJECT=/path # .ai/ template (new) make install-lang # no args → fzf-pick make install-ai # no args → fzf-pick from # ~/projects/* + ~/code/* dirs # without an existing .ai/ #+end_example *** DONE [#A] Test plan for audit + install-ai before propagating to ratio CLOSED: [2026-05-15 Fri] Test against the current state of this machine before pushing changes to ratio. **** =make audit= tests 1. Dry-run report only (no =--apply=). Should show: claude-templates current; per-project drift; correct =ok=/=drift= classifications; summary line and exit code match. 2. After the fold lands, every project should be reported as drift (their =startup.org= still points at the old path). Run =--apply= → rsync converges. Re-run audit → all =ok=. 3. Manually edit one =.ai/workflows/foo.org= in a tracked project. Re-run audit → should report =skipped: uncommitted .ai/=. Run =--apply --force= → rsync clobbers the edit. Verify the edit is gone. 4. Manually delete one =.ai/= dir. Re-run audit → =FAIL: .ai/ missing=. Loop continues. 5. Idempotency: =--apply= twice in a row converges to all =ok= on the second pass. **** =make install-ai= tests 1. Create =/tmp/test-fresh-project= as a git repo. Run =make install-ai PROJECT=/tmp/test-fresh-project=. Verify =.ai/= structure matches canonical, =notes.org= has placeholder, =sessions/= exists. 2. Run =make install-ai PROJECT=/tmp/test-fresh-project= again → should refuse (=.ai/= already exists). 3. Open Claude Code in the new project. Startup workflow runs cleanly (Phase A.0 + Phase A rsync should be a no-op since the install just ran). 4. fzf form: =make install-ai= with no args. Lists candidate dirs (=~/projects/*=, =~/code/*= without =.ai/=). **** Pass criteria - =audit= behavior matches the per-project flow spec for every classification path. - =install-ai= produces a project indistinguishable from one that's been running sessions for a while. - =make doctor= still passes 36/0/0 after all the work. - =make test= (pytest + ERT) passes. *** DONE [#A] Migrate projects on ratio (second machine) CLOSED: [2026-05-15 Fri] After local fold + audit + install-ai are working, propagate to ratio. **** Steps 1. On ratio: =git -C ~/code/rulesets pull= — picks up the folded =claude-templates/= subdir and updated =Makefile= targets. 2. On ratio: archive or =mv= the standalone =~/projects/claude-templates/= aside, replace with symlink to =~/code/rulesets/claude-templates/= (same bridge mechanic as local). 3. On ratio: =make audit= → see drift across ratio's projects. 4. On ratio: =make audit --apply= → rsync into each tracked/gitignored project. Surface projects with uncommitted =.ai/= drift for manual handling. 5. On ratio: =make doctor= → catch any =~/.claude/= install drift (likely some, since ratio hasn't seen recent rulesets updates). 6. Verify by opening Claude Code in a few ratio projects. Startup should be a no-op or near-zero rsync. **** Known unknowns - Ratio may have its own project list overlapping with this machine's but not identical. =audit= discovers projects via the walk, so this is automatic. - Ratio might have uncommitted =.ai/= work in some projects that this machine doesn't. =audit= surfaces them; handle case-by-case. - If anything goes wrong, ratio's archived =~/projects/claude-templates/= is the safety net — restore the symlink target and re-run audit. **** Adjacent: cross-machine memory sync The =[#A] DOING= memory-sync investigation (todo.org:10) is adjacent. Both involve "make my Claude setup portable across machines." Coordinate so the memory-sync stow approach (if approved) doesn't conflict with this fold's symlink mechanics. ** DONE [#B] Document startup pull-ordering rule in protocols.org CLOSED: [2026-05-15 Fri] Phase A.0 of =startup.org= now pulls rulesets ff-only before the project repo (shipped 2026-05-15 as part of the claude-templates fold — after the subtree merge, there's no separate claude-templates pull, just rulesets-then-project). The protocols.org paragraph stating the ordering and "resolve any issues before proceeding" rule shipped 2026-05-15 in the =** Startup Pull Ordering= subsection under =IMPORTANT - MUST DO=. ** DONE [#A] Build =/lint-org= skill + wrap-up integration CLOSED: [2026-05-14 Thu] Spec: [[file:.ai/specs/lint-org-skill-spec.md]] A two-mode skill (=interactive=, =mechanical-only=) that runs =org-lint=, auto-fixes safe categories (item-number, missing-language-in-src-block, misplaced-planning-info, markdown-bold → single-asterisk), and walks judgment items (broken local-file links, invalid fuzzy links, verbatim-asterisk false positives, suspicious-language blocks) inline. Wrap-up integration: =wrap-it-up.org= invokes =/lint-org todo.org --mode=mechanical-only= after the existing =todo-cleanup.el --archive-done= pass. Judgment items defer to a carry-forward file that the next morning's daily-prep merges in, so wrap-up never blocks on a judgment call. Baseline that motivated this: the 2026-05-14 manual pass took =todo.org= from 55 → 1 lint warnings across two commits (=0d10458= signal, =9ad5b30= cosmetic). A nightly mechanical sweep keeps the count near zero forever — each day's drift is small. ** DONE [#C] Test harness for =make audit= + =make install-ai= edge cases :test: CLOSED: [2026-05-15 Fri] Three edge cases from the fold-epic test plan were not exercised because they're destructive on real projects: - =audit --force= clobbers uncommitted =.ai/= work — needs a project with intentionally dirty =.ai/= to verify the override path. - =audit= reports =FAIL= when =.ai/= is missing — needs a project where the directory was deleted to verify the loop continues past the failure. - =install-ai= fzf-pick form (no =PROJECT= arg) — needs interactive testing. Build a self-contained test harness under =.ai/scripts/tests/= that spins up =/tmp/audit-test-projects/= with a known matrix of project states (clean, dirty, missing =.ai/=, pristine, etc.), runs the audit + install-ai targets against it, and asserts expected outputs. The harness should clean up after itself. Pattern reference: bats or shell-based assertions (similar to the elisp ERT suites for =todo-cleanup= and =lint-org=, but for shell scripts). Triggered by: 2026-05-15 fold-epic, child 4 test plan; commits =94782ee= (audit) + =d364cf2= (install-ai). ** DONE [#A] wrap it up mentions github, which isn't the remote for many projects. :chore: CLOSED: [2026-05-16 Sat] For many of them, git.cjennings.net mirrors to github.com, and github.com isn't the remote. For many others, git.cjennings.net is the remote with no mirror. Remove or replace the reference to github.com ** DONE [#B] Phase A startup blind to =claude-templates/inbox/= post-fold :bug:fold: CLOSED: [2026-05-19 Tue] Resolved on inspection: the bug is moot in current state. =inbox-send.py='s discovery scans =~/code/*= and =~/projects/*= single-level only, so =claude-templates/= (two levels under =~/code/=) is never a routable target; the 2026-05-15 incident was a one-time manual workaround because =rulesets/inbox/= didn't exist yet, and that root inbox was added in =470085f=. =claude-templates/inbox/= was removed 2026-05-15 and is no longer on disk. Phase A's inbox check at =startup.org:107= runs =\ls -la inbox/= against the project root. Post-fold, the canonical's inbox sits inside the subtree at =claude-templates/inbox/= and never gets scanned. A 2026-05-15 cross-project handoff from a dotemacs session dropped a record there; the next rulesets session (this one) missed it at startup entirely. Picked up only when the working-tree drift surfaced during the publish flow. Fix: extend Phase A's discovery to also scan =claude-templates/inbox/= when the canonical lives in-repo (i.e., when =claude-templates/.ai/= exists alongside =./.ai/=). The Phase B/C inbox-processing flow already handles per-file routing once a file is surfaced; the gap is only in discovery. Adjacent question worth answering at the same time: should cross-project handoffs file into =./inbox/= at the project root (matching what Phase A already scans), or stay in =claude-templates/inbox/= and rely on the discovery fix? The =inbox-send= script's target-project logic is the place to settle that. Triggered by: 2026-05-15 evening session, surfaced when committing the test-harness work. ** DONE [#A] Implement task-review daily-habit per spec CLOSED: [2026-05-20 Wed] :PROPERTIES: :LAST_REVIEWED: 2026-05-20 :END: Spec: [[file:docs/design/task-review.org]] Retires =wrap-it-up.org='s date-coverage scan and replaces it with a daily list-hygiene review (N=7 oldest-unreviewed top-level =[#A]= / =[#B]= / =[#C]= tasks per session, ~12-day rotation). Built as a pure Claude workflow — Shape B, no elisp; see the spec's Revision section for why the elisp approach was dropped. Status: 1. [X] =task-review-staleness.sh= + bats (count + =--list= modes). 2. [X] =wrap-it-up.org= health check (threshold 30). 3. [-] =task-review.el= — dropped (Shape B is a pure workflow, not an Emacs mode). 4. [X] New =task-review.org= workflow + INDEX entry (the existing listing workflow was renamed to =open-tasks.org= to free the name). 5. [X] Startup nudge in template =startup.org= (threshold 7), not the project-only startup-extras layer. 6. [X] Smoke test against live =todo.org= — first cycle run 2026-05-20 (7 tasks reviewed: 3 re-grades, 1 cancellation, 1 bump-and-tag). Triggered by: 2026-05-16 brainstorm on retiring the date-coverage scan. ** CANCELLED [#B] Build =ov-1= skill for DoDAF OV-1 (High-Level Operational Concept Graphic) CLOSED: [2026-05-20 Wed] Cancelled during the 2026-05-20 task review. Triggered by SOFWeek (May 2026, Tampa) — DeepSat attending; DoD attendees may ask for architecture diagrams. OV-1 is the universal informal currency in DoD briefings ("show me the architecture" → OV-1 by default). Priority upgrades to =[#A]= if Craig confirms scenario 2 below (personal load-bearing need at the event); stays =[#B]= or drops to =[#C]= if scenario 1 (team already covers it, future asset only). *** Prior art (searched 2026-04-19) No existing Claude Code skill exists for DoDAF / OV-1 / SV-1 / SysML. - =anthropics/skills= — 17 skills, zero DoDAF/SysML/defense coverage. - =awesome-claude-code= list — zero hits for DoDAF/OV-1/SysML/UAF. - =mfsgr/sysml2dodaf= — empty repo (0 stars, no code). Vapor. - =HowardKao-1130/mini-NEXEN= — broad SE methodology skill that name-drops DoDAF as a trigger keyword; no artifact generation. 0 stars. - =gaphor/gaphor= (Apache-2.0, 2.2k stars) — mature UML/SysML GUI modeler. Not a skill; not a pipeline. Useful reference only. Nearest prior art to lean on when building: - DoDAF 2.02 Viewpoints & Models reference (dodcio.defense.gov) — canonical OV-1 exemplars. Embed 3-5 layouts as skill =references/=. - Pattern from existing =c4-diagram= skill — same shape (prose → diagram spec), swap the viewpoint vocabulary to DoDAF. - PlantUML for SV-1 (when that skill comes later); Mermaid or draw.io XML for OV-1 lightweight visuals. *** Build scope (when triggered) *In scope:* - Input: prose description of a system + its operational context. - Output: structured OV-1 *spec* — performers, external actors (other systems, forces, adversaries), relationships (data/control flows), narrative captions, classification marking, legend requirements. - DoDAF 2.02 completeness checklist as a quality gate — verify the produced spec contains every element a correct OV-1 requires. - Optional lightweight visual: draw.io XML or Mermaid approximation for quick review; NOT a finished rendering. *Out of scope:* - Icon libraries, pictorial assets, finished PowerPoint export. OV-1 final art belongs to a designer or Craig in Visio/PowerPoint; the skill's job is the spec and the check, not the slide. - SV-1, SV-2, UAF, IDEF1X, other viewpoints. Build only when a concrete need triggers each. Estimate: 4-6 hours. *** Craig's investigation before kickoff 1. Does DeepSat's systems-engineering or marketing team already have an OV-1 (or the equivalent briefing artifact) for SOFWeek? 2. If yes (scenario 1) — skill is a future asset, not event-load-bearing. Ship after SOFWeek. Priority drops to =[#C]=. 3. If no, or if the scenario is "Craig may need to produce/iterate an OV-1 on the fly during the event" (scenario 2) — skill is load-bearing for the event. Priority upgrades to =[#A]=; build before SOFWeek. 4. Confirm the classification level the skill needs to handle (unclassified-only? or FOUO markings? affects the classification block in the spec). 5. Confirm the target rendering format DeepSat uses for OV-1 deliverables (PowerPoint slide? Cameo? Visio? affects whether the skill emits draw.io XML vs Mermaid vs pure structured spec). *** Related See also the DoD-specific notations section under the later TODO (=c4-*= rename revisit) — OV-1 is flagged there as the highest-value starting point across the DoD notation landscape (SysML, DoDAF/UAF, IDEF1X). This entry is the execution plan for that starting point. ** DONE [#A] Split team-specific publishing rules out of commits.md :commits: CLOSED: [2026-05-22 Fri] Shipped 3cb467e. Moved the DeepSat publishing steps (Linear ticket-state, the Slack notification protocol + channel ID, the GHE host, the team merge norm, the Linear ticket-body structure) out of the global =claude-rules/commits.md= into =teams/deepsat/claude/rules/publishing.md=. The global file keeps the universal skeleton and uses seams ("run the project's publishing overlay here if present") like startup-extras. Added =install-team= (targeted per-project copy, keyed on PROJECT, never globally symlinked) and generalized =sync-language-bundle.sh= to keep team overlays fresh at startup (3 new bats; make test green). Remaining deploy step (cross-project, surfaced to Craig): install the overlay into the DeepSat work project — =make install-team TEAM=deepsat PROJECT== — so it actually loads there. ** DONE [#A] Define a /voice-unavailable fallback in the commits.md publish flow :commits: CLOSED: [2026-05-22 Fri] Added an "If =/voice= is unavailable" paragraph to the Single-skill gate in =commits.md=: walk the same patterns inline (the flow already names which matter), state the skill was unavailable and the pass was applied by hand ("/voice unavailable — patterns walked inline"), and flag the missing skill for install. The gate is the pattern walk, not the tooling. The original "=humanizer= unavailable" framing was moot (humanizer → /voice). ** DONE [#A] wrap-it-up Step 3.5 assumes GitHub-family remote :chore:quick: CLOSED: [2026-05-22 Fri] :PROPERTIES: :LAST_REVIEWED: 2026-05-20 :END: Documented the assumption inline at =wrap-it-up.org= Step 3.5 (chose the lightweight path over a provider-agnostic rewrite): the =gh= lookup expects a GitHub-family host, holds today via DeepSat on GHE, flagged for update if a future Linear project lands on GitLab/Gitea/Bitbucket. Triggered by: 2026-05-16 wrap-it-up github.com cleanup (audit of the same file). Step 3.5 (Linear ticket-state hygiene) at =wrap-it-up.org:207= says "the project's GitHub remote — use =gh pr list ...=". Currently fine in practice: the step is Linear-gated, and the only Linear-using project is DeepSat (on =deepsat.ghe.com=, a GitHub-family host where =gh= works). Would break if a future Linear-using project lived on a non-GitHub host (gitlab, gitea, bitbucket). Either drop the GitHub-family assumption (provider-agnostic lookup, harder) or document the assumption explicitly so future projects know the step needs an update if they don't fit. ** DONE [#C] Review pass: tighten skills and rulesets after 2026-05-04 audit CLOSED: [2026-05-22 Fri] :PROPERTIES: :LAST_REVIEWED: 2026-05-20 :END: All 55 grouped-index items dispositioned (2026-05-22): ~49 edited across skills, commands, rule files, hooks, and the two playwright skills; several came out moot post-audit (humanizer→voice, skills→commands, typescript ruleset added); the two commits.md items shipped as the team-overlay split + /voice fallback. Freshness-checked each item against current reality before editing. Source notes used in this pass: - C4 official docs: C4 is notation-independent; System Context and Container diagrams are enough for most teams; every diagram needs title, key/legend, explicit element types, and audience-appropriate abstraction. [[https://c4model.com/diagrams][C4 diagrams]], [[https://c4model.com/diagrams/notation][C4 notation]], [[https://c4model.com/abstractions/component][C4 component]] - arc42 docs: quality requirements need measurable scenarios; section 10 should reference top quality goals and capture lesser quality requirements with specific measures. [[https://docs.arc42.org/section-10/][arc42 section 10]], [[https://quality.arc42.org/articles/specify-quality-requirements][specifying quality requirements]] - ADR references: ADRs capture one justified architecturally significant decision and its rationale; Nygard's original guidance emphasizes short, numbered, repository-stored records and superseding rather than rewriting old decisions. [[https://adr.github.io/][adr.github.io]], [[https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions][Nygard ADR article]] - Playwright docs: prefer user-visible locators and web assertions; locators auto-wait and retry; =networkidle= is discouraged for testing readiness. [[https://playwright.dev/docs/best-practices][Playwright best practices]], [[https://playwright.dev/docs/locators][Playwright locators]], [[https://playwright.dev/docs/next/api/class-page][Playwright page API]] - OWASP references: Top 10 2021 includes Broken Access Control, Cryptographic Failures, Injection, Insecure Design, Security Misconfiguration, Vulnerable and Outdated Components, Identification and Authentication Failures, Software and Data Integrity Failures, Security Logging and Monitoring Failures, and SSRF; WSTG adds a broader testing map across configuration, identity, authn/z, sessions, input validation, error handling, cryptography, business logic, client-side, and API testing. [[https://owasp.org/Top10/2021/][OWASP Top 10 2021]], [[https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/][OWASP WSTG]] - V2MOM references: Salesforce calls the last M "Measures" and emphasizes a simple alignment document with prioritized Methods, explicit Obstacles, and measurable outcomes. [[https://trailhead.salesforce.com/content/learn/modules/selfmotivation/get-focused-with-your-personal-v2mom][Salesforce Trailhead personal V2MOM]], [[https://www.salesforce.com/blog/?p=12][Salesforce V2MOM alignment]] - Prompt research: the cited Meincke paper is titled "Call Me A Jerk: Persuading AI to Comply with Objectionable Requests"; its scope is persuasion increasing compliance with objectionable requests, not a general proof that persuasion framing improves prompt quality. [[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5357179][SSRN paper]] - Combinatorial testing references: NIST supports t-way combinatorial testing and notes pairwise is one covering strength, with higher-strength arrays useful for failures requiring more interacting factors. [[https://www.nist.gov/publications/practical-combinatorial-testing-beyond-pairwise][NIST beyond pairwise]], [[https://www.nist.gov/publications/combinatorial-software-testing][NIST combinatorial testing]] *** Grouped index (for batching by area) Each item below is a one-line summary of a sub-TODO further down. Tick the box when the matching sub-TODO is moved to =DONE=. Items are grouped by area so they can be batched (e.g., "do all Playwright items in one session"). **** Browser testing - [X] [#A] =playwright-js=: locator/assertion-first guidance (replace raw CSS, =networkidle=) - [X] [#B] =playwright-js= + =playwright-py=: reconcile headless/visible defaults - [X] [#B] =playwright-js= + =playwright-py=: remove emoji console markers from examples **** Frontend / UI - [X] [#B] =frontend-design=: WCAG 2.2 alignment, accessibility non-optional - [X] [#B] =frontend-design=: harmonize aesthetic guidance with anti-pattern rules **** Security - [X] [#A] =security-check=: OWASP 2021 + WSTG coverage - [X] [#B] =security-check=: tooling and offline/network caveats **** Combinatorial testing - [X] [#B] =pairwise-tests=: t-way escalation guidance beyond pairwise - [X] [#B] =pairwise-tests=: clarify negative value syntax + generator availability **** V2MOM - [X] [#A] =create-v2mom=: rename Metrics → Measures (Salesforce alignment) - [X] [#B] =create-v2mom=: prevent task migration from turning V2MOM into a backlog - [X] [#B] =create-v2mom=: mitigation/owner fields for Obstacles **** Prompt engineering - [X] [#A] =prompt-engineering=: correct/narrow Meincke citation - [X] [#B] =prompt-engineering=: eval-harness requirement for production prompts **** Codify - [X] [#B] =codify=: stale-entry review + privacy checks before writing project =CLAUDE.md= **** Code review - [X] [#A] =review-code=: resolve local-verification vs CI boundary - [X] [#B] =review-code=: =CLAUDE.md= citation scope for public artifacts - [X] [#B] =review-code=: relax three-strengths rule for tiny/failing diffs **** PR / review responses - [X] [#A] =respond-to-review=: remove review-process language from commit messages - [X] [#B] =respond-to-review=: use unresolved threads + resolution state - [X] [#B] =respond-to-cj-comments=: drop personal absolute paths from public-writing (moot — already clean) - [X] [#B] =respond-to-cj-comments=: fallback when =humanizer= or =emacsclient= unavailable (moot — superseded by /voice + VERIFY pattern) **** Branch workflow - [X] [#A] =finish-branch=: fix base-branch detection - [X] [#B] =finish-branch=: worktree-aware pull/merge safety - [X] [#B] =start-work=: tool-availability + ceremony-scaling rules - [X] [#B] =start-work=: claim-before-justify rollback risk **** Tests / TDD - [X] [#B] =add-tests=: fix missing =typescript-testing.md= reference or add ruleset (moot — ruleset now exists) - [X] [#B] =add-tests=: explicit exceptions to "all three categories per function" **** Debugging / RCA - [X] [#B] =debug=: capture environment + recent-change context before hypotheses - [X] [#B] =root-cause-trace=: constrain defense-in-depth to trust boundaries - [X] [#B] =five-whys=: require evidence + counterfactual validation per why **** Brainstorming - [X] [#B] =brainstorm=: timebox + research/source rules for high-stakes designs **** Architecture - [X] [#B] =arch-decide=: timeless examples, drop unverifiable claims - [X] [#B] =arch-decide=: standardize statuses + immutability language - [X] [#B] =arch-design=: threat modeling + privacy/compliance as first-class inputs - [X] [#B] =arch-design=: separate paradigms from tactical patterns - [X] [#B] =arch-document=: arc42/Q42 quality scenarios - [X] [#B] =arch-document=: staleness + ownership metadata for generated docs - [X] [#B] =arch-evaluate=: confidence levels for framework-agnostic findings - [X] [#B] =arch-evaluate=: report skipped tool checks explicitly **** C4 modeling - [X] [#A] =c4-analyze= + =c4-diagram=: notation/output fallback (not draw.io-only) - [X] [#B] =c4-analyze= + =c4-diagram=: clarify abstraction boundaries **** Global rules - [X] [#B] =commits.md=: split DeepSat/Linear/Slack-specific from global rules → promoted to a top-level task (deferred for Craig) - [X] [#A] =commits.md= + publish flows: =humanizer=-unavailable fallback → promoted to a top-level task (deferred; humanizer premise moot) - [X] [#B] =verification.md=: explicit "unable to verify" reporting standard - [X] [#B] =testing.md=: property-based + mutation testing as escalation paths - [X] [#B] =testing.md=: soften absolute TDD with explicit spike protocol - [X] [#B] =subagents.md=: capability/availability + cost checks **** Languages - [X] [#A] =python-testing.md=: revisit in-memory SQLite guidance - [X] [#B] =python-testing.md=: separate "never mock ORM" from unit-test boundaries - [X] [#B] =elisp.md=: drop tool-specific advice - [X] [#B] =elisp-testing.md=: batch-mode + native-comp caveats **** Hooks - [X] [#A] =hooks/README.md=: include =destructive-bash-confirm.py= in install/settings snippets - [X] [#A] =hooks/git-commit-confirm.py= + =gh-pr-create-confirm.py=: inspect message/body files referenced by =-F= / =--body-file= - [X] [#B] =hooks/destructive-bash-confirm.py=: shell-aware command parsing (not regex) *** 2026-05-22 Fri @ 15:47:10 -0500 Made playwright guidance locator/assertion-first, dropped networkidle-as-readiness Rewrote the readiness guidance in both =playwright-js/SKILL.md= and =playwright-py/SKILL.md=: reconnaissance now waits for a visible app landmark via a web assertion or locator (=expect(...).toBeVisible()= / =get_by_role(...).wait_for()=), not =networkidle= (which Playwright discourages). Updated the login/form examples to =getByLabel=/=getByRole= + web assertions, the API_REFERENCE.md waiting section, and =lib/helpers.js= defaults (=waitForPageReady= now defaults to =load= and prefers a caller-supplied landmark; =authenticate= races the success indicator over a =load= navigation). node --check passes. *** 2026-05-22 Fri @ 14:23:02 -0500 Added headed/headless decision tables to both playwright skills Added matching purpose-based decision tables to =playwright-js/SKILL.md= (was "always visible") and =playwright-py/SKILL.md= Best Practices (was "always headless"). Each names its own default and points at the other skill, so the difference is deliberate, not a habit-flip: headed for interactive debugging, headless for CI/pytest. Also softened the absolutist "Always launch... headless" comment in the py example. *** 2026-05-22 Fri @ 15:47:10 -0500 Removed emoji console markers from the playwright skills Replaced every emoji status marker with a plain ASCII prefix across =playwright-js/= (run.js, lib/helpers.js, SKILL.md) and =playwright-py/= (SKILL.md, examples/*.py): 📦/⚡/📄/📥/🎭/🚀/📋/✅/❌/🔍/📸/✓/✗ → =[setup]=/=[run]=/=[ok]=/=[error]=/=[fail]= etc. Post-change emoji grep is clean (excluding node_modules); node --check and py_compile pass. *** 2026-05-22 Fri @ 14:35:16 -0500 Made accessibility a non-optional WCAG 2.2 gate in frontend-design Added an "Accessibility Gate (required before handoff)" section to =frontend-design/SKILL.md= covering keyboard operation, focus visibility, focus-not-obscured (2.2), target size (2.2), contrast, reduced motion, labels, and semantic structure — a baseline for all frontend work, not just interactive components. Rewrote the Build/Review phases to build accessibly as you go and clear the gate before handoff, and bumped =references/accessibility.md= from WCAG 2.1 to 2.2 with backing detail for the new criteria. *** 2026-05-22 Fri @ 14:35:16 -0500 Added a "creative but bounded" section to frontend-design Added a subsection under Frontend Aesthetics framing the bold/maximalist directions as tools, not obligations: domain fit, readability first, responsive stability, and no decorative effect that degrades the workflow. Reconciles rather than contradicts the maximalist encouragement (maximalism stays on the table as deliberate usable density), and ties the readability bullet to the new accessibility gate. *** 2026-05-22 Fri @ 14:35:16 -0500 Updated security-check to OWASP Top 10 2021 + WSTG mapping Replaced the older six-category list in =.claude/commands/security-check.md= with the full Top 10 2021 set, each finding mapped to a 2021 category or WSTG area. Added the four missing categories (Insecure Design, Software and Data Integrity Failures, Security Logging and Monitoring Failures, SSRF) plus explicit checks for object/function-level authorization, SSRF on URL-fetch paths, update/plugin/dependency integrity, and logging/monitoring gaps. *** 2026-05-22 Fri @ 14:35:16 -0500 Added scanner tooling + network caveats to security-check Added an optional configured-scanners step (=gitleaks=/=trufflehog= secrets, =semgrep= source patterns, OSV scanner, lockfile-diff review) that supplements the manual scans, plus a network caveat: dependency audits that can't run (offline, tool absent, DB unreachable) must report "not run" naming the tool and reason, never read as a pass. Carried that into the no-issues summary. *** 2026-05-22 Fri @ 14:35:16 -0500 Added t-way escalation guidance to pairwise-tests Added an "Escalating Beyond Pairwise (t-way)" subsection: start with pairwise across the whole space, then escalate specific high-risk clusters to 3-way+ when history, safety, security, or domain coupling says a fault needs more than two interacting factors. Lists escalation triggers and shows the sub-model order syntax (={ A, B, C } @ 3=) vs a blanket =/o:3= bump, stressing targeted not uniform escalation. Cites NIST combinatorial-testing work. *** 2026-05-22 Fri @ 14:35:16 -0500 Clarified PICT ~ syntax + honest generator-availability path in pairwise-tests Added a "~ prefix" explanation (PICT marker tagging a value as negative/invalid, not an arithmetic operator; PICT pairs negatives with valid values once and strips the marker before the SUT) and a stop-at-the-model rule: if neither the =pict= binary nor =pypict= is present, produce the model and stop rather than hand-writing a table and passing it off as PICT output. *** 2026-05-22 Fri @ 14:43:17 -0500 Renamed Metrics → Measures throughout create-v2mom Full rename across =.claude/commands/create-v2mom.md= (acronym expansions, Phase 7 heading, the "Measures must be measurable" principle, exit criteria, review questions, red flags, examples) to match Salesforce's official term. Kept the "vanity metrics" idiom intact — it's the anti-pattern term, not a section reference. *** 2026-05-22 Fri @ 14:43:17 -0500 Split strategy from execution in create-v2mom task migration Rewrote Phase 8 (and tightened Phase 5.5): tasks stay in the backlog grouped by method, and each method gains a one-line link to where its tasks live, instead of transplanting the task tree into the V2MOM. Strategy (V2MOM) and execution (backlog) are now explicitly separate sources of truth, keeping the V2MOM concise. *** 2026-05-22 Fri @ 14:43:17 -0500 Made create-v2mom obstacles operational (mitigation/owner/cadence) Phase 6 now captures, per obstacle: name, manifestation, stakes, mitigation, owner, and review cadence — with a worked example per domain (health/finance/software), a "good obstacle" characteristic, a Phase 9 review question, and a red flag for candid-but-not-operational obstacles. An obstacle without a countermove is now flagged as an observation, not a plan. *** 2026-05-22 Fri @ 14:43:17 -0500 Corrected and narrowed the Meincke citation in prompt-engineering Fixed the title to "Call Me A Jerk: Persuading AI to Comply with Objectionable Requests" (SSRN abstract_id=5357179) in all three spots (frontmatter, Seven Principles intro, References). Reframed the ~33%→72% result as what it is — a prompt-safety caution that persuasion raises compliance with objectionable requests — explicitly not evidence that persuasion framing improves engineering prompt quality. Kept the seven principles as a tone vocabulary. *** 2026-05-22 Fri @ 14:43:17 -0500 Added an eval-harness requirement to prompt-engineering critique mode Added critique step 7 + a checklist line: for fragile or reusable/production prompts, write 3-5 adversarial/edge inputs, run both the old and new prompt against each, and record the behavioral delta. A throwaway prompt can ship on the rewrite alone; a discipline/reused/production one can't. Without examples, "the rewrite is better" is an assertion, not a result. *** 2026-05-22 Fri @ 14:43:17 -0500 Added mandatory stale-entry + privacy pre-write checks to codify Added a "Mandatory pre-write checks" block at the top of Phase 3 (Write) in =.claude/commands/codify.md=: a stale-entry scan (update/remove no-longer-true entries in place, don't append contradictions around them) and a privacy/leak check carrying both questions verbatim — "safe if the project were public?" and "belongs in private memory instead?" — routing private content to auto-memory. Gates, not background guidance. *** 2026-05-22 Fri @ 14:06:41 -0500 Scoped review-code's CI-trust rule to reviewing, not shipping Expanded the False-Positive Filter bullet in =review-code/SKILL.md=: "trust CI, don't run builds" applies to reading a diff, not producing one. A pre-commit/pre-push flow still owes the local verification =verification.md= requires (run the suite or state "not run because..."). Closes the apparent contradiction with =verification.md= / =finish-branch=. *** 2026-05-22 Fri @ 14:06:41 -0500 Added private-vs-public CLAUDE.md citation modes to review-code Expanded the Content scope section in =review-code/SKILL.md= with two modes: a private/internal review cites =CLAUDE.md= directly; a public/team review translates the rule into the engineering reason it encodes and doesn't name the rules file (a teammate can act on the reason, not on a file they can't reach). Same principle =commits.md= states for personal tooling in public artifacts. *** 2026-05-22 Fri @ 13:48:14 -0500 Relaxed review-code "three strengths" to up-to-three-or-none Changed all three "three minimum" spots in =review-code/SKILL.md= (Strengths section, Critical Rules DO list, Anti-Patterns) to "up to three specific; say none found on a tiny or weak diff." Reframed the old "No Strengths section" anti-pattern as "Skipping strengths out of laziness" so a substantive diff still demands them while a weak one can honestly report nothing notable. Landed alongside Craig's adjacent edit telling reviewers not to explain why a strength is good (sycophantic padding). *** 2026-05-22 Fri @ 14:12:24 -0500 Removed review-process language from respond-to-review commit guidance Replaced the =fix: Address review — [description]= example (and the matching description-line phrasing) in =.claude/commands/respond-to-review.md= with "name the actual fix (=fix: validate export filename=), not the review that prompted it." Killed the non-ASCII dash and the process-in-commit pattern that conflicted with =commits.md=. *** 2026-05-22 Fri @ 14:12:24 -0500 Made respond-to-review fetch unresolved threads + resolve after verification Rewrote section 1 (Gather) in =.claude/commands/respond-to-review.md= to pull =reviewThreads= via =gh api graphql= with =isResolved=, skipping already-resolved threads so settled feedback isn't re-processed; top-level conversation comments still come from REST. Added a section-4 step: reply and resolve a thread only after the fix is verified, never before. *** 2026-05-22 Fri @ 14:12:24 -0500 Verified respond-to-cj-comments no longer embeds an absolute path (moot) Already resolved by a prior migration: =grep= for =/home/= and =/Users/= in =.claude/commands/respond-to-cj-comments.md= returns nothing. The public-writing section refers to the rules by name, not by local path. No edit needed. *** 2026-05-22 Fri @ 14:12:24 -0500 Closed respond-to-cj-comments humanizer/emacsclient fallback (largely moot) Overtaken by two later changes: =/humanizer= was replaced by =/voice personal= (no =/humanizer= invocation remains), and the mandatory =emacsclient= summary-open was replaced by the in-place VERIFY-task pattern (workflow line ~262, Craig's 2026-05-12 standing instruction). Only a stale descriptive phrase remained — tidied "humanizer's signs of AI writing" to "the signs of AI writing." The original fresh-environment-fallback concern no longer applies as written. *** 2026-05-22 Fri @ 14:51:37 -0500 Fixed finish-branch base-branch detection Rewrote Phase 2: resolve the base *branch name* in priority order (open PR's =baseRefName=, then =git symbolic-ref --short refs/remotes/origin/HEAD= stripped, then ask), and compute the merge-base *SHA* separately only where a commit range is needed. Made the branch-name-vs-merge-base distinction explicit, since the old command returned a SHA where a branch name was needed. *** 2026-05-22 Fri @ 14:51:37 -0500 Made finish-branch merge safer + worktree-aware Added pre-flight checks to Option 1 (Merge Locally): dirty-tree refusal with no auto-stash, protected-branch awareness, upstream-gated =git pull --ff-only=, and merge-commit-vs-rebase as a team-policy choice instead of a hardcoded =--no-ff=. Replaced the fragile =git worktree list | grep = detection with a =git rev-parse --git-dir= vs =--git-common-dir= comparison plus =git worktree list --porcelain= for the path. *** 2026-05-22 Fri @ 14:51:37 -0500 Added tool-availability + ceremony-scale paths to start-work Added a "Tool availability" section (graceful degradation when Linear MCP / =gh= / =/voice= / Playwright are missing — do what's available, surface what isn't, don't block) and a "Ceremony scale" section (trivial / small / standard tiers so a two-line fix skips ticket+branch+gates unless asked). The =humanizer= reference in the original item is moot — the file already uses =/voice= throughout. *** 2026-05-22 Fri @ 14:51:37 -0500 Resolved start-work claim-before-justify rollback risk Split the claim by tracker type: personal todo.org claims defer to after the Justify gate (a killed task needs no rollback), while team trackers (Linear/GitHub) still claim first to signal intent but record prior state (status, assignee, label) so the Phase 2 rollback restores exactly it. Updated the per-tracker rollback steps and the matching anti-pattern. *** 2026-05-22 Fri @ 14:28:41 -0500 Verified add-tests typescript-testing.md reference resolves (moot) Resolved since the audit: =languages/typescript/claude/rules/typescript-testing.md= now exists, and =add-tests/SKILL.md:68= references it by bare filename, the same way it references =python-testing.md= (both get copied into a project's =.claude/rules/=). The "missing file" premise no longer holds. No edit needed. *** 2026-05-22 Fri @ 14:28:41 -0500 Added a category-exception protocol to add-tests Added an exception note to step 7 (proposal) in =add-tests/SKILL.md=: pure adapters, generated code, tiny pass-through wrappers, and framework glue may skip a category that would only re-test the framework, but the skip must be stated and justified in the plan and the behavior covered at integration/E2E level — never a silent omission. Step 12 (write) now points back to "honor documented category exceptions." *** 2026-05-22 Fri @ 14:25:37 -0500 Added environment + recent-change capture to debug Phase 1 Added a fourth Phase-1 step in =debug/SKILL.md=: record versions, feature-flag/config state, dataset/fixture, seed/clock, concurrency, and recent commits/config-infra changes. Noted that intermittent bugs usually live in environment/state transitions (and "what changed recently" is often the fastest route), while a deterministic local bug only needs a one-liner. Updated the phase's closing recap to include the context. *** 2026-05-22 Fri @ 14:25:37 -0500 Constrained root-cause-trace defense-in-depth to boundaries Rewrote step b in =root-cause-trace/SKILL.md=: instead of "add a check at each layer that could have caught it," add one only at a layer that owns a boundary or invariant — ingress/trust, persistence, invariant-owning service, final render. Added the explicit rule that a pass-through function owning neither shouldn't get a duplicate null check (validation spam). Recast the three example layers as the boundary types. *** 2026-05-22 Fri @ 14:25:37 -0500 Required evidence + counterfactual per why in five-whys Expanded step 2 in =five-whys/SKILL.md=: each link now owes an evidence field (a log/commit/metric/config you can point to) and a counterfactual check (remove this cause — does the symptom above plausibly not happen?). Framed the counterfactual as the main guard against monocausal storytelling, and updated the worked example to show both fields. *** 2026-05-22 Fri @ 15:51:59 -0500 Added timebox + fresh-sources rules to brainstorm Phase 1 gained a "Timebox the dialogue" rule (aim for the one-sentence restatement in ~5-8 questions, then move on and park the rest as open questions). Phase 2 gained "Ground high-stakes claims in fresh sources" (check load-bearing claims about markets/regulations/tools/vendors/APIs against a current source; mark unverified ones as assumptions). The design-doc skeleton gained an "## Assumptions" section that distinguishes researched facts (with source) from assumptions (to confirm before building). *** 2026-05-22 Fri @ 14:59:32 -0500 Made arch-decide examples timeless + required citations Dated the MongoDB multi-document-transaction example (scoped to 2024-01) with a backing reference, and added a "Cite, don't assert" Do: every concrete technical claim about a tool/version/platform carries a link, doc, version, or "checked YYYY-MM" date, or gets a domain-neutral placeholder — so unsourced "X can't do Y" doesn't rot into stale fact. *** 2026-05-22 Fri @ 14:59:32 -0500 Standardized arch-decide ADR statuses + immutability rule Declared a canonical five-status set (Proposed, Accepted, Rejected, Deprecated, Superseded) with an explicit "no synonyms" line, and spelled out the immutability rule in the Don'ts: an accepted ADR's body is frozen, only status/link metadata changes, a changed decision gets a new superseding ADR and the old one stays as the historical record. *** 2026-05-22 Fri @ 14:59:32 -0500 Added Trust/Data/Compliance phase to arch-design Added a new Phase 4 (Trust, Data, and Compliance) before the paradigm shortlist: trust boundaries, data classification, abuse/misuse cases, privacy constraints, compliance evidence, and operational ownership — surfaced early so the architecture is drawn around them, not retrofitted by a downstream =security-check=. Threaded into the workflow list, brief template (new §6), review checklist, and anti-patterns. *** 2026-05-22 Fri @ 14:59:32 -0500 Split paradigms from tactical patterns in arch-design Split Phase 5's single mixed table into Step 1 (pick one paradigm: monolith/microservices/layered/event-driven/serverless/pipeline/space-based) and Step 2 (compose tactical patterns: DDD, hexagonal, CQRS, event sourcing — several or none, often per-module), with composition examples and an anti-pattern against treating DDD/CQRS as alternatives to a paradigm. Recommendation + brief now name a paradigm plus composed patterns. *** 2026-05-22 Fri @ 14:59:32 -0500 Expanded arch-document quality scenarios to the Q42 six-part template Replaced §10's thin "Under [condition]..." template with the arc42/Q42 six-part structure (source, stimulus, environment, artifact, response, response measure), each glossed, with the cart-checkout example rewritten across all six parts. A one-line prose form stays acceptable once all six parts are recoverable. *** 2026-05-22 Fri @ 14:59:32 -0500 Added staleness/ownership metadata to arch-document output Added a per-section metadata block (owner, generated-against SHA + date, review cadence, "stale-when" conditions) as an HTML-comment header plus a visible Doc-status note, with field-fill guidance, and a whole-document Doc Status table replacing the README's "Last Updated" stub. Wired into the review checklist and an "Undated docs" anti-pattern. *** 2026-05-22 Fri @ 14:59:32 -0500 Added confidence levels to arch-evaluate findings Added a "Confidence and Provenance" subsection: every framework-agnostic finding carries High/Medium/Low + how it was determined, with a required "Not fully checked because..." note when scale, runtime imports, reflection, or dynamic dispatch cap certainty. Updated the example findings and review checklist; a finding with no note now asserts a full read. *** 2026-05-22 Fri @ 14:59:32 -0500 Made arch-evaluate report skipped tool checks explicitly Replaced "skip silently" with explicit reporting: for each detected language whose tool isn't configured or can't run, emit an Info "tool not configured / not run" finding (with an example) so the audit shows what was and wasn't verified. A check that didn't run no longer reads as a pass. Updated workflow step 4 and the review checklist. *** 2026-05-22 Fri @ 14:51:37 -0500 Added notation/output fallback to c4-analyze + c4-diagram Both commands now treat C4 as notation-independent: a "Choosing a notation" section (draw.io XML, Structurizr DSL, Mermaid with native C4 types, PlantUML/C4-PlantUML) and a headless fallback that emits a text notation (Mermaid or Structurizr DSL) and skips PNG-export/desktop-open when =drawio= or a GUI is absent, rather than failing. draw.io is now one option, not the only one. *** 2026-05-22 Fri @ 14:51:37 -0500 Clarified C4 abstraction boundaries in c4-analyze + c4-diagram Added an "Abstraction boundaries" section to both: a Container is a separately deployable/runnable unit (not synonymous with a Docker container — a SPA or managed DB counts), a Component lives inside one Container and isn't separately deployable. Added a 4e "Verify single abstraction level" check that walks every element and relationship to confirm it stays at the diagram's level, notation-independent. *** 2026-05-22 Fri @ 15:10:35 -0500 Added "When You Cannot Verify" standard to verification.md Added a section requiring, when a verification command can't run, a four-part report: command attempted, why it couldn't run, risk left unverified, and the smallest next command for the user. States the principle that a check that didn't run is never reported as a pass — "unable to verify" is a required honest outcome, not silence. Placed after Red Flags. *** 2026-05-22 Fri @ 15:10:35 -0500 Added property-based + mutation testing escalation to testing.md Added an "Escalation Beyond Category and Pairwise" section: property-based testing for invariants over a broad input domain (round-trips, idempotence, ordering — Hypothesis/fast-check/proptest) and mutation testing for when high line coverage hides thin assertions (mutmut/cosmic-ray/Stryker). Both framed as escalation paths to reach for on a gap, not gates on every unit. *** 2026-05-22 Fri @ 15:10:35 -0500 Added a disciplined spike protocol to testing.md Formalized the existing "I need to spike first" excuse-table row into a "Spike Exception (Disciplined)" subsection under TDD Discipline: TDD stays the default, but a spike is sanctioned when all three hold — timeboxed, spike code not committed, and the first failing test written before productionizing the discovered approach. Built on the existing row rather than contradicting it. *** 2026-05-22 Fri @ 15:10:35 -0500 Added pre-dispatch availability + cost checks to subagents.md Added a "Pre-Dispatch Checks" section with two gates: Availability (no Agent capability → do the work in the main thread under the same scope/constraints/output discipline the contract would enforce) and Cost (when writing the full contract costs more than the task, do it inline). Cross-references the existing "Don't Subagent At All" section and "Subagenting trivial work" anti-pattern rather than duplicating. *** 2026-05-22 Fri @ 15:06:04 -0500 Revised python-testing SQLite guidance toward production-like DBs Replaced "prefer in-memory SQLite for speed" with: run ORM/query tests against a production-like DB (same engine as prod, often containerized), since SQLite diverges from Postgres/MySQL on query semantics, constraints, transactions, JSON, time zones, and indexes (a test can pass on SQLite and fail in prod). SQLite stays only for pure unit tests with no DB-semantics dependency. *** 2026-05-22 Fri @ 15:06:04 -0500 Clarified python-testing ORM-mocking boundary Changed the "never mock" bullet from "ORM queries" to "ORM internals (querysets, sessions, model internals)" and added a paragraph: domain services use real model methods/validation, but a thin orchestration unit can inject a fake at a deliberate data-access port (a repository/interface the code owns). That's still mocking at a boundary, not at ORM internals. *** 2026-05-22 Fri @ 15:06:04 -0500 Made elisp.md editing advice tool-agnostic Rephrased the "prefer Write over repeated Edits" bullet around intent: land nontrivial Elisp as one cohesive change rather than dribbling it in over tiny partial edits (which accumulate paren mismatches), and run paren-balance + byte-compile checks immediately after, whatever editing mechanism the environment uses. *** 2026-05-22 Fri @ 15:06:04 -0500 Added batch-mode + native-comp caveats to elisp-testing.md Added three sections: Batch-Mode Reproducibility (=emacs --batch= as source of truth, no interactive-session state, no blocking prompts, deterministic), Isolating Emacs State (temp =user-emacs-directory=, explicit load-path, declared deps only, with an unwind-protect sandbox example), and Byte-Compile/Native-Comp Warnings (=byte-compile-error-on-warn=, native-comp gated on =native-comp-available-p= and kept opt-in/version-aware). *** 2026-05-22 Fri @ 15:16:22 -0500 Synced hooks/README install snippets with the destructive hook (opt-in) Brought the README's manual-install and settings-JSON snippets in line with the canonical =hooks/settings-snippet.json= (which already wires all three) and the Makefile's opt-in design: added the destructive-bash-confirm.py symlink as an opt-in step, added its settings entry, and reworded the note to say all three are no-op-safe but the destructive gate is opt-in (=make install-hooks= excludes it by default — link manually before relying on the snippet entry). *** 2026-05-22 Fri @ 15:35:06 -0500 Hooks now scan file-backed commit/PR messages Added =read_referenced_file()= to =_common.py= (safe local read: missing/oversize/non-UTF-8 → None) and wired it in: =git-commit-confirm.py= =extract_commit_message= now handles =-F=/=--file=/=--file==== (reads + scans the file, falls through to UNPARSEABLE → asks if unreadable), and =gh-pr-create-confirm.py= reads =--body-file= content instead of a placeholder. Attribution scanning now sees the real committed/posted text. Built a pytest harness (=hooks/tests/=, importlib-by-path loader for the hyphen-named hooks) and wired =hooks/tests= into =make test=. 54 hook tests pass; full suite green. *** 2026-05-22 Fri @ 15:35:06 -0500 Rewrote destructive-bash rm parsing on shlex =detect_rm_rf= now tokenizes with =shlex.split= instead of a whitespace split, so quoted/spaced paths and combined/separate/reordered flags (=-rf=, =-r -f=, =-fr=, =--recursive=/=--force=) all parse. Fails toward asking — returns a sentinel that still fires the modal — on unbalanced quotes or when a forced recursive rm coexists with a compound/pipeline/substitution/redirect construct. Documented the supported/unsupported shell constructs in the docstrings, and extended the dangerous-path banner to =$HOME=-prefixed and wildcard targets. Covered by 25 new tests. (Pre-existing, out-of-scope: path-prefixed =rm= like =/bin/rm= still isn't matched.) ** DONE [#B] Add =make remove= for interactive ruleset removal via fzf CLOSED: [2026-05-22 Fri] Shipped: =scripts/remove.sh= (three modes — =--list=, =--remove-selected= reading stdin, and the default fzf-multi interactive flow) + =make remove= target + =scripts/tests/remove.bats= (5 cases). Lists only symlinks resolving into the repo (foreign links left alone); rm's picked links while leaving repo sources untouched; reports-and-continues on a missing target; quiet no-op on empty selection. shellcheck clean, make test green. Dropped the stale =bridge= entry per the note below. Add a Makefile target that lists every currently-installed ruleset entry and lets me pick one or more to remove via fzf. Granular alternative to =make uninstall= (removes everything) and =make uninstall-hooks= (removes only hooks). *** Why this matters Tearing down a single skill, rule, hook, or config file currently means either running =make uninstall= and re-installing what I want to keep, or =rm=ing the symlink directly and remembering the exact path. Both are friction. An interactive picker lets me filter, multi-select with Tab, and confirm with Enter — the typical fzf flow. Costs about 3-5 seconds per teardown instead of 15+ seconds of "what's the exact name?". *** Design The recipe builds a tab-separated list of every currently-installed item, categorized by type, and pipes it to =fzf --multi=. The user filters, marks with Tab, and confirms with Enter. The recipe parses the selections and =rm=s the matching symlinks. #+begin_example skill debug rule commits.md hook destructive-bash-confirm.py config settings.json commands commands bridge claude-rules #+end_example Each line is =\t=. The recipe maps == to the right path: - =skill= → =$(SKILLS_DIR)/= - =rule= → =$(RULES_DIR)/= - =hook= → =$(HOOKS_DIR)/= - =config= → =$(CLAUDE_DIR)/= - =commands= → =$(CLAUDE_DIR)/commands= - =bridge= → =$(SKILLS_DIR)/claude-rules= Source files in =rulesets/= stay untouched. =make install= re-creates the removed links if needed (the install loop is idempotent). *** Edge cases - Esc instead of Enter → empty selection → clean exit, no removal. - Filter to nothing then Enter → same as Esc. - Selected item already gone → =rm= fails visibly, processing continues on the rest. - =fzf= not installed → fail fast with a clear error (matches the pattern used by =install-lang=). *** Possible extensions - Parallel =make pick-install= target that lists not-yet-installed items and installs the chosen ones. Symmetric UX, same fzf flow. - Confirmation prompt when more than N items selected (defense against accidental select-all). - =--source= flag that also runs =git rm= against the rulesets source for the selected item. Probably bad idea — too easy to lose work. - The =bridge → $(SKILLS_DIR)/claude-rules= entry above is stale — the bridge symlink got removed in a later commit. Drop that bullet when the recipe lands. ** DONE [#B] Document the =mcp/= install pipeline in =mcp/README.org= CLOSED: [2026-05-22 Fri] Wrote =mcp/README.org= covering everything in the "what to cover" list: the file layout (tracked vs gitignored), the secrets-bundle shape (plain =${VAR}= secrets + base64-bundled OAuth artifacts, AES256 symmetric =gpg -c=), the install flow (decrypt → materialize keys/token caches at mode 600 → expand → register unregistered, idempotent), the http/sse-vs-stdio transport split, token rotation when a Google refresh token is revoked, and adding a new server. Grounded in a read of the actual =install.py= + =servers.json=. =mcp/= has =install.py=, =servers.json=, =secrets.env.gpg=, =gcp-oauth.keys.json= (gitignored, regenerated at install). No README. Coming back to this in three months I'll re-discover how the bundle is structured, what =install.py= does, and how to rotate tokens. Saving that re-discovery is the whole point. *** What to cover - Layout: what each file is, which are tracked vs gitignored. - Secrets bundle shape: how vars are listed in =secrets.env=, the symmetric-encryption pattern (=gpg -c --cipher-algo AES256=), the base64-bundled OAuth artifacts (=GCP_OAUTH_KEYS_JSON_B64=, =GOOGLE_DOCS_PERSONAL_TOKEN_B64=, =GOOGLE_DOCS_WORK_TOKEN_B64=). - Install flow: =make install-mcp= → =install.py= decrypts, writes the keys file and Google Docs token caches at mode 600, expands =${VAR}= in =servers.json=, calls =claude mcp add --scope user= for unregistered servers. Idempotent. - Token rotation: when a refresh token gets revoked, the recovery flow (re-auth on one machine, re-bundle, recommit). - Adding a new server: edit =servers.json=, add any new =${VAR}= placeholders to the bundle, re-encrypt. - The OAuth dance for HTTP-transport servers (linear, notion) versus stdio (google-docs-*) — different paths, different gotchas. ** DONE [#C] Add =make uninstall-mcp= + =mcp/install.py --check= for symmetry :feature:solo:quick: CLOSED: [2026-05-28 Thu] :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: Currently the MCP install pipeline only flows one direction. No way to remove rulesets-managed MCP servers in one command. No way to ask "what's the drift between =servers.json= and =claude mcp list=" without eyeballing. *** =make uninstall-mcp= Iterate over =servers.json=, run =claude mcp remove -s user= for each. Ignore "not registered" errors. Idempotent. *** =mcp/install.py --check= Dry-run mode. Decrypt secrets, but instead of registering, print the drift report: - Servers in =servers.json= not in =claude mcp list= → =MISSING= - Servers in =claude mcp list= not in =servers.json= → =EXTRA= - Servers in both → =ok= Useful for diagnosing connection failures and for the eventual =make doctor= integration. ** DONE [#C] Update =README.org= with MCP install pipeline section :chore:solo:quick: CLOSED: [2026-05-28 Thu] :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: =README.org= covers global install, per-project language bundles, and design principles, but doesn't mention =make install-mcp= or the =mcp/= directory. Add a short section after "Per-project language bundles" describing the user-scope MCP install pattern (decrypt → expand → register) and pointing at the eventual =mcp/README.org=. ** DONE [#C] Consolidate =claude-templates/Makefile= after fold :chore:quick:solo: CLOSED: [2026-05-28 Thu] :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: Sibling follow-up from the fold child (2026-05-15). After the subtree merge, =rulesets/claude-templates/Makefile= still has its standalone =install= / =uninstall= / =list= / =test-scripts= targets. The =install= target's =bin/ai= logic is now duplicated in =rulesets/Makefile=. Both work; the redundancy is harmless but worth cleaning up. Options: - *Delete* =claude-templates/Makefile= entirely — forces all install through rulesets root. Cleaner. - *Strip down* to just =test-scripts= — the one piece not redundant with =rulesets/Makefile=. - *Leave it* — slight redundancy, no functional harm. Triggered by: 2026-05-15 fold session's refactor audit (commit =2d645fc=). ** DONE [#C] Run =--archive-done= sweep at start of =open-tasks.org= Phase A :chore:quick:solo: CLOSED: [2026-05-28 Thu] :PROPERTIES: :CREATED: [2026-05-28 Thu] :LAST_REVIEWED: 2026-05-28 :END: From pearl handoff 2026-05-28. =open-tasks.org= Next Mode reads =* Project Open Work= and skips =* Project Resolved= correctly, but a level-2 task that completed during a session sits as =** DONE= under Open Work until something archives it. Between cleanups, a freshly-DONE task can surface as a "what's next" candidate. Proposed fix: as the first step of =open-tasks.org= Phase A, run =emacs --batch -q -l .ai/scripts/todo-cleanup.el --archive-done todo.org=, then read =todo.org=. The cleanup tool already exists; this is wiring it into the workflow. Cost: a few hundred ms at the start of every "what's next" invocation. Win: recommendations never include DONE work. Optional refinement: gate behind a check for read-only / dry-run mode if that's ever introduced. The default invocation archives. ** DONE [#C] Triage Codex enhancement backlog :spec: CLOSED: [2026-05-28 Thu] :PROPERTIES: :CREATED: [2026-05-28 Thu] :LAST_REVIEWED: 2026-05-28 :END: Triaged interactively 2026-05-28. Disposition table for all 14 items lives at [[file:docs/design/2026-05-28-rulesets-enhancement-backlog.org][2026-05-28-rulesets-enhancement-backlog.org]] under "Triage Dispositions": 3 accepted (filed below as TODOs), 3 pilot/scope-limited (filed below), 2 marked as conventions rather than tracked tasks, 6 rejected with rationale. Items #1 and #2 already had homes (#16 and the Phase-1 codex TODO). ** DONE [#C] Canonical/mirror drift detection via pre-commit hook or =make sync-check= :feature:quick:solo: CLOSED: [2026-05-28 Thu] :PROPERTIES: :CREATED: [2026-05-28 Thu] :LAST_REVIEWED: 2026-05-28 :END: From the codex enhancement backlog (item #7), reframed: don't dedupe the dual source — the canonical-in-=claude-templates/= + mirror-in-=.ai/= pattern is a feature (other projects rsync from the canonical; the mirror lets rulesets-as-a-project have a working copy). The real pain is sync-discipline overhead — every workflow edit needs both copies updated, and forgetting one leaves the next startup's rsync to surface the drift. Scope: write a small =scripts/sync-check.sh= (or fold into the existing Makefile) that diffs =claude-templates/.ai/workflows/= against =.ai/workflows/=, exits non-zero on drift. Wire as a pre-commit hook (=githooks/pre-commit= or equivalent) so the discipline is enforced before publish, not at the next startup. =make sync-check= as a manual entry point. Verification: introduce a deliberate diff, commit, hook should block. Restore parity, hook should pass. ** DONE [#C] Add =make status= — compose audit + doctor + open-task count :feature:quick:solo: CLOSED: [2026-05-28 Thu] :PROPERTIES: :CREATED: [2026-05-28 Thu] :LAST_REVIEWED: 2026-05-28 :END: From the codex enhancement backlog (item #12), scope-limited: =make status= only. Reject the rest of #12 (=make sync= duplicates the existing sync flow; =make health= wraps existing checks without adding signal; =make bootstrap-project= duplicates =install-ai= + =install-lang=). Scope: one Makefile target that prints a compact summary of: - Install audit state (clean / drift, calling =make audit=). - Machine-global doctor state (calling =make doctor=). - Open-task count (top-level entries in =todo.org= under =* Rulesets Open Work=). - Inbox count (files in =inbox/= excluding =.gitkeep= and =PROCESSED-= prefixes). - Git working-tree status (clean / dirty, ahead/behind upstream). Output should be roughly 10 lines, scannable in one glance. Composes the existing checks; no new logic except the summary formatting. ** DONE [#C] Iteration-history backfill for spec-review and spec-response :docs:followup: CLOSED: [2026-05-28 Thu] Source: org-drill inbox 2026-05-28. Once the in-flight WIP lands (the requirement that specs carry a bottom =Review and iteration history= section, with iteration / date / contributor / role / what / why / artifacts), backfill the two workflow files themselves using rulesets' session history as evidence. Files to update: - =claude-templates/.ai/workflows/spec-review.org= - =claude-templates/.ai/workflows/spec-response.org= Investigation: search =.ai/sessions/=, =.ai/notes.org=, inbox archive, and git log for mentions of these workflow docs. Identify review/response/design iterations, dates, and contributors (including agents where known: Claude Code, Codex, local models). Distinguish high-confidence history (commits, dated session entries) from inferred (chat-only context). Recommend whether enough evidence exists to populate the section, and draft the entries if so. Dependency: spec-review.org and spec-response.org have uncommitted edits in flight. Wait for those to land before writing to the files. The read-only research portion (search sessions, identify iterations, draft entries to a scratch file) can run in parallel without conflict. :PROPERTIES: :LAST_REVIEWED: 2026-05-28 :END: