#+TITLE: Rulesets — Open Work #+AUTHOR: Craig Jennings #+DATE: 2026-04-19 Tracking TODOs for the rulesets repo that span more than one commit. Project-scoped (not the global =~/sync/org/roam/inbox.org= list). * Rulesets Open Work ** TODO [#A] Check that memories are sync'd across machines via git.m #+begin_src cj: comment this means we need to link the memory file in ~/.claude if it's not already #+end_src ** TODO [#B] Document rulesets + claude-templates pull-before-project ordering in protocols.org Startup currently pulls claude-templates in Phase A.0 and fast-forwards the project repo, but the rulesets repo (=~/code/rulesets/=) isn't pulled at all -- rule changes there don't reach the agent without a manual pull. The ordering and the "resolve any issues before proceeding" expectation also live in =startup.org= rather than =protocols.org= (the single entry point). Required ordering: rulesets first, then claude-templates, then local project. Resolve dirty-tree / merge issues at each step before moving on. Goal: every session starts against the freshest behavioral rules and workflow templates, not a stale local snapshot. Changes needed: 1. Add a rulesets pull step to =startup.org= Phase A.0, mirroring the existing claude-templates ff-only pull logic. 2. State the ordering and the "resolve before proceeding" rule early in =protocols.org= itself, not buried in a workflow file. ** TODO [#A] Build =create-documentation= skill for high-quality project/product docs Create a Claude skill named =create-documentation= that can plan, write, refresh, and review software documentation across README files, project docs, developer guides, API docs, operational docs, and generated/published doc sites. This is broader than =arch-document=. =arch-document= should remain the architecture-specific arc42 skill. =create-documentation= should know when to delegate to it for architecture documentation, but its main job is the full documentation system around a product or repo: onboarding, tutorials, how-to guides, reference, explanation, operations, troubleshooting, contribution, release/upgrade, and publication format. *** Why this matters The repo currently has strong skills for architecture, testing, review, debugging, and workflow. It does not have a general documentation skill that: - Chooses the right documentation type for the user need. - Audits existing docs against code and expected user journeys. - Creates a coherent doc map instead of dumping everything into =README.md=. - Writes in a consistent technical style. - Decides source/publish format intentionally (=.md=, =.org=, generated =.html=, OpenAPI, etc.). - Treats docs as a maintained product surface with verification, ownership, navigation, accessibility, and freshness checks. *** Research notes **** Documentation frameworks and best-practice sources - Diataxis separates documentation by reader need: - Tutorials: learning-oriented, take the reader by the hand. - How-to guides: task-oriented, solve a specific real problem. - Reference: information-oriented, accurate and complete lookup material. - Explanation: understanding-oriented, concepts, background, tradeoffs. Source: [[https://diataxis.fr/][Diataxis]] and the official guidance around tutorials/how-to/reference/explanation. - Django explicitly documents this same organization and teaches readers how to navigate it: tutorials for beginners, topic guides for concepts, reference for APIs, how-to guides for recipes. This is a major reason the docs feel navigable despite large scope. Source: [[https://docs.djangoproject.com/en/5.2/][Django documentation]] - Kubernetes separates concepts, tasks, tutorials, and reference. It also has current/previous-version docs, localization, contribution paths, and task-focused landing pages. Its docs are good at answering "what is this?" separately from "how do I do one thing?" Sources: [[https://kubernetes.io/docs/home/][Kubernetes docs home]], [[https://kubernetes.io/docs/tasks/][Kubernetes tasks]], [[https://kubernetes.io/docs/tutorials/][Kubernetes tutorials]] - Write the Docs emphasizes docs that are precursory, participatory, exemplary, consistent, current, discoverable, addressable, cumulative, and comprehensive. Especially important: incorrect docs are worse than missing docs, and examples should cover common use cases without overwhelming the reference. Source: [[https://www.writethedocs.org/guide/writing/docs-principles/][Write the Docs principles]] - Google developer docs guidance emphasizes project-specific style first, clarity and consistency, conversational but not frivolous tone, active voice, second person, descriptive links, global audience, accessibility, sentence case headings, numbered lists for procedures, code font for code, and alt text for images. Sources: [[https://developers.google.com/style/][Google developer documentation style guide]], [[https://developers.google.com/style/highlights][Google style highlights]], [[https://developers.google.com/style/accessibility][Google accessible docs]] - Google's doc best-practices page adds a pragmatic maintenance principle: minimum viable documentation, update docs with code, delete dead docs, prefer good over perfect, tell the story of code, and avoid duplication. Source: [[https://google.github.io/styleguide/docguide/best_practices.html][Google documentation best practices]] - The Good Docs Project is useful as a template source, especially for README, how-to, tutorial, concept, reference, troubleshooting, contributor, and release-note patterns. Do not vendor wholesale; use as prior art. Source: [[https://www.thegooddocsproject.dev/][The Good Docs Project]] **** Praised project docs to analyze and steal from ***** Django Why it works: - It labels the doc types directly and explains when to use each. - It has a beginner path, advanced tutorials, topic guides, API reference, how-to recipes, deployment, security, testing, release notes, and community help in one coherent index. - It is versioned, so readers know which framework version the docs target. - It cross-links introductory material to deeper references without making the first page a wall of every detail. Patterns to use: - Make the top-level docs home a routing page by reader intent. - Put "How these docs are organized" near the top when the doc set is large. - Split concept, task, tutorial, and reference instead of mixing them. - Include "getting help" and "not found?" paths so the docs have an exit ramp. Source: [[https://docs.djangoproject.com/en/5.2/][Django documentation]] ***** Kubernetes Why it works: - It has a large, complex product but maintains separate lanes for Concepts, Tasks, Tutorials, Reference, and Contribute. - Task pages are short sequences for one operation; tutorials are larger goals with several sections. This prevents "one page tries to teach everything." - It exposes version state clearly, including static old versions and current docs. - It supports localization and documentation contribution, which makes the docs a product surface rather than a side artifact. Patterns to use: - For platform or infrastructure docs, include Concepts / Tasks / Tutorials / Reference as first-class folders. - Create version/freshness metadata when docs are tied to released software. - Add doc contribution guidance for projects with external contributors. - Make operational tasks discoverable by category, not just search. Sources: [[https://kubernetes.io/docs/home/][Kubernetes docs home]], [[https://kubernetes.io/docs/tasks/][Kubernetes tasks]] ***** Rust Why it works: - Rust has a "bookshelf" rather than one overloaded manual: The Book, Rust by Example, standard library API reference, Reference, Cargo Guide, Error Index, Rustonomicon, release notes, platform support, policies, etc. - The learning path is honest about audience: "assume programmed before, not in any specific language." - Reference and learning material are separated. Advanced unsafe guidance gets its own book. - Offline docs via =rustup doc= are treated as part of the product. Patterns to use: - For broad ecosystems, create a documentation bookshelf rather than a single mega-doc. - Separate beginner path, examples, formal reference, advanced/unsafe topics, tooling docs, error index, release notes, and policies. - Document assumptions about reader experience. - Consider offline/local docs for CLI/library ecosystems. Source: [[https://doc.rust-lang.org/][Rust documentation]] ***** Stripe API docs Why it works: - The API reference is organized around resources and common cross-cutting concerns: authentication, errors, idempotency, pagination, request IDs, versioning, metadata, connected accounts. - It pairs prose with concrete request/response examples and client-library language selection. - It exposes test-mode vs live-mode distinctions early. - It offers "Copy for LLM" / "View as Markdown", which acknowledges modern consumption patterns without sacrificing normal docs UX. - Its reputation comes from matching developer mental models and making the common path implementable quickly, not just visual polish. Patterns to use: - API docs should be generated from or checked against OpenAPI/JSON schema or source annotations wherever possible. - Keep cross-cutting API behavior near the front, before endpoint lists. - Include runnable examples, auth, errors, pagination, versioning, idempotency, and sandbox/test data. - Consider LLM-friendly exports (=llms.txt=, "view as Markdown", stable anchors), but do not make the docs only for AI. Source: [[https://docs.stripe.com/api][Stripe API Reference]] ***** FastAPI Why it works: - Documentation is part of the framework's value proposition: OpenAPI and JSON Schema drive interactive Swagger UI and ReDoc automatically. - It reduces manual drift for API reference by deriving docs from typed code. - It integrates examples and tutorial-style explanations with standards-based generated reference. Patterns to use: - Prefer generated API reference from code/specs over hand-maintained endpoint tables. - Generated docs need human-written overview, concepts, authentication, examples, and operational guidance around them. - The skill should identify when an OpenAPI/Swagger/ReDoc/Scalar route already exists and improve metadata/schema quality instead of creating duplicate manual docs. Source: [[https://fastapi.tiangolo.com/features/][FastAPI features]] *** Format and presentation decisions **** Default source format: Markdown Use =.md= as the default for shared project documentation when: - The repo is on GitHub/GitLab/Forgejo and readers browse docs in the web UI. - The project already uses MkDocs, Docusaurus, VitePress, Sphinx+MyST, Jekyll, GitHub Pages, or plain README-driven docs. - Contributors are expected to edit docs without Emacs-specific tooling. - The docs need easy static-site publishing. - The content is README, tutorial, how-to, reference, troubleshooting, contributing, release notes, runbooks, or ordinary prose + code blocks. Markdown source works well because it is low-friction, reviewable in diffs, rendered by repository hosts, and supported by documentation site generators. MkDocs is a good reference point: Markdown source, YAML config, built-in dev server, static HTML output, and easy hosting. Source: [[https://www.mkdocs.org/][MkDocs]] **** Use Org when the document is Emacs-native or personal/planning-heavy Use =.org= when: - The user's workflow is explicitly Emacs/org-mode. - The document contains TODO states, schedules, priorities, tags, agenda integration, property drawers, clocking, or personal planning. - The document is an internal strategy/planning artifact such as V2MOM, research notes, meeting notes, task triage, or a living personal operating document. - The output may later be exported, but the source of truth is intended to be edited in org-mode. Do not default team-facing documentation to =.org= unless the team already uses org-mode. Org can export to HTML, but that does not make it the right authoring format for non-Emacs contributors. Sources: [[https://orgmode.org/org.html][Org manual]], [[https://orgmode.org/worg/org-tutorials/org-publish-html-tutorial.html][Org publish HTML tutorial]] **** Use HTML as generated/published output, rarely as hand-authored source Use =.html= when: - The deliverable is a published static documentation site. - The document needs interactive widgets, embedded API consoles, custom layout, or generated navigation/search. - The project already publishes docs as a website. - The target audience needs searchable, browsable, linkable pages rather than repo-local files. Prefer generated HTML from Markdown/Org/reStructuredText/AsciiDoc/OpenAPI over hand-authored HTML. Hand-edit HTML only for standalone artifacts, custom landing pages, or cases where the project already treats HTML templates as docs source. **** Consider generated/spec-backed formats Use generated reference when possible: - API reference: OpenAPI/Swagger/ReDoc/Scalar from code/spec. - CLI reference: generated from command parser/help output. - Library API reference: language-native doc tools such as rustdoc, pydoc, TypeDoc, JSDoc, Go doc, Sphinx autodoc, etc. - Config reference: generated from schema, types, or validated defaults. The skill should not duplicate generated reference by hand. It should improve source comments, schema descriptions, examples, front matter, and surrounding guides. **** Presentation requirements Every generated doc set should have: - A docs home or README that routes by reader intent. - Stable headings and anchors for addressability. - Descriptive link text, no "click here." - Search/navigation plan when docs exceed a handful of pages. - Version/freshness metadata when tied to released software. - Ownership/review cadence for docs likely to rot. - Accessible structure: semantic headings, alt text, no image-only info, tables only when appropriate, left-aligned text, readable code blocks. - Copyable commands and code examples. - "What changed?" / release notes / migration path when docs describe a new or changed behavior. - Troubleshooting path for common failures. - Clear prerequisites before procedures. - Verification steps after procedures. - Support/escalation path when the docs do not answer the question. - Optional LLM-friendly surfaces for larger doc sets: =llms.txt=, "copy as Markdown" equivalents, concise page summaries, and stable anchors. *** Proposed skill design **** Skill name and trigger Name: =create-documentation= Trigger when the user asks to: - create documentation, docs, README, guide, manual, runbook, tutorial, quickstart, API docs, CLI docs, troubleshooting docs, contributor docs, architecture-adjacent docs, release notes, upgrade guide, or doc site; - improve, audit, reorganize, or publish existing docs; - decide documentation structure or format for a project. Do not trigger for: - architecture-only arc42 docs when =arch-document= is the direct fit; - ADR creation (=arch-decide=); - design docs before implementation shape is known (=brainstorm= or =arch-design=); - prose polishing only (future writing/humanizer skill); - inline code comments/docstrings only, unless the user asks to create docs from them. **** V1 should be one orchestrating skill, not many separate skills Build v1 as one skill with explicit phases and subcommands rather than a set of separate skills. Rationale: - Documentation tasks often start ambiguous; the first job is classification. - Splitting too early creates command-discovery burden. - A single skill can dispatch to existing specialized skills (=arch-document=, =c4-diagram=, =security-check=, =playwright-js/py= for doc-site verification) without making users choose the internal pipeline. Support discoverable subcommands inside one skill: #+begin_example /create-documentation audit /create-documentation plan /create-documentation write /create-documentation refresh /create-documentation publish /create-documentation review #+end_example The default =/create-documentation = runs audit -> plan -> write -> review, asking for confirmation before broad rewrites. **** Future split if v1 gets too large If the skill grows past a manageable size, split into a discoverable =documentation-*= chain. Names and order: 1. =documentation-audit= — inventory existing docs, code/docs drift, reader journeys, missing doc types, stale/generated docs. 2. =documentation-plan= — choose audiences, doc map, formats, source of truth, publishing path, ownership, and freshness policy. 3. =documentation-write= — write or update the selected docs. 4. =documentation-reference= — generate or improve API/CLI/config/library reference from source/spec. 5. =documentation-publish= — configure MkDocs/Docusaurus/Sphinx/GitHub Pages or equivalent, build static HTML, verify links/search. 6. =documentation-review= — quality gate for accuracy, style, navigation, accessibility, examples, and freshness. Keep =create-documentation= as the orchestrator and user-facing entry point. The chain is discoverable because every helper starts with =documentation-= and the orchestrator prints the next command at each handoff. *** V1 workflow details **** Phase 1: Intake and classification Ask only what is missing from local context: - Who is the reader? New user, evaluator, integrator, maintainer, operator, contributor, auditor, support engineer? - What is the reader trying to do or understand? - Is this for a public project, internal team, personal workflow, regulated audience, or customer-facing product? - Is the output repo-browsed, web-published, printed/exported, or Emacs-native? - Is there existing code, existing docs, an API spec, generated reference, or only a concept? - What is the maintenance expectation? One-off, release-maintained, continuously updated? Classify the work into one or more doc types: - README / landing page. - Quickstart. - Tutorial. - How-to guide. - Concept/explanation. - API reference. - CLI reference. - Configuration reference. - Architecture docs (delegate to =arch-document= if arc42/C4/ADR-driven). - Operations/runbook. - Troubleshooting/FAQ. - Upgrade/migration/release notes. - Contributor/development docs. - Security/compliance docs. - Examples/cookbook. **** Phase 2: Audit existing material Inventory: - =README*=, =docs/=, =doc/=, =site/=, =mkdocs.yml=, =docusaurus.config.*=, =vitepress=, =sphinx=, =docs.rs=, =pkg.go.dev=, OpenAPI specs, generated docs folders, GitHub Pages config, ADRs, architecture docs, examples, scripts, CLI help, package metadata. - Existing doc type coverage: tutorial/how-to/reference/explanation. - Broken links, stale version numbers, commands that no longer exist, screenshots that may be stale, code snippets not exercised, doc/code drift. - Source of truth for generated docs. Flag generated files; do not hand-edit them until source is known. - Reader journey gaps: "new user can install?", "first success path?", "operator can recover?", "contributor can run tests?", "API consumer can authenticate and handle errors?" Use =rg= first. For API/CLI reference, prefer structured sources: OpenAPI/JSON Schema, package metadata, command =--help= output, docstrings, or language-native documentation tooling. **** Phase 3: Documentation plan Write a short plan before broad edits: - Audiences and priority order. - Proposed doc map/tree. - Doc type for each page. - Source format decision: =.md= / =.org= / generated spec / generated HTML. - Publishing target, if any. - Existing docs to preserve, move, merge, or delete. - Generated-reference strategy. - Ownership and freshness policy. - Verification plan. Stop for confirmation when the plan moves or rewrites more than one file. **** Phase 4: Write or update docs Writing rules: - Lead with the reader's goal, not the implementation history. - Put prerequisites before steps. - Use numbered lists for procedures. - Use bullets for non-ordered choices. - Use active voice and second person for instructions. - Keep sentences short and globally readable. - Define acronyms on first use. - Use code font for commands, file names, env vars, API names, and literals. - Use descriptive links. - Prefer examples that cover the common path and one meaningful edge/error path. - Separate examples/tutorials from dense reference. - Avoid stale duplication: link to canonical generated reference instead of copying it. - Include expected output after commands where it helps verification. - Include cleanup/rollback steps when procedures change state. - Include troubleshooting for common failures. - Avoid marketing voice in technical docs. State capability and constraints plainly. - No AI attribution in docs, examples, comments, generated pages, footers, or screenshots. Page skeletons: README / docs home: #+begin_example # ## Start here - New user: - Existing user with a task: - API lookup: - Maintainer/operator: ## Quick example ... ## Documentation map ... ## Support / contributing ... #+end_example Tutorial: #+begin_example # Tutorial: ## What you'll build ## Prerequisites ## Step 1 ... ## Checkpoint ## Step 2 ... ## What you learned ## Next #+end_example How-to: #+begin_example # How to ## When to use this ## Prerequisites ## Steps ## Verify ## Troubleshooting ## Related #+end_example Reference: #+begin_example # reference ## Summary ## Parameters / options / fields ## Behavior ## Errors ## Examples ## Version notes #+end_example Explanation: #+begin_example # ## Problem it solves ## Mental model ## How it fits with related concepts ## Tradeoffs and constraints ## Further reading #+end_example Runbook: #+begin_example # Runbook: ## Scope ## Preconditions ## Normal procedure ## Verification ## Rollback ## Alerts and escalation ## Post-incident notes #+end_example **** Phase 5: Presentation and publishing If docs are repo-local only: - Ensure links render on GitHub/GitLab. - Keep relative links stable. - Add an index if more than 4-5 docs exist. If docs are web-published: - Detect existing generator and follow it. - Prefer project-native tooling over introducing MkDocs/Docusaurus/Sphinx. - If no tooling exists and user wants a site, choose conservatively: - Python/simple repo: MkDocs Material is a pragmatic default. - JS/React ecosystem: Docusaurus or VitePress if already in stack. - Python libraries: Sphinx or MkDocs depending on existing ecosystem. - API docs: ReDoc/Swagger/Scalar from OpenAPI. - Build locally if dependencies exist. - Check links, nav, search, mobile viewport, and accessibility basics. - Do not commit generated =site/= output unless the project already does. **** Phase 6: Verification Verification should match doc type: - Commands in quickstarts/how-tos: run them or mark not run with reason. - Code snippets: compile/run where feasible, or use fenced language and note assumptions. - API docs: validate OpenAPI/spec if tooling exists. - Links: run link checker if configured; otherwise sample-check changed links. - Published site: build docs and inspect output. - Screenshots: verify current UI if included. - Generated docs: regenerate from source and confirm no unexpected diff. Final report must say: - Files created/changed. - Doc types covered. - Format/source-of-truth decisions. - What was verified. - What could not be verified. - Known gaps/follow-ups. *** Relationship to existing skills - =arch-document=: use when the requested docs are specifically architecture docs from brief + ADRs + C4/arc42. =create-documentation= may call it, then wrap the output in a broader docs map. - =c4-analyze= / =c4-diagram=: use for diagrams in architecture or concept docs when visual structure helps. - =brainstorm=: use before =create-documentation= when the product/feature itself is still unclear. - =arch-design= / =arch-decide=: use when documentation reveals missing architectural choices. - =security-check=: use when docs include security guidance, auth, secrets, deployment, or compliance claims. - =playwright-js= / =playwright-py=: use to verify published doc sites, interactive docs, screenshots, and browser-rendered examples. - =codify=: use after a documentation session reveals reusable project-specific documentation rules. *** Quality bar and anti-patterns The skill should reject: - A giant README that mixes tutorial, reference, architecture, and operations. - Duplicating generated API/CLI/config reference by hand. - Unverified commands in quickstarts without a "not run" note. - Screenshots with no alt text or no update path. - Tables used for layout instead of actual tabular data. - "Overview" pages that do not route readers to tasks. - Tutorials that become reference dumps. - How-to guides that explain concepts for pages before giving steps. - Reference pages that hide required options in prose. - Marketing claims without concrete examples. - Docs that mention local private paths, personal tooling, or AI attribution in public artifacts. - Publishing generated HTML as source unless the project explicitly owns HTML docs that way. *** Acceptance criteria for building the skill - [ ] Directory =create-documentation/= with =SKILL.md=. - [ ] Frontmatter description includes positive and negative triggers. - [ ] Skill body includes the V1 phases above. - [ ] Includes a source-format decision table for =.md= / =.org= / =.html= / generated spec/reference. - [ ] Includes doc-type classifier based on Diataxis plus README/runbook/API additions. - [ ] Includes examples/skeletons for README, tutorial, how-to, reference, explanation, runbook, troubleshooting, contributor docs, and API overview. - [ ] Includes audit checklist for existing repos. - [ ] Includes publishing guidance without hardcoding one static-site tool. - [ ] Includes verification checklist and "unable to verify" reporting. - [ ] Cross-references =arch-document=, =brainstorm=, =security-check=, =playwright-js=, =playwright-py=, and =codify=. - [ ] Adds =references/= only if needed; suggested files: - =references/doc-type-decision.md= - =references/style-guide.md= - =references/format-decision.md= - =references/page-skeletons.md= - =references/doc-audit-checklist.md= - [ ] Keep =SKILL.md= concise enough to load; move long skeletons/checklists to references for progressive disclosure. - [ ] Run =./scripts/lint.sh= after adding the skill. *** Open design questions before implementation - Should the user-facing command be exactly =/create-documentation= while internal helper names use =documentation-*=, or should all names share the =create-documentation = form? Recommendation: one skill with subcommands for v1. - Should Markdown be the hard default for team docs? Recommendation: yes, unless the project already uses org/reST/AsciiDoc or the output is personal Emacs-native planning. - Should the skill create a docs site automatically? Recommendation: no. It should propose a site when the doc set exceeds README-scale or when search, versioning, or public publishing is required. Ask before adding tooling. - Should it write docs before code exists? Recommendation: yes for specs, user journeys, and design docs, but route unclear feature/product decisions through =brainstorm= or =arch-design= first. - Should it include LLM-specific docs surfaces? Recommendation: optional for public/library/API docs: =llms.txt= or markdown export is valuable, but normal human navigation remains primary. ** TODO [#A] Review pass: tighten skills and rulesets after 2026-05-04 audit Source notes used in this pass: - C4 official docs: C4 is notation-independent; System Context and Container diagrams are enough for most teams; every diagram needs title, key/legend, explicit element types, and audience-appropriate abstraction. [[https://c4model.com/diagrams][C4 diagrams]], [[https://c4model.com/diagrams/notation][C4 notation]], [[https://c4model.com/abstractions/component][C4 component]] - arc42 docs: quality requirements need measurable scenarios; section 10 should reference top quality goals and capture lesser quality requirements with specific measures. [[https://docs.arc42.org/section-10/][arc42 section 10]], [[https://quality.arc42.org/articles/specify-quality-requirements][specifying quality requirements]] - ADR references: ADRs capture one justified architecturally significant decision and its rationale; Nygard's original guidance emphasizes short, numbered, repository-stored records and superseding rather than rewriting old decisions. [[https://adr.github.io/][adr.github.io]], [[https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions][Nygard ADR article]] - Playwright docs: prefer user-visible locators and web assertions; locators auto-wait and retry; =networkidle= is discouraged for testing readiness. [[https://playwright.dev/docs/best-practices][Playwright best practices]], [[https://playwright.dev/docs/locators][Playwright locators]], [[https://playwright.dev/docs/next/api/class-page][Playwright page API]] - OWASP references: Top 10 2021 includes Broken Access Control, Cryptographic Failures, Injection, Insecure Design, Security Misconfiguration, Vulnerable and Outdated Components, Identification and Authentication Failures, Software and Data Integrity Failures, Security Logging and Monitoring Failures, and SSRF; WSTG adds a broader testing map across configuration, identity, authn/z, sessions, input validation, error handling, cryptography, business logic, client-side, and API testing. [[https://owasp.org/Top10/2021/][OWASP Top 10 2021]], [[https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/][OWASP WSTG]] - V2MOM references: Salesforce calls the last M "Measures" and emphasizes a simple alignment document with prioritized Methods, explicit Obstacles, and measurable outcomes. [[https://trailhead.salesforce.com/content/learn/modules/selfmotivation/get-focused-with-your-personal-v2mom][Salesforce Trailhead personal V2MOM]], [[https://www.salesforce.com/blog/?p=12][Salesforce V2MOM alignment]] - Prompt research: the cited Meincke paper is titled "Call Me A Jerk: Persuading AI to Comply with Objectionable Requests"; its scope is persuasion increasing compliance with objectionable requests, not a general proof that persuasion framing improves prompt quality. [[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5357179][SSRN paper]] - Combinatorial testing references: NIST supports t-way combinatorial testing and notes pairwise is one covering strength, with higher-strength arrays useful for failures requiring more interacting factors. [[https://www.nist.gov/publications/practical-combinatorial-testing-beyond-pairwise][NIST beyond pairwise]], [[https://www.nist.gov/publications/combinatorial-software-testing][NIST combinatorial testing]] *** Grouped index (for batching by area) Each item below is a one-line summary of a sub-TODO further down. Tick the box when the matching sub-TODO is moved to =DONE=. Items are grouped by area so they can be batched (e.g., "do all Playwright items in one session"). **** Browser testing - [ ] [#A] =playwright-js=: locator/assertion-first guidance (replace raw CSS, =networkidle=) - [ ] [#B] =playwright-js= + =playwright-py=: reconcile headless/visible defaults - [ ] [#B] =playwright-js= + =playwright-py=: remove emoji console markers from examples **** Frontend / UI - [ ] [#B] =frontend-design=: WCAG 2.2 alignment, accessibility non-optional - [ ] [#B] =frontend-design=: harmonize aesthetic guidance with anti-pattern rules **** Security - [ ] [#A] =security-check=: OWASP 2021 + WSTG coverage - [ ] [#B] =security-check=: tooling and offline/network caveats **** Combinatorial testing - [ ] [#B] =pairwise-tests=: t-way escalation guidance beyond pairwise - [ ] [#B] =pairwise-tests=: clarify negative value syntax + generator availability **** V2MOM - [ ] [#A] =create-v2mom=: rename Metrics → Measures (Salesforce alignment) - [ ] [#B] =create-v2mom=: prevent task migration from turning V2MOM into a backlog - [ ] [#B] =create-v2mom=: mitigation/owner fields for Obstacles **** Prompt engineering - [ ] [#A] =prompt-engineering=: correct/narrow Meincke citation - [ ] [#B] =prompt-engineering=: eval-harness requirement for production prompts **** Codify - [ ] [#B] =codify=: stale-entry review + privacy checks before writing project =CLAUDE.md= **** Code review - [ ] [#A] =review-code=: resolve local-verification vs CI boundary - [ ] [#B] =review-code=: =CLAUDE.md= citation scope for public artifacts - [ ] [#B] =review-code=: relax three-strengths rule for tiny/failing diffs **** PR / review responses - [ ] [#A] =respond-to-review=: remove review-process language from commit messages - [ ] [#B] =respond-to-review=: use unresolved threads + resolution state - [ ] [#B] =respond-to-cj-comments=: drop personal absolute paths from public-writing - [ ] [#B] =respond-to-cj-comments=: fallback when =humanizer= or =emacsclient= unavailable **** Branch workflow - [ ] [#A] =finish-branch=: fix base-branch detection - [ ] [#B] =finish-branch=: worktree-aware pull/merge safety - [ ] [#B] =start-work=: tool-availability + ceremony-scaling rules - [ ] [#B] =start-work=: claim-before-justify rollback risk **** Tests / TDD - [ ] [#B] =add-tests=: fix missing =typescript-testing.md= reference or add ruleset - [ ] [#B] =add-tests=: explicit exceptions to "all three categories per function" **** Debugging / RCA - [ ] [#B] =debug=: capture environment + recent-change context before hypotheses - [ ] [#B] =root-cause-trace=: constrain defense-in-depth to trust boundaries - [ ] [#B] =five-whys=: require evidence + counterfactual validation per why **** Brainstorming - [ ] [#B] =brainstorm=: timebox + research/source rules for high-stakes designs **** Architecture - [ ] [#B] =arch-decide=: timeless examples, drop unverifiable claims - [ ] [#B] =arch-decide=: standardize statuses + immutability language - [ ] [#B] =arch-design=: threat modeling + privacy/compliance as first-class inputs - [ ] [#B] =arch-design=: separate paradigms from tactical patterns - [ ] [#B] =arch-document=: arc42/Q42 quality scenarios - [ ] [#B] =arch-document=: staleness + ownership metadata for generated docs - [ ] [#B] =arch-evaluate=: confidence levels for framework-agnostic findings - [ ] [#B] =arch-evaluate=: report skipped tool checks explicitly **** C4 modeling - [ ] [#A] =c4-analyze= + =c4-diagram=: notation/output fallback (not draw.io-only) - [ ] [#B] =c4-analyze= + =c4-diagram=: clarify abstraction boundaries **** Global rules - [ ] [#B] =commits.md=: split DeepSat/Linear/Slack-specific from global rules - [ ] [#A] =commits.md= + publish flows: =humanizer=-unavailable fallback - [ ] [#B] =verification.md=: explicit "unable to verify" reporting standard - [ ] [#B] =testing.md=: property-based + mutation testing as escalation paths - [ ] [#B] =testing.md=: soften absolute TDD with explicit spike protocol - [ ] [#B] =subagents.md=: capability/availability + cost checks **** Languages - [ ] [#A] =python-testing.md=: revisit in-memory SQLite guidance - [ ] [#B] =python-testing.md=: separate "never mock ORM" from unit-test boundaries - [ ] [#B] =elisp.md=: drop tool-specific advice - [ ] [#B] =elisp-testing.md=: batch-mode + native-comp caveats **** Hooks - [ ] [#A] =hooks/README.md=: include =destructive-bash-confirm.py= in install/settings snippets - [ ] [#A] =hooks/git-commit-confirm.py= + =gh-pr-create-confirm.py=: inspect message/body files referenced by =-F= / =--body-file= - [ ] [#B] =hooks/destructive-bash-confirm.py=: shell-aware command parsing (not regex) *** TODO [#A] =playwright-js=: replace raw CSS/page actions and =networkidle= defaults with locator/assertion-first guidance Current examples lean on =page.click=, =page.fill=, =waitForSelector=, and =waitForLoadState('networkidle')=. Official Playwright guidance prefers locators based on user-visible attributes, web assertions for readiness, and calls =networkidle= discouraged for testing. Keep reconnaissance, but revise it to wait for a visible app-specific landmark instead of treating network quiet as readiness. *** TODO [#B] =playwright-js= and =playwright-py=: reconcile headless/visible-browser defaults =playwright-js= says visible Chromium by default; =playwright-py= says headless by default. That may be intentional, but the difference should be explicit: interactive visual debugging -> headed, CI/pytest smoke tests -> headless. Add a small decision table so agents don't flip modes by habit. *** TODO [#B] =playwright-js= and =playwright-py=: remove emoji console markers from examples The broader rules discourage emojis in shared engineering output. The Playwright examples print camera/check/cross emoji. Replace with plain ASCII status prefixes. *** TODO [#B] =frontend-design=: make accessibility non-optional and align with WCAG 2.2 The workflow only loads =references/accessibility.md= for interactive components. Accessibility should be a baseline for all frontend work: keyboard operation, focus visibility/not-obscured, target size, contrast, reduced motion, labels, and semantic structure. Add WCAG 2.2-oriented gates before handoff. *** TODO [#B] =frontend-design=: harmonize aesthetic guidance with current UI anti-pattern rules The skill encourages gradient meshes, heavy texture, custom cursors, overlap, and maximalist directions. Those can conflict with the repo's newer frontend discipline against generic gradients, decorative blobs/orbs, text overlap, single-hue palettes, unreadable layouts, and marketing-style dashboards. Add a "creative but bounded" section: domain fit, readability, responsive stability, and no decorative effects that degrade the task workflow. *** TODO [#A] =security-check=: update OWASP coverage to the 2021 categories and WSTG test areas The current security checklist uses older category names and misses several current Top 10 items: Insecure Design, Software and Data Integrity Failures, Security Logging and Monitoring Failures, and SSRF. Expand the review table so each finding maps to either OWASP Top 10 2021 or a WSTG area, and add explicit checks for authorization object/function-level access, SSRF URL fetches, integrity of update/plugin paths, and security-relevant logging gaps. *** TODO [#B] =security-check=: add practical tooling and offline/network caveats Add optional use of project-configured scanners such as =gitleaks= or =trufflehog= for secrets, =semgrep= for source patterns, =pip-audit= / =npm audit= / OSV where configured, and lockfile diff review. Note that dependency audits may need network access and should report "not run" clearly rather than silently passing. *** TODO [#B] =pairwise-tests=: add t-way escalation guidance beyond pairwise Pairwise is a pragmatic default, but NIST's combinatorial testing work covers higher-strength t-way arrays too. Add a rule: start with pairwise for broad coverage, escalate selected high-risk parameter clusters to 3-way or higher when history, safety, security, or domain reasoning suggests faults require more than two interacting factors. *** TODO [#B] =pairwise-tests=: clarify negative value syntax and actual generator availability The examples use =~0= style values that are PICT-specific and easy to misread. Add a short "negative testing values are labels, not operators unless PICT treats them specially" explanation, and make the run path honest: if PICT or =pypict= is unavailable, produce the model and stop instead of implying cases were generated. *** TODO [#A] =create-v2mom=: rename "Metrics" to Salesforce's "Measures" or explicitly justify the deviation V2MOM's final M is officially "Measures." The skill uses "Metrics" throughout. Either rename the section and description to "Measures" or add a clear note that this fork intentionally says "Metrics" while preserving the V2MOM concept. *** TODO [#B] =create-v2mom=: prevent task migration from turning V2MOM into a backlog Salesforce presents V2MOM as a simple alignment framework. This skill's optional task-migration phase can make the V2MOM the entire todo system. Split strategy from execution: keep the V2MOM concise, and link to method-specific backlogs instead of embedding every task under the strategic document. *** TODO [#B] =create-v2mom=: add mitigation/owner fields for Obstacles The current Obstacles phase captures barriers but not consistently how each will be overcome. Add "mitigation, owner, and review cadence" per obstacle so the section becomes operational instead of just candid. *** TODO [#A] =prompt-engineering=: correct and narrow the Meincke citation The skill cites "Persuasion and Compliance in Large Language Models" but the paper found in research is "Call Me A Jerk: Persuading AI to Comply with Objectionable Requests." Revise the reference and avoid overgeneralizing the result: it shows persuasion can raise compliance with objectionable requests, which is a cautionary prompt-safety finding, not broad evidence that persuasion principles improve engineering prompt quality. *** TODO [#B] =prompt-engineering=: add an evaluation harness requirement for production prompts Prompt critique currently ends with a rewrite and checklist. Add a requirement for fragile or reusable prompts: create 3-5 adversarial/edge examples, run the old and new prompt against them, and record the observed behavioral delta. Without examples, prompt quality remains asserted rather than verified. *** TODO [#B] =codify=: add stale-entry review and privacy checks before writing project =CLAUDE.md= The skill has good gates, but it should explicitly scan for stale entries, private context, and team-visible leakage before appending. Add "would this be safe if the project were public?" and "does this belong in private memory instead?" as mandatory checks, not just table background. *** TODO [#A] =review-code=: resolve the local-verification vs CI boundary =review-code= says "Trust CI for lint, typecheck, test runs; don't re-run them." =verification.md= and =finish-branch= require fresh local evidence before completion. Clarify: code review should not duplicate CI while reading a PR, but pre-commit/pre-push workflows still need local verification or a clear "not run because..." statement. *** TODO [#B] =review-code=: handle public-artifact scope when citing =CLAUDE.md= The skill requires auditing and reporting =CLAUDE.md= adherence, while =commits.md= says personal tooling files should not be cited as authority in public artifacts. Add two output modes: private/internal review may cite =CLAUDE.md= directly; public/team review should translate the rule into the underlying engineering reason without naming personal rulesets. *** TODO [#B] =review-code=: relax mandatory "three strengths" for tiny or failing diffs "Three minimum" strengths can force filler on small diffs or bad PRs. Adjust to "up to three specific strengths; say none found when appropriate" so the review stays honest and avoids synthetic praise. *** TODO [#A] =respond-to-review=: remove review-process language from commit messages The skill suggests commits like =fix: Address review — [description]=, which conflicts with =commits.md='s "what changed and why, not the process" rule and also uses a non-ASCII dash. Replace with conventional subjects that name the actual fix, e.g. =fix: validate export filename=. *** TODO [#B] =respond-to-review=: use unresolved review threads and resolution state, not only flat comments Fetching inline and top-level comments via REST misses thread resolution and can re-process already-resolved feedback. Add the same thread-level workflow as the GitHub comment-addressing skill: gather unresolved threads, group by requested change, implement, reply, and resolve only after verification. *** TODO [#B] =respond-to-cj-comments=: remove personal absolute path references from public-writing instructions The skill embeds =/home/cjennings/code/rulesets/claude-rules/commits.md= in the public-writing section. That contradicts the public-artifact scope rule. Refer to "the commit/public-writing rules" internally, and ensure any emitted public text never cites the local path. *** TODO [#B] =respond-to-cj-comments=: add fallback when =humanizer= or =emacsclient= is unavailable The workflow requires =/humanizer= and opens long summaries in =emacsclient=. Neither is guaranteed in a fresh environment. Add tool-availability checks and fallbacks: apply the style passes inline if =humanizer= is absent, and write the summary file path without opening an editor if =emacsclient= fails. *** TODO [#A] =finish-branch=: fix base-branch detection Phase 2 says "determine base branch" but the command shown returns a merge-base commit SHA, not the branch name to check out, pull, merge into, or pass as PR base. Replace with explicit branch detection: upstream PR base if present, configured default branch from =origin/HEAD=, or user-selected branch, then compute merge-base separately. *** TODO [#B] =finish-branch=: make pull/merge steps safer and worktree-aware Option 1 runs =git pull= and =git merge --no-ff= after checkout. Add checks for dirty worktree, upstream tracking, protected branches, and rebase-vs-merge team policy. Worktree detection via grepping branch names is fragile; use =git worktree list --porcelain= or =git rev-parse --git-common-dir= based checks. *** TODO [#B] =start-work=: add tool-availability and ceremony-scaling rules The workflow assumes Linear MCP, GitHub CLI, =humanizer=, Playwright skills, and multi-commit TDD ceremony. Add a first-class "tools unavailable" path and a ceremony scale: trivial local fixes should not require the full ticket, branch, three approval gates, and commit-per-phase flow unless the user wants that process. *** TODO [#B] =start-work=: resolve the "claim before justify" rollback risk The skill marks Linear/GitHub/todo tasks in progress before the Justify gate, then says rolling back is required if justification fails. Consider moving claiming after Gate 1 for personal todo tasks, or make the rollback steps explicit per tracker with stored prior state. *** TODO [#B] =add-tests=: fix missing =typescript-testing.md= reference or add the ruleset Phase 3 references =typescript-testing.md=, but this repo currently has Python and Elisp testing rules only. Either add the TypeScript ruleset or change the skill to discover project-local JS/TS testing conventions instead of pointing to a missing file. *** TODO [#B] =add-tests=: add explicit exceptions to "all three categories per function" The Normal/Boundary/Error rule is useful, but some functions are pure adapters, generated code, tiny wrappers, or framework glue. Add an exception protocol: state why a category does not apply, and cover the behavior at the integration or E2E level when unit categories would test framework behavior. *** TODO [#B] =debug=: capture environment and recent-change context before hypotheses The debugging workflow covers reproduction and logs, but should explicitly record environment, versions, feature flags, data set, seed/time, concurrency, and recent commits/config changes. Many intermittent failures are environment or state transitions, not just local code paths. *** TODO [#B] =root-cause-trace=: constrain defense-in-depth to trust boundaries and invariants The skill says add defense at each intermediate layer that could have caught the bad value. That risks validation spam. Tighten it: add checks at ingress, trust boundaries, persistence boundaries, and invariant-owning layers; avoid duplicative null checks in every pass-through function. *** TODO [#B] =five-whys=: require evidence and counterfactual validation per why The skill says "one best-supported answer" but should require an evidence field for each link and a counterfactual check: if this cause were removed, would the next symptom plausibly disappear? This reduces monocausal storytelling. *** TODO [#B] =brainstorm=: add timebox and research/source rules for high-stakes designs The one-question-at-a-time flow can run long. Add a timebox and a rule that claims about markets, regulations, tools, vendors, or current APIs require fresh sources. The design doc should distinguish researched facts from assumptions. *** TODO [#B] =arch-decide=: make examples technically timeless and avoid unverifiable claims The sample ADRs include claims such as MongoDB lacking ACID for multi-document transactions "at decision time." Examples age and can teach stale facts. Replace with either clearly dated examples or domain-neutral placeholders, and require references for real technical claims in generated ADRs. *** TODO [#B] =arch-decide=: standardize statuses and immutability language The skill mixes Accepted, Decided, Deprecated, Superseded, Rejected, and "Not Accepted." Pick a canonical status set and state that accepted ADR content is not edited except for status/link metadata; changed decisions get new ADRs that supersede old ones. *** TODO [#B] =arch-design=: add threat modeling and privacy/compliance as first-class design inputs Security appears as one quality attribute, but architecture design should also ask about trust boundaries, data classification, abuse cases, privacy constraints, compliance evidence, and operational ownership. These influence architecture early and should not wait for =security-check=. *** TODO [#B] =arch-design=: separate architecture paradigms from tactical patterns The candidate table mixes paradigms (modular monolith, microservices, event-driven) with tactical or partial patterns (DDD, CQRS, event sourcing). Revise the matrix so candidates can compose patterns rather than treating each as a mutually exclusive architecture choice. *** TODO [#B] =arch-document=: strengthen quality scenarios using arc42/Q42 structure Section 10 currently says "Under [condition], the system should [response] within [measure]." Expand to a compact quality-scenario template: source, stimulus, environment, artifact, response, response measure. This better matches architecture-quality practice and makes requirements testable. *** TODO [#B] =arch-document=: add staleness and ownership metadata to generated docs arc42 docs are living documents. Add owner, source commit/date, review cadence, and "known stale when..." notes per section or in the README so generated docs do not become authoritative after the code has moved on. *** TODO [#B] =arch-evaluate=: add confidence levels for framework-agnostic findings Claude-read import graphs and public API comparisons can be incomplete in large or dynamic languages. Add confidence/provenance per finding and require "not fully checked because..." when scale or dynamic imports limit certainty. *** TODO [#B] =arch-evaluate=: report skipped tool checks explicitly The workflow says skip unconfigured language-specific tools silently, but the review checklist also wants checks run. For audit usefulness, list detected languages and "tool not configured" entries under Info instead of silent skips. *** TODO [#A] =c4-analyze= and =c4-diagram=: add notation/output fallback instead of draw.io-only C4 is notation-independent. These skills hard-require draw.io XML, PNG export, and opening draw.io desktop. Add supported outputs (Structurizr DSL, Mermaid, PlantUML, draw.io) and a fallback path when =drawio= or a GUI is unavailable. *** TODO [#B] =c4-analyze= and =c4-diagram=: clarify C4 abstraction boundaries Emphasize that C4 Containers are deployable/runnable units, not necessarily Docker containers, and that Components are not separately deployable. Add a check that every relationship and element stays at one abstraction level. *** TODO [#B] =commits.md=: split DeepSat/Linear/Slack-specific publishing rules from global commit rules The global commit rule file includes Linear status transitions and a hard-coded Slack channel. That is team-specific and may leak or misfire in unrelated projects. Move those steps to a project/team overlay, leaving global rules for author identity, attribution, commit format, review gate, and verification. *** TODO [#A] =commits.md= and publish flows: define fallback when =humanizer= is unavailable Several workflows make =humanizer= mandatory, but no =humanizer= skill exists in this repo. Either add the skill, install instructions, or a fallback plain-English pass that satisfies the same checks without an external skill. *** TODO [#B] =verification.md=: add explicit "unable to verify" reporting standard The rule says run tests/lint/typecheck/build before claiming done. Add the required final wording when a command cannot be run: command attempted, reason it could not run, risk left unverified, and the smallest next command for the user to run. *** TODO [#B] =testing.md=: add property-based and mutation testing as escalation paths The testing rules cover categories and pairwise matrices. Add guidance for property-based testing when invariants matter across broad input domains, and mutation testing when test quality is suspect despite high coverage. *** TODO [#B] =testing.md=: soften absolute TDD with an explicit spike protocol The rule currently treats TDD as non-negotiable. Keep TDD as the default, but define a disciplined spike exception: timebox, do not commit spike code, write the first failing test before productionizing the discovered approach. *** TODO [#B] =subagents.md=: add capability/availability and cost checks The rule assumes subagents exist and should handle failures. Add "if the environment lacks subagents, continue locally and preserve the same scope boundaries" plus a cost check for tasks where context handoff exceeds the work. *** TODO [#A] =languages/python/claude/rules/python-testing.md=: revisit in-memory SQLite guidance "Prefer in-memory SQLite for speed in unit tests" is risky for Django or SQLAlchemy projects whose production database is PostgreSQL/MySQL; query semantics, constraints, transactions, JSON, time zones, and indexes differ. Recommend production-like DBs for ORM/query behavior and reserve SQLite for pure unit tests that do not depend on database semantics. *** TODO [#B] =languages/python/claude/rules/python-testing.md=: separate "never mock ORM" from true unit-test boundaries For domain services, real model methods and validation are usually right. For thin orchestration units, a repository/interface fake may be cleaner than hitting a real database. Clarify the boundary: do not mock ORM internals, but do inject fakes at deliberate data-access ports. *** TODO [#B] =languages/elisp/claude/rules/elisp.md=: update editing workflow to avoid tool-specific advice The rule says prefer Write over repeated Edits. That advice is Claude-tooling specific and can conflict with environments that require patch-based edits. Rephrase around the intent: for nontrivial Elisp, make cohesive edits and run paren/byte-compile checks immediately. *** TODO [#B] =languages/elisp/claude/rules/elisp-testing.md=: add batch-mode and native-comp caveats ERT guidance is solid, but add rules for =emacs --batch= reproducibility, isolating =user-emacs-directory= / package state, and optionally catching native-comp or byte-compile warnings depending on the project's Emacs version. *** TODO [#A] =hooks/README.md=: include =destructive-bash-confirm.py= in install/settings snippets The table documents the destructive-command hook, but the manual install and settings JSON snippets only include the commit and PR hooks. Add the destructive hook to both snippets so documented installation matches the listed hooks. *** TODO [#A] =hooks/git-commit-confirm.py= and =hooks/gh-pr-create-confirm.py=: inspect message/body files =commits.md= uses =git commit -F /tmp/commit-*.md= and =gh pr create --body-file ...=. The hooks currently treat file-backed messages as unparseable or just display the file path, so attribution scanning may miss the actual committed/posted text. Read safe local files referenced by =-F=, =--file=, and =--body-file= before deciding whether the command is clean. *** TODO [#B] =hooks/destructive-bash-confirm.py=: replace regex command parsing with shell-aware parsing where possible The hook's regexes can miss quoted paths, variables, aliases, =env= wrappers, or compound commands, and can misidentify targets. Use =shlex= for simple commands, document unsupported shell constructs, and fail toward asking when a destructive pattern is ambiguous. ** TODO [#B] Build =ov-1= skill for DoDAF OV-1 (High-Level Operational Concept Graphic) Triggered by SOFWeek (May 2026, Tampa) — DeepSat attending; DoD attendees may ask for architecture diagrams. OV-1 is the universal informal currency in DoD briefings ("show me the architecture" → OV-1 by default). Priority upgrades to =[#A]= if Craig confirms scenario 2 below (personal load-bearing need at the event); stays =[#B]= or drops to =[#C]= if scenario 1 (team already covers it, future asset only). *** Prior art (searched 2026-04-19) No existing Claude Code skill exists for DoDAF / OV-1 / SV-1 / SysML. - =anthropics/skills= — 17 skills, zero DoDAF/SysML/defense coverage. - =awesome-claude-code= list — zero hits for DoDAF/OV-1/SysML/UAF. - =mfsgr/sysml2dodaf= — empty repo (0 stars, no code). Vapor. - =HowardKao-1130/mini-NEXEN= — broad SE methodology skill that name-drops DoDAF as a trigger keyword; no artifact generation. 0 stars. - =gaphor/gaphor= (Apache-2.0, 2.2k stars) — mature UML/SysML GUI modeler. Not a skill; not a pipeline. Useful reference only. Nearest prior art to lean on when building: - DoDAF 2.02 Viewpoints & Models reference (dodcio.defense.gov) — canonical OV-1 exemplars. Embed 3-5 layouts as skill =references/=. - Pattern from existing =c4-diagram= skill — same shape (prose → diagram spec), swap the viewpoint vocabulary to DoDAF. - PlantUML for SV-1 (when that skill comes later); Mermaid or draw.io XML for OV-1 lightweight visuals. *** Build scope (when triggered) *In scope:* - Input: prose description of a system + its operational context. - Output: structured OV-1 *spec* — performers, external actors (other systems, forces, adversaries), relationships (data/control flows), narrative captions, classification marking, legend requirements. - DoDAF 2.02 completeness checklist as a quality gate — verify the produced spec contains every element a correct OV-1 requires. - Optional lightweight visual: draw.io XML or Mermaid approximation for quick review; NOT a finished rendering. *Out of scope:* - Icon libraries, pictorial assets, finished PowerPoint export. OV-1 final art belongs to a designer or Craig in Visio/PowerPoint; the skill's job is the spec and the check, not the slide. - SV-1, SV-2, UAF, IDEF1X, other viewpoints. Build only when a concrete need triggers each. Estimate: 4-6 hours. *** Craig's investigation before kickoff 1. Does DeepSat's systems-engineering or marketing team already have an OV-1 (or the equivalent briefing artifact) for SOFWeek? 2. If yes (scenario 1) — skill is a future asset, not event-load-bearing. Ship after SOFWeek. Priority drops to =[#C]=. 3. If no, or if the scenario is "Craig may need to produce/iterate an OV-1 on the fly during the event" (scenario 2) — skill is load-bearing for the event. Priority upgrades to =[#A]=; build before SOFWeek. 4. Confirm the classification level the skill needs to handle (unclassified-only? or FOUO markings? affects the classification block in the spec). 5. Confirm the target rendering format DeepSat uses for OV-1 deliverables (PowerPoint slide? Cameo? Visio? affects whether the skill emits draw.io XML vs Mermaid vs pure structured spec). *** Related See also the DoD-specific notations section under the later TODO (=c4-*= rename revisit) — OV-1 is flagged there as the highest-value starting point across the DoD notation landscape (SysML, DoDAF/UAF, IDEF1X). This entry is the execution plan for that starting point. ** TODO [#A] Build =/update-skills= skill for keeping forks in sync with upstream The rulesets repo has a growing set of forks (=arch-decide= from wshobson/agents, =playwright-js= from lackeyjb/playwright-skill, =playwright-py= from anthropics/skills/webapp-testing). Over time, upstream releases fixes, new templates, or scope expansions that we'd want to pull in without losing our local modifications. A skill should handle this deliberately rather than by manual re-cloning. *** Design decisions (agreed) - *Upstream tracking:* per-fork manifest =.skill-upstream= (YAML or JSON): - =url= (GitHub URL) - =ref= (branch or tag) - =subpath= (path inside the upstream repo when it's a monorepo) - =last_synced_commit= (updated on successful sync) - *Local modifications:* 3-way merge. Requires a pristine baseline snapshot of the upstream-at-time-of-fork. Store under =.skill-upstream/baseline/= or similar; committed to the rulesets repo so the merge base is reproducible. - *Apply changes:* skill edits files directly with per-file confirmation. - *Conflict policy:* per-hunk prompt inside the skill. When a 3-way merge produces a conflict, the skill walks each conflicting hunk and asks Craig: keep-local / take-upstream / both / skip. Editor-independent; works on machines where Emacs isn't available. Fallback when baseline is missing or corrupt (can't run 3-way merge): write =.local=, =.upstream=, =.baseline= files side-by-side and surface as manual review. *** V1 Scope - [ ] Skill at =~/code/rulesets/update-skills/= - [ ] Discovery: scan sibling skill dirs for =.skill-upstream= manifests - [ ] Helper script (bash or python) to: - Clone each upstream at =ref= shallowly into =/tmp/= - Compare current skill state vs latest upstream vs stored baseline - Classify each file: =unchanged= / =upstream-only= / =local-only= / =both-changed= - For =both-changed=: run =git merge-file --stdout =; if clean, write result directly; if conflicts, parse the conflict-marker output and feed each hunk into the per-hunk prompt loop - [ ] Per-hunk prompt loop: - Show base / local / upstream side-by-side for each conflicting hunk - Ask: keep-local / take-upstream / both (concatenate) / skip (leave marker) - Assemble resolved hunks into the final file content - [ ] Per-fork summary output with file-level classification table - [ ] Per-file confirmation flow (yes / no / show-diff) BEFORE per-hunk loop - [ ] On successful sync: update =last_synced_commit= in the manifest - [ ] =--dry-run= to preview without writing *** V2+ (deferred) - [ ] Track upstream *releases* (tags) not just branches, so skill can propose "upgrade from v1.2 to v1.3" with release notes pulled in - [ ] Generate patch files as an alternative apply method (for users who prefer =git apply= / =patch= over in-place edits) - [ ] Non-interactive mode (=--non-interactive= / CI): skip conflict resolution, emit side-by-side files for later manual review - [ ] Auto-run on a schedule via Claude Code background agent - [ ] Summary of aggregate upstream activity across all forks (which forks have upstream changes waiting, which don't) - [ ] Optional editor integration: on machines with Emacs, offer =M-x smerge-ediff= as an alternate path for users who prefer ediff over per-hunk prompts *** Initial forks to enumerate (for manifest bootstrap) - [ ] =arch-decide= → =wshobson/agents= :: =plugins/documentation-generation/skills/architecture-decision-records= :: MIT - [ ] =playwright-js= → =lackeyjb/playwright-skill= :: =skills/playwright-skill= :: MIT - [ ] =playwright-py= → =anthropics/skills= :: =skills/webapp-testing= :: Apache-2.0 *** Open questions - [ ] What happens when upstream *renames* a file we fork? Skill would see "file gone from upstream, still present locally" — drop, keep, or prompt? - [ ] What happens when upstream splits into multiple forks (e.g., a plugin reshuffles its structure)? Probably out of scope for v1; manual migration. - [ ] Rate-limit / offline mode: if GitHub is unreachable, should skill fail or degrade gracefully? Likely degrade; print warning per fork. ** TODO [#B] Build /research-writer — clean-room synthesis for research-backed long-form SCHEDULED: <2026-05-15 Fri> Gap in current rulesets: between =brainstorm= (idea refinement → design doc) and =arch-document= (arc42 technical docs), there's no skill for research-backed long-form prose — blog posts, essays, white papers, proposals with data backing, article-length content with citations. Craig writes documents across many contexts (defense-contractor work, personal, technical, proposals). The gap is real. *Evaluated 2026-04-19:* ComposioHQ/awesome-claude-skills has a =content-research-writer= skill (540 lines, 14 KB) that attempts this. *Not adopting:* - Parent repo has no LICENSE file — reuse legally ambiguous - Bloated: 540 lines of prose-scaffolding with no tooling - No citation-style enforcement (APA/Chicago/IEEE/MLA) - No source-quality heuristics (primary vs secondary, peer-review, recency) - Fictional example citations in the skill itself (models the hallucination failure mode a citation-focused skill should prevent) - No citation-verification step - Overlaps with =humanizer= at polish with no composition guidance *Patterns worth lifting clean-room (from their better parts):* - Folder convention =~/writing//= with =outline.md=, =research.md=, versioned drafts, =sources/= - Section-by-section feedback loop (outline validated → per-section research validated → per-section draft validated) - Hook alternatives pattern (generate three hook variants with rationale) *Additions for the clean-room version (v1):* - Citation-style selection (APA / Chicago / MLA / IEEE / custom) with style-specific examples and a pick-one step up front - Source-quality heuristics: primary > secondary; peer-reviewed; recency thresholds by domain; publisher reputation; funding transparency - Citation-verification discipline: fetch real sources, never fabricate, mark unverifiable claims with =[citation needed]= rather than inventing - Composition hand-off to =/humanizer= at the polish stage - Classification awareness: if the working directory or context signals defense / regulated territory, flag any sentence that might touch CUI or classified material before emission *Target:* ~150-200 lines, clean-room per blanket policy. *When to build:* wait for a real research-writing task to validate the design against actual document patterns. Building preemptively risks tuning for my guess at Craig's workflow rather than his real one. Triggers that would prompt "let's build it now": - Starting a white paper / proposal that needs citation discipline - Writing a technical blog post with external references - A pattern of hitting the same research-writing friction 3+ times Upstream reference (do not vendor): ComposioHQ/awesome-claude-skills =content-research-writer/SKILL.md=. ** TODO [#C] Try Skill Seekers on a real DeepSat docs-briefing need SCHEDULED: <2026-05-15 Fri> =Skill Seekers= ([[https://github.com/yusufkaraaslan/Skill_Seekers]]) is a Python CLI + MCP server that ingests 18 source types (docs sites, PDFs, GitHub repos, YouTube videos, Confluence, Notion, OpenAPI specs, etc.) and exports to 20+ AI targets including Claude skills. MIT licensed, 12.9k stars, active as of 2026-04-12. *Evaluated: 2026-04-19 — not adopted for rulesets.* Generates *reference-style* skills (encyclopedic dumps of scraped source material), not *operational* skills (opinionated how-we-do-things content). Doesn't fit the rulesets curation pattern. *Next-trigger experiment (this TODO):* the next time a DeepSat task needs Claude briefed deeply on a specific library, API, or docs site — try: #+begin_src bash pip install skill-seekers skill-seekers create --target claude #+end_src Measure output quality vs hand-curated briefing. If usable, consider installing as a persistent tool. If output is bloated / under-structured, discard and stick with hand briefing. *Candidate first experiments (pick one from an actual need, don't invent):* - A Django ORM reference skill scoped to the version DeepSat pins - An OpenAPI-to-skill conversion for a partner-vendor API - A React hooks reference skill for the frontend team's current patterns - A specific AWS service's docs (e.g. GovCloud-flavored) *Patterns worth borrowing into rulesets even without adopting the tool:* - Enhancement-via-agent pipeline (scrape raw → LLM pass → structured SKILL.md). Applicable if we ever build internal-docs-to-skill tooling. - Multi-target export abstraction (one knowledge extraction → many output formats). Clean design for any future multi-AI-tool workflow. *Concerns to verify on actual use:* - =LICENSE= has an unfilled =[Your Name/Username]= placeholder (MIT is unambiguous, but sloppy for a 12k-star project) - Default branch is =development=, not =main= — pin with care - Heavy commercialization signals (website at skillseekersweb.com, Trendshift promo, branded badges) — license might shift later; watch - Companion =skill-seekers-configs= community repo has only 8 stars despite main's 12.9k — ecosystem thinner than headline adoption ** TODO [#C] Revisit =c4-*= rename if a second notation skill ships Current naming keeps =c4-analyze= and =c4-diagram= as-is (framework prefix encodes the notation; "C4" is a discoverable brand). Suite membership is surfaced via the description footer, not the name. If a second notation-specific skill ever lands (=uml-*=, =erd-*=, =arc42-*=), the compound pattern =arch-analyze-= / =arch-diagram-= starts paying off: alphabetical clustering under 'a' amortizes across three+ skills, and the hierarchy becomes regular. At that point, rename all notation skills together in one pass. Trigger: adding skill #2 in the notation family. Don't pre-rename. Candidate future notation skills (not yet in scope — noted for when a real need arrives, not pre-emptively): - *UML* (Unified Modeling Language): OO design notation, 14 diagram types in practice dominated by class / sequence / state / component. Common in DoD / safety-critical / enterprise-architecture contexts. Tooling: PlantUML (text-to-diagram), Mermaid UML, draw.io. Would likely split into =uml-class=, =uml-sequence=, =uml-state= rather than one monolith — different audiences, different inputs. - *ERD* (Entity-Relationship Diagram): database schema modeling — entities, attributes, cardinality. Crow's Foot notation dominates practice; Chen is academic; IDEF1X is DoD-standard. Tooling: dbdiagram.io, Mermaid ERD, PlantUML, ERAlchemy (code-to-ERD for SQL). Natural fit as =erd-analyze= (extract from schema/migrations) and =erd-diagram= (generate from prose/model definitions). - *arc42*: already partially covered by =arch-document= (which emits arc42-structured docs). A standalone =arc42-*= skill would be redundant unless the arc42-specific visualizations need separation. Each answers a different question: - C4 → "What systems exist and how do they talk, at what zoom?" - UML class/sequence → "What does the code look like / what happens when X runs?" - ERD → "What's the database shape?" - arc42 → "What's the full architecture document?" Deferred pending an actual need that's blocked on not having one of these. *** DoD-specific notations (DeepSat context) Defense-contractor work uses a narrower, different notation set than commercial software. Document the trigger conditions and starting point so a future decision to build doesn't have to re-derive the landscape. **** SysML (Systems Modeling Language) UML 2 profile, dominant in DoD systems engineering. Six diagrams account for ~all practical use: - *Block Definition Diagram (BDD)* — structural; like UML class but for system blocks (components, subsystems, hardware). - *Internal Block Diagram (IBD)* — parts within a block and how they connect (flow ports, interfaces). - *Requirement diagram* — unique to SysML; traces requirements to satisfying blocks. Essential in regulated environments. - *Activity diagram* — behavioral flow. - *State machine* — same shape as UML. - *Sequence diagram* — same shape as UML. SysML v1.x is in the field; v2 is emerging but not yet adopted at scale (as of 2026-04). Tooling dominated by Cameo Systems Modeler / MagicDraw and Enterprise Architect. Text-based option: PlantUML + =plantuml-sysml= (git-friendly, growing niche). *Candidate skills*: =sysml-bdd=, =sysml-ibd=, =sysml-requirement=, =sysml-sequence=. Three or more in this cluster triggers the =arch-*-= rename discussion from the parent entry. **** DoDAF / UAF (architecture frameworks) Not notations themselves — frameworks that specify *which* viewpoints a program must deliver. Viewpoints are rendered using UML/SysML diagrams. - *DoDAF (DoD Architecture Framework)* — legacy but still contract-required on many programs. - *UAF (Unified Architecture Framework)* — DoDAF/MODAF successor, SysML-based. Gaining adoption on newer contracts. Common required viewpoints (formal CDRL deliverables or PDR/CDR review packages): - *OV-1* — High-Level Operational Concept Graphic. The "cartoon" showing the system in operational context with icons, arrows, surrounding actors/environment. *Universally asked for — informal or formal.* Starting point for any DoD diagram skill. - *OV-2* — Operational resource flows (nodes and flows). - *OV-5a/b* — Operational activities. - *SV-1* — Systems interfaces. Maps closely to C4 Container. - *SV-2* — Systems resource flows. - *SV-4* — Systems functionality. - *SV-10b* — Systems state transitions. *Informal ask ("send me an architecture diagram") → OV-1 + SV-1 satisfies 90% of the time.* Formal CDRL asks specify the viewpoint set contractually. *C4 gap*: C4 is rare in DoD. C4 System Context ≈ OV-1 in intent but not in visual convention. C4 Container ≈ SV-1. Expect a mapping step or reviewer pushback if delivering C4-shaped artifacts to a DoD audience. *Candidate skills*: =dodaf-ov1=, =dodaf-sv1= first (highest-value); =uaf-viewpoint= if newer contracts require UAF. **** IDEF1X (data modeling) FIPS 184 — federal standard for data modeling. Used in classified DoD data systems, intelligence databases, and anywhere the government specifies the data model. Same shape language as Crow's Foot but with different adornments and notation conventions. *Rule of thumb*: classified DoD data work → IDEF1X; unclassified contractor work → Crow's Foot unless the contract specifies otherwise. *Candidate skills*: =idef1x-diagram= / =idef1x-analyze= (parallel to a future =erd-diagram= / =erd-analyze= pair). **** Tooling baseline - *Cameo Systems Modeler / MagicDraw* (Dassault) — commercial SysML dominant in DoD programs. - *Enterprise Architect (Sparx)* — widely used for UML + SysML + DoDAF. - *Rhapsody (IBM)* — SysML with code generation; strong in avionics / embedded (FACE, ARINC). - *Papyrus (Eclipse)* — open source SysML; free but clunkier. - *PlantUML + plantuml-sysml* — text-based, version-controllable. Fits a git-centric workflow better than any GUI tool. **** Highest-value starting point If DeepSat contracts regularly require architecture deliverables, the highest-ROI first skill is =dodaf-ov1= (or whatever naming convention the rename discussion lands on). OV-1 is the universal currency in briefings, proposals, and reviews; it's the one artifact that shows up in every program regardless of contract specifics. Trigger for building: an actual DoD deliverable that's blocked on not having a skill to generate or check OV-1-shaped artifacts. Don't build speculatively — defense-specific notations are narrow enough that each skill should be driven by a concrete contract need, not aspiration. ** TODO [#B] Add =make remove= for interactive ruleset removal via fzf Add a Makefile target that lists every currently-installed ruleset entry and lets me pick one or more to remove via fzf. Granular alternative to =make uninstall= (removes everything) and =make uninstall-hooks= (removes only hooks). *** Why this matters Tearing down a single skill, rule, hook, or config file currently means either running =make uninstall= and re-installing what I want to keep, or =rm=ing the symlink directly and remembering the exact path. Both are friction. An interactive picker lets me filter, multi-select with Tab, and confirm with Enter — the typical fzf flow. Costs about 3-5 seconds per teardown instead of 15+ seconds of "what's the exact name?". *** Design The recipe builds a tab-separated list of every currently-installed item, categorized by type, and pipes it to =fzf --multi=. The user filters, marks with Tab, and confirms with Enter. The recipe parses the selections and =rm=s the matching symlinks. #+begin_example skill debug rule commits.md hook destructive-bash-confirm.py config settings.json commands commands bridge claude-rules #+end_example Each line is =\t=. The recipe maps == to the right path: - =skill= → =$(SKILLS_DIR)/= - =rule= → =$(RULES_DIR)/= - =hook= → =$(HOOKS_DIR)/= - =config= → =$(CLAUDE_DIR)/= - =commands= → =$(CLAUDE_DIR)/commands= - =bridge= → =$(SKILLS_DIR)/claude-rules= Source files in =rulesets/= stay untouched. =make install= re-creates the removed links if needed (the install loop is idempotent). *** Edge cases - Esc instead of Enter → empty selection → clean exit, no removal. - Filter to nothing then Enter → same as Esc. - Selected item already gone → =rm= fails visibly, processing continues on the rest. - =fzf= not installed → fail fast with a clear error (matches the pattern used by =install-lang=). *** Possible extensions - Parallel =make pick-install= target that lists not-yet-installed items and installs the chosen ones. Symmetric UX, same fzf flow. - Confirmation prompt when more than N items selected (defense against accidental select-all). - =--source= flag that also runs =git rm= against the rulesets source for the selected item. Probably bad idea — too easy to lose work. - The =bridge → $(SKILLS_DIR)/claude-rules= entry above is stale — the bridge symlink got removed in a later commit. Drop that bullet when the recipe lands. ** TODO [#B] Document the =mcp/= install pipeline in =mcp/README.org= =mcp/= has =install.py=, =servers.json=, =secrets.env.gpg=, =gcp-oauth.keys.json= (gitignored, regenerated at install). No README. Coming back to this in three months I'll re-discover how the bundle is structured, what =install.py= does, and how to rotate tokens. Saving that re-discovery is the whole point. *** What to cover - Layout: what each file is, which are tracked vs gitignored. - Secrets bundle shape: how vars are listed in =secrets.env=, the symmetric-encryption pattern (=gpg -c --cipher-algo AES256=), the base64-bundled OAuth artifacts (=GCP_OAUTH_KEYS_JSON_B64=, =GOOGLE_DOCS_PERSONAL_TOKEN_B64=, =GOOGLE_DOCS_WORK_TOKEN_B64=). - Install flow: =make install-mcp= → =install.py= decrypts, writes the keys file and Google Docs token caches at mode 600, expands =${VAR}= in =servers.json=, calls =claude mcp add --scope user= for unregistered servers. Idempotent. - Token rotation: when a refresh token gets revoked, the recovery flow (re-auth on one machine, re-bundle, recommit). - Adding a new server: edit =servers.json=, add any new =${VAR}= placeholders to the bundle, re-encrypt. - The OAuth dance for HTTP-transport servers (linear, notion) versus stdio (google-docs-*) — different paths, different gotchas. ** TODO [#C] Add =make uninstall-mcp= + =mcp/install.py --check= for symmetry Currently the MCP install pipeline only flows one direction. No way to remove rulesets-managed MCP servers in one command. No way to ask "what's the drift between =servers.json= and =claude mcp list=" without eyeballing. *** =make uninstall-mcp= Iterate over =servers.json=, run =claude mcp remove -s user= for each. Ignore "not registered" errors. Idempotent. *** =mcp/install.py --check= Dry-run mode. Decrypt secrets, but instead of registering, print the drift report: - Servers in =servers.json= not in =claude mcp list= → =MISSING= - Servers in =claude mcp list= not in =servers.json= → =EXTRA= - Servers in both → =ok= Useful for diagnosing connection failures and for the eventual =make doctor= integration. ** TODO [#C] Update =README.org= with MCP install pipeline section =README.org= covers global install, per-project language bundles, and design principles, but doesn't mention =make install-mcp= or the =mcp/= directory. Add a short section after "Per-project language bundles" describing the user-scope MCP install pattern (decrypt → expand → register) and pointing at the eventual =mcp/README.org=. ** TODO [#C] Token-rotation helper for =@a-bonus/google-docs-mcp= OAuth refresh When a Google refresh token gets revoked (re-grant scopes, removed Connected App, account password reset), recovery is currently manual: run =npx -y @a-bonus/google-docs-mcp= with the right env, follow the URL in a browser, kill the process, base64-encode the new =token.json=, decrypt =secrets.env.gpg=, replace the var, re-encrypt. A small =mcp/refresh-google-docs-token.sh = would chain that into one command. *** Sketch #+begin_src bash # usage: mcp/refresh-google-docs-token.sh personal profile="$1" gpg -d ... | grep -v "GOOGLE_DOCS_${profile^^}_TOKEN_B64" > /tmp/secrets.env.tmp GOOGLE_MCP_PROFILE="$profile" npx -y @a-bonus/google-docs-mcp & xdg-open # wait for ~/.config/google-docs-mcp/$profile/token.json to land kill %1 echo "GOOGLE_DOCS_${profile^^}_TOKEN_B64=$(base64 -w0 ~/.config/google-docs-mcp/$profile/token.json)" >> /tmp/secrets.env.tmp gpg -c --cipher-algo AES256 -o mcp/secrets.env.gpg.new /tmp/secrets.env.tmp mv mcp/secrets.env.gpg.new mcp/secrets.env.gpg rm /tmp/secrets.env.tmp #+end_src The flow tonight worked but took a handful of manual steps. One script collapses it. ** TODO [#C] Decide on category-3 rule copies in the deepsat tree While symlinking personal-project =.claude/rules/= mirrors to the rulesets canonical on 2026-05-07, two locations didn't fit the "personal mirror → symlink" pattern and were left untouched pending judgment: - =~/projects/work/deepsat/code/coding-rulesets/claude-rules/{testing,verification}.md= — looks like a vendored team-shared copy. - =~/projects/work/deepsat/code/orchestration_dashboard_mvp/.claude/rules/{testing,verification}.md= — could be project-specific overrides. For each: read the file, diff against the rulesets canonical, decide whether it's an intentional diverge (leave alone), stale (sync content), or should canonicalize (replace with symlink and accept the cross-repo dependency). The orchestration_dashboard_mvp pair is the project where Vrezh's PR review surfaced this whole thread, so any decision there has team-visibility implications. ** TODO [#C] Audit language-specific rule files for cross-project duplication The four canonical rules (=commits=, =testing=, =verification=, =subagents=) are now symlinked across the five personal-project mirrors as of 2026-05-07. But several language-specific rule files exist in multiple project mirrors and may be duplicated or drifted: - =python-testing.md= in =~/projects/work/.claude/rules/= - =typescript-testing.md= in =~/projects/work/deepsat/code/.claude/rules/= - =elisp-testing.md= and =elisp.md= in =~/.emacs.d/=, =~/code/gloss/=, =~/code/chime/= The Elisp pair is the most suspicious — three repos using essentially the same rules. Audit: diff these across the projects, check for drift, then decide whether to canonicalize them under =~/code/rulesets/claude-rules/languages//= and symlink, or leave them as project-local. ** TODO [#B] Fold =claude-templates= into rulesets Two repos, one source of truth. =~/projects/claude-templates/= is the canonical =.ai/= template that gets rsync'd into every project at session start. Keeping it standalone means a second =git pull= in startup Phase A.0, a second remote to push to at wrap-up, and a split history any time a change touches both. Folding it into =rulesets/claude-templates/= gives one repo to clone on a fresh machine and one place to edit templates. *** Open design choices - *History.* =git subtree add --prefix=claude-templates ~/projects/claude-templates main= preserves the 84-commit history under the new prefix. Plain content copy (=cp -a= + =git add=) is simpler but loses history. Either is fine since the standalone repo stays archived on =cjennings.net=. - *Layout.* =rulesets/claude-templates/= mirrors the old repo name and sits next to =claude-rules/= cleanly. Alternative: absorb =.ai/= directly under a different name (=rulesets/.ai-template/= or similar). First option is clearer. - *bin/ai.* The standalone Makefile symlinks =$HOME/.local/bin/ai → bin/ai=. After the move, fold that into rulesets' Makefile as another install target. *** Mechanical steps 1. Subtree-merge or copy =~/projects/claude-templates/= into =rulesets/claude-templates/=. 2. Update 3 references in rulesets: - =.ai/protocols.org= line 163 — pointer in the "Let's run/do the X workflow" section. - =.ai/workflows/cross-agent-comms.org= line 8 — promotion-target path. - =.ai/workflows/startup.org= lines 22, 96-98 — Phase A.0 pull + Phase A rsync sources. 3. Update Phase A.0 of =startup.org= to pull rulesets instead of claude-templates. Inside rulesets sessions, the existing project-repo pull already covers it. Outside rulesets (every other project's session), Phase A.0 needs an explicit =git pull= on =~/code/rulesets/= before the rsync — otherwise the templates will be stale. 4. Replace =~/projects/claude-templates/= with a symlink to =~/code/rulesets/claude-templates/= for transition continuity. 5. After every active project has had one session start (and rsync'd the new =startup.org=), drop the symlink and archive =cjennings.net:git/claude-templates.git=. *** Bootstrap gap Every project on the machine has a =.ai/workflows/startup.org= that rsyncs from =~/projects/claude-templates/=. Until each project's startup.org gets refreshed (which happens via the rsync itself), the old path needs to keep resolving. The symlink at step 4 is the bridge: old paths resolve into the new location, the rsync delivers the updated startup.org, next session uses the new path directly. ** TODO [#B] Add =make audit= — drift detector across all =.ai/=-using projects Companion to =make doctor= (single-machine scope, checks =~/.claude/=). =audit= is cross-project scope: walks every directory on the machine that has a =.ai/=, diffs the synced template files against the canonical source, and reports drift. Catches stale projects without forcing a session start in each one. *** Open design choices - *Scope.* Template-sync drift is the useful flavor: for each project, diff =.ai/protocols.org=, =.ai/workflows/=, =.ai/scripts/= against the canonical source and report =ok= / =behind= / =diverged=. Other interpretations (per-project health check, alias for =doctor=) add less value. - *Source path.* Today: =~/projects/claude-templates/.ai/=. After the "Fold claude-templates into rulesets" task lands: =~/code/rulesets/claude-templates/.ai/=. Build =audit= against whichever path is canonical when the work happens. - *Project discovery.* Walk =~/code/=, =~/projects/=, =~/.emacs.d/= up to depth 3 for any directory containing =.ai/=. Skip the canonical source itself. - *Output and exit code.* Per-project line: =ok=, =behind = (canonical newer, rsync would update), =diverged= (project has local edits an rsync would overwrite). Exit 0 on all-=ok=, 1 on any =diverged=. *** Why not extend =make doctor= instead =doctor= currently has a clean meaning: "is this machine's =~/.claude/= consistent with rulesets?" Mixing in cross-project =.ai/= drift muddies the exit code. Keep them separate; a future =make all-checks= can wrap both. ** TODO [#C] Refactor =daily-prep.org= to delegate to =triage-intake.org= for the triage section =daily-prep.org= still does its own inline triage (Gmail × 3 accounts, Slack, Linear, GHE PRs, calendars) as part of the full prep flow. Now that =triage-intake.org= exists as a standalone scan over the same source set, daily-prep could call it and consume its synthesis instead of duplicating the source-scan logic — DRYs up a 57k-line workflow and keeps both flows in sync when sources change. Scope: - Identify the sections in =daily-prep.org= that do the inline triage (the email / Slack / Linear / PR / calendar fan-out, plus the "Sources checked: ..." footer at the top of each generated prep doc). - Replace those sections with "run =triage-intake.org=" and adapt the downstream sections (Heads-up, Day's Priorities, Carry-forwards) to read triage-intake's synthesis output rather than the inline scan results. - Verify the generated prep doc still has the same shape (Heads-up + Day's Priorities + Carry-forwards + Sources checked). Origin: came up while authoring =triage-intake.org= on 2026-05-11. * Rulesets Resolved ** DONE [#A] Add =make doctor= — verify ~/.claude/ matches repo + settings.json :feature: A drift detector that scans =~/.claude/= and reports anything inconsistent with what the repo expects. Single-command answer to "is my machine consistent with rulesets?" *** Why this matters A 2026-05-06 sweep found =~/.claude/hooks/= didn't exist on this machine even though =settings.json= referenced =~/.claude/hooks/precompact-priorities.sh= as a PreCompact hook. Compaction would have silently failed to invoke the hook. The fix was =make install-hooks=, but the breakage was invisible until I happened to grep for it. =make doctor= run regularly (or even as part of session start) would catch this kind of drift in seconds instead of after the fact. *** Checks - Every entry in =settings.json= ="hooks"= block points at a file that exists. - Every entry in =enabledPlugins= has a matching install under =~/.claude/plugins/data/=. - Every skill in =$(SKILLS)= has a working symlink at =~/.claude/skills/=. - Every rule in =$(RULES)= has a working symlink at =~/.claude/rules/=. - Every default hook has a symlink at =~/.claude/hooks/= (warn-only — opt-out is legitimate). - =settings.json= and =.mcp.json= symlinks resolve to the rulesets versions. - =mcp/install.py= state matches =claude mcp list= (every server in =servers.json= is registered). - No dangling symlinks anywhere under =~/.claude/=. *** Output One line per check: =ok= / =WARN= / =FAIL=. Final summary: =N ok, M warnings, K failures=. Exit non-zero on any failure so it can ride a pre-flight check. ** DONE [#A] Build =voice= skill — combine =humanizer= with universal + personal style passes :feature: Combine =humanizer= with universal good-writing passes (Strunk & White, Orwell, Plain English) and the personal-style passes from =commits.md=. Two modes — =general= for arbitrary writing, =personal= for commits/PRs/comments — share a foundation and diverge on register. Built and shipped 2026-05-07: =voice/SKILL.md= with 39 numbered patterns walked sequentially. Patterns 1-25 carried over from humanizer, 26-31 are universal good-writing additions, 32-39 are personal-only. Migrated three callers (=commits.md=, =respond-to-cj-comments.md=, =start-work.md=). Removed the standalone =humanizer= skill since voice supersedes it. *** Why this matters Three transformations want to run together for personal-mode artifacts (commits, PR titles + bodies, PR comments) but lived in three places: =humanizer= as a skill, S&W-style universal rules nowhere (applied ad-hoc), and the personal-style passes as prose steps in =commits.md= that got re-applied by hand each time. Costs: (1) the "I forgot pass (e)" failure mode — skipping a pass without flagging is a defect but happens in practice. (2) No single-call invocation of the full transform. (3) General-mode writing (research notes, philosophy, history) got only humanizer with no universal-prose pass at all. Combining brings them under one skill with one invocation. *** Design Two modes: - *general* (default) — for arbitrary writing not bound for commit/PR/comment publishing (research notes, philosophy/history essays, emails, README prose). Runs: - humanizer (current behavior — strip AI-generated-writing fingerprints) - tier-1 universal passes (canonical good-writing rules) - the 2 personal-style passes that have no register conflict (jargon-fragment rewrite, noun-ified verbs) - *personal* — for commits, PR titles + bodies, PR comments. Runs general PLUS: - 8 personal-only passes (first-person rewrite, semicolons, contractions, sentence-split, felt-experience, sentence fragments, terse cut, public-artifact scope check) The 8 personal-only passes are explicitly *not* in general mode. They conflict with academic / literary / philosophical register. Forcing first-person on a Foucault essay or stripping felt-experience from a journal entry would damage the writing. *** Tier 1 universals (v1) From Strunk & White, Orwell's "Politics and the English Language", Plain English Campaign, and Garner's Modern English Usage. Each is a detection-pattern + rewrite-rule pair, mechanical enough to apply consistently across runs. - *Omit needless words* — curated phrase list (=the fact that= → =that=/=because=, =in order to= → =to=, =at this point in time= → =now=, =due to the fact that= → =because=, =for the purpose of= → =to=, =in spite of= → =despite=, etc.) - *Long word → short word* — Plain English wordlist (~150 entries: =utilize=→=use=, =commence=→=start=, =terminate=→=end=, =facilitate=→=help=, =demonstrate=→=show=, =sufficient=→=enough=, =prior to=→=before=, =subsequent to=→=after=, =in the event that=→=if=, =a great deal of=→=much=) - *Active over passive voice* — detect "to be + past-participle" patterns. Suggestion-only in v1 (auto-rewrite is risky in technical contexts where passive is appropriate); graduate to auto-rewrite for unambiguous cases in v2. - *Comma splices* — detect independent clauses joined only by comma; rewrite to period or semicolon-then-period. - *Cliché flag* — small curated list (=at the end of the day=, =moving forward=, =going forward=, =at this juncture=, =circle back=, =low-hanging fruit=, =deep dive=, =leverage= as verb). *** Tier 2 universals (v2) - *Positive over negative form* (S&W) — =not unlike= → =like=, =do not fail to= → =remember to=, =did not pay any attention= → =ignored= - *Garner-style word-pair corrections* — comprise/compose, less/fewer, that/which (restrictive vs nonrestrictive), affect/effect, principal/principle - *Parallelism in lists* — detect mismatched grammar in bullet items - *Tense consistency* — flag mid-paragraph tense shifts - *Acronym definition on first use* — detect uppercase tokens used before being expanded *** Tier 3 (v3, may not land) - *Concrete-over-abstract* preference - *Emphatic word at sentence end* (S&W rule 18) - *Vary sentence length / rhythm* - *Reading-grade-level scoring* (Hemingway-style) *** Personal-style pass placement | # | Pass | Mode | Why | |---|------|------|-----| | 1 | First-person voice rewrite | personal only | Forces "I" voice; wrong for academic prose where third-person and "we" are conventional | | 2 | Jargon-fragment → complete sentence | both | Universal clarity, no genre conflict | | 3 | Semicolon → period/comma | personal only | Semicolons are conventional in long-form / academic prose | | 4 | Contractions ("it's", "don't") | personal only | Academic and formal writing typically avoids contractions | | 5 | Sentence split on conjunctions | personal only | Foucault, Hegel, Adorno deliberately use long compound sentences | | 6 | Felt-experience narration ("I'll feel this every time") | personal only | Personal essays *use* felt-experience as content | | 7 | Noun-ified verbs ("the ask", "a learn", "the spend") | both | Targets corporate-speak with curated wordlist; doesn't catch philosophical nominalizations like "the becoming" | | 8 | Sentence fragments → complete (in prose) | personal only | Fragments are valid stylistic devices in literary prose | | 9 | Terse cut (rhetorical padding: "worth noting", "it's important to understand") | personal only | Tier 1 omit-needless-words covers the worst offenders universally; aggressive cut conflicts with academic register | | 10 | Public-artifact scope check (local paths, private repos, personal tooling) | personal only — *flag-only*, no auto-rewrite | Operational/safety check, not stylistic; auto-masking risks silently editing meaningful text | *** Inclusive-language pass — explicitly excluded Considered and rejected. Conflicts with planned writing on philosophy/history topics (Foucault on sexuality and gender, history of slavery in New Orleans). Wordlist substitutions would override deliberate vocabulary choices in those genres. *** V1 scope - [ ] Skill at =~/code/rulesets/voice/= with =SKILL.md= - [ ] Frontmatter with positive triggers (commit, PR, comment, "humanize", "voice pass") and negative triggers (code, structured data, plain bullet lists) - [X] Mode invocation: default = =general= when invoked bare; =personal= invoked explicitly by publish-context callers - [X] humanizer content migrated from =humanizer/= → =voice/= - [X] Tier 1 universal passes implemented (5 patterns: #26-30, plus #31 noun-ified verbs as a universal personal addition) - [X] 2 personal passes that run in both modes (#30 jargon-fragment, #31 noun-ified verbs) - [X] 8 personal passes that run in personal mode only (#32 first-person, #33 semicolons, #34 contractions, #35 sentence-split, #36 felt-experience, #37 fragments, #38 terse cut, #39 scope check) - [X] Each pass = detection-pattern + rewrite-rule pair (#39 is detection + flag-only) - [X] Total v1 pattern count: 31 in general mode (humanizer's 25 + 4 tier-1 + 2 universal personal); +8 personal-only = 39 in personal mode - [X] Update =commits.md= to invoke =/voice personal= instead of "run =humanizer= and apply five passes manually" - [X] Remove the existing =humanizer/= skill (no callers outside this repo, all migrated) - [X] =make doctor= still passes - [X] =make lint= clean *** v2 (deferred) - [ ] Tier 2 universals (positive form, word-pair corrections, parallelism, tense consistency, acronym definition) - [ ] Per-pass severity flags for Tier 1 active-voice (suggestion-only when actor is implicit; auto-rewrite when actor is named) - [ ] Reporting mode: list which passes fired and which were no-ops *** v3 (aspirational, may not land) - [ ] Tier 3 (concrete-over-abstract, emphatic-word position, sentence-length variation, reading-grade scoring) - [ ] Progressive disclosure split: =voice/SKILL.md= orchestrator + =voice/passes/.md= per pass with worked examples *** Migration (resolved) Decision: deleted =humanizer/= entirely. Three callers (=commits.md=, =respond-to-cj-comments.md=, =start-work.md=) all updated to invoke =/voice= directly. No alias needed since nothing outside the repo invoked humanizer. *** Naming alternatives considered - =voice= — chosen. Captures both modes; broad enough. - =polish= — descriptive of multi-pass nature; less prescriptive about whose voice. - =house-style= — signals "this is the house style"; appropriate for personal repo. - =commit-voice= — too narrow (passes apply to research notes, emails, etc. in general mode). - =humanize= (extending current) — undersells the universal + personal additions. *** Open questions before implementation Resolved during implementation: - Default mode when =/voice= is invoked bare: =general=. Personal-context callers (=commits.md= publish flow, =respond-to-cj-comments.md=) invoke =/voice personal= explicitly. Avoids accidentally first-person-ifying research notes. - Reporting: skill prints "Summary of changes" listing which patterns fired (audit value). - Public-artifact scope check (#39): flag-only, user resolves manually. Blocking would frustrate on legitimate path mentions. - Tier 1 active-voice detection: suggestion-only in v1. Auto-rewrite for unambiguous cases deferred to v2. ** DONE [#B] Add =--archive-done= mode to =.ai/scripts/todo-cleanup.el= :feature: Opt-in mode that moves every level-2 subtree whose TODO state is DONE or CANCELLED out of the "Open Work" section and into the "Resolved" section of the same org file, subtree intact. - *Section matching.* Key on a top-level heading containing "Open Work" and one containing "Resolved" — that pairing is the only naming consistent across projects (=Work Open Work= / =Work Resolved= here; bare =Open Work= / =Resolved= elsewhere). Require exactly one match for each; otherwise skip with a clear message, no crash. - *Modes.* =--check= previews and writes nothing, same as the existing hygiene pass. Idempotent. Not run by default in the wrap-up flow — archiving is consequential, so it stays opt-in: =emacs --batch -q -l todo-cleanup.el --archive-done FILE=. - *Edge cases.* Source or target section missing; subtree at EOF; nested DONE subtree under an open parent stays put (only level-2 entries move); nothing to move → clean no-op. - *Tests.* TDD with ERT — the project's first elisp tests. Fixtures (synthetic) under =.ai/scripts/tests/=; run via =make test= (rulesets) or =make test-scripts= (claude-templates), which run pytest + every =tests/test-*.el= ERT suite. Cases: one DONE level-2 moves; multiple; CANCELLED also moves; structural (no-state) headings don't move; nested DONE under an open parent stays; level-2 DONE with open level-3 children moves intact; subtree at EOF; missing source/target section; ambiguous "Resolved"; lowercase headings; nothing-to-do; idempotency; =--check= preview + its idempotency; realistic-sample integration. Origin: came up while scrubbing a project's todo.org on 2026-05-11 — moving a big completed PROJECT subtree (plus a few smaller ones) into the Resolved section by hand was the cue to build a reusable tool. Built and shipped 2026-05-11: =--archive-done= added to =.ai/scripts/todo-cleanup.el= test-first; 13-test ERT suite (=tests/test-todo-cleanup.el=) + realistic synthetic fixture (=tests/fixtures/todo-sample.org=), wired into =make test= / =make test-scripts= alongside pytest. The CLI dispatch moved into =tc-main= behind a guard so the suite can =require= the file without firing it. Section matching is case-insensitive and tolerates the = Open Work= / = Resolved= naming variants. Opt-in only — not wired into the wrap-up flow. Source of truth is =~/projects/claude-templates/=; rsync'd into this repo.