#+TITLE: Rulesets — Open Work
#+AUTHOR: Craig Jennings
#+DATE: 2026-04-19

Tracking TODOs for the rulesets repo that span more than one commit.
Project-scoped (not the global =~/sync/org/roam/inbox.org= list).

* Rulesets Open Work

** TODO [#A] Check that memories are sync'd across machines via git.m
#+begin_src cj: comment
this means we need to link the memory file in ~/.claude if it's not already 
#+end_src
** TODO [#B] Document rulesets + claude-templates pull-before-project ordering in protocols.org

Startup currently pulls claude-templates in Phase A.0 and fast-forwards the
project repo, but the rulesets repo (=~/code/rulesets/=) isn't pulled at all
-- rule changes there don't reach the agent without a manual pull. The
ordering and the "resolve any issues before proceeding" expectation also live
in =startup.org= rather than =protocols.org= (the single entry point).

Required ordering: rulesets first, then claude-templates, then local project.
Resolve dirty-tree / merge issues at each step before moving on. Goal: every
session starts against the freshest behavioral rules and workflow templates,
not a stale local snapshot.

Changes needed:
1. Add a rulesets pull step to =startup.org= Phase A.0, mirroring the
   existing claude-templates ff-only pull logic.
2. State the ordering and the "resolve before proceeding" rule early in
   =protocols.org= itself, not buried in a workflow file.

** TODO [#A] Build =create-documentation= skill for high-quality project/product docs

Create a Claude skill named =create-documentation= that can plan, write,
refresh, and review software documentation across README files, project docs,
developer guides, API docs, operational docs, and generated/published doc
sites.

This is broader than =arch-document=. =arch-document= should remain the
architecture-specific arc42 skill. =create-documentation= should know when to
delegate to it for architecture documentation, but its main job is the full
documentation system around a product or repo: onboarding, tutorials, how-to
guides, reference, explanation, operations, troubleshooting, contribution,
release/upgrade, and publication format.

*** Why this matters

The repo currently has strong skills for architecture, testing, review,
debugging, and workflow. It does not have a general documentation skill that:

- Chooses the right documentation type for the user need.
- Audits existing docs against code and expected user journeys.
- Creates a coherent doc map instead of dumping everything into =README.md=.
- Writes in a consistent technical style.
- Decides source/publish format intentionally (=.md=, =.org=, generated
  =.html=, OpenAPI, etc.).
- Treats docs as a maintained product surface with verification, ownership,
  navigation, accessibility, and freshness checks.

*** Research notes

**** Documentation frameworks and best-practice sources

- Diataxis separates documentation by reader need:
  - Tutorials: learning-oriented, take the reader by the hand.
  - How-to guides: task-oriented, solve a specific real problem.
  - Reference: information-oriented, accurate and complete lookup material.
  - Explanation: understanding-oriented, concepts, background, tradeoffs.
  Source: [[https://diataxis.fr/][Diataxis]] and the official guidance around
  tutorials/how-to/reference/explanation.
- Django explicitly documents this same organization and teaches readers how
  to navigate it: tutorials for beginners, topic guides for concepts,
  reference for APIs, how-to guides for recipes. This is a major reason the
  docs feel navigable despite large scope.
  Source: [[https://docs.djangoproject.com/en/5.2/][Django documentation]]
- Kubernetes separates concepts, tasks, tutorials, and reference. It also has
  current/previous-version docs, localization, contribution paths, and
  task-focused landing pages. Its docs are good at answering "what is this?"
  separately from "how do I do one thing?"
  Sources: [[https://kubernetes.io/docs/home/][Kubernetes docs home]],
  [[https://kubernetes.io/docs/tasks/][Kubernetes tasks]],
  [[https://kubernetes.io/docs/tutorials/][Kubernetes tutorials]]
- Write the Docs emphasizes docs that are precursory, participatory,
  exemplary, consistent, current, discoverable, addressable, cumulative, and
  comprehensive. Especially important: incorrect docs are worse than missing
  docs, and examples should cover common use cases without overwhelming the
  reference.
  Source: [[https://www.writethedocs.org/guide/writing/docs-principles/][Write the Docs principles]]
- Google developer docs guidance emphasizes project-specific style first,
  clarity and consistency, conversational but not frivolous tone, active voice,
  second person, descriptive links, global audience, accessibility, sentence
  case headings, numbered lists for procedures, code font for code, and alt
  text for images.
  Sources: [[https://developers.google.com/style/][Google developer documentation style guide]],
  [[https://developers.google.com/style/highlights][Google style highlights]],
  [[https://developers.google.com/style/accessibility][Google accessible docs]]
- Google's doc best-practices page adds a pragmatic maintenance principle:
  minimum viable documentation, update docs with code, delete dead docs, prefer
  good over perfect, tell the story of code, and avoid duplication.
  Source: [[https://google.github.io/styleguide/docguide/best_practices.html][Google documentation best practices]]
- The Good Docs Project is useful as a template source, especially for
  README, how-to, tutorial, concept, reference, troubleshooting, contributor,
  and release-note patterns. Do not vendor wholesale; use as prior art.
  Source: [[https://www.thegooddocsproject.dev/][The Good Docs Project]]

**** Praised project docs to analyze and steal from

***** Django

Why it works:
- It labels the doc types directly and explains when to use each.
- It has a beginner path, advanced tutorials, topic guides, API reference,
  how-to recipes, deployment, security, testing, release notes, and community
  help in one coherent index.
- It is versioned, so readers know which framework version the docs target.
- It cross-links introductory material to deeper references without making the
  first page a wall of every detail.

Patterns to use:
- Make the top-level docs home a routing page by reader intent.
- Put "How these docs are organized" near the top when the doc set is large.
- Split concept, task, tutorial, and reference instead of mixing them.
- Include "getting help" and "not found?" paths so the docs have an exit ramp.

Source: [[https://docs.djangoproject.com/en/5.2/][Django documentation]]

***** Kubernetes

Why it works:
- It has a large, complex product but maintains separate lanes for Concepts,
  Tasks, Tutorials, Reference, and Contribute.
- Task pages are short sequences for one operation; tutorials are larger goals
  with several sections. This prevents "one page tries to teach everything."
- It exposes version state clearly, including static old versions and current
  docs.
- It supports localization and documentation contribution, which makes the
  docs a product surface rather than a side artifact.

Patterns to use:
- For platform or infrastructure docs, include Concepts / Tasks / Tutorials /
  Reference as first-class folders.
- Create version/freshness metadata when docs are tied to released software.
- Add doc contribution guidance for projects with external contributors.
- Make operational tasks discoverable by category, not just search.

Sources: [[https://kubernetes.io/docs/home/][Kubernetes docs home]],
[[https://kubernetes.io/docs/tasks/][Kubernetes tasks]]

***** Rust

Why it works:
- Rust has a "bookshelf" rather than one overloaded manual: The Book, Rust by
  Example, standard library API reference, Reference, Cargo Guide, Error Index,
  Rustonomicon, release notes, platform support, policies, etc.
- The learning path is honest about audience: "assume programmed before, not in
  any specific language."
- Reference and learning material are separated. Advanced unsafe guidance gets
  its own book.
- Offline docs via =rustup doc= are treated as part of the product.

Patterns to use:
- For broad ecosystems, create a documentation bookshelf rather than a single
  mega-doc.
- Separate beginner path, examples, formal reference, advanced/unsafe topics,
  tooling docs, error index, release notes, and policies.
- Document assumptions about reader experience.
- Consider offline/local docs for CLI/library ecosystems.

Source: [[https://doc.rust-lang.org/][Rust documentation]]

***** Stripe API docs

Why it works:
- The API reference is organized around resources and common cross-cutting
  concerns: authentication, errors, idempotency, pagination, request IDs,
  versioning, metadata, connected accounts.
- It pairs prose with concrete request/response examples and client-library
  language selection.
- It exposes test-mode vs live-mode distinctions early.
- It offers "Copy for LLM" / "View as Markdown", which acknowledges modern
  consumption patterns without sacrificing normal docs UX.
- Its reputation comes from matching developer mental models and making the
  common path implementable quickly, not just visual polish.

Patterns to use:
- API docs should be generated from or checked against OpenAPI/JSON schema or
  source annotations wherever possible.
- Keep cross-cutting API behavior near the front, before endpoint lists.
- Include runnable examples, auth, errors, pagination, versioning, idempotency,
  and sandbox/test data.
- Consider LLM-friendly exports (=llms.txt=, "view as Markdown", stable
  anchors), but do not make the docs only for AI.

Source: [[https://docs.stripe.com/api][Stripe API Reference]]

***** FastAPI

Why it works:
- Documentation is part of the framework's value proposition: OpenAPI and JSON
  Schema drive interactive Swagger UI and ReDoc automatically.
- It reduces manual drift for API reference by deriving docs from typed code.
- It integrates examples and tutorial-style explanations with standards-based
  generated reference.

Patterns to use:
- Prefer generated API reference from code/specs over hand-maintained endpoint
  tables.
- Generated docs need human-written overview, concepts, authentication,
  examples, and operational guidance around them.
- The skill should identify when an OpenAPI/Swagger/ReDoc/Scalar route already
  exists and improve metadata/schema quality instead of creating duplicate
  manual docs.

Source: [[https://fastapi.tiangolo.com/features/][FastAPI features]]

*** Format and presentation decisions

**** Default source format: Markdown

Use =.md= as the default for shared project documentation when:
- The repo is on GitHub/GitLab/Forgejo and readers browse docs in the web UI.
- The project already uses MkDocs, Docusaurus, VitePress, Sphinx+MyST,
  Jekyll, GitHub Pages, or plain README-driven docs.
- Contributors are expected to edit docs without Emacs-specific tooling.
- The docs need easy static-site publishing.
- The content is README, tutorial, how-to, reference, troubleshooting,
  contributing, release notes, runbooks, or ordinary prose + code blocks.

Markdown source works well because it is low-friction, reviewable in diffs,
rendered by repository hosts, and supported by documentation site generators.
MkDocs is a good reference point: Markdown source, YAML config, built-in dev
server, static HTML output, and easy hosting.
Source: [[https://www.mkdocs.org/][MkDocs]]

**** Use Org when the document is Emacs-native or personal/planning-heavy

Use =.org= when:
- The user's workflow is explicitly Emacs/org-mode.
- The document contains TODO states, schedules, priorities, tags, agenda
  integration, property drawers, clocking, or personal planning.
- The document is an internal strategy/planning artifact such as V2MOM,
  research notes, meeting notes, task triage, or a living personal operating
  document.
- The output may later be exported, but the source of truth is intended to be
  edited in org-mode.

Do not default team-facing documentation to =.org= unless the team already uses
org-mode. Org can export to HTML, but that does not make it the right authoring
format for non-Emacs contributors.
Sources: [[https://orgmode.org/org.html][Org manual]],
[[https://orgmode.org/worg/org-tutorials/org-publish-html-tutorial.html][Org publish HTML tutorial]]

**** Use HTML as generated/published output, rarely as hand-authored source

Use =.html= when:
- The deliverable is a published static documentation site.
- The document needs interactive widgets, embedded API consoles, custom layout,
  or generated navigation/search.
- The project already publishes docs as a website.
- The target audience needs searchable, browsable, linkable pages rather than
  repo-local files.

Prefer generated HTML from Markdown/Org/reStructuredText/AsciiDoc/OpenAPI over
hand-authored HTML. Hand-edit HTML only for standalone artifacts, custom landing
pages, or cases where the project already treats HTML templates as docs source.

**** Consider generated/spec-backed formats

Use generated reference when possible:
- API reference: OpenAPI/Swagger/ReDoc/Scalar from code/spec.
- CLI reference: generated from command parser/help output.
- Library API reference: language-native doc tools such as rustdoc, pydoc,
  TypeDoc, JSDoc, Go doc, Sphinx autodoc, etc.
- Config reference: generated from schema, types, or validated defaults.

The skill should not duplicate generated reference by hand. It should improve
source comments, schema descriptions, examples, front matter, and surrounding
guides.

**** Presentation requirements

Every generated doc set should have:
- A docs home or README that routes by reader intent.
- Stable headings and anchors for addressability.
- Descriptive link text, no "click here."
- Search/navigation plan when docs exceed a handful of pages.
- Version/freshness metadata when tied to released software.
- Ownership/review cadence for docs likely to rot.
- Accessible structure: semantic headings, alt text, no image-only info,
  tables only when appropriate, left-aligned text, readable code blocks.
- Copyable commands and code examples.
- "What changed?" / release notes / migration path when docs describe a new or
  changed behavior.
- Troubleshooting path for common failures.
- Clear prerequisites before procedures.
- Verification steps after procedures.
- Support/escalation path when the docs do not answer the question.
- Optional LLM-friendly surfaces for larger doc sets: =llms.txt=,
  "copy as Markdown" equivalents, concise page summaries, and stable anchors.

*** Proposed skill design

**** Skill name and trigger

Name: =create-documentation=

Trigger when the user asks to:
- create documentation, docs, README, guide, manual, runbook, tutorial,
  quickstart, API docs, CLI docs, troubleshooting docs, contributor docs,
  architecture-adjacent docs, release notes, upgrade guide, or doc site;
- improve, audit, reorganize, or publish existing docs;
- decide documentation structure or format for a project.

Do not trigger for:
- architecture-only arc42 docs when =arch-document= is the direct fit;
- ADR creation (=arch-decide=);
- design docs before implementation shape is known (=brainstorm= or
  =arch-design=);
- prose polishing only (future writing/humanizer skill);
- inline code comments/docstrings only, unless the user asks to create docs
  from them.

**** V1 should be one orchestrating skill, not many separate skills

Build v1 as one skill with explicit phases and subcommands rather than a set
of separate skills. Rationale:
- Documentation tasks often start ambiguous; the first job is classification.
- Splitting too early creates command-discovery burden.
- A single skill can dispatch to existing specialized skills
  (=arch-document=, =c4-diagram=, =security-check=, =playwright-js/py= for
  doc-site verification) without making users choose the internal pipeline.

Support discoverable subcommands inside one skill:

#+begin_example
/create-documentation audit <path>
/create-documentation plan <path-or-scope>
/create-documentation write <doc-type> <scope>
/create-documentation refresh <path>
/create-documentation publish <path>
/create-documentation review <path>
#+end_example

The default =/create-documentation <scope>= runs audit -> plan -> write ->
review, asking for confirmation before broad rewrites.

**** Future split if v1 gets too large

If the skill grows past a manageable size, split into a discoverable
=documentation-*= chain. Names and order:

1. =documentation-audit= — inventory existing docs, code/docs drift, reader
   journeys, missing doc types, stale/generated docs.
2. =documentation-plan= — choose audiences, doc map, formats, source of truth,
   publishing path, ownership, and freshness policy.
3. =documentation-write= — write or update the selected docs.
4. =documentation-reference= — generate or improve API/CLI/config/library
   reference from source/spec.
5. =documentation-publish= — configure MkDocs/Docusaurus/Sphinx/GitHub Pages
   or equivalent, build static HTML, verify links/search.
6. =documentation-review= — quality gate for accuracy, style, navigation,
   accessibility, examples, and freshness.

Keep =create-documentation= as the orchestrator and user-facing entry point.
The chain is discoverable because every helper starts with =documentation-= and
the orchestrator prints the next command at each handoff.

*** V1 workflow details

**** Phase 1: Intake and classification

Ask only what is missing from local context:
- Who is the reader? New user, evaluator, integrator, maintainer, operator,
  contributor, auditor, support engineer?
- What is the reader trying to do or understand?
- Is this for a public project, internal team, personal workflow, regulated
  audience, or customer-facing product?
- Is the output repo-browsed, web-published, printed/exported, or Emacs-native?
- Is there existing code, existing docs, an API spec, generated reference, or
  only a concept?
- What is the maintenance expectation? One-off, release-maintained,
  continuously updated?

Classify the work into one or more doc types:
- README / landing page.
- Quickstart.
- Tutorial.
- How-to guide.
- Concept/explanation.
- API reference.
- CLI reference.
- Configuration reference.
- Architecture docs (delegate to =arch-document= if arc42/C4/ADR-driven).
- Operations/runbook.
- Troubleshooting/FAQ.
- Upgrade/migration/release notes.
- Contributor/development docs.
- Security/compliance docs.
- Examples/cookbook.

**** Phase 2: Audit existing material

Inventory:
- =README*=, =docs/=, =doc/=, =site/=, =mkdocs.yml=, =docusaurus.config.*=,
  =vitepress=, =sphinx=, =docs.rs=, =pkg.go.dev=, OpenAPI specs,
  generated docs folders, GitHub Pages config, ADRs, architecture docs,
  examples, scripts, CLI help, package metadata.
- Existing doc type coverage: tutorial/how-to/reference/explanation.
- Broken links, stale version numbers, commands that no longer exist,
  screenshots that may be stale, code snippets not exercised, doc/code drift.
- Source of truth for generated docs. Flag generated files; do not hand-edit
  them until source is known.
- Reader journey gaps: "new user can install?", "first success path?",
  "operator can recover?", "contributor can run tests?", "API consumer can
  authenticate and handle errors?"

Use =rg= first. For API/CLI reference, prefer structured sources:
OpenAPI/JSON Schema, package metadata, command =--help= output, docstrings, or
language-native documentation tooling.

**** Phase 3: Documentation plan

Write a short plan before broad edits:
- Audiences and priority order.
- Proposed doc map/tree.
- Doc type for each page.
- Source format decision: =.md= / =.org= / generated spec / generated HTML.
- Publishing target, if any.
- Existing docs to preserve, move, merge, or delete.
- Generated-reference strategy.
- Ownership and freshness policy.
- Verification plan.

Stop for confirmation when the plan moves or rewrites more than one file.

**** Phase 4: Write or update docs

Writing rules:
- Lead with the reader's goal, not the implementation history.
- Put prerequisites before steps.
- Use numbered lists for procedures.
- Use bullets for non-ordered choices.
- Use active voice and second person for instructions.
- Keep sentences short and globally readable.
- Define acronyms on first use.
- Use code font for commands, file names, env vars, API names, and literals.
- Use descriptive links.
- Prefer examples that cover the common path and one meaningful edge/error
  path.
- Separate examples/tutorials from dense reference.
- Avoid stale duplication: link to canonical generated reference instead of
  copying it.
- Include expected output after commands where it helps verification.
- Include cleanup/rollback steps when procedures change state.
- Include troubleshooting for common failures.
- Avoid marketing voice in technical docs. State capability and constraints
  plainly.
- No AI attribution in docs, examples, comments, generated pages, footers, or
  screenshots.

Page skeletons:

README / docs home:
#+begin_example
# <Project>

<One-paragraph purpose>

## Start here
- New user: <quickstart>
- Existing user with a task: <how-to index>
- API lookup: <reference>
- Maintainer/operator: <operations/contributing>

## Quick example
...

## Documentation map
...

## Support / contributing
...
#+end_example

Tutorial:
#+begin_example
# Tutorial: <goal>

## What you'll build
## Prerequisites
## Step 1 ...
## Checkpoint
## Step 2 ...
## What you learned
## Next
#+end_example

How-to:
#+begin_example
# How to <task>

## When to use this
## Prerequisites
## Steps
## Verify
## Troubleshooting
## Related
#+end_example

Reference:
#+begin_example
# <Thing> reference

## Summary
## Parameters / options / fields
## Behavior
## Errors
## Examples
## Version notes
#+end_example

Explanation:
#+begin_example
# <Concept>

## Problem it solves
## Mental model
## How it fits with related concepts
## Tradeoffs and constraints
## Further reading
#+end_example

Runbook:
#+begin_example
# Runbook: <operation>

## Scope
## Preconditions
## Normal procedure
## Verification
## Rollback
## Alerts and escalation
## Post-incident notes
#+end_example

**** Phase 5: Presentation and publishing

If docs are repo-local only:
- Ensure links render on GitHub/GitLab.
- Keep relative links stable.
- Add an index if more than 4-5 docs exist.

If docs are web-published:
- Detect existing generator and follow it.
- Prefer project-native tooling over introducing MkDocs/Docusaurus/Sphinx.
- If no tooling exists and user wants a site, choose conservatively:
  - Python/simple repo: MkDocs Material is a pragmatic default.
  - JS/React ecosystem: Docusaurus or VitePress if already in stack.
  - Python libraries: Sphinx or MkDocs depending on existing ecosystem.
  - API docs: ReDoc/Swagger/Scalar from OpenAPI.
- Build locally if dependencies exist.
- Check links, nav, search, mobile viewport, and accessibility basics.
- Do not commit generated =site/= output unless the project already does.

**** Phase 6: Verification

Verification should match doc type:
- Commands in quickstarts/how-tos: run them or mark not run with reason.
- Code snippets: compile/run where feasible, or use fenced language and note
  assumptions.
- API docs: validate OpenAPI/spec if tooling exists.
- Links: run link checker if configured; otherwise sample-check changed links.
- Published site: build docs and inspect output.
- Screenshots: verify current UI if included.
- Generated docs: regenerate from source and confirm no unexpected diff.

Final report must say:
- Files created/changed.
- Doc types covered.
- Format/source-of-truth decisions.
- What was verified.
- What could not be verified.
- Known gaps/follow-ups.

*** Relationship to existing skills

- =arch-document=: use when the requested docs are specifically architecture
  docs from brief + ADRs + C4/arc42. =create-documentation= may call it, then
  wrap the output in a broader docs map.
- =c4-analyze= / =c4-diagram=: use for diagrams in architecture or concept
  docs when visual structure helps.
- =brainstorm=: use before =create-documentation= when the product/feature
  itself is still unclear.
- =arch-design= / =arch-decide=: use when documentation reveals missing
  architectural choices.
- =security-check=: use when docs include security guidance, auth, secrets,
  deployment, or compliance claims.
- =playwright-js= / =playwright-py=: use to verify published doc sites,
  interactive docs, screenshots, and browser-rendered examples.
- =codify=: use after a documentation session reveals reusable project-specific
  documentation rules.

*** Quality bar and anti-patterns

The skill should reject:
- A giant README that mixes tutorial, reference, architecture, and operations.
- Duplicating generated API/CLI/config reference by hand.
- Unverified commands in quickstarts without a "not run" note.
- Screenshots with no alt text or no update path.
- Tables used for layout instead of actual tabular data.
- "Overview" pages that do not route readers to tasks.
- Tutorials that become reference dumps.
- How-to guides that explain concepts for pages before giving steps.
- Reference pages that hide required options in prose.
- Marketing claims without concrete examples.
- Docs that mention local private paths, personal tooling, or AI attribution in
  public artifacts.
- Publishing generated HTML as source unless the project explicitly owns HTML
  docs that way.

*** Acceptance criteria for building the skill

- [ ] Directory =create-documentation/= with =SKILL.md=.
- [ ] Frontmatter description includes positive and negative triggers.
- [ ] Skill body includes the V1 phases above.
- [ ] Includes a source-format decision table for =.md= / =.org= / =.html= /
  generated spec/reference.
- [ ] Includes doc-type classifier based on Diataxis plus README/runbook/API
  additions.
- [ ] Includes examples/skeletons for README, tutorial, how-to, reference,
  explanation, runbook, troubleshooting, contributor docs, and API overview.
- [ ] Includes audit checklist for existing repos.
- [ ] Includes publishing guidance without hardcoding one static-site tool.
- [ ] Includes verification checklist and "unable to verify" reporting.
- [ ] Cross-references =arch-document=, =brainstorm=, =security-check=,
  =playwright-js=, =playwright-py=, and =codify=.
- [ ] Adds =references/= only if needed; suggested files:
  - =references/doc-type-decision.md=
  - =references/style-guide.md=
  - =references/format-decision.md=
  - =references/page-skeletons.md=
  - =references/doc-audit-checklist.md=
- [ ] Keep =SKILL.md= concise enough to load; move long skeletons/checklists to
  references for progressive disclosure.
- [ ] Run =./scripts/lint.sh= after adding the skill.

*** Open design questions before implementation

- Should the user-facing command be exactly =/create-documentation= while
  internal helper names use =documentation-*=, or should all names share the
  =create-documentation <subcommand>= form? Recommendation: one skill with
  subcommands for v1.
- Should Markdown be the hard default for team docs? Recommendation: yes,
  unless the project already uses org/reST/AsciiDoc or the output is personal
  Emacs-native planning.
- Should the skill create a docs site automatically? Recommendation: no. It
  should propose a site when the doc set exceeds README-scale or when search,
  versioning, or public publishing is required. Ask before adding tooling.
- Should it write docs before code exists? Recommendation: yes for specs,
  user journeys, and design docs, but route unclear feature/product decisions
  through =brainstorm= or =arch-design= first.
- Should it include LLM-specific docs surfaces? Recommendation: optional for
  public/library/API docs: =llms.txt= or markdown export is valuable, but normal
  human navigation remains primary.

** TODO [#A] Review pass: tighten skills and rulesets after 2026-05-04 audit

Source notes used in this pass:
- C4 official docs: C4 is notation-independent; System Context and Container
  diagrams are enough for most teams; every diagram needs title, key/legend,
  explicit element types, and audience-appropriate abstraction.
  [[https://c4model.com/diagrams][C4 diagrams]],
  [[https://c4model.com/diagrams/notation][C4 notation]],
  [[https://c4model.com/abstractions/component][C4 component]]
- arc42 docs: quality requirements need measurable scenarios; section 10
  should reference top quality goals and capture lesser quality requirements
  with specific measures. [[https://docs.arc42.org/section-10/][arc42 section 10]],
  [[https://quality.arc42.org/articles/specify-quality-requirements][specifying quality requirements]]
- ADR references: ADRs capture one justified architecturally significant
  decision and its rationale; Nygard's original guidance emphasizes short,
  numbered, repository-stored records and superseding rather than rewriting old
  decisions. [[https://adr.github.io/][adr.github.io]],
  [[https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions][Nygard ADR article]]
- Playwright docs: prefer user-visible locators and web assertions; locators
  auto-wait and retry; =networkidle= is discouraged for testing readiness.
  [[https://playwright.dev/docs/best-practices][Playwright best practices]],
  [[https://playwright.dev/docs/locators][Playwright locators]],
  [[https://playwright.dev/docs/next/api/class-page][Playwright page API]]
- OWASP references: Top 10 2021 includes Broken Access Control,
  Cryptographic Failures, Injection, Insecure Design, Security
  Misconfiguration, Vulnerable and Outdated Components, Identification and
  Authentication Failures, Software and Data Integrity Failures, Security
  Logging and Monitoring Failures, and SSRF; WSTG adds a broader testing map
  across configuration, identity, authn/z, sessions, input validation, error
  handling, cryptography, business logic, client-side, and API testing.
  [[https://owasp.org/Top10/2021/][OWASP Top 10 2021]],
  [[https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/][OWASP WSTG]]
- V2MOM references: Salesforce calls the last M "Measures" and emphasizes a
  simple alignment document with prioritized Methods, explicit Obstacles, and
  measurable outcomes. [[https://trailhead.salesforce.com/content/learn/modules/selfmotivation/get-focused-with-your-personal-v2mom][Salesforce Trailhead personal V2MOM]],
  [[https://www.salesforce.com/blog/?p=12][Salesforce V2MOM alignment]]
- Prompt research: the cited Meincke paper is titled "Call Me A Jerk:
  Persuading AI to Comply with Objectionable Requests"; its scope is
  persuasion increasing compliance with objectionable requests, not a general
  proof that persuasion framing improves prompt quality.
  [[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5357179][SSRN paper]]
- Combinatorial testing references: NIST supports t-way combinatorial testing
  and notes pairwise is one covering strength, with higher-strength arrays
  useful for failures requiring more interacting factors.
  [[https://www.nist.gov/publications/practical-combinatorial-testing-beyond-pairwise][NIST beyond pairwise]],
  [[https://www.nist.gov/publications/combinatorial-software-testing][NIST combinatorial testing]]

*** Grouped index (for batching by area)

Each item below is a one-line summary of a sub-TODO further down. Tick the box when the matching sub-TODO is moved to =DONE=. Items are grouped by area so they can be batched (e.g., "do all Playwright items in one session").

**** Browser testing
- [ ] [#A] =playwright-js=: locator/assertion-first guidance (replace raw CSS, =networkidle=)
- [ ] [#B] =playwright-js= + =playwright-py=: reconcile headless/visible defaults
- [ ] [#B] =playwright-js= + =playwright-py=: remove emoji console markers from examples

**** Frontend / UI
- [ ] [#B] =frontend-design=: WCAG 2.2 alignment, accessibility non-optional
- [ ] [#B] =frontend-design=: harmonize aesthetic guidance with anti-pattern rules

**** Security
- [ ] [#A] =security-check=: OWASP 2021 + WSTG coverage
- [ ] [#B] =security-check=: tooling and offline/network caveats

**** Combinatorial testing
- [ ] [#B] =pairwise-tests=: t-way escalation guidance beyond pairwise
- [ ] [#B] =pairwise-tests=: clarify negative value syntax + generator availability

**** V2MOM
- [ ] [#A] =create-v2mom=: rename Metrics → Measures (Salesforce alignment)
- [ ] [#B] =create-v2mom=: prevent task migration from turning V2MOM into a backlog
- [ ] [#B] =create-v2mom=: mitigation/owner fields for Obstacles

**** Prompt engineering
- [ ] [#A] =prompt-engineering=: correct/narrow Meincke citation
- [ ] [#B] =prompt-engineering=: eval-harness requirement for production prompts

**** Codify
- [ ] [#B] =codify=: stale-entry review + privacy checks before writing project =CLAUDE.md=

**** Code review
- [ ] [#A] =review-code=: resolve local-verification vs CI boundary
- [ ] [#B] =review-code=: =CLAUDE.md= citation scope for public artifacts
- [ ] [#B] =review-code=: relax three-strengths rule for tiny/failing diffs

**** PR / review responses
- [ ] [#A] =respond-to-review=: remove review-process language from commit messages
- [ ] [#B] =respond-to-review=: use unresolved threads + resolution state
- [ ] [#B] =respond-to-cj-comments=: drop personal absolute paths from public-writing
- [ ] [#B] =respond-to-cj-comments=: fallback when =humanizer= or =emacsclient= unavailable

**** Branch workflow
- [ ] [#A] =finish-branch=: fix base-branch detection
- [ ] [#B] =finish-branch=: worktree-aware pull/merge safety
- [ ] [#B] =start-work=: tool-availability + ceremony-scaling rules
- [ ] [#B] =start-work=: claim-before-justify rollback risk

**** Tests / TDD
- [ ] [#B] =add-tests=: fix missing =typescript-testing.md= reference or add ruleset
- [ ] [#B] =add-tests=: explicit exceptions to "all three categories per function"

**** Debugging / RCA
- [ ] [#B] =debug=: capture environment + recent-change context before hypotheses
- [ ] [#B] =root-cause-trace=: constrain defense-in-depth to trust boundaries
- [ ] [#B] =five-whys=: require evidence + counterfactual validation per why

**** Brainstorming
- [ ] [#B] =brainstorm=: timebox + research/source rules for high-stakes designs

**** Architecture
- [ ] [#B] =arch-decide=: timeless examples, drop unverifiable claims
- [ ] [#B] =arch-decide=: standardize statuses + immutability language
- [ ] [#B] =arch-design=: threat modeling + privacy/compliance as first-class inputs
- [ ] [#B] =arch-design=: separate paradigms from tactical patterns
- [ ] [#B] =arch-document=: arc42/Q42 quality scenarios
- [ ] [#B] =arch-document=: staleness + ownership metadata for generated docs
- [ ] [#B] =arch-evaluate=: confidence levels for framework-agnostic findings
- [ ] [#B] =arch-evaluate=: report skipped tool checks explicitly

**** C4 modeling
- [ ] [#A] =c4-analyze= + =c4-diagram=: notation/output fallback (not draw.io-only)
- [ ] [#B] =c4-analyze= + =c4-diagram=: clarify abstraction boundaries

**** Global rules
- [ ] [#B] =commits.md=: split DeepSat/Linear/Slack-specific from global rules
- [ ] [#A] =commits.md= + publish flows: =humanizer=-unavailable fallback
- [ ] [#B] =verification.md=: explicit "unable to verify" reporting standard
- [ ] [#B] =testing.md=: property-based + mutation testing as escalation paths
- [ ] [#B] =testing.md=: soften absolute TDD with explicit spike protocol
- [ ] [#B] =subagents.md=: capability/availability + cost checks

**** Languages
- [ ] [#A] =python-testing.md=: revisit in-memory SQLite guidance
- [ ] [#B] =python-testing.md=: separate "never mock ORM" from unit-test boundaries
- [ ] [#B] =elisp.md=: drop tool-specific advice
- [ ] [#B] =elisp-testing.md=: batch-mode + native-comp caveats

**** Hooks
- [ ] [#A] =hooks/README.md=: include =destructive-bash-confirm.py= in install/settings snippets
- [ ] [#A] =hooks/git-commit-confirm.py= + =gh-pr-create-confirm.py=: inspect message/body files referenced by =-F= / =--body-file=
- [ ] [#B] =hooks/destructive-bash-confirm.py=: shell-aware command parsing (not regex)

*** TODO [#A] =playwright-js=: replace raw CSS/page actions and =networkidle= defaults with locator/assertion-first guidance

Current examples lean on =page.click=, =page.fill=, =waitForSelector=, and
=waitForLoadState('networkidle')=. Official Playwright guidance prefers
locators based on user-visible attributes, web assertions for readiness, and
calls =networkidle= discouraged for testing. Keep reconnaissance, but revise it
to wait for a visible app-specific landmark instead of treating network quiet
as readiness.

*** TODO [#B] =playwright-js= and =playwright-py=: reconcile headless/visible-browser defaults

=playwright-js= says visible Chromium by default; =playwright-py= says
headless by default. That may be intentional, but the difference should be
explicit: interactive visual debugging -> headed, CI/pytest smoke tests ->
headless. Add a small decision table so agents don't flip modes by habit.

*** TODO [#B] =playwright-js= and =playwright-py=: remove emoji console markers from examples

The broader rules discourage emojis in shared engineering output. The
Playwright examples print camera/check/cross emoji. Replace with plain ASCII
status prefixes.

*** TODO [#B] =frontend-design=: make accessibility non-optional and align with WCAG 2.2

The workflow only loads =references/accessibility.md= for interactive
components. Accessibility should be a baseline for all frontend work: keyboard
operation, focus visibility/not-obscured, target size, contrast, reduced
motion, labels, and semantic structure. Add WCAG 2.2-oriented gates before
handoff.

*** TODO [#B] =frontend-design=: harmonize aesthetic guidance with current UI anti-pattern rules

The skill encourages gradient meshes, heavy texture, custom cursors, overlap,
and maximalist directions. Those can conflict with the repo's newer frontend
discipline against generic gradients, decorative blobs/orbs, text overlap,
single-hue palettes, unreadable layouts, and marketing-style dashboards. Add a
"creative but bounded" section: domain fit, readability, responsive stability,
and no decorative effects that degrade the task workflow.

*** TODO [#A] =security-check=: update OWASP coverage to the 2021 categories and WSTG test areas

The current security checklist uses older category names and misses several
current Top 10 items: Insecure Design, Software and Data Integrity Failures,
Security Logging and Monitoring Failures, and SSRF. Expand the review table so
each finding maps to either OWASP Top 10 2021 or a WSTG area, and add explicit
checks for authorization object/function-level access, SSRF URL fetches,
integrity of update/plugin paths, and security-relevant logging gaps.

*** TODO [#B] =security-check=: add practical tooling and offline/network caveats

Add optional use of project-configured scanners such as =gitleaks= or
=trufflehog= for secrets, =semgrep= for source patterns, =pip-audit= / =npm
audit= / OSV where configured, and lockfile diff review. Note that dependency
audits may need network access and should report "not run" clearly rather than
silently passing.

*** TODO [#B] =pairwise-tests=: add t-way escalation guidance beyond pairwise

Pairwise is a pragmatic default, but NIST's combinatorial testing work covers
higher-strength t-way arrays too. Add a rule: start with pairwise for broad
coverage, escalate selected high-risk parameter clusters to 3-way or higher
when history, safety, security, or domain reasoning suggests faults require
more than two interacting factors.

*** TODO [#B] =pairwise-tests=: clarify negative value syntax and actual generator availability

The examples use =~0= style values that are PICT-specific and easy to
misread. Add a short "negative testing values are labels, not operators unless
PICT treats them specially" explanation, and make the run path honest: if PICT
or =pypict= is unavailable, produce the model and stop instead of implying
cases were generated.

*** TODO [#A] =create-v2mom=: rename "Metrics" to Salesforce's "Measures" or explicitly justify the deviation

V2MOM's final M is officially "Measures." The skill uses "Metrics" throughout.
Either rename the section and description to "Measures" or add a clear note
that this fork intentionally says "Metrics" while preserving the V2MOM concept.

*** TODO [#B] =create-v2mom=: prevent task migration from turning V2MOM into a backlog

Salesforce presents V2MOM as a simple alignment framework. This skill's
optional task-migration phase can make the V2MOM the entire todo system. Split
strategy from execution: keep the V2MOM concise, and link to method-specific
backlogs instead of embedding every task under the strategic document.

*** TODO [#B] =create-v2mom=: add mitigation/owner fields for Obstacles

The current Obstacles phase captures barriers but not consistently how each
will be overcome. Add "mitigation, owner, and review cadence" per obstacle so
the section becomes operational instead of just candid.

*** TODO [#A] =prompt-engineering=: correct and narrow the Meincke citation

The skill cites "Persuasion and Compliance in Large Language Models" but the
paper found in research is "Call Me A Jerk: Persuading AI to Comply with
Objectionable Requests." Revise the reference and avoid overgeneralizing the
result: it shows persuasion can raise compliance with objectionable requests,
which is a cautionary prompt-safety finding, not broad evidence that persuasion
principles improve engineering prompt quality.

*** TODO [#B] =prompt-engineering=: add an evaluation harness requirement for production prompts

Prompt critique currently ends with a rewrite and checklist. Add a requirement
for fragile or reusable prompts: create 3-5 adversarial/edge examples, run the
old and new prompt against them, and record the observed behavioral delta.
Without examples, prompt quality remains asserted rather than verified.

*** TODO [#B] =codify=: add stale-entry review and privacy checks before writing project =CLAUDE.md=

The skill has good gates, but it should explicitly scan for stale entries,
private context, and team-visible leakage before appending. Add "would this be
safe if the project were public?" and "does this belong in private memory
instead?" as mandatory checks, not just table background.

*** TODO [#A] =review-code=: resolve the local-verification vs CI boundary

=review-code= says "Trust CI for lint, typecheck, test runs; don't re-run
them." =verification.md= and =finish-branch= require fresh local evidence
before completion. Clarify: code review should not duplicate CI while reading a
PR, but pre-commit/pre-push workflows still need local verification or a clear
"not run because..." statement.

*** TODO [#B] =review-code=: handle public-artifact scope when citing =CLAUDE.md=

The skill requires auditing and reporting =CLAUDE.md= adherence, while
=commits.md= says personal tooling files should not be cited as authority in
public artifacts. Add two output modes: private/internal review may cite
=CLAUDE.md= directly; public/team review should translate the rule into the
underlying engineering reason without naming personal rulesets.

*** TODO [#B] =review-code=: relax mandatory "three strengths" for tiny or failing diffs

"Three minimum" strengths can force filler on small diffs or bad PRs. Adjust to
"up to three specific strengths; say none found when appropriate" so the review
stays honest and avoids synthetic praise.

*** TODO [#A] =respond-to-review=: remove review-process language from commit messages

The skill suggests commits like =fix: Address review — [description]=, which
conflicts with =commits.md='s "what changed and why, not the process" rule and
also uses a non-ASCII dash. Replace with conventional subjects that name the
actual fix, e.g. =fix: validate export filename=.

*** TODO [#B] =respond-to-review=: use unresolved review threads and resolution state, not only flat comments

Fetching inline and top-level comments via REST misses thread resolution and
can re-process already-resolved feedback. Add the same thread-level workflow as
the GitHub comment-addressing skill: gather unresolved threads, group by
requested change, implement, reply, and resolve only after verification.

*** TODO [#B] =respond-to-cj-comments=: remove personal absolute path references from public-writing instructions

The skill embeds =/home/cjennings/code/rulesets/claude-rules/commits.md= in
the public-writing section. That contradicts the public-artifact scope rule.
Refer to "the commit/public-writing rules" internally, and ensure any emitted
public text never cites the local path.

*** TODO [#B] =respond-to-cj-comments=: add fallback when =humanizer= or =emacsclient= is unavailable

The workflow requires =/humanizer= and opens long summaries in =emacsclient=.
Neither is guaranteed in a fresh environment. Add tool-availability checks and
fallbacks: apply the style passes inline if =humanizer= is absent, and write the
summary file path without opening an editor if =emacsclient= fails.

*** TODO [#A] =finish-branch=: fix base-branch detection

Phase 2 says "determine base branch" but the command shown returns a merge-base
commit SHA, not the branch name to check out, pull, merge into, or pass as PR
base. Replace with explicit branch detection: upstream PR base if present,
configured default branch from =origin/HEAD=, or user-selected branch, then
compute merge-base separately.

*** TODO [#B] =finish-branch=: make pull/merge steps safer and worktree-aware

Option 1 runs =git pull= and =git merge --no-ff= after checkout. Add checks for
dirty worktree, upstream tracking, protected branches, and rebase-vs-merge team
policy. Worktree detection via grepping branch names is fragile; use =git
worktree list --porcelain= or =git rev-parse --git-common-dir= based checks.

*** TODO [#B] =start-work=: add tool-availability and ceremony-scaling rules

The workflow assumes Linear MCP, GitHub CLI, =humanizer=, Playwright skills, and
multi-commit TDD ceremony. Add a first-class "tools unavailable" path and a
ceremony scale: trivial local fixes should not require the full ticket,
branch, three approval gates, and commit-per-phase flow unless the user wants
that process.

*** TODO [#B] =start-work=: resolve the "claim before justify" rollback risk

The skill marks Linear/GitHub/todo tasks in progress before the Justify gate,
then says rolling back is required if justification fails. Consider moving
claiming after Gate 1 for personal todo tasks, or make the rollback steps
explicit per tracker with stored prior state.

*** TODO [#B] =add-tests=: fix missing =typescript-testing.md= reference or add the ruleset

Phase 3 references =typescript-testing.md=, but this repo currently has Python
and Elisp testing rules only. Either add the TypeScript ruleset or change the
skill to discover project-local JS/TS testing conventions instead of pointing
to a missing file.

*** TODO [#B] =add-tests=: add explicit exceptions to "all three categories per function"

The Normal/Boundary/Error rule is useful, but some functions are pure adapters,
generated code, tiny wrappers, or framework glue. Add an exception protocol:
state why a category does not apply, and cover the behavior at the integration
or E2E level when unit categories would test framework behavior.

*** TODO [#B] =debug=: capture environment and recent-change context before hypotheses

The debugging workflow covers reproduction and logs, but should explicitly
record environment, versions, feature flags, data set, seed/time, concurrency,
and recent commits/config changes. Many intermittent failures are environment
or state transitions, not just local code paths.

*** TODO [#B] =root-cause-trace=: constrain defense-in-depth to trust boundaries and invariants

The skill says add defense at each intermediate layer that could have caught
the bad value. That risks validation spam. Tighten it: add checks at ingress,
trust boundaries, persistence boundaries, and invariant-owning layers; avoid
duplicative null checks in every pass-through function.

*** TODO [#B] =five-whys=: require evidence and counterfactual validation per why

The skill says "one best-supported answer" but should require an evidence
field for each link and a counterfactual check: if this cause were removed,
would the next symptom plausibly disappear? This reduces monocausal storytelling.

*** TODO [#B] =brainstorm=: add timebox and research/source rules for high-stakes designs

The one-question-at-a-time flow can run long. Add a timebox and a rule that
claims about markets, regulations, tools, vendors, or current APIs require
fresh sources. The design doc should distinguish researched facts from
assumptions.

*** TODO [#B] =arch-decide=: make examples technically timeless and avoid unverifiable claims

The sample ADRs include claims such as MongoDB lacking ACID for multi-document
transactions "at decision time." Examples age and can teach stale facts. Replace
with either clearly dated examples or domain-neutral placeholders, and require
references for real technical claims in generated ADRs.

*** TODO [#B] =arch-decide=: standardize statuses and immutability language

The skill mixes Accepted, Decided, Deprecated, Superseded, Rejected, and "Not
Accepted." Pick a canonical status set and state that accepted ADR content is
not edited except for status/link metadata; changed decisions get new ADRs that
supersede old ones.

*** TODO [#B] =arch-design=: add threat modeling and privacy/compliance as first-class design inputs

Security appears as one quality attribute, but architecture design should also
ask about trust boundaries, data classification, abuse cases, privacy
constraints, compliance evidence, and operational ownership. These influence
architecture early and should not wait for =security-check=.

*** TODO [#B] =arch-design=: separate architecture paradigms from tactical patterns

The candidate table mixes paradigms (modular monolith, microservices,
event-driven) with tactical or partial patterns (DDD, CQRS, event sourcing).
Revise the matrix so candidates can compose patterns rather than treating each
as a mutually exclusive architecture choice.

*** TODO [#B] =arch-document=: strengthen quality scenarios using arc42/Q42 structure

Section 10 currently says "Under [condition], the system should [response]
within [measure]." Expand to a compact quality-scenario template: source,
stimulus, environment, artifact, response, response measure. This better
matches architecture-quality practice and makes requirements testable.

*** TODO [#B] =arch-document=: add staleness and ownership metadata to generated docs

arc42 docs are living documents. Add owner, source commit/date, review cadence,
and "known stale when..." notes per section or in the README so generated docs
do not become authoritative after the code has moved on.

*** TODO [#B] =arch-evaluate=: add confidence levels for framework-agnostic findings

Claude-read import graphs and public API comparisons can be incomplete in large
or dynamic languages. Add confidence/provenance per finding and require "not
fully checked because..." when scale or dynamic imports limit certainty.

*** TODO [#B] =arch-evaluate=: report skipped tool checks explicitly

The workflow says skip unconfigured language-specific tools silently, but the
review checklist also wants checks run. For audit usefulness, list detected
languages and "tool not configured" entries under Info instead of silent skips.

*** TODO [#A] =c4-analyze= and =c4-diagram=: add notation/output fallback instead of draw.io-only

C4 is notation-independent. These skills hard-require draw.io XML, PNG export,
and opening draw.io desktop. Add supported outputs (Structurizr DSL, Mermaid,
PlantUML, draw.io) and a fallback path when =drawio= or a GUI is unavailable.

*** TODO [#B] =c4-analyze= and =c4-diagram=: clarify C4 abstraction boundaries

Emphasize that C4 Containers are deployable/runnable units, not necessarily
Docker containers, and that Components are not separately deployable. Add a
check that every relationship and element stays at one abstraction level.

*** TODO [#B] =commits.md=: split DeepSat/Linear/Slack-specific publishing rules from global commit rules

The global commit rule file includes Linear status transitions and a hard-coded
Slack channel. That is team-specific and may leak or misfire in unrelated
projects. Move those steps to a project/team overlay, leaving global rules for
author identity, attribution, commit format, review gate, and verification.

*** TODO [#A] =commits.md= and publish flows: define fallback when =humanizer= is unavailable

Several workflows make =humanizer= mandatory, but no =humanizer= skill exists
in this repo. Either add the skill, install instructions, or a fallback
plain-English pass that satisfies the same checks without an external skill.

*** TODO [#B] =verification.md=: add explicit "unable to verify" reporting standard

The rule says run tests/lint/typecheck/build before claiming done. Add the
required final wording when a command cannot be run: command attempted, reason
it could not run, risk left unverified, and the smallest next command for the
user to run.

*** TODO [#B] =testing.md=: add property-based and mutation testing as escalation paths

The testing rules cover categories and pairwise matrices. Add guidance for
property-based testing when invariants matter across broad input domains, and
mutation testing when test quality is suspect despite high coverage.

*** TODO [#B] =testing.md=: soften absolute TDD with an explicit spike protocol

The rule currently treats TDD as non-negotiable. Keep TDD as the default, but
define a disciplined spike exception: timebox, do not commit spike code, write
the first failing test before productionizing the discovered approach.

*** TODO [#B] =subagents.md=: add capability/availability and cost checks

The rule assumes subagents exist and should handle failures. Add "if the
environment lacks subagents, continue locally and preserve the same scope
boundaries" plus a cost check for tasks where context handoff exceeds the work.

*** TODO [#A] =languages/python/claude/rules/python-testing.md=: revisit in-memory SQLite guidance

"Prefer in-memory SQLite for speed in unit tests" is risky for Django or
SQLAlchemy projects whose production database is PostgreSQL/MySQL; query
semantics, constraints, transactions, JSON, time zones, and indexes differ.
Recommend production-like DBs for ORM/query behavior and reserve SQLite for
pure unit tests that do not depend on database semantics.

*** TODO [#B] =languages/python/claude/rules/python-testing.md=: separate "never mock ORM" from true unit-test boundaries

For domain services, real model methods and validation are usually right. For
thin orchestration units, a repository/interface fake may be cleaner than
hitting a real database. Clarify the boundary: do not mock ORM internals, but
do inject fakes at deliberate data-access ports.

*** TODO [#B] =languages/elisp/claude/rules/elisp.md=: update editing workflow to avoid tool-specific advice

The rule says prefer Write over repeated Edits. That advice is Claude-tooling
specific and can conflict with environments that require patch-based edits.
Rephrase around the intent: for nontrivial Elisp, make cohesive edits and run
paren/byte-compile checks immediately.

*** TODO [#B] =languages/elisp/claude/rules/elisp-testing.md=: add batch-mode and native-comp caveats

ERT guidance is solid, but add rules for =emacs --batch= reproducibility,
isolating =user-emacs-directory= / package state, and optionally catching
native-comp or byte-compile warnings depending on the project's Emacs version.

*** TODO [#A] =hooks/README.md=: include =destructive-bash-confirm.py= in install/settings snippets

The table documents the destructive-command hook, but the manual install and
settings JSON snippets only include the commit and PR hooks. Add the destructive
hook to both snippets so documented installation matches the listed hooks.

*** TODO [#A] =hooks/git-commit-confirm.py= and =hooks/gh-pr-create-confirm.py=: inspect message/body files

=commits.md= uses =git commit -F /tmp/commit-*.md= and =gh pr create
--body-file ...=. The hooks currently treat file-backed messages as
unparseable or just display the file path, so attribution scanning may miss the
actual committed/posted text. Read safe local files referenced by =-F=,
=--file=, and =--body-file= before deciding whether the command is clean.

*** TODO [#B] =hooks/destructive-bash-confirm.py=: replace regex command parsing with shell-aware parsing where possible

The hook's regexes can miss quoted paths, variables, aliases, =env= wrappers,
or compound commands, and can misidentify targets. Use =shlex= for simple
commands, document unsupported shell constructs, and fail toward asking when a
destructive pattern is ambiguous.

** TODO [#B] Build =ov-1= skill for DoDAF OV-1 (High-Level Operational Concept Graphic)

Triggered by SOFWeek (May 2026, Tampa) — DeepSat attending; DoD attendees
may ask for architecture diagrams. OV-1 is the universal informal
currency in DoD briefings ("show me the architecture" → OV-1 by default).

Priority upgrades to =[#A]= if Craig confirms scenario 2 below (personal
load-bearing need at the event); stays =[#B]= or drops to =[#C]= if
scenario 1 (team already covers it, future asset only).

*** Prior art (searched 2026-04-19)

No existing Claude Code skill exists for DoDAF / OV-1 / SV-1 / SysML.

- =anthropics/skills= — 17 skills, zero DoDAF/SysML/defense coverage.
- =awesome-claude-code= list — zero hits for DoDAF/OV-1/SysML/UAF.
- =mfsgr/sysml2dodaf= — empty repo (0 stars, no code). Vapor.
- =HowardKao-1130/mini-NEXEN= — broad SE methodology skill that
  name-drops DoDAF as a trigger keyword; no artifact generation. 0 stars.
- =gaphor/gaphor= (Apache-2.0, 2.2k stars) — mature UML/SysML GUI
  modeler. Not a skill; not a pipeline. Useful reference only.

Nearest prior art to lean on when building:
- DoDAF 2.02 Viewpoints & Models reference (dodcio.defense.gov) —
  canonical OV-1 exemplars. Embed 3-5 layouts as skill =references/=.
- Pattern from existing =c4-diagram= skill — same shape (prose → diagram
  spec), swap the viewpoint vocabulary to DoDAF.
- PlantUML for SV-1 (when that skill comes later); Mermaid or draw.io
  XML for OV-1 lightweight visuals.

*** Build scope (when triggered)

*In scope:*
- Input: prose description of a system + its operational context.
- Output: structured OV-1 *spec* — performers, external actors (other
  systems, forces, adversaries), relationships (data/control flows),
  narrative captions, classification marking, legend requirements.
- DoDAF 2.02 completeness checklist as a quality gate — verify the
  produced spec contains every element a correct OV-1 requires.
- Optional lightweight visual: draw.io XML or Mermaid approximation for
  quick review; NOT a finished rendering.

*Out of scope:*
- Icon libraries, pictorial assets, finished PowerPoint export. OV-1
  final art belongs to a designer or Craig in Visio/PowerPoint; the
  skill's job is the spec and the check, not the slide.
- SV-1, SV-2, UAF, IDEF1X, other viewpoints. Build only when a
  concrete need triggers each.

Estimate: 4-6 hours.

*** Craig's investigation before kickoff

1. Does DeepSat's systems-engineering or marketing team already have an
   OV-1 (or the equivalent briefing artifact) for SOFWeek?
2. If yes (scenario 1) — skill is a future asset, not event-load-bearing.
   Ship after SOFWeek. Priority drops to =[#C]=.
3. If no, or if the scenario is "Craig may need to produce/iterate an
   OV-1 on the fly during the event" (scenario 2) — skill is load-bearing
   for the event. Priority upgrades to =[#A]=; build before SOFWeek.
4. Confirm the classification level the skill needs to handle
   (unclassified-only? or FOUO markings? affects the classification
   block in the spec).
5. Confirm the target rendering format DeepSat uses for OV-1
   deliverables (PowerPoint slide? Cameo? Visio? affects whether the
   skill emits draw.io XML vs Mermaid vs pure structured spec).

*** Related

See also the DoD-specific notations section under the later TODO
(=c4-*= rename revisit) — OV-1 is flagged there as the highest-value
starting point across the DoD notation landscape (SysML, DoDAF/UAF,
IDEF1X). This entry is the execution plan for that starting point.

** TODO [#A] Build =/update-skills= skill for keeping forks in sync with upstream

The rulesets repo has a growing set of forks (=arch-decide= from
wshobson/agents, =playwright-js= from lackeyjb/playwright-skill, =playwright-py=
from anthropics/skills/webapp-testing). Over time, upstream releases fixes,
new templates, or scope expansions that we'd want to pull in without losing
our local modifications. A skill should handle this deliberately rather than
by manual re-cloning.

*** Design decisions (agreed)

- *Upstream tracking:* per-fork manifest =.skill-upstream= (YAML or JSON):
  - =url= (GitHub URL)
  - =ref= (branch or tag)
  - =subpath= (path inside the upstream repo when it's a monorepo)
  - =last_synced_commit= (updated on successful sync)
- *Local modifications:* 3-way merge. Requires a pristine baseline snapshot of
  the upstream-at-time-of-fork. Store under =.skill-upstream/baseline/= or
  similar; committed to the rulesets repo so the merge base is reproducible.
- *Apply changes:* skill edits files directly with per-file confirmation.
- *Conflict policy:* per-hunk prompt inside the skill. When a 3-way merge
  produces a conflict, the skill walks each conflicting hunk and asks Craig:
  keep-local / take-upstream / both / skip. Editor-independent; works on
  machines where Emacs isn't available. Fallback when baseline is missing
  or corrupt (can't run 3-way merge): write =.local=, =.upstream=,
  =.baseline= files side-by-side and surface as manual review.

*** V1 Scope

- [ ] Skill at =~/code/rulesets/update-skills/=
- [ ] Discovery: scan sibling skill dirs for =.skill-upstream= manifests
- [ ] Helper script (bash or python) to:
  - Clone each upstream at =ref= shallowly into =/tmp/=
  - Compare current skill state vs latest upstream vs stored baseline
  - Classify each file: =unchanged= / =upstream-only= / =local-only= / =both-changed=
  - For =both-changed=: run =git merge-file --stdout <local> <baseline> <upstream>=;
    if clean, write result directly; if conflicts, parse the conflict-marker
    output and feed each hunk into the per-hunk prompt loop
- [ ] Per-hunk prompt loop:
  - Show base / local / upstream side-by-side for each conflicting hunk
  - Ask: keep-local / take-upstream / both (concatenate) / skip (leave marker)
  - Assemble resolved hunks into the final file content
- [ ] Per-fork summary output with file-level classification table
- [ ] Per-file confirmation flow (yes / no / show-diff) BEFORE per-hunk loop
- [ ] On successful sync: update =last_synced_commit= in the manifest
- [ ] =--dry-run= to preview without writing

*** V2+ (deferred)

- [ ] Track upstream *releases* (tags) not just branches, so skill can propose
  "upgrade from v1.2 to v1.3" with release notes pulled in
- [ ] Generate patch files as an alternative apply method (for users who prefer
  =git apply= / =patch= over in-place edits)
- [ ] Non-interactive mode (=--non-interactive= / CI): skip conflict resolution,
  emit side-by-side files for later manual review
- [ ] Auto-run on a schedule via Claude Code background agent
- [ ] Summary of aggregate upstream activity across all forks (which forks have
  upstream changes waiting, which don't)
- [ ] Optional editor integration: on machines with Emacs, offer
  =M-x smerge-ediff= as an alternate path for users who prefer ediff over
  per-hunk prompts

*** Initial forks to enumerate (for manifest bootstrap)

- [ ] =arch-decide= → =wshobson/agents= :: =plugins/documentation-generation/skills/architecture-decision-records= :: MIT
- [ ] =playwright-js= → =lackeyjb/playwright-skill= :: =skills/playwright-skill= :: MIT
- [ ] =playwright-py= → =anthropics/skills= :: =skills/webapp-testing= :: Apache-2.0

*** Open questions

- [ ] What happens when upstream *renames* a file we fork? Skill would see
  "file gone from upstream, still present locally" — drop, keep, or prompt?
- [ ] What happens when upstream splits into multiple forks (e.g., a plugin
  reshuffles its structure)? Probably out of scope for v1; manual migration.
- [ ] Rate-limit / offline mode: if GitHub is unreachable, should skill fail
  or degrade gracefully? Likely degrade; print warning per fork.

** TODO [#B] Build /research-writer — clean-room synthesis for research-backed long-form
SCHEDULED: <2026-05-15 Fri>

Gap in current rulesets: between =brainstorm= (idea refinement → design doc)
and =arch-document= (arc42 technical docs), there's no skill for
research-backed long-form prose — blog posts, essays, white papers,
proposals with data backing, article-length content with citations.

Craig writes documents across many contexts (defense-contractor work,
personal, technical, proposals). The gap is real.

*Evaluated 2026-04-19:* ComposioHQ/awesome-claude-skills has a
=content-research-writer= skill (540 lines, 14 KB) that attempts this. *Not
adopting:*
- Parent repo has no LICENSE file — reuse legally ambiguous
- Bloated: 540 lines of prose-scaffolding with no tooling
- No citation-style enforcement (APA/Chicago/IEEE/MLA)
- No source-quality heuristics (primary vs secondary, peer-review, recency)
- Fictional example citations in the skill itself (models the hallucination
  failure mode a citation-focused skill should prevent)
- No citation-verification step
- Overlaps with =humanizer= at polish with no composition guidance

*Patterns worth lifting clean-room (from their better parts):*
- Folder convention =~/writing/<article-name>/= with =outline.md=,
  =research.md=, versioned drafts, =sources/=
- Section-by-section feedback loop (outline validated → per-section
  research validated → per-section draft validated)
- Hook alternatives pattern (generate three hook variants with rationale)

*Additions for the clean-room version (v1):*
- Citation-style selection (APA / Chicago / MLA / IEEE / custom) with
  style-specific examples and a pick-one step up front
- Source-quality heuristics: primary > secondary; peer-reviewed; recency
  thresholds by domain; publisher reputation; funding transparency
- Citation-verification discipline: fetch real sources, never fabricate,
  mark unverifiable claims with =[citation needed]= rather than inventing
- Composition hand-off to =/humanizer= at the polish stage
- Classification awareness: if the working directory or context signals
  defense / regulated territory, flag any sentence that might touch CUI
  or classified material before emission

*Target:* ~150-200 lines, clean-room per blanket policy.

*When to build:* wait for a real research-writing task to validate the
design against actual document patterns. Building preemptively risks
tuning for my guess at Craig's workflow rather than his real one.
Triggers that would prompt "let's build it now":
- Starting a white paper / proposal that needs citation discipline
- Writing a technical blog post with external references
- A pattern of hitting the same research-writing friction 3+ times

Upstream reference (do not vendor): ComposioHQ/awesome-claude-skills
=content-research-writer/SKILL.md=.

** TODO [#C] Try Skill Seekers on a real DeepSat docs-briefing need
SCHEDULED: <2026-05-15 Fri>

=Skill Seekers= ([[https://github.com/yusufkaraaslan/Skill_Seekers]]) is a Python
CLI + MCP server that ingests 18 source types (docs sites, PDFs, GitHub
repos, YouTube videos, Confluence, Notion, OpenAPI specs, etc.) and
exports to 20+ AI targets including Claude skills. MIT licensed, 12.9k
stars, active as of 2026-04-12.

*Evaluated: 2026-04-19 — not adopted for rulesets.* Generates
*reference-style* skills (encyclopedic dumps of scraped source material),
not *operational* skills (opinionated how-we-do-things content). Doesn't
fit the rulesets curation pattern.

*Next-trigger experiment (this TODO):* the next time a DeepSat task needs
Claude briefed deeply on a specific library, API, or docs site — try:
#+begin_src bash
pip install skill-seekers
skill-seekers create <url> --target claude
#+end_src
Measure output quality vs hand-curated briefing. If usable, consider
installing as a persistent tool. If output is bloated / under-structured,
discard and stick with hand briefing.

*Candidate first experiments (pick one from an actual need, don't invent):*
- A Django ORM reference skill scoped to the version DeepSat pins
- An OpenAPI-to-skill conversion for a partner-vendor API
- A React hooks reference skill for the frontend team's current patterns
- A specific AWS service's docs (e.g. GovCloud-flavored)

*Patterns worth borrowing into rulesets even without adopting the tool:*
- Enhancement-via-agent pipeline (scrape raw → LLM pass → structured
  SKILL.md). Applicable if we ever build internal-docs-to-skill tooling.
- Multi-target export abstraction (one knowledge extraction → many output
  formats). Clean design for any future multi-AI-tool workflow.

*Concerns to verify on actual use:*
- =LICENSE= has an unfilled =[Your Name/Username]= placeholder (MIT is
  unambiguous, but sloppy for a 12k-star project)
- Default branch is =development=, not =main= — pin with care
- Heavy commercialization signals (website at skillseekersweb.com,
  Trendshift promo, branded badges) — license might shift later; watch
- Companion =skill-seekers-configs= community repo has only 8 stars
  despite main's 12.9k — ecosystem thinner than headline adoption

** TODO [#C] Revisit =c4-*= rename if a second notation skill ships

Current naming keeps =c4-analyze= and =c4-diagram= as-is (framework prefix
encodes the notation; "C4" is a discoverable brand). Suite membership is
surfaced via the description footer, not the name.

If a second notation-specific skill ever lands (=uml-*=, =erd-*=, =arc42-*=),
the compound pattern =arch-analyze-<notation>= / =arch-diagram-<notation>=
starts paying off: alphabetical clustering under 'a' amortizes across three+
skills, and the hierarchy becomes regular. At that point, rename all
notation skills together in one pass.

Trigger: adding skill #2 in the notation family. Don't pre-rename.

Candidate future notation skills (not yet in scope — noted for when a
real need arrives, not pre-emptively):

- *UML* (Unified Modeling Language): OO design notation, 14 diagram types
  in practice dominated by class / sequence / state / component. Common
  in DoD / safety-critical / enterprise-architecture contexts. Tooling:
  PlantUML (text-to-diagram), Mermaid UML, draw.io. Would likely split
  into =uml-class=, =uml-sequence=, =uml-state= rather than one monolith
  — different audiences, different inputs.
- *ERD* (Entity-Relationship Diagram): database schema modeling —
  entities, attributes, cardinality. Crow's Foot notation dominates
  practice; Chen is academic; IDEF1X is DoD-standard. Tooling:
  dbdiagram.io, Mermaid ERD, PlantUML, ERAlchemy (code-to-ERD for SQL).
  Natural fit as =erd-analyze= (extract from schema/migrations) and
  =erd-diagram= (generate from prose/model definitions).
- *arc42*: already partially covered by =arch-document= (which emits
  arc42-structured docs). A standalone =arc42-*= skill would be
  redundant unless the arc42-specific visualizations need separation.

Each answers a different question:

- C4 → "What systems exist and how do they talk, at what zoom?"
- UML class/sequence → "What does the code look like / what happens when X runs?"
- ERD → "What's the database shape?"
- arc42 → "What's the full architecture document?"

Deferred pending an actual need that's blocked on not having one of these.

*** DoD-specific notations (DeepSat context)

Defense-contractor work uses a narrower, different notation set than
commercial software. Document the trigger conditions and starting point
so a future decision to build doesn't have to re-derive the landscape.

**** SysML (Systems Modeling Language)

UML 2 profile, dominant in DoD systems engineering. Six diagrams account
for ~all practical use:

- *Block Definition Diagram (BDD)* — structural; like UML class but for
  system blocks (components, subsystems, hardware).
- *Internal Block Diagram (IBD)* — parts within a block and how they
  connect (flow ports, interfaces).
- *Requirement diagram* — unique to SysML; traces requirements to
  satisfying blocks. Essential in regulated environments.
- *Activity diagram* — behavioral flow.
- *State machine* — same shape as UML.
- *Sequence diagram* — same shape as UML.

SysML v1.x is in the field; v2 is emerging but not yet adopted at scale
(as of 2026-04). Tooling dominated by Cameo Systems Modeler / MagicDraw
and Enterprise Architect. Text-based option: PlantUML + =plantuml-sysml=
(git-friendly, growing niche).

*Candidate skills*: =sysml-bdd=, =sysml-ibd=, =sysml-requirement=,
=sysml-sequence=. Three or more in this cluster triggers the
=arch-*-<notation>= rename discussion from the parent entry.

**** DoDAF / UAF (architecture frameworks)

Not notations themselves — frameworks that specify *which* viewpoints a
program must deliver. Viewpoints are rendered using UML/SysML diagrams.

- *DoDAF (DoD Architecture Framework)* — legacy but still
  contract-required on many programs.
- *UAF (Unified Architecture Framework)* — DoDAF/MODAF successor,
  SysML-based. Gaining adoption on newer contracts.

Common required viewpoints (formal CDRL deliverables or PDR/CDR
review packages):

- *OV-1* — High-Level Operational Concept Graphic. The "cartoon" showing
  the system in operational context with icons, arrows, surrounding
  actors/environment. *Universally asked for — informal or formal.*
  Starting point for any DoD diagram skill.
- *OV-2* — Operational resource flows (nodes and flows).
- *OV-5a/b* — Operational activities.
- *SV-1* — Systems interfaces. Maps closely to C4 Container.
- *SV-2* — Systems resource flows.
- *SV-4* — Systems functionality.
- *SV-10b* — Systems state transitions.

*Informal ask ("send me an architecture diagram") → OV-1 + SV-1 satisfies
90% of the time.* Formal CDRL asks specify the viewpoint set contractually.

*C4 gap*: C4 is rare in DoD. C4 System Context ≈ OV-1 in intent but not
in visual convention. C4 Container ≈ SV-1. Expect a mapping step or
reviewer pushback if delivering C4-shaped artifacts to a DoD audience.

*Candidate skills*: =dodaf-ov1=, =dodaf-sv1= first (highest-value);
=uaf-viewpoint= if newer contracts require UAF.

**** IDEF1X (data modeling)

FIPS 184 — federal standard for data modeling. Used in classified DoD
data systems, intelligence databases, and anywhere the government
specifies the data model. Same shape language as Crow's Foot but with
different adornments and notation conventions.

*Rule of thumb*: classified DoD data work → IDEF1X; unclassified
contractor work → Crow's Foot unless the contract specifies otherwise.

*Candidate skills*: =idef1x-diagram= / =idef1x-analyze= (parallel to a
future =erd-diagram= / =erd-analyze= pair).

**** Tooling baseline

- *Cameo Systems Modeler / MagicDraw* (Dassault) — commercial SysML
  dominant in DoD programs.
- *Enterprise Architect (Sparx)* — widely used for UML + SysML + DoDAF.
- *Rhapsody (IBM)* — SysML with code generation; strong in avionics /
  embedded (FACE, ARINC).
- *Papyrus (Eclipse)* — open source SysML; free but clunkier.
- *PlantUML + plantuml-sysml* — text-based, version-controllable. Fits a
  git-centric workflow better than any GUI tool.

**** Highest-value starting point

If DeepSat contracts regularly require architecture deliverables, the
highest-ROI first skill is =dodaf-ov1= (or whatever naming convention
the rename discussion lands on). OV-1 is the universal currency in
briefings, proposals, and reviews; it's the one artifact that shows up
in every program regardless of contract specifics.

Trigger for building: an actual DoD deliverable that's blocked on not
having a skill to generate or check OV-1-shaped artifacts. Don't build
speculatively — defense-specific notations are narrow enough that each
skill should be driven by a concrete contract need, not aspiration.

** TODO [#B] Add =make remove= for interactive ruleset removal via fzf

Add a Makefile target that lists every currently-installed ruleset entry
and lets me pick one or more to remove via fzf. Granular alternative to
=make uninstall= (removes everything) and =make uninstall-hooks= (removes
only hooks).

*** Why this matters

Tearing down a single skill, rule, hook, or config file currently means
either running =make uninstall= and re-installing what I want to keep,
or =rm=ing the symlink directly and remembering the exact path. Both are
friction. An interactive picker lets me filter, multi-select with Tab,
and confirm with Enter — the typical fzf flow. Costs about 3-5 seconds
per teardown instead of 15+ seconds of "what's the exact name?".

*** Design

The recipe builds a tab-separated list of every currently-installed item,
categorized by type, and pipes it to =fzf --multi=. The user filters,
marks with Tab, and confirms with Enter. The recipe parses the selections
and =rm=s the matching symlinks.

#+begin_example
  skill     debug
  rule      commits.md
  hook      destructive-bash-confirm.py
  config    settings.json
  commands  commands
  bridge    claude-rules
#+end_example

Each line is =<kind>\t<name>=. The recipe maps =<kind>= to the right path:

- =skill=    → =$(SKILLS_DIR)/<name>=
- =rule=     → =$(RULES_DIR)/<name>=
- =hook=     → =$(HOOKS_DIR)/<name>=
- =config=   → =$(CLAUDE_DIR)/<name>=
- =commands= → =$(CLAUDE_DIR)/commands=
- =bridge=   → =$(SKILLS_DIR)/claude-rules=

Source files in =rulesets/= stay untouched. =make install= re-creates the
removed links if needed (the install loop is idempotent).

*** Edge cases

- Esc instead of Enter → empty selection → clean exit, no removal.
- Filter to nothing then Enter → same as Esc.
- Selected item already gone → =rm= fails visibly, processing continues
  on the rest.
- =fzf= not installed → fail fast with a clear error (matches the pattern
  used by =install-lang=).

*** Possible extensions

- Parallel =make pick-install= target that lists not-yet-installed items
  and installs the chosen ones. Symmetric UX, same fzf flow.
- Confirmation prompt when more than N items selected (defense against
  accidental select-all).
- =--source= flag that also runs =git rm= against the rulesets source for
  the selected item. Probably bad idea — too easy to lose work.
- The =bridge → $(SKILLS_DIR)/claude-rules= entry above is stale — the
  bridge symlink got removed in a later commit. Drop that bullet when the
  recipe lands.

** TODO [#B] Document the =mcp/= install pipeline in =mcp/README.org=

=mcp/= has =install.py=, =servers.json=, =secrets.env.gpg=, =gcp-oauth.keys.json= (gitignored, regenerated at install). No README. Coming back to this in three months I'll re-discover how the bundle is structured, what =install.py= does, and how to rotate tokens. Saving that re-discovery is the whole point.

*** What to cover

- Layout: what each file is, which are tracked vs gitignored.
- Secrets bundle shape: how vars are listed in =secrets.env=, the symmetric-encryption pattern (=gpg -c --cipher-algo AES256=), the base64-bundled OAuth artifacts (=GCP_OAUTH_KEYS_JSON_B64=, =GOOGLE_DOCS_PERSONAL_TOKEN_B64=, =GOOGLE_DOCS_WORK_TOKEN_B64=).
- Install flow: =make install-mcp= → =install.py= decrypts, writes the keys file and Google Docs token caches at mode 600, expands =${VAR}= in =servers.json=, calls =claude mcp add --scope user= for unregistered servers. Idempotent.
- Token rotation: when a refresh token gets revoked, the recovery flow (re-auth on one machine, re-bundle, recommit).
- Adding a new server: edit =servers.json=, add any new =${VAR}= placeholders to the bundle, re-encrypt.
- The OAuth dance for HTTP-transport servers (linear, notion) versus stdio (google-docs-*) — different paths, different gotchas.

** TODO [#C] Add =make uninstall-mcp= + =mcp/install.py --check= for symmetry

Currently the MCP install pipeline only flows one direction. No way to remove rulesets-managed MCP servers in one command. No way to ask "what's the drift between =servers.json= and =claude mcp list=" without eyeballing.

*** =make uninstall-mcp=

Iterate over =servers.json=, run =claude mcp remove <name> -s user= for each. Ignore "not registered" errors. Idempotent.

*** =mcp/install.py --check=

Dry-run mode. Decrypt secrets, but instead of registering, print the drift report:

- Servers in =servers.json= not in =claude mcp list= → =MISSING=
- Servers in =claude mcp list= not in =servers.json= → =EXTRA=
- Servers in both → =ok=

Useful for diagnosing connection failures and for the eventual =make doctor= integration.

** TODO [#C] Update =README.org= with MCP install pipeline section

=README.org= covers global install, per-project language bundles, and design principles, but doesn't mention =make install-mcp= or the =mcp/= directory. Add a short section after "Per-project language bundles" describing the user-scope MCP install pattern (decrypt → expand → register) and pointing at the eventual =mcp/README.org=.

** TODO [#C] Token-rotation helper for =@a-bonus/google-docs-mcp= OAuth refresh

When a Google refresh token gets revoked (re-grant scopes, removed Connected App, account password reset), recovery is currently manual: run =npx -y @a-bonus/google-docs-mcp= with the right env, follow the URL in a browser, kill the process, base64-encode the new =token.json=, decrypt =secrets.env.gpg=, replace the var, re-encrypt. A small =mcp/refresh-google-docs-token.sh <profile>= would chain that into one command.

*** Sketch

#+begin_src bash
# usage: mcp/refresh-google-docs-token.sh personal
profile="$1"
gpg -d ... | grep -v "GOOGLE_DOCS_${profile^^}_TOKEN_B64" > /tmp/secrets.env.tmp
GOOGLE_MCP_PROFILE="$profile" npx -y @a-bonus/google-docs-mcp &
xdg-open <captured-url>
# wait for ~/.config/google-docs-mcp/$profile/token.json to land
kill %1
echo "GOOGLE_DOCS_${profile^^}_TOKEN_B64=$(base64 -w0 ~/.config/google-docs-mcp/$profile/token.json)" >> /tmp/secrets.env.tmp
gpg -c --cipher-algo AES256 -o mcp/secrets.env.gpg.new /tmp/secrets.env.tmp
mv mcp/secrets.env.gpg.new mcp/secrets.env.gpg
rm /tmp/secrets.env.tmp
#+end_src

The flow tonight worked but took a handful of manual steps. One script collapses it.

** TODO [#C] Decide on category-3 rule copies in the deepsat tree

While symlinking personal-project =.claude/rules/= mirrors to the rulesets canonical on 2026-05-07, two locations didn't fit the "personal mirror → symlink" pattern and were left untouched pending judgment:

- =~/projects/work/deepsat/code/coding-rulesets/claude-rules/{testing,verification}.md= — looks like a vendored team-shared copy.
- =~/projects/work/deepsat/code/orchestration_dashboard_mvp/.claude/rules/{testing,verification}.md= — could be project-specific overrides.

For each: read the file, diff against the rulesets canonical, decide whether it's an intentional diverge (leave alone), stale (sync content), or should canonicalize (replace with symlink and accept the cross-repo dependency). The orchestration_dashboard_mvp pair is the project where Vrezh's PR review surfaced this whole thread, so any decision there has team-visibility implications.

** TODO [#C] Audit language-specific rule files for cross-project duplication

The four canonical rules (=commits=, =testing=, =verification=, =subagents=) are now symlinked across the five personal-project mirrors as of 2026-05-07. But several language-specific rule files exist in multiple project mirrors and may be duplicated or drifted:

- =python-testing.md= in =~/projects/work/.claude/rules/=
- =typescript-testing.md= in =~/projects/work/deepsat/code/.claude/rules/=
- =elisp-testing.md= and =elisp.md= in =~/.emacs.d/=, =~/code/gloss/=, =~/code/chime/=

The Elisp pair is the most suspicious — three repos using essentially the same rules. Audit: diff these across the projects, check for drift, then decide whether to canonicalize them under =~/code/rulesets/claude-rules/languages/<lang>/= and symlink, or leave them as project-local.

** TODO [#B] Fold =claude-templates= into rulesets

Two repos, one source of truth. =~/projects/claude-templates/= is the canonical =.ai/= template that gets rsync'd into every project at session start. Keeping it standalone means a second =git pull= in startup Phase A.0, a second remote to push to at wrap-up, and a split history any time a change touches both. Folding it into =rulesets/claude-templates/= gives one repo to clone on a fresh machine and one place to edit templates.

*** Open design choices

- *History.* =git subtree add --prefix=claude-templates ~/projects/claude-templates main= preserves the 84-commit history under the new prefix. Plain content copy (=cp -a= + =git add=) is simpler but loses history. Either is fine since the standalone repo stays archived on =cjennings.net=.
- *Layout.* =rulesets/claude-templates/= mirrors the old repo name and sits next to =claude-rules/= cleanly. Alternative: absorb =.ai/= directly under a different name (=rulesets/.ai-template/= or similar). First option is clearer.
- *bin/ai.* The standalone Makefile symlinks =$HOME/.local/bin/ai → bin/ai=. After the move, fold that into rulesets' Makefile as another install target.

*** Mechanical steps

1. Subtree-merge or copy =~/projects/claude-templates/= into =rulesets/claude-templates/=.
2. Update 3 references in rulesets:
   - =.ai/protocols.org= line 163 — pointer in the "Let's run/do the X workflow" section.
   - =.ai/workflows/cross-agent-comms.org= line 8 — promotion-target path.
   - =.ai/workflows/startup.org= lines 22, 96-98 — Phase A.0 pull + Phase A rsync sources.
3. Update Phase A.0 of =startup.org= to pull rulesets instead of claude-templates. Inside rulesets sessions, the existing project-repo pull already covers it. Outside rulesets (every other project's session), Phase A.0 needs an explicit =git pull= on =~/code/rulesets/= before the rsync — otherwise the templates will be stale.
4. Replace =~/projects/claude-templates/= with a symlink to =~/code/rulesets/claude-templates/= for transition continuity.
5. After every active project has had one session start (and rsync'd the new =startup.org=), drop the symlink and archive =cjennings.net:git/claude-templates.git=.

*** Bootstrap gap

Every project on the machine has a =.ai/workflows/startup.org= that rsyncs from =~/projects/claude-templates/=. Until each project's startup.org gets refreshed (which happens via the rsync itself), the old path needs to keep resolving. The symlink at step 4 is the bridge: old paths resolve into the new location, the rsync delivers the updated startup.org, next session uses the new path directly.

** TODO [#B] Add =make audit= — drift detector across all =.ai/=-using projects

Companion to =make doctor= (single-machine scope, checks =~/.claude/=). =audit= is cross-project scope: walks every directory on the machine that has a =.ai/=, diffs the synced template files against the canonical source, and reports drift. Catches stale projects without forcing a session start in each one.

*** Open design choices

- *Scope.* Template-sync drift is the useful flavor: for each project, diff =.ai/protocols.org=, =.ai/workflows/=, =.ai/scripts/= against the canonical source and report =ok= / =behind= / =diverged=. Other interpretations (per-project health check, alias for =doctor=) add less value.
- *Source path.* Today: =~/projects/claude-templates/.ai/=. After the "Fold claude-templates into rulesets" task lands: =~/code/rulesets/claude-templates/.ai/=. Build =audit= against whichever path is canonical when the work happens.
- *Project discovery.* Walk =~/code/=, =~/projects/=, =~/.emacs.d/= up to depth 3 for any directory containing =.ai/=. Skip the canonical source itself.
- *Output and exit code.* Per-project line: =ok=, =behind <N files>= (canonical newer, rsync would update), =diverged= (project has local edits an rsync would overwrite). Exit 0 on all-=ok=, 1 on any =diverged=.

*** Why not extend =make doctor= instead

=doctor= currently has a clean meaning: "is this machine's =~/.claude/= consistent with rulesets?" Mixing in cross-project =.ai/= drift muddies the exit code. Keep them separate; a future =make all-checks= can wrap both.

** TODO [#C] Refactor =daily-prep.org= to delegate to =triage-intake.org= for the triage section

=daily-prep.org= still does its own inline triage (Gmail × 3 accounts, Slack, Linear, GHE PRs, calendars) as part of the full prep flow. Now that =triage-intake.org= exists as a standalone scan over the same source set, daily-prep could call it and consume its synthesis instead of duplicating the source-scan logic — DRYs up a 57k-line workflow and keeps both flows in sync when sources change.

Scope:
- Identify the sections in =daily-prep.org= that do the inline triage (the email / Slack / Linear / PR / calendar fan-out, plus the "Sources checked: ..." footer at the top of each generated prep doc).
- Replace those sections with "run =triage-intake.org=" and adapt the downstream sections (Heads-up, Day's Priorities, Carry-forwards) to read triage-intake's synthesis output rather than the inline scan results.
- Verify the generated prep doc still has the same shape (Heads-up + Day's Priorities + Carry-forwards + Sources checked).

Origin: came up while authoring =triage-intake.org= on 2026-05-11.

* Rulesets Resolved
** DONE [#A] Add =make doctor= — verify ~/.claude/ matches repo + settings.json :feature:

A drift detector that scans =~/.claude/= and reports anything inconsistent with what the repo expects. Single-command answer to "is my machine consistent with rulesets?"

*** Why this matters

A 2026-05-06 sweep found =~/.claude/hooks/= didn't exist on this machine even though =settings.json= referenced =~/.claude/hooks/precompact-priorities.sh= as a PreCompact hook. Compaction would have silently failed to invoke the hook. The fix was =make install-hooks=, but the breakage was invisible until I happened to grep for it. =make doctor= run regularly (or even as part of session start) would catch this kind of drift in seconds instead of after the fact.

*** Checks

- Every entry in =settings.json= ="hooks"= block points at a file that exists.
- Every entry in =enabledPlugins= has a matching install under =~/.claude/plugins/data/=.
- Every skill in =$(SKILLS)= has a working symlink at =~/.claude/skills/<name>=.
- Every rule in =$(RULES)= has a working symlink at =~/.claude/rules/<name>=.
- Every default hook has a symlink at =~/.claude/hooks/<name>= (warn-only — opt-out is legitimate).
- =settings.json= and =.mcp.json= symlinks resolve to the rulesets versions.
- =mcp/install.py= state matches =claude mcp list= (every server in =servers.json= is registered).
- No dangling symlinks anywhere under =~/.claude/=.

*** Output

One line per check: =ok= / =WARN= / =FAIL=. Final summary: =N ok, M warnings, K failures=. Exit non-zero on any failure so it can ride a pre-flight check.

** DONE [#A] Build =voice= skill — combine =humanizer= with universal + personal style passes :feature:

Combine =humanizer= with universal good-writing passes (Strunk & White, Orwell, Plain English) and the personal-style passes from =commits.md=. Two modes — =general= for arbitrary writing, =personal= for commits/PRs/comments — share a foundation and diverge on register.

Built and shipped 2026-05-07: =voice/SKILL.md= with 39 numbered patterns walked sequentially. Patterns 1-25 carried over from humanizer, 26-31 are universal good-writing additions, 32-39 are personal-only. Migrated three callers (=commits.md=, =respond-to-cj-comments.md=, =start-work.md=). Removed the standalone =humanizer= skill since voice supersedes it.

*** Why this matters

Three transformations want to run together for personal-mode artifacts (commits, PR titles + bodies, PR comments) but lived in three places: =humanizer= as a skill, S&W-style universal rules nowhere (applied ad-hoc), and the personal-style passes as prose steps in =commits.md= that got re-applied by hand each time. Costs: (1) the "I forgot pass (e)" failure mode — skipping a pass without flagging is a defect but happens in practice. (2) No single-call invocation of the full transform. (3) General-mode writing (research notes, philosophy, history) got only humanizer with no universal-prose pass at all. Combining brings them under one skill with one invocation.

*** Design

Two modes:

- *general* (default) — for arbitrary writing not bound for commit/PR/comment publishing (research notes, philosophy/history essays, emails, README prose). Runs:
  - humanizer (current behavior — strip AI-generated-writing fingerprints)
  - tier-1 universal passes (canonical good-writing rules)
  - the 2 personal-style passes that have no register conflict (jargon-fragment rewrite, noun-ified verbs)

- *personal* — for commits, PR titles + bodies, PR comments. Runs general PLUS:
  - 8 personal-only passes (first-person rewrite, semicolons, contractions, sentence-split, felt-experience, sentence fragments, terse cut, public-artifact scope check)

The 8 personal-only passes are explicitly *not* in general mode. They conflict with academic / literary / philosophical register. Forcing first-person on a Foucault essay or stripping felt-experience from a journal entry would damage the writing.

*** Tier 1 universals (v1)

From Strunk & White, Orwell's "Politics and the English Language", Plain English Campaign, and Garner's Modern English Usage. Each is a detection-pattern + rewrite-rule pair, mechanical enough to apply consistently across runs.

- *Omit needless words* — curated phrase list (=the fact that= → =that=/=because=, =in order to= → =to=, =at this point in time= → =now=, =due to the fact that= → =because=, =for the purpose of= → =to=, =in spite of= → =despite=, etc.)
- *Long word → short word* — Plain English wordlist (~150 entries: =utilize=→=use=, =commence=→=start=, =terminate=→=end=, =facilitate=→=help=, =demonstrate=→=show=, =sufficient=→=enough=, =prior to=→=before=, =subsequent to=→=after=, =in the event that=→=if=, =a great deal of=→=much=)
- *Active over passive voice* — detect "to be + past-participle" patterns. Suggestion-only in v1 (auto-rewrite is risky in technical contexts where passive is appropriate); graduate to auto-rewrite for unambiguous cases in v2.
- *Comma splices* — detect independent clauses joined only by comma; rewrite to period or semicolon-then-period.
- *Cliché flag* — small curated list (=at the end of the day=, =moving forward=, =going forward=, =at this juncture=, =circle back=, =low-hanging fruit=, =deep dive=, =leverage= as verb).

*** Tier 2 universals (v2)

- *Positive over negative form* (S&W) — =not unlike= → =like=, =do not fail to= → =remember to=, =did not pay any attention= → =ignored=
- *Garner-style word-pair corrections* — comprise/compose, less/fewer, that/which (restrictive vs nonrestrictive), affect/effect, principal/principle
- *Parallelism in lists* — detect mismatched grammar in bullet items
- *Tense consistency* — flag mid-paragraph tense shifts
- *Acronym definition on first use* — detect uppercase tokens used before being expanded

*** Tier 3 (v3, may not land)

- *Concrete-over-abstract* preference
- *Emphatic word at sentence end* (S&W rule 18)
- *Vary sentence length / rhythm*
- *Reading-grade-level scoring* (Hemingway-style)

*** Personal-style pass placement

| # | Pass | Mode | Why |
|---|------|------|-----|
| 1 | First-person voice rewrite | personal only | Forces "I" voice; wrong for academic prose where third-person and "we" are conventional |
| 2 | Jargon-fragment → complete sentence | both | Universal clarity, no genre conflict |
| 3 | Semicolon → period/comma | personal only | Semicolons are conventional in long-form / academic prose |
| 4 | Contractions ("it's", "don't") | personal only | Academic and formal writing typically avoids contractions |
| 5 | Sentence split on conjunctions | personal only | Foucault, Hegel, Adorno deliberately use long compound sentences |
| 6 | Felt-experience narration ("I'll feel this every time") | personal only | Personal essays *use* felt-experience as content |
| 7 | Noun-ified verbs ("the ask", "a learn", "the spend") | both | Targets corporate-speak with curated wordlist; doesn't catch philosophical nominalizations like "the becoming" |
| 8 | Sentence fragments → complete (in prose) | personal only | Fragments are valid stylistic devices in literary prose |
| 9 | Terse cut (rhetorical padding: "worth noting", "it's important to understand") | personal only | Tier 1 omit-needless-words covers the worst offenders universally; aggressive cut conflicts with academic register |
| 10 | Public-artifact scope check (local paths, private repos, personal tooling) | personal only — *flag-only*, no auto-rewrite | Operational/safety check, not stylistic; auto-masking risks silently editing meaningful text |

*** Inclusive-language pass — explicitly excluded

Considered and rejected. Conflicts with planned writing on philosophy/history topics (Foucault on sexuality and gender, history of slavery in New Orleans). Wordlist substitutions would override deliberate vocabulary choices in those genres.

*** V1 scope

- [ ] Skill at =~/code/rulesets/voice/= with =SKILL.md=
- [ ] Frontmatter with positive triggers (commit, PR, comment, "humanize", "voice pass") and negative triggers (code, structured data, plain bullet lists)
- [X] Mode invocation: default = =general= when invoked bare; =personal= invoked explicitly by publish-context callers
- [X] humanizer content migrated from =humanizer/= → =voice/=
- [X] Tier 1 universal passes implemented (5 patterns: #26-30, plus #31 noun-ified verbs as a universal personal addition)
- [X] 2 personal passes that run in both modes (#30 jargon-fragment, #31 noun-ified verbs)
- [X] 8 personal passes that run in personal mode only (#32 first-person, #33 semicolons, #34 contractions, #35 sentence-split, #36 felt-experience, #37 fragments, #38 terse cut, #39 scope check)
- [X] Each pass = detection-pattern + rewrite-rule pair (#39 is detection + flag-only)
- [X] Total v1 pattern count: 31 in general mode (humanizer's 25 + 4 tier-1 + 2 universal personal); +8 personal-only = 39 in personal mode
- [X] Update =commits.md= to invoke =/voice personal= instead of "run =humanizer= and apply five passes manually"
- [X] Remove the existing =humanizer/= skill (no callers outside this repo, all migrated)
- [X] =make doctor= still passes
- [X] =make lint= clean

*** v2 (deferred)

- [ ] Tier 2 universals (positive form, word-pair corrections, parallelism, tense consistency, acronym definition)
- [ ] Per-pass severity flags for Tier 1 active-voice (suggestion-only when actor is implicit; auto-rewrite when actor is named)
- [ ] Reporting mode: list which passes fired and which were no-ops

*** v3 (aspirational, may not land)

- [ ] Tier 3 (concrete-over-abstract, emphatic-word position, sentence-length variation, reading-grade scoring)
- [ ] Progressive disclosure split: =voice/SKILL.md= orchestrator + =voice/passes/<pass-name>.md= per pass with worked examples

*** Migration (resolved)

Decision: deleted =humanizer/= entirely. Three callers (=commits.md=, =respond-to-cj-comments.md=, =start-work.md=) all updated to invoke =/voice= directly. No alias needed since nothing outside the repo invoked humanizer.

*** Naming alternatives considered

- =voice= — chosen. Captures both modes; broad enough.
- =polish= — descriptive of multi-pass nature; less prescriptive about whose voice.
- =house-style= — signals "this is the house style"; appropriate for personal repo.
- =commit-voice= — too narrow (passes apply to research notes, emails, etc. in general mode).
- =humanize= (extending current) — undersells the universal + personal additions.

*** Open questions before implementation

Resolved during implementation:
- Default mode when =/voice= is invoked bare: =general=. Personal-context callers (=commits.md= publish flow, =respond-to-cj-comments.md=) invoke =/voice personal= explicitly. Avoids accidentally first-person-ifying research notes.
- Reporting: skill prints "Summary of changes" listing which patterns fired (audit value).
- Public-artifact scope check (#39): flag-only, user resolves manually. Blocking would frustrate on legitimate path mentions.
- Tier 1 active-voice detection: suggestion-only in v1. Auto-rewrite for unambiguous cases deferred to v2.

** DONE [#B] Add =--archive-done= mode to =.ai/scripts/todo-cleanup.el= :feature:

Opt-in mode that moves every level-2 subtree whose TODO state is DONE or CANCELLED out of the "Open Work" section and into the "Resolved" section of the same org file, subtree intact.

- *Section matching.* Key on a top-level heading containing "Open Work" and one containing "Resolved" — that pairing is the only naming consistent across projects (=Work Open Work= / =Work Resolved= here; bare =Open Work= / =Resolved= elsewhere). Require exactly one match for each; otherwise skip with a clear message, no crash.
- *Modes.* =--check= previews and writes nothing, same as the existing hygiene pass. Idempotent. Not run by default in the wrap-up flow — archiving is consequential, so it stays opt-in: =emacs --batch -q -l todo-cleanup.el --archive-done FILE=.
- *Edge cases.* Source or target section missing; subtree at EOF; nested DONE subtree under an open parent stays put (only level-2 entries move); nothing to move → clean no-op.
- *Tests.* TDD with ERT — the project's first elisp tests. Fixtures (synthetic) under =.ai/scripts/tests/=; run via =make test= (rulesets) or =make test-scripts= (claude-templates), which run pytest + every =tests/test-*.el= ERT suite. Cases: one DONE level-2 moves; multiple; CANCELLED also moves; structural (no-state) headings don't move; nested DONE under an open parent stays; level-2 DONE with open level-3 children moves intact; subtree at EOF; missing source/target section; ambiguous "Resolved"; lowercase headings; nothing-to-do; idempotency; =--check= preview + its idempotency; realistic-sample integration.

Origin: came up while scrubbing a project's todo.org on 2026-05-11 — moving a big completed PROJECT subtree (plus a few smaller ones) into the Resolved section by hand was the cue to build a reusable tool.

Built and shipped 2026-05-11: =--archive-done= added to =.ai/scripts/todo-cleanup.el= test-first; 13-test ERT suite (=tests/test-todo-cleanup.el=) + realistic synthetic fixture (=tests/fixtures/todo-sample.org=), wired into =make test= / =make test-scripts= alongside pytest. The CLI dispatch moved into =tc-main= behind a guard so the suite can =require= the file without firing it. Section matching is case-insensitive and tolerates the =<Project> Open Work= / =<Project> Resolved= naming variants. Opt-in only — not wired into the wrap-up flow. Source of truth is =~/projects/claude-templates/=; rsync'd into this repo.