#+TITLE: Rulesets — Open Work
#+AUTHOR: Craig Jennings
#+DATE: 2026-04-19

Tracking TODOs for the rulesets repo that span more than one commit.
Project-scoped (not the global =~/sync/org/roam/inbox.org= list).

* TODO [#A] Build =/update-skills= skill for keeping forks in sync with upstream

The rulesets repo has a growing set of forks (=arch-decide= from
wshobson/agents, =playwright-js= from lackeyjb/playwright-skill, =playwright-py=
from anthropics/skills/webapp-testing). Over time, upstream releases fixes,
new templates, or scope expansions that we'd want to pull in without losing
our local modifications. A skill should handle this deliberately rather than
by manual re-cloning.

** Design decisions (agreed)

- *Upstream tracking:* per-fork manifest =.skill-upstream= (YAML or JSON):
  - =url= (GitHub URL)
  - =ref= (branch or tag)
  - =subpath= (path inside the upstream repo when it's a monorepo)
  - =last_synced_commit= (updated on successful sync)
- *Local modifications:* 3-way merge. Requires a pristine baseline snapshot of
  the upstream-at-time-of-fork. Store under =.skill-upstream/baseline/= or
  similar; committed to the rulesets repo so the merge base is reproducible.
- *Apply changes:* skill edits files directly with per-file confirmation.
- *Conflict policy:* per-hunk prompt inside the skill. When a 3-way merge
  produces a conflict, the skill walks each conflicting hunk and asks Craig:
  keep-local / take-upstream / both / skip. Editor-independent; works on
  machines where Emacs isn't available. Fallback when baseline is missing
  or corrupt (can't run 3-way merge): write =.local=, =.upstream=,
  =.baseline= files side-by-side and surface as manual review.

** V1 Scope

- [ ] Skill at =~/code/rulesets/update-skills/=
- [ ] Discovery: scan sibling skill dirs for =.skill-upstream= manifests
- [ ] Helper script (bash or python) to:
  - Clone each upstream at =ref= shallowly into =/tmp/=
  - Compare current skill state vs latest upstream vs stored baseline
  - Classify each file: =unchanged= / =upstream-only= / =local-only= / =both-changed=
  - For =both-changed=: run =git merge-file --stdout <local> <baseline> <upstream>=;
    if clean, write result directly; if conflicts, parse the conflict-marker
    output and feed each hunk into the per-hunk prompt loop
- [ ] Per-hunk prompt loop:
  - Show base / local / upstream side-by-side for each conflicting hunk
  - Ask: keep-local / take-upstream / both (concatenate) / skip (leave marker)
  - Assemble resolved hunks into the final file content
- [ ] Per-fork summary output with file-level classification table
- [ ] Per-file confirmation flow (yes / no / show-diff) BEFORE per-hunk loop
- [ ] On successful sync: update =last_synced_commit= in the manifest
- [ ] =--dry-run= to preview without writing

** V2+ (deferred)

- [ ] Track upstream *releases* (tags) not just branches, so skill can propose
  "upgrade from v1.2 to v1.3" with release notes pulled in
- [ ] Generate patch files as an alternative apply method (for users who prefer
  =git apply= / =patch= over in-place edits)
- [ ] Non-interactive mode (=--non-interactive= / CI): skip conflict resolution,
  emit side-by-side files for later manual review
- [ ] Auto-run on a schedule via Claude Code background agent
- [ ] Summary of aggregate upstream activity across all forks (which forks have
  upstream changes waiting, which don't)
- [ ] Optional editor integration: on machines with Emacs, offer
  =M-x smerge-ediff= as an alternate path for users who prefer ediff over
  per-hunk prompts

** Initial forks to enumerate (for manifest bootstrap)

- [ ] =arch-decide= → =wshobson/agents= :: =plugins/documentation-generation/skills/architecture-decision-records= :: MIT
- [ ] =playwright-js= → =lackeyjb/playwright-skill= :: =skills/playwright-skill= :: MIT
- [ ] =playwright-py= → =anthropics/skills= :: =skills/webapp-testing= :: Apache-2.0

** Open questions

- [ ] What happens when upstream *renames* a file we fork? Skill would see
  "file gone from upstream, still present locally" — drop, keep, or prompt?
- [ ] What happens when upstream splits into multiple forks (e.g., a plugin
  reshuffles its structure)? Probably out of scope for v1; manual migration.
- [ ] Rate-limit / offline mode: if GitHub is unreachable, should skill fail
  or degrade gracefully? Likely degrade; print warning per fork.

* TODO [#B] Build /research-writer — clean-room synthesis for research-backed long-form
SCHEDULED: <2026-05-15 Fri>

Gap in current rulesets: between =brainstorm= (idea refinement → design doc)
and =arch-document= (arc42 technical docs), there's no skill for
research-backed long-form prose — blog posts, essays, white papers,
proposals with data backing, article-length content with citations.

Craig writes documents across many contexts (defense-contractor work,
personal, technical, proposals). The gap is real.

*Evaluated 2026-04-19:* ComposioHQ/awesome-claude-skills has a
=content-research-writer= skill (540 lines, 14 KB) that attempts this. *Not
adopting:*
- Parent repo has no LICENSE file — reuse legally ambiguous
- Bloated: 540 lines of prose-scaffolding with no tooling
- No citation-style enforcement (APA/Chicago/IEEE/MLA)
- No source-quality heuristics (primary vs secondary, peer-review, recency)
- Fictional example citations in the skill itself (models the hallucination
  failure mode a citation-focused skill should prevent)
- No citation-verification step
- Overlaps with =humanizer= at polish with no composition guidance

*Patterns worth lifting clean-room (from their better parts):*
- Folder convention =~/writing/<article-name>/= with =outline.md=,
  =research.md=, versioned drafts, =sources/=
- Section-by-section feedback loop (outline validated → per-section
  research validated → per-section draft validated)
- Hook alternatives pattern (generate three hook variants with rationale)

*Additions for the clean-room version (v1):*
- Citation-style selection (APA / Chicago / MLA / IEEE / custom) with
  style-specific examples and a pick-one step up front
- Source-quality heuristics: primary > secondary; peer-reviewed; recency
  thresholds by domain; publisher reputation; funding transparency
- Citation-verification discipline: fetch real sources, never fabricate,
  mark unverifiable claims with =[citation needed]= rather than inventing
- Composition hand-off to =/humanizer= at the polish stage
- Classification awareness: if the working directory or context signals
  defense / regulated territory, flag any sentence that might touch CUI
  or classified material before emission

*Target:* ~150-200 lines, clean-room per blanket policy.

*When to build:* wait for a real research-writing task to validate the
design against actual document patterns. Building preemptively risks
tuning for my guess at Craig's workflow rather than his real one.
Triggers that would prompt "let's build it now":
- Starting a white paper / proposal that needs citation discipline
- Writing a technical blog post with external references
- A pattern of hitting the same research-writing friction 3+ times

Upstream reference (do not vendor): ComposioHQ/awesome-claude-skills
=content-research-writer/SKILL.md=.

* TODO [#C] Try Skill Seekers on a real DeepSat docs-briefing need
SCHEDULED: <2026-05-15 Fri>

=Skill Seekers= ([[https://github.com/yusufkaraaslan/Skill_Seekers]]) is a Python
CLI + MCP server that ingests 18 source types (docs sites, PDFs, GitHub
repos, YouTube videos, Confluence, Notion, OpenAPI specs, etc.) and
exports to 20+ AI targets including Claude skills. MIT licensed, 12.9k
stars, active as of 2026-04-12.

*Evaluated: 2026-04-19 — not adopted for rulesets.* Generates
*reference-style* skills (encyclopedic dumps of scraped source material),
not *operational* skills (opinionated how-we-do-things content). Doesn't
fit the rulesets curation pattern.

*Next-trigger experiment (this TODO):* the next time a DeepSat task needs
Claude briefed deeply on a specific library, API, or docs site — try:
#+begin_src bash
pip install skill-seekers
skill-seekers create <url> --target claude
#+end_src
Measure output quality vs hand-curated briefing. If usable, consider
installing as a persistent tool. If output is bloated / under-structured,
discard and stick with hand briefing.

*Candidate first experiments (pick one from an actual need, don't invent):*
- A Django ORM reference skill scoped to the version DeepSat pins
- An OpenAPI-to-skill conversion for a partner-vendor API
- A React hooks reference skill for the frontend team's current patterns
- A specific AWS service's docs (e.g. GovCloud-flavored)

*Patterns worth borrowing into rulesets even without adopting the tool:*
- Enhancement-via-agent pipeline (scrape raw → LLM pass → structured
  SKILL.md). Applicable if we ever build internal-docs-to-skill tooling.
- Multi-target export abstraction (one knowledge extraction → many output
  formats). Clean design for any future multi-AI-tool workflow.

*Concerns to verify on actual use:*
- =LICENSE= has an unfilled =[Your Name/Username]= placeholder (MIT is
  unambiguous, but sloppy for a 12k-star project)
- Default branch is =development=, not =main= — pin with care
- Heavy commercialization signals (website at skillseekersweb.com,
  Trendshift promo, branded badges) — license might shift later; watch
- Companion =skill-seekers-configs= community repo has only 8 stars
  despite main's 12.9k — ecosystem thinner than headline adoption

* TODO [#C] Revisit =c4-*= rename if a second notation skill ships

Current naming keeps =c4-analyze= and =c4-diagram= as-is (framework prefix
encodes the notation; "C4" is a discoverable brand). Suite membership is
surfaced via the description footer, not the name.

If a second notation-specific skill ever lands (=uml-*=, =erd-*=, =arc42-*=),
the compound pattern =arch-analyze-<notation>= / =arch-diagram-<notation>=
starts paying off: alphabetical clustering under 'a' amortizes across three+
skills, and the hierarchy becomes regular. At that point, rename all
notation skills together in one pass.

Trigger: adding skill #2 in the notation family. Don't pre-rename.

Candidate future notation skills (not yet in scope — noted for when a
real need arrives, not pre-emptively):

- *UML* (Unified Modeling Language): OO design notation, 14 diagram types
  in practice dominated by class / sequence / state / component. Common
  in DoD / safety-critical / enterprise-architecture contexts. Tooling:
  PlantUML (text-to-diagram), Mermaid UML, draw.io. Would likely split
  into =uml-class=, =uml-sequence=, =uml-state= rather than one monolith
  — different audiences, different inputs.
- *ERD* (Entity-Relationship Diagram): database schema modeling —
  entities, attributes, cardinality. Crow's Foot notation dominates
  practice; Chen is academic; IDEF1X is DoD-standard. Tooling:
  dbdiagram.io, Mermaid ERD, PlantUML, ERAlchemy (code-to-ERD for SQL).
  Natural fit as =erd-analyze= (extract from schema/migrations) and
  =erd-diagram= (generate from prose/model definitions).
- *arc42*: already partially covered by =arch-document= (which emits
  arc42-structured docs). A standalone =arc42-*= skill would be
  redundant unless the arc42-specific visualizations need separation.

Each answers a different question:

- C4 → "What systems exist and how do they talk, at what zoom?"
- UML class/sequence → "What does the code look like / what happens when X runs?"
- ERD → "What's the database shape?"
- arc42 → "What's the full architecture document?"

Deferred pending an actual need that's blocked on not having one of these.

** DoD-specific notations (DeepSat context)

Defense-contractor work uses a narrower, different notation set than
commercial software. Document the trigger conditions and starting point
so a future decision to build doesn't have to re-derive the landscape.

*** SysML (Systems Modeling Language)

UML 2 profile, dominant in DoD systems engineering. Six diagrams account
for ~all practical use:

- *Block Definition Diagram (BDD)* — structural; like UML class but for
  system blocks (components, subsystems, hardware).
- *Internal Block Diagram (IBD)* — parts within a block and how they
  connect (flow ports, interfaces).
- *Requirement diagram* — unique to SysML; traces requirements to
  satisfying blocks. Essential in regulated environments.
- *Activity diagram* — behavioral flow.
- *State machine* — same shape as UML.
- *Sequence diagram* — same shape as UML.

SysML v1.x is in the field; v2 is emerging but not yet adopted at scale
(as of 2026-04). Tooling dominated by Cameo Systems Modeler / MagicDraw
and Enterprise Architect. Text-based option: PlantUML + =plantuml-sysml=
(git-friendly, growing niche).

*Candidate skills*: =sysml-bdd=, =sysml-ibd=, =sysml-requirement=,
=sysml-sequence=. Three or more in this cluster triggers the
=arch-*-<notation>= rename discussion from the parent entry.

*** DoDAF / UAF (architecture frameworks)

Not notations themselves — frameworks that specify *which* viewpoints a
program must deliver. Viewpoints are rendered using UML/SysML diagrams.

- *DoDAF (DoD Architecture Framework)* — legacy but still
  contract-required on many programs.
- *UAF (Unified Architecture Framework)* — DoDAF/MODAF successor,
  SysML-based. Gaining adoption on newer contracts.

Common required viewpoints (formal CDRL deliverables or PDR/CDR
review packages):

- *OV-1* — High-Level Operational Concept Graphic. The "cartoon" showing
  the system in operational context with icons, arrows, surrounding
  actors/environment. *Universally asked for — informal or formal.*
  Starting point for any DoD diagram skill.
- *OV-2* — Operational resource flows (nodes and flows).
- *OV-5a/b* — Operational activities.
- *SV-1* — Systems interfaces. Maps closely to C4 Container.
- *SV-2* — Systems resource flows.
- *SV-4* — Systems functionality.
- *SV-10b* — Systems state transitions.

*Informal ask ("send me an architecture diagram") → OV-1 + SV-1 satisfies
90% of the time.* Formal CDRL asks specify the viewpoint set contractually.

*C4 gap*: C4 is rare in DoD. C4 System Context ≈ OV-1 in intent but not
in visual convention. C4 Container ≈ SV-1. Expect a mapping step or
reviewer pushback if delivering C4-shaped artifacts to a DoD audience.

*Candidate skills*: =dodaf-ov1=, =dodaf-sv1= first (highest-value);
=uaf-viewpoint= if newer contracts require UAF.

*** IDEF1X (data modeling)

FIPS 184 — federal standard for data modeling. Used in classified DoD
data systems, intelligence databases, and anywhere the government
specifies the data model. Same shape language as Crow's Foot but with
different adornments and notation conventions.

*Rule of thumb*: classified DoD data work → IDEF1X; unclassified
contractor work → Crow's Foot unless the contract specifies otherwise.

*Candidate skills*: =idef1x-diagram= / =idef1x-analyze= (parallel to a
future =erd-diagram= / =erd-analyze= pair).

*** Tooling baseline

- *Cameo Systems Modeler / MagicDraw* (Dassault) — commercial SysML
  dominant in DoD programs.
- *Enterprise Architect (Sparx)* — widely used for UML + SysML + DoDAF.
- *Rhapsody (IBM)* — SysML with code generation; strong in avionics /
  embedded (FACE, ARINC).
- *Papyrus (Eclipse)* — open source SysML; free but clunkier.
- *PlantUML + plantuml-sysml* — text-based, version-controllable. Fits a
  git-centric workflow better than any GUI tool.

*** Highest-value starting point

If DeepSat contracts regularly require architecture deliverables, the
highest-ROI first skill is =dodaf-ov1= (or whatever naming convention
the rename discussion lands on). OV-1 is the universal currency in
briefings, proposals, and reviews; it's the one artifact that shows up
in every program regardless of contract specifics.

Trigger for building: an actual DoD deliverable that's blocked on not
having a skill to generate or check OV-1-shaped artifacts. Don't build
speculatively — defense-specific notations are narrow enough that each
skill should be driven by a concrete contract need, not aspiration.