#+TITLE: Rulesets — Open Work #+AUTHOR: Craig Jennings #+DATE: 2026-04-19 Tracking TODOs for the rulesets repo that span more than one commit. Project-scoped (not the global =~/sync/org/roam/inbox.org= list). * TODO [#A] Build =/update-skills= skill for keeping forks in sync with upstream The rulesets repo has a growing set of forks (=arch-decide= from wshobson/agents, =playwright-js= from lackeyjb/playwright-skill, =playwright-py= from anthropics/skills/webapp-testing). Over time, upstream releases fixes, new templates, or scope expansions that we'd want to pull in without losing our local modifications. A skill should handle this deliberately rather than by manual re-cloning. ** Design decisions (agreed) - *Upstream tracking:* per-fork manifest =.skill-upstream= (YAML or JSON): - =url= (GitHub URL) - =ref= (branch or tag) - =subpath= (path inside the upstream repo when it's a monorepo) - =last_synced_commit= (updated on successful sync) - *Local modifications:* 3-way merge. Requires a pristine baseline snapshot of the upstream-at-time-of-fork. Store under =.skill-upstream/baseline/= or similar; committed to the rulesets repo so the merge base is reproducible. - *Apply changes:* skill edits files directly with per-file confirmation. - *Conflict policy:* per-hunk prompt inside the skill. When a 3-way merge produces a conflict, the skill walks each conflicting hunk and asks Craig: keep-local / take-upstream / both / skip. Editor-independent; works on machines where Emacs isn't available. Fallback when baseline is missing or corrupt (can't run 3-way merge): write =.local=, =.upstream=, =.baseline= files side-by-side and surface as manual review. ** V1 Scope - [ ] Skill at =~/code/rulesets/update-skills/= - [ ] Discovery: scan sibling skill dirs for =.skill-upstream= manifests - [ ] Helper script (bash or python) to: - Clone each upstream at =ref= shallowly into =/tmp/= - Compare current skill state vs latest upstream vs stored baseline - Classify each file: =unchanged= / =upstream-only= / =local-only= / =both-changed= - For =both-changed=: run =git merge-file --stdout =; if clean, write result directly; if conflicts, parse the conflict-marker output and feed each hunk into the per-hunk prompt loop - [ ] Per-hunk prompt loop: - Show base / local / upstream side-by-side for each conflicting hunk - Ask: keep-local / take-upstream / both (concatenate) / skip (leave marker) - Assemble resolved hunks into the final file content - [ ] Per-fork summary output with file-level classification table - [ ] Per-file confirmation flow (yes / no / show-diff) BEFORE per-hunk loop - [ ] On successful sync: update =last_synced_commit= in the manifest - [ ] =--dry-run= to preview without writing ** V2+ (deferred) - [ ] Track upstream *releases* (tags) not just branches, so skill can propose "upgrade from v1.2 to v1.3" with release notes pulled in - [ ] Generate patch files as an alternative apply method (for users who prefer =git apply= / =patch= over in-place edits) - [ ] Non-interactive mode (=--non-interactive= / CI): skip conflict resolution, emit side-by-side files for later manual review - [ ] Auto-run on a schedule via Claude Code background agent - [ ] Summary of aggregate upstream activity across all forks (which forks have upstream changes waiting, which don't) - [ ] Optional editor integration: on machines with Emacs, offer =M-x smerge-ediff= as an alternate path for users who prefer ediff over per-hunk prompts ** Initial forks to enumerate (for manifest bootstrap) - [ ] =arch-decide= → =wshobson/agents= :: =plugins/documentation-generation/skills/architecture-decision-records= :: MIT - [ ] =playwright-js= → =lackeyjb/playwright-skill= :: =skills/playwright-skill= :: MIT - [ ] =playwright-py= → =anthropics/skills= :: =skills/webapp-testing= :: Apache-2.0 ** Open questions - [ ] What happens when upstream *renames* a file we fork? Skill would see "file gone from upstream, still present locally" — drop, keep, or prompt? - [ ] What happens when upstream splits into multiple forks (e.g., a plugin reshuffles its structure)? Probably out of scope for v1; manual migration. - [ ] Rate-limit / offline mode: if GitHub is unreachable, should skill fail or degrade gracefully? Likely degrade; print warning per fork. * TODO [#B] Build /research-writer — clean-room synthesis for research-backed long-form SCHEDULED: <2026-05-15 Fri> Gap in current rulesets: between =brainstorm= (idea refinement → design doc) and =arch-document= (arc42 technical docs), there's no skill for research-backed long-form prose — blog posts, essays, white papers, proposals with data backing, article-length content with citations. Craig writes documents across many contexts (defense-contractor work, personal, technical, proposals). The gap is real. *Evaluated 2026-04-19:* ComposioHQ/awesome-claude-skills has a =content-research-writer= skill (540 lines, 14 KB) that attempts this. *Not adopting:* - Parent repo has no LICENSE file — reuse legally ambiguous - Bloated: 540 lines of prose-scaffolding with no tooling - No citation-style enforcement (APA/Chicago/IEEE/MLA) - No source-quality heuristics (primary vs secondary, peer-review, recency) - Fictional example citations in the skill itself (models the hallucination failure mode a citation-focused skill should prevent) - No citation-verification step - Overlaps with =humanizer= at polish with no composition guidance *Patterns worth lifting clean-room (from their better parts):* - Folder convention =~/writing//= with =outline.md=, =research.md=, versioned drafts, =sources/= - Section-by-section feedback loop (outline validated → per-section research validated → per-section draft validated) - Hook alternatives pattern (generate three hook variants with rationale) *Additions for the clean-room version (v1):* - Citation-style selection (APA / Chicago / MLA / IEEE / custom) with style-specific examples and a pick-one step up front - Source-quality heuristics: primary > secondary; peer-reviewed; recency thresholds by domain; publisher reputation; funding transparency - Citation-verification discipline: fetch real sources, never fabricate, mark unverifiable claims with =[citation needed]= rather than inventing - Composition hand-off to =/humanizer= at the polish stage - Classification awareness: if the working directory or context signals defense / regulated territory, flag any sentence that might touch CUI or classified material before emission *Target:* ~150-200 lines, clean-room per blanket policy. *When to build:* wait for a real research-writing task to validate the design against actual document patterns. Building preemptively risks tuning for my guess at Craig's workflow rather than his real one. Triggers that would prompt "let's build it now": - Starting a white paper / proposal that needs citation discipline - Writing a technical blog post with external references - A pattern of hitting the same research-writing friction 3+ times Upstream reference (do not vendor): ComposioHQ/awesome-claude-skills =content-research-writer/SKILL.md=. * TODO [#C] Try Skill Seekers on a real DeepSat docs-briefing need SCHEDULED: <2026-05-15 Fri> =Skill Seekers= ([[https://github.com/yusufkaraaslan/Skill_Seekers]]) is a Python CLI + MCP server that ingests 18 source types (docs sites, PDFs, GitHub repos, YouTube videos, Confluence, Notion, OpenAPI specs, etc.) and exports to 20+ AI targets including Claude skills. MIT licensed, 12.9k stars, active as of 2026-04-12. *Evaluated: 2026-04-19 — not adopted for rulesets.* Generates *reference-style* skills (encyclopedic dumps of scraped source material), not *operational* skills (opinionated how-we-do-things content). Doesn't fit the rulesets curation pattern. *Next-trigger experiment (this TODO):* the next time a DeepSat task needs Claude briefed deeply on a specific library, API, or docs site — try: #+begin_src bash pip install skill-seekers skill-seekers create --target claude #+end_src Measure output quality vs hand-curated briefing. If usable, consider installing as a persistent tool. If output is bloated / under-structured, discard and stick with hand briefing. *Candidate first experiments (pick one from an actual need, don't invent):* - A Django ORM reference skill scoped to the version DeepSat pins - An OpenAPI-to-skill conversion for a partner-vendor API - A React hooks reference skill for the frontend team's current patterns - A specific AWS service's docs (e.g. GovCloud-flavored) *Patterns worth borrowing into rulesets even without adopting the tool:* - Enhancement-via-agent pipeline (scrape raw → LLM pass → structured SKILL.md). Applicable if we ever build internal-docs-to-skill tooling. - Multi-target export abstraction (one knowledge extraction → many output formats). Clean design for any future multi-AI-tool workflow. *Concerns to verify on actual use:* - =LICENSE= has an unfilled =[Your Name/Username]= placeholder (MIT is unambiguous, but sloppy for a 12k-star project) - Default branch is =development=, not =main= — pin with care - Heavy commercialization signals (website at skillseekersweb.com, Trendshift promo, branded badges) — license might shift later; watch - Companion =skill-seekers-configs= community repo has only 8 stars despite main's 12.9k — ecosystem thinner than headline adoption * TODO [#C] Revisit =c4-*= rename if a second notation skill ships Current naming keeps =c4-analyze= and =c4-diagram= as-is (framework prefix encodes the notation; "C4" is a discoverable brand). Suite membership is surfaced via the description footer, not the name. If a second notation-specific skill ever lands (=uml-*=, =erd-*=, =arc42-*=), the compound pattern =arch-analyze-= / =arch-diagram-= starts paying off: alphabetical clustering under 'a' amortizes across three+ skills, and the hierarchy becomes regular. At that point, rename all notation skills together in one pass. Trigger: adding skill #2 in the notation family. Don't pre-rename. Candidate future notation skills (not yet in scope — noted for when a real need arrives, not pre-emptively): - *UML* (Unified Modeling Language): OO design notation, 14 diagram types in practice dominated by class / sequence / state / component. Common in DoD / safety-critical / enterprise-architecture contexts. Tooling: PlantUML (text-to-diagram), Mermaid UML, draw.io. Would likely split into =uml-class=, =uml-sequence=, =uml-state= rather than one monolith — different audiences, different inputs. - *ERD* (Entity-Relationship Diagram): database schema modeling — entities, attributes, cardinality. Crow's Foot notation dominates practice; Chen is academic; IDEF1X is DoD-standard. Tooling: dbdiagram.io, Mermaid ERD, PlantUML, ERAlchemy (code-to-ERD for SQL). Natural fit as =erd-analyze= (extract from schema/migrations) and =erd-diagram= (generate from prose/model definitions). - *arc42*: already partially covered by =arch-document= (which emits arc42-structured docs). A standalone =arc42-*= skill would be redundant unless the arc42-specific visualizations need separation. Each answers a different question: - C4 → "What systems exist and how do they talk, at what zoom?" - UML class/sequence → "What does the code look like / what happens when X runs?" - ERD → "What's the database shape?" - arc42 → "What's the full architecture document?" Deferred pending an actual need that's blocked on not having one of these. ** DoD-specific notations (DeepSat context) Defense-contractor work uses a narrower, different notation set than commercial software. Document the trigger conditions and starting point so a future decision to build doesn't have to re-derive the landscape. *** SysML (Systems Modeling Language) UML 2 profile, dominant in DoD systems engineering. Six diagrams account for ~all practical use: - *Block Definition Diagram (BDD)* — structural; like UML class but for system blocks (components, subsystems, hardware). - *Internal Block Diagram (IBD)* — parts within a block and how they connect (flow ports, interfaces). - *Requirement diagram* — unique to SysML; traces requirements to satisfying blocks. Essential in regulated environments. - *Activity diagram* — behavioral flow. - *State machine* — same shape as UML. - *Sequence diagram* — same shape as UML. SysML v1.x is in the field; v2 is emerging but not yet adopted at scale (as of 2026-04). Tooling dominated by Cameo Systems Modeler / MagicDraw and Enterprise Architect. Text-based option: PlantUML + =plantuml-sysml= (git-friendly, growing niche). *Candidate skills*: =sysml-bdd=, =sysml-ibd=, =sysml-requirement=, =sysml-sequence=. Three or more in this cluster triggers the =arch-*-= rename discussion from the parent entry. *** DoDAF / UAF (architecture frameworks) Not notations themselves — frameworks that specify *which* viewpoints a program must deliver. Viewpoints are rendered using UML/SysML diagrams. - *DoDAF (DoD Architecture Framework)* — legacy but still contract-required on many programs. - *UAF (Unified Architecture Framework)* — DoDAF/MODAF successor, SysML-based. Gaining adoption on newer contracts. Common required viewpoints (formal CDRL deliverables or PDR/CDR review packages): - *OV-1* — High-Level Operational Concept Graphic. The "cartoon" showing the system in operational context with icons, arrows, surrounding actors/environment. *Universally asked for — informal or formal.* Starting point for any DoD diagram skill. - *OV-2* — Operational resource flows (nodes and flows). - *OV-5a/b* — Operational activities. - *SV-1* — Systems interfaces. Maps closely to C4 Container. - *SV-2* — Systems resource flows. - *SV-4* — Systems functionality. - *SV-10b* — Systems state transitions. *Informal ask ("send me an architecture diagram") → OV-1 + SV-1 satisfies 90% of the time.* Formal CDRL asks specify the viewpoint set contractually. *C4 gap*: C4 is rare in DoD. C4 System Context ≈ OV-1 in intent but not in visual convention. C4 Container ≈ SV-1. Expect a mapping step or reviewer pushback if delivering C4-shaped artifacts to a DoD audience. *Candidate skills*: =dodaf-ov1=, =dodaf-sv1= first (highest-value); =uaf-viewpoint= if newer contracts require UAF. *** IDEF1X (data modeling) FIPS 184 — federal standard for data modeling. Used in classified DoD data systems, intelligence databases, and anywhere the government specifies the data model. Same shape language as Crow's Foot but with different adornments and notation conventions. *Rule of thumb*: classified DoD data work → IDEF1X; unclassified contractor work → Crow's Foot unless the contract specifies otherwise. *Candidate skills*: =idef1x-diagram= / =idef1x-analyze= (parallel to a future =erd-diagram= / =erd-analyze= pair). *** Tooling baseline - *Cameo Systems Modeler / MagicDraw* (Dassault) — commercial SysML dominant in DoD programs. - *Enterprise Architect (Sparx)* — widely used for UML + SysML + DoDAF. - *Rhapsody (IBM)* — SysML with code generation; strong in avionics / embedded (FACE, ARINC). - *Papyrus (Eclipse)* — open source SysML; free but clunkier. - *PlantUML + plantuml-sysml* — text-based, version-controllable. Fits a git-centric workflow better than any GUI tool. *** Highest-value starting point If DeepSat contracts regularly require architecture deliverables, the highest-ROI first skill is =dodaf-ov1= (or whatever naming convention the rename discussion lands on). OV-1 is the universal currency in briefings, proposals, and reviews; it's the one artifact that shows up in every program regardless of contract specifics. Trigger for building: an actual DoD deliverable that's blocked on not having a skill to generate or check OV-1-shaped artifacts. Don't build speculatively — defense-specific notations are narrow enough that each skill should be driven by a concrete contract need, not aspiration.