#+TITLE: Suggested rulesets enhancements #+DATE: 2026-05-28 #+SOURCE: Codex review of /home/cjennings/code/rulesets * Purpose This file captures improvement ideas from a broad review of the =rulesets= project. The goals are: - make agents more efficient; - increase agent knowledge and effectiveness; - reduce token usage; - improve user experience; - make the system easier to operate across machines, projects, and runtimes. * 2026-05-28 Triage Dispositions Each item below was triaged interactively with Craig on 2026-05-28. Disposition recorded here so the decision survives in git. ** Accept (filed as TODOs) - *Item #2 Per-agent live session files* — filed [#B] :feature: as "Codex Phase 1 — AI_AGENT_ID + session-context.d/.org" earlier on 2026-05-28. Multi-agent is plausible; the singleton session-context.org races for real. Strong yes. - *Item #8 Tighten generated/cache exclusions (=.aiignore=)* — small surface area, real payoff for filesystem scans. Filed [#C] :chore:. - *Item #10 Workflow test harness* — startup drift check catches index mismatches but not deeper integrity (broken script references, orphan plugins). A handful of bats/pytest tests catches this at CI time. Filed [#C] :feature:. ** Pilot or scope-limit (filed as TODOs) - *Item #5 Token-tiered workflow and rule files* — pilot the Summary / Execution / Reference / History split on the largest workflows (=startup.org=, =triage-intake.org=). Don't templatize universally; shorter workflows don't need tiering. Filed [#C] :feature:. - *Item #7 Deduplicate =.ai/= template source* — reject the wholesale dedupe. The dual source (=claude-templates/= canonical + =.ai/= mirror) is a *feature*: canonical lives where other projects can rsync from it, mirror lives so rulesets-as-a-project has a working copy. Real pain is the sync-discipline overhead. Filed [#C] :feature:quick:solo: for "canonical/mirror drift detection via pre-commit hook or =make sync-check=". - *Item #12 User-facing command simplification* — accept =make status= only (composes audit + doctor + open-task count cleanly). Reject =make sync= / =make health= / =make bootstrap-project= — they either duplicate existing flows or wrap for the sake of wrapping. Filed [#C] :feature:quick:solo:. ** Convention, not a tracked task - *Item #11 Normalize script interfaces* (=--help=, =--check=, =--json=, stable exit codes) — adopt opportunistically as scripts are touched. The pattern is partially in place already. Don't sweep; let convention spread organically. No TODO; document the convention if it firms up further. - *Item #13 Durable decision log (ADR layer)* — watch-and-wait. If rulesets keeps growing, formalize ADRs under =docs/decisions/=. If scope stabilizes, the existing session-archive + docs/design pattern is sufficient. No TODO until the growth signal is clear. ** Reject - *Item #1 Broader runtime-neutral arc* — adapter manifests + runtime profiles is heavy infrastructure for a marginal payoff in the actual use case (Claude Code primary, Codex co-reviewer). The broader spec stays parked at #16 [#C]; only Phase 1 (item #2 above) gets prioritized. The full arc reopens when there's a real local-model workflow. - *Item #3 Workflow routing index compression (catalog.json/edn for INDEX.org)* — =INDEX.org= is already one-line-per-entry and token-efficient. A parallel machine-readable layer adds drift risk and a generator step for savings that are small at current size. Keep =INDEX.org= terse instead. - *Item #4 Universal =catalog.json= for skills/commands/rules/hooks/workflows* — looks valuable on paper, rots fast in practice. Every artifact change requires the catalog updated; prose docstrings already serve the same purpose; Claude Code's harness handles skill discovery natively. Maintenance burden far exceeds the "first-pass orientation" savings. - *Item #6 Generated install and audit manifest (=install-manifest.json=)* — the current Makefile + =install-*.sh= pattern is shell-based but refined over many sessions and works correctly. A data layer adds indirection without changing what executes. Better docs in the existing scripts solve the "globs hide intent" complaint at lower cost. - *Item #9 Generated project facts snapshot (=.ai/project-facts.org=)* — duplicates =notes.org='s *Project-Specific Context* role, which is already loaded at every session start. Invites drift between two parallel sources. Keep =notes.org= terse and high-signal. - *Item #14 Local/offline model profile support* — premature. No actual local-model workflow yet. Building profile infrastructure ahead of a real use case is speculative. Reopens with item #1 when there's a concrete trigger. * Enhancement Backlog ** TODO Runtime-neutral core Complete the migration from Claude-Code-specific structure to a generic agent runtime contract. Current state: - The reusable core is already mostly runtime-neutral: =.ai/protocols.org=, =.ai/workflows/=, =.ai/scripts/=, =inbox/=, cross-agent comms, session archives, and task workflows are conceptually usable by any capable agent. - The install paths, language bundles, launcher, hook wiring, and naming are still Claude-specific: =~/.claude/=, =.claude/=, =CLAUDE.md=, Claude hook payloads, and the =claude= binary. Suggested shape: - Define a small runtime contract: where rules live, where workflows live, how live session state is named, how startup instructions are found, and which capabilities a runtime adapter may expose. - Treat Claude Code as one adapter, not the project identity. - Add adapter directories or manifests for Claude, Codex/OpenAI-compatible agents, and local OpenAI-compatible agents. Why it helps: - Makes the system usable offline with local models after setup. - Reduces vendor lock-in while preserving the high-value =.ai/= workflow layer. - Lets multiple agents share project conventions without rewriting the project memory and workflows for each tool. ** TODO Per-agent live session files Replace the singleton live session file =.ai/session-context.org= with agent-scoped session context files. Problem: - The current live file is a single shared path. - If two agents operate in one project at the same time, they can overwrite or confuse each other's active session state. - This matters more as the project moves toward generic runtime support. Suggested shape: - Use a directory such as =.ai/live-sessions/=. - Name live files with runtime, host, pid/session id, and timestamp, e.g. =codex-strix-2026-05-28-0830.org=. - At wrap-up, move the live file into =.ai/sessions/= using the existing archive naming convention. - Keep a lightweight pointer file only if needed, e.g. =.ai/current-session= for single-agent clients. Why it helps: - Enables concurrent agents safely. - Makes crash recovery more precise. - Prevents one runtime's live context from polluting another runtime's session. ** TODO Workflow routing index compression Convert =.ai/workflows/INDEX.org= into a compact machine-readable routing manifest, while keeping prose workflow documentation separate. Problem: - The index is useful, but long. - Agents need only a small routing table most of the time: trigger phrase, workflow file, purpose, and plugin ownership. - Reading prose-heavy routing text costs tokens before the agent knows which workflow matters. Suggested shape: - Add =.ai/workflows/catalog.json= or =catalog.edn= with entries: - =id= - =file= - =purpose= - =triggers= - =source_plugins= - =loads= - =token_tier= - Generate the human =INDEX.org= from the manifest, or generate the manifest from a canonical structured block in each workflow. Why it helps: - Faster workflow routing. - Lower context load at startup. - Easier drift detection: missing files, stale triggers, orphan plugins, and duplicate triggers can be checked mechanically. ** TODO Skill, command, rule, hook, and workflow catalog Generate a top-level machine-readable catalog of all agent-facing artifacts. Problem: - Agents currently infer system shape from directory scans and long files. - Important capabilities are spread across top-level skill directories, =.claude/commands/=, =claude-rules/=, =hooks/=, =scripts/=, =languages/=, =teams/=, =mcp/=, and =.ai/workflows/=. - This makes first-pass orientation expensive. Suggested shape: - Generate =catalog.json= at repo root. - Include each artifact's: - kind: skill, command, rule, hook, script, workflow, language bundle, team overlay, MCP server; - name; - summary; - trigger or invocation; - source path; - install target; - dependencies; - whether it is user-invoked, model-triggered, startup-triggered, or internal. Why it helps: - Agents can load one compact file before deciding what else to read. - Improves routing accuracy. - Makes docs, install checks, and audit output consistent. - Reduces repeated expensive exploration in every session. ** TODO Token-tiered workflow and rule files Split long workflow and rule documents into explicit token tiers. Problem: - Some files serve multiple audiences: routing, quick execution, deep reference, and historical rationale. - Agents often need the execution steps but not the whole rationale. - Long documents are good for maintainability but expensive in active context. Suggested shape: - Standardize top-level sections: - =Summary= or =Quick Contract=: one-screen purpose and outputs. - =Execution=: steps an agent must follow. - =Reference=: examples, edge cases, rationale, old decisions. - =History= or =Design Notes=: durable context not needed every run. - Teach startup/routing to read only =Summary= first, then =Execution= only for the selected workflow. - Keep long rationale available but opt-in. Why it helps: - Cuts token usage without deleting useful knowledge. - Makes behavior more predictable because every workflow exposes its contract in the same place. - Helps smaller/local models follow the system with less context pressure. ** TODO Generated install and audit manifest Make installable artifacts explicit in a manifest instead of inferred primarily from directory globs and Makefile variables. Problem: - The Makefile and scripts infer many behaviors from filesystem shape. - Globs are convenient but hide intent: default hooks vs opt-in hooks, user commands vs skills, project-owned files vs rulesets-owned files. - Drift checks duplicate some of this logic. Suggested shape: - Add =install-manifest.json= or fold install targets into the top-level =catalog.json=. - For each installed artifact, specify: - source; - target; - install mode: symlink, copy, seed-only, generated, ignored; - overwrite policy; - ownership: rulesets-owned, project-owned, machine-owned; - check policy. Why it helps: - Reduces Makefile complexity. - Makes =doctor= and =audit= simpler and more reliable. - Documents ownership rules in data instead of prose plus shell logic. ** TODO Deduplicate =.ai/= template source Clarify the canonical source for =.ai/= template files and reduce duplication between repo-local =.ai/= and =claude-templates/.ai/=. Problem: - The repo contains both live project =.ai/= files and template =claude-templates/.ai/= files. - Much of the content appears duplicated. - Agents and humans have to know which one is canonical for a given edit. Suggested shape: - Choose one canonical template source. - Treat project-local =.ai/= as this repo's own working copy, not the template source, or vice versa. - Add a manifest/doctor check that says which paths are generated, copied, or project-owned. - Consider excluding live session state from template sync completely. Why it helps: - Prevents edits landing in the wrong copy. - Reduces files agents must inspect. - Makes audit output easier to trust. ** TODO Tighten generated/cache exclusions Make default inventory, audit, and review commands ignore generated/vendor/cache content more aggressively. Problem: - Directory scans can include =node_modules=, =__pycache__=, =.pytest_cache=, package locks, generated OAuth artifacts, and test caches. - Even when ignored by git, these files are visible to naïve filesystem reads. Suggested shape: - Add a shared ignore file for agent inventory, e.g. =.aiignore= or =rulesets-ignore.json=. - Teach scripts and instructions to use it when summarizing or reviewing. - Keep intentional lockfiles policy explicit: ignored if local skill dependency cache, tracked if reproducibility matters. Why it helps: - Reduces token waste during exploration. - Prevents vendor files from distorting project summaries. - Makes agent review output focus on authored source. ** TODO Generated project facts snapshot Maintain a compact =project-facts= file for high-value context. Problem: - Important facts exist, but are scattered across README, notes, sessions, design docs, Makefile, and task files. - Each new agent session repeats discovery work. Suggested shape: - Generate or maintain =.ai/project-facts.org= or =.ai/project-facts.json=. - Include: - project purpose; - install modes; - active language bundles; - canonical template source; - active migrations; - recent durable decisions; - key commands; - known hazards; - remote location; - current open design direction. Why it helps: - Gives agents high-signal context immediately. - Reduces repeated README/Makefile/design-doc scanning. - Helps small/local models perform better with limited context. ** TODO Workflow test harness Add lightweight tests for workflow documentation integrity. Problem: - Many workflows are prose specs. - Prose can drift: triggers can disappear from the index, referenced scripts can be renamed, expected output files can change, plugin ownership can be unclear. Suggested shape: - Add tests that verify: - every workflow file is indexed or classified as a plugin; - every indexed workflow exists; - every referenced script path exists; - every source plugin maps to a parent workflow; - required sections exist; - workflow names and triggers are unique enough to route. Why it helps: - Catches documentation drift before runtime. - Makes workflow changes safer. - Increases agent trust in the routing layer. ** TODO Normalize script interfaces Standardize helper script command-line conventions. Problem: - Scripts are useful but vary by interface. - Agents compose scripts more safely when flags and exit codes are predictable. Suggested shape: - For every script where it makes sense, support: - =--help=; - =--check= or dry-run mode; - =--json= for structured agent consumption; - stable exit codes; - no-op success when there is nothing to do; - clear stderr for human-readable failures. - Document the shared convention once. Why it helps: - Improves automation reliability. - Reduces brittle text parsing. - Lets agents gather facts with lower token usage by reading JSON summaries. ** TODO User-facing command simplification Add a few high-level commands for common operator intent. Problem: - The Makefile is capable but broad. - A user or agent may need to remember whether to run =install=, =audit=, =doctor=, =install-ai=, =install-lang=, or =catchup-machine=. Suggested shape: - Add or polish high-level targets such as: - =make status=: summarize install state, dirty state, audit status, and open inbox/task counts; - =make sync=: safe machine/project sync path; - =make health=: doctor + lint + relevant tests; - =make bootstrap-project PROJECT=...=: install =.ai/= plus optional language bundle in one guided path. Why it helps: - Reduces user friction. - Makes common workflows memorable. - Gives agents safer entry points than assembling many lower-level commands. ** TODO Durable decision log for rulesets itself Promote durable project decisions into a concise decision log. Problem: - Some important decisions live in sessions, inbox files, or design notes. - Session archives are good history but not the best place for current truth. Suggested shape: - Add =docs/decisions/= or =docs/adr/= for rulesets itself. - Keep entries short: - context; - decision; - consequences; - supersedes/superseded-by links. - Link design docs and session summaries as supporting material. Why it helps: - Makes the current architecture easier to understand. - Reduces need to reread long historical sessions. - Helps agents avoid reopening settled questions. ** TODO Local/offline model profile support Encode model profiles and capability assumptions for hosted and local runtimes. Problem: - The generic runtime spec identifies local/offline use as a goal. - Different agents have different context windows, tool support, speed, and reliability. - Workflows do not yet adapt to those differences. Suggested shape: - Add runtime/model profiles such as: - hosted high-context coding agent; - hosted low-cost/fast agent; - local 30B coding model; - local 8B fallback. - For each profile, specify: - context budget; - preferred summary tiers; - tool assumptions; - maximum recommended workflow depth; - when to ask for human confirmation; - when to avoid huge file reads. Why it helps: - Makes offline operation practical, not just possible. - Helps smaller models avoid context overload. - Lets workflows degrade gracefully by capability. * Likely Highest-Impact First Steps 1. Generate a compact catalog for workflows, skills, commands, rules, hooks, and scripts. 2. Add token-tiered summaries to the highest-traffic workflow/rule files. 3. Replace singleton live session state with per-agent live session files. 4. Clarify and enforce the canonical =.ai/= template source. 5. Add a generated project facts snapshot for fast agent orientation.