#+TITLE: Suggested rulesets enhancements
#+DATE: 2026-05-28
#+SOURCE: Codex review of /home/cjennings/code/rulesets

* Purpose

This file captures improvement ideas from a broad review of the =rulesets=
project. The goals are:

- make agents more efficient;
- increase agent knowledge and effectiveness;
- reduce token usage;
- improve user experience;
- make the system easier to operate across machines, projects, and runtimes.

* 2026-05-28 Triage Dispositions

Each item below was triaged interactively with Craig on 2026-05-28. Disposition recorded here so the decision survives in git.

** Accept (filed as TODOs)

- *Item #2 Per-agent live session files* — filed [#B] :feature: as "Codex Phase 1 — AI_AGENT_ID + session-context.d/<id>.org" earlier on 2026-05-28. Multi-agent is plausible; the singleton session-context.org races for real. Strong yes.
- *Item #8 Tighten generated/cache exclusions (=.aiignore=)* — small surface area, real payoff for filesystem scans. Filed [#C] :chore:.
- *Item #10 Workflow test harness* — startup drift check catches index mismatches but not deeper integrity (broken script references, orphan plugins). A handful of bats/pytest tests catches this at CI time. Filed [#C] :feature:.

** Pilot or scope-limit (filed as TODOs)

- *Item #5 Token-tiered workflow and rule files* — pilot the Summary / Execution / Reference / History split on the largest workflows (=startup.org=, =triage-intake.org=). Don't templatize universally; shorter workflows don't need tiering. Filed [#C] :feature:.
- *Item #7 Deduplicate =.ai/= template source* — reject the wholesale dedupe. The dual source (=claude-templates/= canonical + =.ai/= mirror) is a *feature*: canonical lives where other projects can rsync from it, mirror lives so rulesets-as-a-project has a working copy. Real pain is the sync-discipline overhead. Filed [#C] :feature:quick:solo: for "canonical/mirror drift detection via pre-commit hook or =make sync-check=".
- *Item #12 User-facing command simplification* — accept =make status= only (composes audit + doctor + open-task count cleanly). Reject =make sync= / =make health= / =make bootstrap-project= — they either duplicate existing flows or wrap for the sake of wrapping. Filed [#C] :feature:quick:solo:.

** Convention, not a tracked task

- *Item #11 Normalize script interfaces* (=--help=, =--check=, =--json=, stable exit codes) — adopt opportunistically as scripts are touched. The pattern is partially in place already. Don't sweep; let convention spread organically. No TODO; document the convention if it firms up further.
- *Item #13 Durable decision log (ADR layer)* — watch-and-wait. If rulesets keeps growing, formalize ADRs under =docs/decisions/=. If scope stabilizes, the existing session-archive + docs/design pattern is sufficient. No TODO until the growth signal is clear.

** Reject

- *Item #1 Broader runtime-neutral arc* — adapter manifests + runtime profiles is heavy infrastructure for a marginal payoff in the actual use case (Claude Code primary, Codex co-reviewer). The broader spec stays parked at #16 [#C]; only Phase 1 (item #2 above) gets prioritized. The full arc reopens when there's a real local-model workflow.
- *Item #3 Workflow routing index compression (catalog.json/edn for INDEX.org)* — =INDEX.org= is already one-line-per-entry and token-efficient. A parallel machine-readable layer adds drift risk and a generator step for savings that are small at current size. Keep =INDEX.org= terse instead.
- *Item #4 Universal =catalog.json= for skills/commands/rules/hooks/workflows* — looks valuable on paper, rots fast in practice. Every artifact change requires the catalog updated; prose docstrings already serve the same purpose; Claude Code's harness handles skill discovery natively. Maintenance burden far exceeds the "first-pass orientation" savings.
- *Item #6 Generated install and audit manifest (=install-manifest.json=)* — the current Makefile + =install-*.sh= pattern is shell-based but refined over many sessions and works correctly. A data layer adds indirection without changing what executes. Better docs in the existing scripts solve the "globs hide intent" complaint at lower cost.
- *Item #9 Generated project facts snapshot (=.ai/project-facts.org=)* — duplicates =notes.org='s *Project-Specific Context* role, which is already loaded at every session start. Invites drift between two parallel sources. Keep =notes.org= terse and high-signal.
- *Item #14 Local/offline model profile support* — premature. No actual local-model workflow yet. Building profile infrastructure ahead of a real use case is speculative. Reopens with item #1 when there's a concrete trigger.

* Enhancement Backlog

** TODO Runtime-neutral core

Complete the migration from Claude-Code-specific structure to a generic agent
runtime contract.

Current state:

- The reusable core is already mostly runtime-neutral: =.ai/protocols.org=,
  =.ai/workflows/=, =.ai/scripts/=, =inbox/=, cross-agent comms, session
  archives, and task workflows are conceptually usable by any capable agent.
- The install paths, language bundles, launcher, hook wiring, and naming are
  still Claude-specific: =~/.claude/=, =.claude/=, =CLAUDE.md=, Claude hook
  payloads, and the =claude= binary.

Suggested shape:

- Define a small runtime contract: where rules live, where workflows live, how
  live session state is named, how startup instructions are found, and which
  capabilities a runtime adapter may expose.
- Treat Claude Code as one adapter, not the project identity.
- Add adapter directories or manifests for Claude, Codex/OpenAI-compatible
  agents, and local OpenAI-compatible agents.

Why it helps:

- Makes the system usable offline with local models after setup.
- Reduces vendor lock-in while preserving the high-value =.ai/= workflow layer.
- Lets multiple agents share project conventions without rewriting the project
  memory and workflows for each tool.

** TODO Per-agent live session files

Replace the singleton live session file =.ai/session-context.org= with
agent-scoped session context files.

Problem:

- The current live file is a single shared path.
- If two agents operate in one project at the same time, they can overwrite or
  confuse each other's active session state.
- This matters more as the project moves toward generic runtime support.

Suggested shape:

- Use a directory such as =.ai/live-sessions/=.
- Name live files with runtime, host, pid/session id, and timestamp, e.g.
  =codex-strix-2026-05-28-0830.org=.
- At wrap-up, move the live file into =.ai/sessions/= using the existing archive
  naming convention.
- Keep a lightweight pointer file only if needed, e.g. =.ai/current-session= for
  single-agent clients.

Why it helps:

- Enables concurrent agents safely.
- Makes crash recovery more precise.
- Prevents one runtime's live context from polluting another runtime's session.

** TODO Workflow routing index compression

Convert =.ai/workflows/INDEX.org= into a compact machine-readable routing
manifest, while keeping prose workflow documentation separate.

Problem:

- The index is useful, but long.
- Agents need only a small routing table most of the time: trigger phrase,
  workflow file, purpose, and plugin ownership.
- Reading prose-heavy routing text costs tokens before the agent knows which
  workflow matters.

Suggested shape:

- Add =.ai/workflows/catalog.json= or =catalog.edn= with entries:
  - =id=
  - =file=
  - =purpose=
  - =triggers=
  - =source_plugins=
  - =loads=
  - =token_tier=
- Generate the human =INDEX.org= from the manifest, or generate the manifest
  from a canonical structured block in each workflow.

Why it helps:

- Faster workflow routing.
- Lower context load at startup.
- Easier drift detection: missing files, stale triggers, orphan plugins, and
  duplicate triggers can be checked mechanically.

** TODO Skill, command, rule, hook, and workflow catalog

Generate a top-level machine-readable catalog of all agent-facing artifacts.

Problem:

- Agents currently infer system shape from directory scans and long files.
- Important capabilities are spread across top-level skill directories,
  =.claude/commands/=, =claude-rules/=, =hooks/=, =scripts/=, =languages/=,
  =teams/=, =mcp/=, and =.ai/workflows/=.
- This makes first-pass orientation expensive.

Suggested shape:

- Generate =catalog.json= at repo root.
- Include each artifact's:
  - kind: skill, command, rule, hook, script, workflow, language bundle, team
    overlay, MCP server;
  - name;
  - summary;
  - trigger or invocation;
  - source path;
  - install target;
  - dependencies;
  - whether it is user-invoked, model-triggered, startup-triggered, or internal.

Why it helps:

- Agents can load one compact file before deciding what else to read.
- Improves routing accuracy.
- Makes docs, install checks, and audit output consistent.
- Reduces repeated expensive exploration in every session.

** TODO Token-tiered workflow and rule files

Split long workflow and rule documents into explicit token tiers.

Problem:

- Some files serve multiple audiences: routing, quick execution, deep reference,
  and historical rationale.
- Agents often need the execution steps but not the whole rationale.
- Long documents are good for maintainability but expensive in active context.

Suggested shape:

- Standardize top-level sections:
  - =Summary= or =Quick Contract=: one-screen purpose and outputs.
  - =Execution=: steps an agent must follow.
  - =Reference=: examples, edge cases, rationale, old decisions.
  - =History= or =Design Notes=: durable context not needed every run.
- Teach startup/routing to read only =Summary= first, then =Execution= only for
  the selected workflow.
- Keep long rationale available but opt-in.

Why it helps:

- Cuts token usage without deleting useful knowledge.
- Makes behavior more predictable because every workflow exposes its contract
  in the same place.
- Helps smaller/local models follow the system with less context pressure.

** TODO Generated install and audit manifest

Make installable artifacts explicit in a manifest instead of inferred primarily
from directory globs and Makefile variables.

Problem:

- The Makefile and scripts infer many behaviors from filesystem shape.
- Globs are convenient but hide intent: default hooks vs opt-in hooks, user
  commands vs skills, project-owned files vs rulesets-owned files.
- Drift checks duplicate some of this logic.

Suggested shape:

- Add =install-manifest.json= or fold install targets into the top-level
  =catalog.json=.
- For each installed artifact, specify:
  - source;
  - target;
  - install mode: symlink, copy, seed-only, generated, ignored;
  - overwrite policy;
  - ownership: rulesets-owned, project-owned, machine-owned;
  - check policy.

Why it helps:

- Reduces Makefile complexity.
- Makes =doctor= and =audit= simpler and more reliable.
- Documents ownership rules in data instead of prose plus shell logic.

** TODO Deduplicate =.ai/= template source

Clarify the canonical source for =.ai/= template files and reduce duplication
between repo-local =.ai/= and =claude-templates/.ai/=.

Problem:

- The repo contains both live project =.ai/= files and template
  =claude-templates/.ai/= files.
- Much of the content appears duplicated.
- Agents and humans have to know which one is canonical for a given edit.

Suggested shape:

- Choose one canonical template source.
- Treat project-local =.ai/= as this repo's own working copy, not the template
  source, or vice versa.
- Add a manifest/doctor check that says which paths are generated, copied, or
  project-owned.
- Consider excluding live session state from template sync completely.

Why it helps:

- Prevents edits landing in the wrong copy.
- Reduces files agents must inspect.
- Makes audit output easier to trust.

** TODO Tighten generated/cache exclusions

Make default inventory, audit, and review commands ignore generated/vendor/cache
content more aggressively.

Problem:

- Directory scans can include =node_modules=, =__pycache__=, =.pytest_cache=,
  package locks, generated OAuth artifacts, and test caches.
- Even when ignored by git, these files are visible to naïve filesystem reads.

Suggested shape:

- Add a shared ignore file for agent inventory, e.g. =.aiignore= or
  =rulesets-ignore.json=.
- Teach scripts and instructions to use it when summarizing or reviewing.
- Keep intentional lockfiles policy explicit: ignored if local skill dependency
  cache, tracked if reproducibility matters.

Why it helps:

- Reduces token waste during exploration.
- Prevents vendor files from distorting project summaries.
- Makes agent review output focus on authored source.

** TODO Generated project facts snapshot

Maintain a compact =project-facts= file for high-value context.

Problem:

- Important facts exist, but are scattered across README, notes, sessions,
  design docs, Makefile, and task files.
- Each new agent session repeats discovery work.

Suggested shape:

- Generate or maintain =.ai/project-facts.org= or =.ai/project-facts.json=.
- Include:
  - project purpose;
  - install modes;
  - active language bundles;
  - canonical template source;
  - active migrations;
  - recent durable decisions;
  - key commands;
  - known hazards;
  - remote location;
  - current open design direction.

Why it helps:

- Gives agents high-signal context immediately.
- Reduces repeated README/Makefile/design-doc scanning.
- Helps small/local models perform better with limited context.

** TODO Workflow test harness

Add lightweight tests for workflow documentation integrity.

Problem:

- Many workflows are prose specs.
- Prose can drift: triggers can disappear from the index, referenced scripts can
  be renamed, expected output files can change, plugin ownership can be unclear.

Suggested shape:

- Add tests that verify:
  - every workflow file is indexed or classified as a plugin;
  - every indexed workflow exists;
  - every referenced script path exists;
  - every source plugin maps to a parent workflow;
  - required sections exist;
  - workflow names and triggers are unique enough to route.

Why it helps:

- Catches documentation drift before runtime.
- Makes workflow changes safer.
- Increases agent trust in the routing layer.

** TODO Normalize script interfaces

Standardize helper script command-line conventions.

Problem:

- Scripts are useful but vary by interface.
- Agents compose scripts more safely when flags and exit codes are predictable.

Suggested shape:

- For every script where it makes sense, support:
  - =--help=;
  - =--check= or dry-run mode;
  - =--json= for structured agent consumption;
  - stable exit codes;
  - no-op success when there is nothing to do;
  - clear stderr for human-readable failures.
- Document the shared convention once.

Why it helps:

- Improves automation reliability.
- Reduces brittle text parsing.
- Lets agents gather facts with lower token usage by reading JSON summaries.

** TODO User-facing command simplification

Add a few high-level commands for common operator intent.

Problem:

- The Makefile is capable but broad.
- A user or agent may need to remember whether to run =install=, =audit=,
  =doctor=, =install-ai=, =install-lang=, or =catchup-machine=.

Suggested shape:

- Add or polish high-level targets such as:
  - =make status=: summarize install state, dirty state, audit status, and open
    inbox/task counts;
  - =make sync=: safe machine/project sync path;
  - =make health=: doctor + lint + relevant tests;
  - =make bootstrap-project PROJECT=...=: install =.ai/= plus optional language
    bundle in one guided path.

Why it helps:

- Reduces user friction.
- Makes common workflows memorable.
- Gives agents safer entry points than assembling many lower-level commands.

** TODO Durable decision log for rulesets itself

Promote durable project decisions into a concise decision log.

Problem:

- Some important decisions live in sessions, inbox files, or design notes.
- Session archives are good history but not the best place for current truth.

Suggested shape:

- Add =docs/decisions/= or =docs/adr/= for rulesets itself.
- Keep entries short:
  - context;
  - decision;
  - consequences;
  - supersedes/superseded-by links.
- Link design docs and session summaries as supporting material.

Why it helps:

- Makes the current architecture easier to understand.
- Reduces need to reread long historical sessions.
- Helps agents avoid reopening settled questions.

** TODO Local/offline model profile support

Encode model profiles and capability assumptions for hosted and local runtimes.

Problem:

- The generic runtime spec identifies local/offline use as a goal.
- Different agents have different context windows, tool support, speed, and
  reliability.
- Workflows do not yet adapt to those differences.

Suggested shape:

- Add runtime/model profiles such as:
  - hosted high-context coding agent;
  - hosted low-cost/fast agent;
  - local 30B coding model;
  - local 8B fallback.
- For each profile, specify:
  - context budget;
  - preferred summary tiers;
  - tool assumptions;
  - maximum recommended workflow depth;
  - when to ask for human confirmation;
  - when to avoid huge file reads.

Why it helps:

- Makes offline operation practical, not just possible.
- Helps smaller models avoid context overload.
- Lets workflows degrade gracefully by capability.

* Likely Highest-Impact First Steps

1. Generate a compact catalog for workflows, skills, commands, rules, hooks, and
   scripts.
2. Add token-tiered summaries to the highest-traffic workflow/rule files.
3. Replace singleton live session state with per-agent live session files.
4. Clarify and enforce the canonical =.ai/= template source.
5. Add a generated project facts snapshot for fast agent orientation.