docs/design/2026-05-28-rulesets-enhancement-backlog.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456

#+TITLE: Suggested rulesets enhancements
#+DATE: 2026-05-28
#+SOURCE: Codex review of /home/cjennings/code/rulesets

* Purpose

This file captures improvement ideas from a broad review of the =rulesets=
project. The goals are:

- make agents more efficient;
- increase agent knowledge and effectiveness;
- reduce token usage;
- improve user experience;
- make the system easier to operate across machines, projects, and runtimes.

* 2026-05-28 Triage Dispositions

Each item below was triaged interactively with Craig on 2026-05-28. Disposition recorded here so the decision survives in git.

** Accept (filed as TODOs)

- *Item #2 Per-agent live session files* — filed [#B] :feature: as "Codex Phase 1 — AI_AGENT_ID + session-context.d/<id>.org" earlier on 2026-05-28. Multi-agent is plausible; the singleton session-context.org races for real. Strong yes.
- *Item #8 Tighten generated/cache exclusions (=.aiignore=)* — small surface area, real payoff for filesystem scans. Filed [#C] :chore:.
- *Item #10 Workflow test harness* — startup drift check catches index mismatches but not deeper integrity (broken script references, orphan plugins). A handful of bats/pytest tests catches this at CI time. Filed [#C] :feature:.

** Pilot or scope-limit (filed as TODOs)

- *Item #5 Token-tiered workflow and rule files* — pilot the Summary / Execution / Reference / History split on the largest workflows (=startup.org=, =triage-intake.org=). Don't templatize universally; shorter workflows don't need tiering. Filed [#C] :feature:.
- *Item #7 Deduplicate =.ai/= template source* — reject the wholesale dedupe. The dual source (=claude-templates/= canonical + =.ai/= mirror) is a *feature*: canonical lives where other projects can rsync from it, mirror lives so rulesets-as-a-project has a working copy. Real pain is the sync-discipline overhead. Filed [#C] :feature:quick:solo: for "canonical/mirror drift detection via pre-commit hook or =make sync-check=".
- *Item #12 User-facing command simplification* — accept =make status= only (composes audit + doctor + open-task count cleanly). Reject =make sync= / =make health= / =make bootstrap-project= — they either duplicate existing flows or wrap for the sake of wrapping. Filed [#C] :feature:quick:solo:.

** Convention, not a tracked task

- *Item #11 Normalize script interfaces* (=--help=, =--check=, =--json=, stable exit codes) — adopt opportunistically as scripts are touched. The pattern is partially in place already. Don't sweep; let convention spread organically. No TODO; document the convention if it firms up further.
- *Item #13 Durable decision log (ADR layer)* — watch-and-wait. If rulesets keeps growing, formalize ADRs under =docs/decisions/=. If scope stabilizes, the existing session-archive + docs/design pattern is sufficient. No TODO until the growth signal is clear.

** Reject

- *Item #1 Broader runtime-neutral arc* — adapter manifests + runtime profiles is heavy infrastructure for a marginal payoff in the actual use case (Claude Code primary, Codex co-reviewer). The broader spec stays parked at #16 [#C]; only Phase 1 (item #2 above) gets prioritized. The full arc reopens when there's a real local-model workflow.
- *Item #3 Workflow routing index compression (catalog.json/edn for INDEX.org)* — =INDEX.org= is already one-line-per-entry and token-efficient. A parallel machine-readable layer adds drift risk and a generator step for savings that are small at current size. Keep =INDEX.org= terse instead.
- *Item #4 Universal =catalog.json= for skills/commands/rules/hooks/workflows* — looks valuable on paper, rots fast in practice. Every artifact change requires the catalog updated; prose docstrings already serve the same purpose; Claude Code's harness handles skill discovery natively. Maintenance burden far exceeds the "first-pass orientation" savings.
- *Item #6 Generated install and audit manifest (=install-manifest.json=)* — the current Makefile + =install-*.sh= pattern is shell-based but refined over many sessions and works correctly. A data layer adds indirection without changing what executes. Better docs in the existing scripts solve the "globs hide intent" complaint at lower cost.
- *Item #9 Generated project facts snapshot (=.ai/project-facts.org=)* — duplicates =notes.org='s *Project-Specific Context* role, which is already loaded at every session start. Invites drift between two parallel sources. Keep =notes.org= terse and high-signal.
- *Item #14 Local/offline model profile support* — premature. No actual local-model workflow yet. Building profile infrastructure ahead of a real use case is speculative. Reopens with item #1 when there's a concrete trigger.

* Enhancement Backlog

** TODO Runtime-neutral core

Complete the migration from Claude-Code-specific structure to a generic agent
runtime contract.

Current state:

- The reusable core is already mostly runtime-neutral: =.ai/protocols.org=,
  =.ai/workflows/=, =.ai/scripts/=, =inbox/=, cross-agent comms, session
  archives, and task workflows are conceptually usable by any capable agent.
- The install paths, language bundles, launcher, hook wiring, and naming are
  still Claude-specific: =~/.claude/=, =.claude/=, =CLAUDE.md=, Claude hook
  payloads, and the =claude= binary.

Suggested shape:

- Define a small runtime contract: where rules live, where workflows live, how
  live session state is named, how startup instructions are found, and which
  capabilities a runtime adapter may expose.
- Treat Claude Code as one adapter, not the project identity.
- Add adapter directories or manifests for Claude, Codex/OpenAI-compatible
  agents, and local OpenAI-compatible agents.

Why it helps:

- Makes the system usable offline with local models after setup.
- Reduces vendor lock-in while preserving the high-value =.ai/= workflow layer.
- Lets multiple agents share project conventions without rewriting the project
  memory and workflows for each tool.

** TODO Per-agent live session files

Replace the singleton live session file =.ai/session-context.org= with
agent-scoped session context files.

Problem:

- The current live file is a single shared path.
- If two agents operate in one project at the same time, they can overwrite or
  confuse each other's active session state.
- This matters more as the project moves toward generic runtime support.

Suggested shape:

- Use a directory such as =.ai/live-sessions/=.
- Name live files with runtime, host, pid/session id, and timestamp, e.g.
  =codex-strix-2026-05-28-0830.org=.
- At wrap-up, move the live file into =.ai/sessions/= using the existing archive
  naming convention.
- Keep a lightweight pointer file only if needed, e.g. =.ai/current-session= for
  single-agent clients.

Why it helps:

- Enables concurrent agents safely.
- Makes crash recovery more precise.
- Prevents one runtime's live context from polluting another runtime's session.

** TODO Workflow routing index compression

Convert =.ai/workflows/INDEX.org= into a compact machine-readable routing
manifest, while keeping prose workflow documentation separate.

Problem:

- The index is useful, but long.
- Agents need only a small routing table most of the time: trigger phrase,
  workflow file, purpose, and plugin ownership.
- Reading prose-heavy routing text costs tokens before the agent knows which
  workflow matters.

Suggested shape:

- Add =.ai/workflows/catalog.json= or =catalog.edn= with entries:
  - =id=
  - =file=
  - =purpose=
  - =triggers=
  - =source_plugins=
  - =loads=
  - =token_tier=
- Generate the human =INDEX.org= from the manifest, or generate the manifest
  from a canonical structured block in each workflow.

Why it helps:

- Faster workflow routing.
- Lower context load at startup.
- Easier drift detection: missing files, stale triggers, orphan plugins, and
  duplicate triggers can be checked mechanically.

** TODO Skill, command, rule, hook, and workflow catalog

Generate a top-level machine-readable catalog of all agent-facing artifacts.

Problem:

- Agents currently infer system shape from directory scans and long files.
- Important capabilities are spread across top-level skill directories,
  =.claude/commands/=, =claude-rules/=, =hooks/=, =scripts/=, =languages/=,
  =teams/=, =mcp/=, and =.ai/workflows/=.
- This makes first-pass orientation expensive.

Suggested shape:

- Generate =catalog.json= at repo root.
- Include each artifact's:
  - kind: skill, command, rule, hook, script, workflow, language bundle, team
    overlay, MCP server;
  - name;
  - summary;
  - trigger or invocation;
  - source path;
  - install target;
  - dependencies;
  - whether it is user-invoked, model-triggered, startup-triggered, or internal.

Why it helps:

- Agents can load one compact file before deciding what else to read.
- Improves routing accuracy.
- Makes docs, install checks, and audit output consistent.
- Reduces repeated expensive exploration in every session.

** TODO Token-tiered workflow and rule files

Split long workflow and rule documents into explicit token tiers.

Problem:

- Some files serve multiple audiences: routing, quick execution, deep reference,
  and historical rationale.
- Agents often need the execution steps but not the whole rationale.
- Long documents are good for maintainability but expensive in active context.

Suggested shape:

- Standardize top-level sections:
  - =Summary= or =Quick Contract=: one-screen purpose and outputs.
  - =Execution=: steps an agent must follow.
  - =Reference=: examples, edge cases, rationale, old decisions.
  - =History= or =Design Notes=: durable context not needed every run.
- Teach startup/routing to read only =Summary= first, then =Execution= only for
  the selected workflow.
- Keep long rationale available but opt-in.

Why it helps:

- Cuts token usage without deleting useful knowledge.
- Makes behavior more predictable because every workflow exposes its contract
  in the same place.
- Helps smaller/local models follow the system with less context pressure.

** TODO Generated install and audit manifest

Make installable artifacts explicit in a manifest instead of inferred primarily
from directory globs and Makefile variables.

Problem:

- The Makefile and scripts infer many behaviors from filesystem shape.
- Globs are convenient but hide intent: default hooks vs opt-in hooks, user
  commands vs skills, project-owned files vs rulesets-owned files.
- Drift checks duplicate some of this logic.

Suggested shape:

- Add =install-manifest.json= or fold install targets into the top-level
  =catalog.json=.
- For each installed artifact, specify:
  - source;
  - target;
  - install mode: symlink, copy, seed-only, generated, ignored;
  - overwrite policy;
  - ownership: rulesets-owned, project-owned, machine-owned;
  - check policy.

Why it helps:

- Reduces Makefile complexity.
- Makes =doctor= and =audit= simpler and more reliable.
- Documents ownership rules in data instead of prose plus shell logic.

** TODO Deduplicate =.ai/= template source

Clarify the canonical source for =.ai/= template files and reduce duplication
between repo-local =.ai/= and =claude-templates/.ai/=.

Problem:

- The repo contains both live project =.ai/= files and template
  =claude-templates/.ai/= files.
- Much of the content appears duplicated.
- Agents and humans have to know which one is canonical for a given edit.

Suggested shape:

- Choose one canonical template source.
- Treat project-local =.ai/= as this repo's own working copy, not the template
  source, or vice versa.
- Add a manifest/doctor check that says which paths are generated, copied, or
  project-owned.
- Consider excluding live session state from template sync completely.

Why it helps:

- Prevents edits landing in the wrong copy.
- Reduces files agents must inspect.
- Makes audit output easier to trust.

** TODO Tighten generated/cache exclusions

Make default inventory, audit, and review commands ignore generated/vendor/cache
content more aggressively.

Problem:

- Directory scans can include =node_modules=, =__pycache__=, =.pytest_cache=,
  package locks, generated OAuth artifacts, and test caches.
- Even when ignored by git, these files are visible to naïve filesystem reads.

Suggested shape:

- Add a shared ignore file for agent inventory, e.g. =.aiignore= or
  =rulesets-ignore.json=.
- Teach scripts and instructions to use it when summarizing or reviewing.
- Keep intentional lockfiles policy explicit: ignored if local skill dependency
  cache, tracked if reproducibility matters.

Why it helps:

- Reduces token waste during exploration.
- Prevents vendor files from distorting project summaries.
- Makes agent review output focus on authored source.

** TODO Generated project facts snapshot

Maintain a compact =project-facts= file for high-value context.

Problem:

- Important facts exist, but are scattered across README, notes, sessions,
  design docs, Makefile, and task files.
- Each new agent session repeats discovery work.

Suggested shape:

- Generate or maintain =.ai/project-facts.org= or =.ai/project-facts.json=.
- Include:
  - project purpose;
  - install modes;
  - active language bundles;
  - canonical template source;
  - active migrations;
  - recent durable decisions;
  - key commands;
  - known hazards;
  - remote location;
  - current open design direction.

Why it helps:

- Gives agents high-signal context immediately.
- Reduces repeated README/Makefile/design-doc scanning.
- Helps small/local models perform better with limited context.

** TODO Workflow test harness

Add lightweight tests for workflow documentation integrity.

Problem:

- Many workflows are prose specs.
- Prose can drift: triggers can disappear from the index, referenced scripts can
  be renamed, expected output files can change, plugin ownership can be unclear.

Suggested shape:

- Add tests that verify:
  - every workflow file is indexed or classified as a plugin;
  - every indexed workflow exists;
  - every referenced script path exists;
  - every source plugin maps to a parent workflow;
  - required sections exist;
  - workflow names and triggers are unique enough to route.

Why it helps:

- Catches documentation drift before runtime.
- Makes workflow changes safer.
- Increases agent trust in the routing layer.

** TODO Normalize script interfaces

Standardize helper script command-line conventions.

Problem:

- Scripts are useful but vary by interface.
- Agents compose scripts more safely when flags and exit codes are predictable.

Suggested shape:

- For every script where it makes sense, support:
  - =--help=;
  - =--check= or dry-run mode;
  - =--json= for structured agent consumption;
  - stable exit codes;
  - no-op success when there is nothing to do;
  - clear stderr for human-readable failures.
- Document the shared convention once.

Why it helps:

- Improves automation reliability.
- Reduces brittle text parsing.
- Lets agents gather facts with lower token usage by reading JSON summaries.

** TODO User-facing command simplification

Add a few high-level commands for common operator intent.

Problem:

- The Makefile is capable but broad.
- A user or agent may need to remember whether to run =install=, =audit=,
  =doctor=, =install-ai=, =install-lang=, or =catchup-machine=.

Suggested shape:

- Add or polish high-level targets such as:
  - =make status=: summarize install state, dirty state, audit status, and open
    inbox/task counts;
  - =make sync=: safe machine/project sync path;
  - =make health=: doctor + lint + relevant tests;
  - =make bootstrap-project PROJECT=...=: install =.ai/= plus optional language
    bundle in one guided path.

Why it helps:

- Reduces user friction.
- Makes common workflows memorable.
- Gives agents safer entry points than assembling many lower-level commands.

** TODO Durable decision log for rulesets itself

Promote durable project decisions into a concise decision log.

Problem:

- Some important decisions live in sessions, inbox files, or design notes.
- Session archives are good history but not the best place for current truth.

Suggested shape:

- Add =docs/decisions/= or =docs/adr/= for rulesets itself.
- Keep entries short:
  - context;
  - decision;
  - consequences;
  - supersedes/superseded-by links.
- Link design docs and session summaries as supporting material.

Why it helps:

- Makes the current architecture easier to understand.
- Reduces need to reread long historical sessions.
- Helps agents avoid reopening settled questions.

** TODO Local/offline model profile support

Encode model profiles and capability assumptions for hosted and local runtimes.

Problem:

- The generic runtime spec identifies local/offline use as a goal.
- Different agents have different context windows, tool support, speed, and
  reliability.
- Workflows do not yet adapt to those differences.

Suggested shape:

- Add runtime/model profiles such as:
  - hosted high-context coding agent;
  - hosted low-cost/fast agent;
  - local 30B coding model;
  - local 8B fallback.
- For each profile, specify:
  - context budget;
  - preferred summary tiers;
  - tool assumptions;
  - maximum recommended workflow depth;
  - when to ask for human confirmation;
  - when to avoid huge file reads.

Why it helps:

- Makes offline operation practical, not just possible.
- Helps smaller models avoid context overload.
- Lets workflows degrade gracefully by capability.

* Likely Highest-Impact First Steps

1. Generate a compact catalog for workflows, skills, commands, rules, hooks, and
   scripts.
2. Add token-tiered summaries to the highest-traffic workflow/rule files.
3. Replace singleton live session state with per-agent live session files.
4. Clarify and enforce the canonical =.ai/= template source.
5. Add a generated project facts snapshot for fast agent orientation.