docs/design/2026-05-28-generic-agent-runtime-spec.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471

#+TITLE: Spec: Generic Agent Runtime Support for rulesets
#+AUTHOR: Codex
#+DATE: 2026-05-28
#+STARTUP: showall

* Introductory note

Craig asked for a design pass on making =rulesets= generic rather than
Claude-Code-specific. The motivating case is offline operation: if he is on a
laptop without network, a local LLM should still be able to use the same project
structure, workflows, memory, and cross-agent conventions. The design also needs
to support two different LLMs running in the same project at the same time,
without trampling each other's live session state.

I read the current =rulesets= checkout and found that the reusable core is
already there: =.ai/= workflows, scripts, cross-agent comms, inboxes, and
project startup structure are not inherently Claude-specific. The Claude
assumptions live mostly in naming, install destinations, launcher behavior,
per-language bundle layout, hook APIs, and a single active
=.ai/session-context.org= file.

Hardware notes:

- This machine is the high-end local-LLM target: AMD Ryzen AI Max+ 395, 128 GiB
  RAM, Radeon 8060S / Strix Halo unified memory. For offline agentic coding, I
  recommend installing =Qwen3-Coder-30B-A3B-Instruct-GGUF= as the default local
  coding model, preferably =Q6_K= on this machine and =Q4_K_M= as the compatibility
  quant. It is code-specialized, Apache-2.0, and its GGUF files fit comfortably.
  For a stronger general fallback on this machine, also install
  =Qwen3-Next-80B-A3B-Instruct-GGUF= =Q4_K_M=; it is not as code-specialized but
  gives a much larger model with long context and still fits the 128 GiB system.
- =velox= hardware from =ssh velox inxi -C -G -m -S --filter=: Intel i7-1370P,
  64 GiB DDR4, Intel Iris Xe integrated graphics. For that machine, the strongest
  model I would recommend as normal offline coding stock is
  =Qwen3-Coder-30B-A3B-Instruct-GGUF= =Q4_K_M=. It should fit in RAM with room for
  context, but expect CPU-class latency. Also install an 8B fallback for quick
  edits and low-latency triage.

Suggested archsetup handoff: ask =archsetup= to install the runtime stack
(=llama.cpp= with Vulkan/CPU support, optionally =ollama= as a simple manager),
create a shared model cache, and prefetch the model set above during normal
machine setup when network is available.

Sources checked:

- [[https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct-GGUF][Qwen3-Next-80B-A3B-Instruct-GGUF model card]]: Q4_K_M is 48.4 GB, native
  context length is 262,144 tokens, Apache-2.0.
- [[https://huggingface.co/tensorblock/Qwen_Qwen3-Coder-30B-A3B-Instruct-GGUF][Qwen3-Coder-30B-A3B-Instruct GGUF quant listing]]: Q4_K_M is 18.557 GB,
  Q5_K_M is 21.726 GB, Q6_K is 25.093 GB.
- [[https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct-GGUF][Qwen3-Next model overview]]: 80B total parameters, 3B active, GGUF support via
  =llama.cpp= / =llama-cpp-python=.
- [[https://en.wikipedia.org/wiki/Llama.cpp][llama.cpp overview]]: supports Vulkan, HIP/ROCm, OpenCL, CPU, and other
  backends. For this hardware class, keep the implementation backend-swappable.

* Status

Draft v0. This is not an implementation plan yet; it is a product/architecture
spec for the next =rulesets= refactor.

* Problem

=rulesets= is named and wired as a Claude Code rules distribution:

- Global install targets =~/.claude/skills=, =~/.claude/rules=,
  =~/.claude/hooks=, and =~/.claude/settings.json=.
- Per-project language bundles copy into =.claude/= and seed =CLAUDE.md=.
- The launcher =claude-templates/bin/ai= hard-codes =CLAUDE_CMD=claude= and
  requires the =claude= binary.
- Template documentation says "Claude" throughout =protocols.org=,
  =startup.org=, and the README.
- Hook scripts and settings assume Claude Code's hook protocol and
  =$CLAUDE_PROJECT_DIR=.
- The active session file is a singleton =.ai/session-context.org=, which is
  unsafe when two agents operate in the same project simultaneously.

The result: the good project structure is portable in principle but not in
practice. A local offline model can read files, but there is no generic runtime
contract that tells it where to load rules from, where to record live state, how
to avoid another agent's context file, or how to use the same launcher and
project discovery flow.

* Goals

- Preserve =.ai/= as the project-neutral workflow, memory, scripts, inbox, and
  cross-agent layer.
- Support multiple runtimes:
  - Claude Code as the existing adapter.
  - Codex/OpenAI-compatible hosted agents.
  - Local OpenAI-compatible agents backed by =llama.cpp= / =ollama= / LM Studio.
- Allow two or more agents to work in the same project concurrently without
  sharing a live session-context file.
- Keep current Claude workflows working during migration.
- Make language bundles and team overlays installable for more than one runtime.
- Make offline use a first-class path: rules, workflows, launcher, model cache,
  and local endpoint all work with no network after setup.

* Non-goals for v1

- No attempt to make every Claude hook feature work identically in every runtime.
  Runtimes expose different hook/event APIs.
- No automatic prompt translation that rewrites every rule into every vendor's
  preferred style. V1 should install common rules plus small runtime adapters.
- No local model benchmarking harness. Pick sensible defaults and make the model
  inventory configurable.
- No forced rename of existing =.claude/= installations in existing projects.
  Compatibility matters.

* Current-state findings

** Project-neutral pieces

These can remain conceptually unchanged:

- =.ai/protocols.org= as the behavioral entry point.
- =.ai/workflows/= and =.ai/scripts/= as synced canonical project tooling.
- =.ai/project-workflows/= and =.ai/project-scripts/= as project-owned extension
  points.
- =inbox/= and =inbox/from-agents/= as human and agent inboxes.
- Cross-agent message protocol and scripts. They say "agent" already and are
  mostly model-neutral.

** Claude-specific pieces

Observed files and assumptions:

- =README.org= describes "Claude Code skills, rules, and per-language project
  bundles."
- =Makefile= uses =SKILLS_DIR=$(HOME)/.claude/skills=,
  =RULES_DIR=$(HOME)/.claude/rules=, =HOOKS_DIR=$(HOME)/.claude/hooks=, and
  installs =.claude= config.
- =Makefile deps= installs =@anthropic-ai/claude-code= and checks =claude=.
- =scripts/install-lang.sh= copies common rules into =PROJECT/.claude/rules=,
  copies language-specific =claude/= directories, and seeds =CLAUDE.md=.
- =scripts/sync-language-bundle.sh= fingerprints bundles by
  =PROJECT/.claude/rules= files.
- =scripts/install-team.sh= installs team overlays into =PROJECT/.claude/rules=.
- =scripts/audit.sh= calls the canonical source =claude-templates/.ai=.
- =claude-templates/bin/ai= requires =claude= and launches
  =claude "<project instructions>"= in tmux.
- =languages/elisp/CLAUDE.md= is the project instruction template.
- =languages/elisp/claude/settings.json= uses Claude Code hooks and
  =$CLAUDE_PROJECT_DIR=.

* Proposed model

** Vocabulary

- *Core* — runtime-neutral rules, workflows, scripts, and project conventions.
- *Runtime* — an agent implementation: =claude=, =codex=, =local-openai=,
  =aider-local=, etc.
- *Runtime adapter* — install paths, hook wiring, command template, instruction
  filename, and limitations for one runtime.
- *Agent instance* — one live process/session in one project, identified by
  runtime + host + project + unique suffix.

** Directory model

Keep =.ai/= as the stable project-local core.

Change active session state from a singleton:

#+begin_example
.ai/session-context.org
#+end_example

to an active-session directory:

#+begin_example
.ai/session-context.d/
  <agent-id>.org
.ai/sessions/
  YYYY-MM-DD-HH-MM-<agent-id>-<description>.org
#+end_example

Recommended =agent-id= shape:

#+begin_example
<host>.<project>.<runtime>.<short-id>
#+end_example

Examples:

#+begin_example
pearl.org-drill.claude.a83f
pearl.org-drill.local-qwen30b.19ca
velox.archsetup.local-qwen30b.7712
#+end_example

Compatibility rule: if exactly one active context exists, tools may expose a
temporary =.ai/session-context.org= symlink or legacy copy for old workflows.
New workflows should read/write by =AI_AGENT_ID=.

** Runtime manifest

Add a repository-level runtime manifest:

#+begin_example
runtimes/
  claude.toml
  codex.toml
  local-openai.toml
#+end_example

Each runtime defines:

#+begin_src toml
id = "local-openai"
display_name = "Local OpenAI-compatible agent"
command = "aider"
args = ["--model", "openai/qwen-local", "--openai-api-base", "http://127.0.0.1:11434/v1"]
requires_network = false
project_instruction_files = ["AGENTS.md", ".ai/protocols.org"]
global_install_root = "~/.config/rulesets/runtimes/local-openai"
project_install_dir = ".agents/local-openai"
supports_hooks = "wrapper"
supports_mcp = false
supports_subagents = false
#+end_src

The manifest lets the launcher and install scripts reason about a runtime
without hard-coding Claude paths.

** Source layout

Refactor source directories toward:

#+begin_example
agent-rules/                 # former claude-rules; runtime-neutral where possible
skills/                      # skills with runtime support metadata
ai-templates/.ai/            # former claude-templates/.ai
runtimes/claude/             # Claude adapter
runtimes/codex/              # Codex adapter
runtimes/local-openai/       # local model adapter
languages/elisp/common/      # common language bundle material
languages/elisp/runtimes/claude/
languages/elisp/runtimes/local-openai/
teams/deepsat/common/
teams/deepsat/runtimes/claude/
#+end_example

Do not require a big-bang rename. V1 can support aliases:

- =claude-rules/= remains as a compatibility symlink or wrapper around
  =agent-rules/=.
- =claude-templates/= remains as an alias for =ai-templates/= until all startup
  workflows are updated.
- =languages/<lang>/claude/= remains supported by the Claude adapter.

** Install behavior

Replace "install Claude tooling" with "install runtime adapter":

#+begin_example
make install-runtime RUNTIME=claude
make install-runtime RUNTIME=local-openai
make install-lang LANG=elisp PROJECT=~/code/foo RUNTIME=claude
make install-lang LANG=elisp PROJECT=~/code/foo RUNTIME=local-openai
#+end_example

Claude adapter:

- Global: =~/.claude/skills=, =~/.claude/rules=, =~/.claude/hooks=.
- Project: =.claude/= and =CLAUDE.md=.
- Hook API: Claude Code =settings.json=.

Local OpenAI adapter:

- Global: =~/.config/rulesets/local-openai/= and model server config.
- Project: =.agents/local-openai/= plus =AGENTS.md= or
  =.ai/runtime/local-openai/instructions.md=.
- Hook API: wrapper-level checks only. If the local CLI has no hook protocol,
  hooks become documented commands or wrapper pre/post actions.

Codex adapter:

- Project instruction file should be =AGENTS.md= where supported.
- Runtime-specific config lives under =.agents/codex/= or the tool's native
  config path.

** Launcher behavior

Refactor =claude-templates/bin/ai= into a generic launcher, still named =ai=:

#+begin_example
ai                         # choose project and default runtime
ai --runtime claude .
ai --runtime local-openai .
ai --runtime local-qwen30b ~/code/org-drill
ai --attach
ai --list-runtimes
#+end_example

Launcher responsibilities:

- Discover projects by =.ai/protocols.org=, not by "Claude-template project."
- Select runtime from:
  - explicit =--runtime=,
  - project default in =.ai/runtime.toml=,
  - host default in =~/.config/rulesets/runtime.toml=.
- Create =AI_AGENT_ID= before launch.
- Export:
  - =AI_AGENT_ID=
  - =AI_RUNTIME=
  - =AI_PROJECT_DIR=
  - =AI_SESSION_CONTEXT=.ai/session-context.d/$AI_AGENT_ID.org=
- Use tmux window names that include runtime when needed:
  - =org-drill= if only one agent for the project.
  - =org-drill:claude= and =org-drill:local-qwen30b= if multiple agents exist.
- Pass a runtime-appropriate opening instruction:
  - Claude: current command-line prompt.
  - Local agent: prompt file or initial message that says to read
    =.ai/protocols.org= and use =AI_SESSION_CONTEXT=.

** Session-context contract

Every runtime must obey:

- Never write the legacy singleton when =AI_SESSION_CONTEXT= is set.
- Create the context file lazily on the first state-mutating turn.
- Archive to =.ai/sessions/= with the =agent-id= in the filename.
- Include runtime and model metadata in frontmatter:

#+begin_example
#+TITLE: Session context
#+AGENT_ID: pearl.org-drill.local-qwen30b.19ca
#+RUNTIME: local-openai
#+MODEL: Qwen3-Coder-30B-A3B-Instruct-Q6_K
#+HOST: pearl
#+STARTED: 2026-05-28T...
#+end_example

Startup workflow changes:

- Check =.ai/session-context.d/*.org=, not only =.ai/session-context.org=.
- If the current =AI_AGENT_ID= has a live file, recover it.
- If other active files exist, surface them as "other active agents" but do not
  read them wholesale unless needed. This prevents context contamination.

** Cross-agent updates

The existing cross-agent protocol can stay, but add optional fields:

#+begin_example
#+SENDER_AGENT_ID: pearl.org-drill.claude.a83f
#+SENDER_RUNTIME: claude
#+TARGET_AGENT_ID: pearl.org-drill.local-qwen30b.19ca
#+TARGET_RUNTIME: local-openai
#+MODEL: Qwen3-Coder-30B-A3B-Instruct-Q6_K
#+end_example

Destination syntax can remain =machine.project= for project-level delivery.
Add =machine.project.agent-id= as an optional targeted form when two agents in
the same project are both active.

Receivers should ignore messages targeted at another =TARGET_AGENT_ID= unless
the user explicitly asks them to take over.

** Hook and validation strategy

V1 should not pretend all runtimes have Claude's hooks.

Define hook levels:

| Level | Meaning |
|-------+---------|
| =native= | Runtime has an event/hook API; install native config. |
| =wrapper= | =ai= launcher or helper scripts run checks around common actions. |
| =manual= | Rules document the verification commands; no enforcement. |

Language bundles should declare which hooks are required and which are advisory.
For local runtimes, start with =manual= plus project-level test commands. Add
=wrapper= only where the local agent CLI can route edits through a known command.

** Local model runtime

Install a host-level local model service:

- Preferred low-level runtime: =llama.cpp= server with OpenAI-compatible API.
- Optional manager: =ollama= for simpler model lifecycle where its model catalog
  is enough.
- Model cache: =~/.local/share/llm/models= or =/srv/models/llm=.
- Ports:
  - =127.0.0.1:11434= for =ollama= if installed.
  - =127.0.0.1:8081= for =llama-server= default coding model.
  - =127.0.0.1:8082= for larger/general model when running simultaneously.

Host model recommendations:

| Host | Hardware | Default offline coding model | Larger/secondary model |
|------+----------+------------------------------+------------------------|
| current high-end machine | Ryzen AI Max+ 395, 128 GiB unified RAM, Radeon 8060S | =Qwen3-Coder-30B-A3B-Instruct-GGUF Q6_K= | =Qwen3-Next-80B-A3B-Instruct-GGUF Q4_K_M= |
| velox | i7-1370P, 64 GiB RAM, Intel Iris Xe | =Qwen3-Coder-30B-A3B-Instruct-GGUF Q4_K_M= | 8B fallback for speed |

Rationale:

- The Qwen3-Coder 30B GGUF sizes leave enough headroom for context and a second
  agent on both machines.
- The high-end machine can also carry Qwen3-Next 80B Q4_K_M at 48.4 GB, useful
  for long-context planning or general reasoning offline.
- =velox= is memory-capable but GPU-limited; Qwen3-Coder 30B Q4_K_M is the
  strongest practical coding default before latency becomes the dominant pain.

* Migration plan

** Phase 1: Add runtime identity without renaming everything

- Teach =ai= launcher to set =AI_AGENT_ID=, =AI_RUNTIME=, =AI_PROJECT_DIR=, and
  =AI_SESSION_CONTEXT=.
- Update startup/wrap-up workflows to prefer =AI_SESSION_CONTEXT=.
- Keep legacy =.ai/session-context.org= fallback.
- Add tests for two simultaneous session-context files.

** Phase 2: Introduce runtime manifests and generic install commands

- Add =runtimes/claude.toml= and make current install behavior data-driven.
- Add =runtimes/local-openai.toml= with command templates.
- Add =make install-runtime= and keep =make install= as Claude-compatible alias.

** Phase 3: Split common language bundles from runtime adapters

- Move runtime-neutral language rules into =languages/<lang>/common=.
- Keep Claude-specific settings/hooks under =languages/<lang>/runtimes/claude=.
- Add local-openai adapter docs/instructions for at least elisp.

** Phase 4: Rename user-facing docs

- Rename =claude-templates= to =ai-templates= after compatibility aliases exist.
- Rename =claude-rules= to =agent-rules= after scripts no longer hard-code it.
- Update docs from "Claude should" to "the active agent should" where the rule is
  runtime-neutral.
- Keep a short Claude adapter README for Claude-only behavior.

** Phase 5: Local model install handoff

- Send archsetup an inbox note requesting local model runtime support.
- After archsetup lands it, teach =rulesets doctor= to verify:
  - =llama-server= or =ollama= installed.
  - configured model files exist.
  - configured OpenAI-compatible endpoint can answer a smoke prompt.

* Test strategy

- Unit-test launcher runtime selection and =AI_AGENT_ID= generation.
- Unit-test session-context path generation and archival names.
- Integration-test two fake runtimes launching the same project into distinct
  context files.
- Test =sync-language-bundle.sh= compatibility for legacy Claude bundles.
- Test install-lang for:
  - =RUNTIME=claude= writes =.claude/= and =CLAUDE.md=.
  - =RUNTIME=local-openai= writes =.agents/local-openai/= and does not touch
    =.claude/=.
- Test startup workflow examples or scripts so they look for
  =session-context.d= without breaking old projects.
- Test cross-agent targeted messages with =TARGET_AGENT_ID=.

* Open decisions

- What should the generic project instruction file be: =AGENTS.md=,
  =AI.md=, or runtime-specific only?
- Should =.ai/session-context.org= become a symlink to the current agent's file,
  or should it disappear after migration?
- Should =rulesets= standardize on =llama.cpp= only, or support =ollama= as the
  default beginner-friendly local runtime?
- Which local agent CLI should be the first supported offline editor:
  =aider=, =opencode=, a simple custom wrapper, or something else?

* Recommended next step

Start with Phase 1 only. The singleton session-context file is the immediate
correctness issue for simultaneous agents, and it can be fixed without renaming
the whole repository or disrupting current Claude installs.