aboutsummaryrefslogtreecommitdiff
path: root/docs/design/2026-05-28-generic-agent-runtime-spec.org
diff options
context:
space:
mode:
authorCraig Jennings <c@cjennings.net>2026-06-12 02:24:01 -0500
committerCraig Jennings <c@cjennings.net>2026-06-12 02:24:01 -0500
commit22e19c21e6aabe0319d4b09a862f4a3705c92509 (patch)
tree5210f7ac7bba64b7b1b0b23f56d14f4dafefaf9a /docs/design/2026-05-28-generic-agent-runtime-spec.org
parentc6fd73441ef0b683abb859863dcd0d48377a4838 (diff)
downloadrulesets-22e19c21e6aabe0319d4b09a862f4a3705c92509.tar.gz
rulesets-22e19c21e6aabe0319d4b09a862f4a3705c92509.zip
docs(spec): fold the Codex review into the agent-runtime spec
The review's top finding was that one Not-ready label hid an implementable slice. Status now splits by arc: Phase 1.5 helper instances are READY WITH CAVEATS (the three-ring gate and the manual drills are binding, and the ai-term.el work is a coordinated .emacs.d handoff with an exact artifact), while phases 2-5 stay NOT READY behind a decisions-required section and a Phase 5 reverification prerequisite that demotes the model table to a recommendation. The remaining findings hardened the slice: per-ring rollback actions including the half-propagated-sync case, the review's test inventory adopted as normative, a message contract for stale helper files, and explicit roster-unavailable behavior on unsupported platforms. All recommendations accepted except the document split, modified to a dual rubric in one document. The review file and dispositions table ride along.
Diffstat (limited to 'docs/design/2026-05-28-generic-agent-runtime-spec.org')
-rw-r--r--docs/design/2026-05-28-generic-agent-runtime-spec.org156
1 files changed, 125 insertions, 31 deletions
diff --git a/docs/design/2026-05-28-generic-agent-runtime-spec.org b/docs/design/2026-05-28-generic-agent-runtime-spec.org
index 40a97b4..0b37814 100644
--- a/docs/design/2026-05-28-generic-agent-runtime-spec.org
+++ b/docs/design/2026-05-28-generic-agent-runtime-spec.org
@@ -65,30 +65,38 @@ under-specified — spawning a second Claude in the same project to look things
up or update tasks safely — and a new Phase 1.5 sequences that slice ahead of
the runtime-neutral phases 2-6, which remain pending a go/no-go.
-*NOT IMPLEMENTATION-READY* (Craig, 2026-06-11, after the fourth design
-revision). The helper-instance design iterated four times in one evening;
-holding it open until the known gaps close. Readiness checklist — all of
-these before any build starts:
-
-- [X] Emacs launch surface designed (see the open-issue subsection in the
- helper section): every place a session can be born routes through, or is
- caught by, the deterministic path. /Closed 2026-06-12: mechanics verified
+*Readiness is split by arc* (per the 2026-06-12 Codex review's top finding —
+the spec contains two different projects and one label misled):
+
+- *Phase 1.5 — helper instances: READY WITH CAVEATS* (2026-06-12). The
+ caveats are binding, not advisory: the three-ring pre-live gate governs
+ every merge into synced template paths; the manual drills are gates, not
+ suggestions; and the =ai-term.el= work lands as a coordinated
+ cross-project handoff to =~/.emacs.d= (the exact artifact is named in
+ Phase 1.5), so the rulesets side isn't "done" while the F9 path is
+ still unsafe.
+- *Phases 2-5 — runtime-neutral refactor: NOT READY.* Blocked on the
+ /Decisions required before phases 2-5/ section under Open decisions, and
+ on Phase 5's reverification prerequisite (the local-model table is a
+ recommendation, not an implementation constant). Parked pending Craig's
+ go/no-go on the arc.
+
+The original readiness checklist, resolved:
+
+- [X] Emacs launch surface designed. /Closed 2026-06-12: mechanics verified
in ai-term.el's code, integration design written, the three open calls
confirmed by Craig (roster-only sharing, singleton primary,
helper-mode.org as canonical home)./
-- [ ] Pre-live test strategy agreed (see Test strategy): sandbox drills
- pass, and the rollout is gated so nothing reaches live projects via
- template sync until validated — startup.org edits propagate to every
- project on their next session, so "accidentally live everywhere" is the
- default failure mode, not an edge case. /The three-ring gating is
- written; "agreed" lands with the independent review below./
-- [X] A re-read of the whole helper section after the dust settles, since
- four same-day revisions usually leave a seam somewhere. /Done 2026-06-12:
- the coherence pass unified the churned subsections and verified the
- ai-term.el claims against code./
-- [ ] Independent spec review (the =spec-review= cycle, as the KB and
- consolidation specs got) comes back Ready or Ready-with-caveats, and its
- dispositions are folded in via =spec-response=.
+- [X] Pre-live test strategy agreed. /The review accepted the three-ring
+ gate as the release-safety mechanism and asked for per-ring rollback
+ actions — added to the gating section./
+- [X] A re-read of the whole helper section after the dust settles. /Done
+ 2026-06-12: the coherence pass unified the churned subsections and
+ verified the ai-term.el claims against code./
+- [X] Independent spec review. /Codex, 2026-06-12: Not-ready for the
+ combined spec, Phase 1.5 implementable as a scoped slice — which the
+ split rubric above now states directly. Dispositions folded in the same
+ day; see Review dispositions./
* Problem
@@ -423,7 +431,11 @@ Known limits, accepted for v1: an agent session not running as a local
process on this machine (a cloud session against the same checkout) is
invisible to the scan; and the match is on process cwd, so an agent started
from outside the project tree wouldn't be seen. Both are edge shapes the
-operator created deliberately and can manage manually.
+operator created deliberately and can manage manually. The scan is also
+Linux-=/proc=-specific: on an unsupported platform the script reports
+"roster unavailable" explicitly (never a silent "alone"), and startup
+treats that result as the no-op path from the pre-live gate — same behavior
+as the script being absent.
*** Spawn paths: deterministic launcher, startup safety net
@@ -534,7 +546,10 @@ every personal task and corruption has maximal blast radius.
stale file would block hygiene forever, so staleness is surfaced as a
judgment call — the file's own content and timestamps show whether the
helper is really gone — never silently skipped past and never silently
- honored indefinitely.
+ honored indefinitely. The surfaced message is contractual (review
+ finding): it names the file path, its timestamps, and the suggested
+ actions (treat as stale and proceed / wait / abort), so the judgment is
+ made on evidence rather than a bare "helper detected" warning.
2. /A new primary starting while a helper runs./ The previous primary may
wrap and exit while a helper keeps working; the next =ai= launch becomes
primary and runs full startup. The existing guards already do the right
@@ -625,7 +640,12 @@ What remains to design — the integration, not a new surface:
- The =emacs.md= live-reload discipline applies to the ai-term.el changes,
and the change lands in the =~/.emacs.d= project (its own repo and
session scope — a cross-project handoff from rulesets, not a rulesets
- edit).
+ edit). The handoff artifact is exact (review finding, 2026-06-12):
+ implementation step one sends an =inbox-send .emacs.d= handoff carrying
+ this subsection's integration contract plus the recommendations below,
+ and the rulesets task does not close until =.emacs.d= confirms its task
+ is filed or landed — otherwise the shell path ships safe while F9 stays
+ unsafe and nothing tracks the gap.
Recommendations for ai-term.el beyond the helper feature (Craig asked for
these 2026-06-12; they ride the same handoff):
@@ -799,6 +819,12 @@ Independent of the phases 2-6 go/no-go; same-runtime only.
** Phase 5: Local model install handoff
+- Prerequisite (review finding, 2026-06-12): a reverification task runs
+ first — record /current/ model URLs, file sizes, licenses, backend
+ support, a smoke command, memory fit, and fallback behavior against live
+ sources. The model table in the Introductory note is a recommendation
+ frozen at 2026-05-28, not an implementation constant; nothing in doctor
+ checks or the archsetup handoff bakes it in unverified.
- Send archsetup an inbox note requesting local model runtime support.
- After archsetup lands it, teach =rulesets doctor= to verify:
- =llama-server= or =ollama= installed.
@@ -832,24 +858,43 @@ three rings:
the self-ancestry exclusion against the test's own process chain. The
startup hook tested for the no-op guarantee: when =agent-roster= is
absent or reports alone, behavior is byte-identical to today.
+ /Rollback: revert the commit; nothing here touches synced paths yet./
2. /Sandbox ring./ A disposable project (its own git repo, never
template-synced back) runs the live drills before any real project sees
the feature: primary + helper concurrent edits on one org file; the
corruption drill (primary wrap-up pauses on a live helper); the
orphaned-helper drill (primary wraps first, helper closes the door,
tree ends clean); the raw-launch drill (helper started without the
- launcher gets caught by the startup roster); and an Emacs-surface drill
- once that design lands.
+ launcher gets caught by the startup roster); and the Emacs F9 drill
+ (helper spawned via ai-term once its handoff lands).
+ /Rollback: delete the sandbox project; no other surface was touched./
3. /Pilot ring./ The startup detection ships dormant-by-construction —
the hook is a no-op wherever =agent-roster= is missing, and the script
ships first to one pilot project only (copied into its
=.ai/project-scripts/=, which the sync never touches) before the
template-wide release puts it everywhere. Rulesets itself is the
natural pilot: it's where a broken sweep is noticed fastest.
+ /Rollback: delete =agent-roster= from the pilot's project-scripts; the
+ hook reverts to its no-op path on the next session./
+4. /Template-wide release./ The startup branch and the script land in the
+ synced template paths only after the pilot soaks.
+ /Rollback: revert the startup.org commit and remove the script from
+ =claude-templates/.ai/scripts/=; the next sync's =--delete= clears every
+ project's copy, and the no-op guarantee means a half-propagated state
+ (some projects synced, some not) is safe in both directions./
+
+Ring-1 test inventory (the review's list, normative): roster alone /
+ancestry-exclusion / not-alone-on-sleeper cases; startup no-op
+byte-identity when roster is missing or alone; startup routes to
+helper-mode and skips pulls/rsync/inbox when not alone; =ai --helper=
+assigns a sanitized id, exports both vars, uses the helper opener; primary
+and helper resolve distinct context paths; helper-originated =inbox-send=
+slugs carry the id; wrap-up pauses on live helpers before hygiene and
+commit; orphaned-helper close runs only when the roster reports alone;
+=todo-cleanup.el= takes a =/tmp= backup before any mutating mode.
Nothing merges past ring 1 into the synced template paths until ring 2's
-drills pass, and the spec's NOT-IMPLEMENTATION-READY marker clears only
-when all three rings are written into the implementation plan.
+drills pass.
* Open decisions
@@ -866,8 +911,57 @@ when all three rings are written into the implementation plan.
- Which local agent CLI should be the first supported offline editor:
=aider=, =opencode=, a simple custom wrapper, or something else?
+** Decisions required before phases 2-5 — added 2026-06-12 (review finding)
+
+These are the blocker subset of the open decisions above, plus two the
+review added. Phases 2-5 stay NOT READY until each has an accepted answer;
+deciding them inside code is the failure mode this section prevents.
+
+1. Generic instruction-file strategy (=AGENTS.md= / =AI.md= /
+ runtime-specific only).
+2. Default local runtime manager/server (=llama.cpp= only vs =ollama=
+ as the beginner default).
+3. First supported local editing CLI.
+4. Phase-2 adapter scope: Claude + one local runtime only, or Codex
+ support immediately.
+5. Compatibility behavior for existing =CLAUDE.md= / =.claude/= projects
+ during the transition.
+
* Recommended next step
-Start with Phase 1 only. The singleton session-context file is the immediate
-correctness issue for simultaneous agents, and it can be fixed without renaming
-the whole repository or disrupting current Claude installs.
+Updated 2026-06-12: implement Phase 1.5 under its READY-WITH-CAVEATS rubric
+(the helper task in todo.org carries the plan). Phases 2-5 stay parked until
+the decisions section above is answered and Craig calls the go/no-go on the
+arc. The original recommendation — start with Phase 1 only — is complete:
+Phase 1 shipped.
+
+* Review dispositions — 2026-06-12 Codex review
+
+Every recommendation from [[file:2026-05-28-generic-agent-runtime-spec-review.org][the review]], dispositioned:
+
+| Recommendation | Disposition | Where it landed |
+|----------------+-------------+-----------------|
+| Split readiness labels by arc | Accept | Status: dual rubric (1.5 READY WITH CAVEATS, 2-5 NOT READY) |
+| "Decisions required before phases 2-5" section | Accept | Open decisions, new subsection (5 items) |
+| Phase 5 reverification prerequisite | Accept | Phase 5, first bullet; model table marked recommendation-only |
+| Exact =.emacs.d= handoff artifact | Accept | Emacs subsection: =inbox-send= handoff is implementation step one; task closes on =.emacs.d= confirmation |
+| Per-ring rollback actions | Accept | Pre-live gating: rollback line per ring, incl. the half-propagated-sync case |
+| Stale-helper message contract | Accept | Data-integrity rule 1: path + timestamps + suggested actions |
+| Roster unsupported-platform behavior | Accept | Roster subsection: explicit "roster unavailable" result, no-op path |
+| Ring-1 test inventory | Accept | Pre-live gating: the review's list adopted as normative |
+| Docs-update list for 1.5 | Accept | Already in Phase 1.5 items; INDEX auto-routed note included |
+| Physically split the spec (open question) | Modify | Dual rubric in one document; phases 2-5 stay parked here pending Craig's go/no-go, matching the standing task framing |
+
+Rejections: none.
+
+* Review and iteration history
+
+** 2026-06-12 Fri @ 02:09:10 -0500 — Codex — reviewer
+- What changed or was recommended: ran the spec-review workflow and wrote a formal review. Rubric for the whole spec: =Not ready=. Phase 1 is already shipped; Phase 1.5 helper instances are implementable as a scoped slice with the existing rollout/manual-validation caveats; phases 2-5 remain blocked on product choices and time-sensitive local-runtime/model verification.
+- Why: the spec now combines a concrete same-runtime helper implementation with a broader runtime-neutral refactor whose instruction-file, local-runtime, first-CLI, and adapter-scope decisions are still open.
+- Artifacts: [[file:2026-05-28-generic-agent-runtime-spec-review.org][2026-05-28-generic-agent-runtime-spec-review.org]]; existing [[file:../../todo.org][todo.org]] entries for "Helper-instance support" and "Generic agent runtime support — Codex spec v0" updated with the review outcome.
+
+** 2026-06-12 Fri @ 02:23:04 -0500 — Claude — author (spec-response)
+- What changed: folded the review in. All recommendations accepted except the document-split open question, modified to a dual rubric in one document (see Review dispositions). Status now labels Phase 1.5 READY WITH CAVEATS and phases 2-5 NOT READY; the original readiness checklist is fully resolved.
+- Why: the review's top finding was that one Not-ready label hid an implementable slice; the rest hardened the slice's rollout (per-ring rollbacks, normative test inventory, exact .emacs.d handoff artifact, stale-helper message contract, roster platform behavior) and fenced the parked arc (decisions-required section, Phase 5 reverification prerequisite).
+- Artifacts: this spec's Status, Pre-live gating, Phase 5, Open decisions, and Review dispositions sections; the helper task in todo.org carries the same caveats.