docs(spec): fold the Codex review into the agent-runtime spec

The review's top finding was that one Not-ready label hid an implementable slice. Status now splits by arc: Phase 1.5 helper instances are READY WITH CAVEATS (the three-ring gate and the manual drills are binding, and the ai-term.el work is a coordinated .emacs.d handoff with an exact artifact), while phases 2-5 stay NOT READY behind a decisions-required section and a Phase 5 reverification prerequisite that demotes the model table to a recommendation. The remaining findings hardened the slice: per-ring rollback actions including the half-propagated-sync case, the review's test inventory adopted as normative, a message contract for stale helper files, and explicit roster-unavailable behavior on unsupported platforms. All recommendations accepted except the document split, modified to a dual rubric in one document. The review file and dispositions table ride along.
author: Craig Jennings <c@cjennings.net> 2026-06-12 02:24:01 -0500
committer: Craig Jennings <c@cjennings.net> 2026-06-12 02:24:01 -0500
commit: 22e19c21e6aabe0319d4b09a862f4a3705c92509 (patch)
tree: 5210f7ac7bba64b7b1b0b23f56d14f4dafefaf9a /docs/design/2026-05-28-generic-agent-runtime-spec.org
parent: c6fd73441ef0b683abb859863dcd0d48377a4838 (diff)
download: rulesets-22e19c21e6aabe0319d4b09a862f4a3705c92509.tar.gz
rulesets-22e19c21e6aabe0319d4b09a862f4a3705c92509.zip
1 files changed, 125 insertions, 31 deletions
diff --git a/docs/design/2026-05-28-generic-agent-runtime-spec.org b/docs/design/2026-05-28-generic-agent-runtime-spec.org
index 40a97b4..0b37814 100644
--- a/docs/design/2026-05-28-generic-agent-runtime-spec.org
+++ b/docs/design/2026-05-28-generic-agent-runtime-spec.org
@@ -65,30 +65,38 @@ under-specified — spawning a second Claude in the same project to look things
 up or update tasks safely — and a new Phase 1.5 sequences that slice ahead of
 the runtime-neutral phases 2-6, which remain pending a go/no-go.
 
-*NOT IMPLEMENTATION-READY* (Craig, 2026-06-11, after the fourth design
-revision). The helper-instance design iterated four times in one evening;
-holding it open until the known gaps close. Readiness checklist — all of
-these before any build starts:
-
-- [X] Emacs launch surface designed (see the open-issue subsection in the
-  helper section): every place a session can be born routes through, or is
-  caught by, the deterministic path. /Closed 2026-06-12: mechanics verified
+*Readiness is split by arc* (per the 2026-06-12 Codex review's top finding —
+the spec contains two different projects and one label misled):
+
+- *Phase 1.5 — helper instances: READY WITH CAVEATS* (2026-06-12). The
+  caveats are binding, not advisory: the three-ring pre-live gate governs
+  every merge into synced template paths; the manual drills are gates, not
+  suggestions; and the =ai-term.el= work lands as a coordinated
+  cross-project handoff to =~/.emacs.d= (the exact artifact is named in
+  Phase 1.5), so the rulesets side isn't "done" while the F9 path is
+  still unsafe.
+- *Phases 2-5 — runtime-neutral refactor: NOT READY.* Blocked on the
+  /Decisions required before phases 2-5/ section under Open decisions, and
+  on Phase 5's reverification prerequisite (the local-model table is a
+  recommendation, not an implementation constant). Parked pending Craig's
+  go/no-go on the arc.
+
+The original readiness checklist, resolved:
+
+- [X] Emacs launch surface designed. /Closed 2026-06-12: mechanics verified
   in ai-term.el's code, integration design written, the three open calls
   confirmed by Craig (roster-only sharing, singleton primary,
   helper-mode.org as canonical home)./
-- [ ] Pre-live test strategy agreed (see Test strategy): sandbox drills
-  pass, and the rollout is gated so nothing reaches live projects via
-  template sync until validated — startup.org edits propagate to every
-  project on their next session, so "accidentally live everywhere" is the
-  default failure mode, not an edge case. /The three-ring gating is
-  written; "agreed" lands with the independent review below./
-- [X] A re-read of the whole helper section after the dust settles, since
-  four same-day revisions usually leave a seam somewhere. /Done 2026-06-12:
-  the coherence pass unified the churned subsections and verified the
-  ai-term.el claims against code./
-- [ ] Independent spec review (the =spec-review= cycle, as the KB and
-  consolidation specs got) comes back Ready or Ready-with-caveats, and its
-  dispositions are folded in via =spec-response=.
+- [X] Pre-live test strategy agreed. /The review accepted the three-ring
+  gate as the release-safety mechanism and asked for per-ring rollback
+  actions — added to the gating section./
+- [X] A re-read of the whole helper section after the dust settles. /Done
+  2026-06-12: the coherence pass unified the churned subsections and
+  verified the ai-term.el claims against code./
+- [X] Independent spec review. /Codex, 2026-06-12: Not-ready for the
+  combined spec, Phase 1.5 implementable as a scoped slice — which the
+  split rubric above now states directly. Dispositions folded in the same
+  day; see Review dispositions./
 
 * Problem
 
@@ -423,7 +431,11 @@ Known limits, accepted for v1: an agent session not running as a local
 process on this machine (a cloud session against the same checkout) is
 invisible to the scan; and the match is on process cwd, so an agent started
 from outside the project tree wouldn't be seen. Both are edge shapes the
-operator created deliberately and can manage manually.
+operator created deliberately and can manage manually. The scan is also
+Linux-=/proc=-specific: on an unsupported platform the script reports
+"roster unavailable" explicitly (never a silent "alone"), and startup
+treats that result as the no-op path from the pre-live gate — same behavior
+as the script being absent.
 
 *** Spawn paths: deterministic launcher, startup safety net
 
@@ -534,7 +546,10 @@ every personal task and corruption has maximal blast radius.
    stale file would block hygiene forever, so staleness is surfaced as a
    judgment call — the file's own content and timestamps show whether the
    helper is really gone — never silently skipped past and never silently
-   honored indefinitely.
+   honored indefinitely. The surfaced message is contractual (review
+   finding): it names the file path, its timestamps, and the suggested
+   actions (treat as stale and proceed / wait / abort), so the judgment is
+   made on evidence rather than a bare "helper detected" warning.
 2. /A new primary starting while a helper runs./ The previous primary may
    wrap and exit while a helper keeps working; the next =ai= launch becomes
    primary and runs full startup. The existing guards already do the right
@@ -625,7 +640,12 @@ What remains to design — the integration, not a new surface:
 - The =emacs.md= live-reload discipline applies to the ai-term.el changes,
   and the change lands in the =~/.emacs.d= project (its own repo and
   session scope — a cross-project handoff from rulesets, not a rulesets
-  edit).
+  edit). The handoff artifact is exact (review finding, 2026-06-12):
+  implementation step one sends an =inbox-send .emacs.d= handoff carrying
+  this subsection's integration contract plus the recommendations below,
+  and the rulesets task does not close until =.emacs.d= confirms its task
+  is filed or landed — otherwise the shell path ships safe while F9 stays
+  unsafe and nothing tracks the gap.
 
 Recommendations for ai-term.el beyond the helper feature (Craig asked for
 these 2026-06-12; they ride the same handoff):
@@ -799,6 +819,12 @@ Independent of the phases 2-6 go/no-go; same-runtime only.
 
 ** Phase 5: Local model install handoff
 
+- Prerequisite (review finding, 2026-06-12): a reverification task runs
+  first — record /current/ model URLs, file sizes, licenses, backend
+  support, a smoke command, memory fit, and fallback behavior against live
+  sources. The model table in the Introductory note is a recommendation
+  frozen at 2026-05-28, not an implementation constant; nothing in doctor
+  checks or the archsetup handoff bakes it in unverified.
 - Send archsetup an inbox note requesting local model runtime support.
 - After archsetup lands it, teach =rulesets doctor= to verify:
   - =llama-server= or =ollama= installed.
@@ -832,24 +858,43 @@ three rings:
    the self-ancestry exclusion against the test's own process chain. The
    startup hook tested for the no-op guarantee: when =agent-roster= is
    absent or reports alone, behavior is byte-identical to today.
+   /Rollback: revert the commit; nothing here touches synced paths yet./
 2. /Sandbox ring./ A disposable project (its own git repo, never
    template-synced back) runs the live drills before any real project sees
    the feature: primary + helper concurrent edits on one org file; the
    corruption drill (primary wrap-up pauses on a live helper); the
    orphaned-helper drill (primary wraps first, helper closes the door,
    tree ends clean); the raw-launch drill (helper started without the
-   launcher gets caught by the startup roster); and an Emacs-surface drill
-   once that design lands.
+   launcher gets caught by the startup roster); and the Emacs F9 drill
+   (helper spawned via ai-term once its handoff lands).
+   /Rollback: delete the sandbox project; no other surface was touched./
 3. /Pilot ring./ The startup detection ships dormant-by-construction —
    the hook is a no-op wherever =agent-roster= is missing, and the script
    ships first to one pilot project only (copied into its
    =.ai/project-scripts/=, which the sync never touches) before the
    template-wide release puts it everywhere. Rulesets itself is the
    natural pilot: it's where a broken sweep is noticed fastest.
+   /Rollback: delete =agent-roster= from the pilot's project-scripts; the
+   hook reverts to its no-op path on the next session./
+4. /Template-wide release./ The startup branch and the script land in the
+   synced template paths only after the pilot soaks.
+   /Rollback: revert the startup.org commit and remove the script from
+   =claude-templates/.ai/scripts/=; the next sync's =--delete= clears every
+   project's copy, and the no-op guarantee means a half-propagated state
+   (some projects synced, some not) is safe in both directions./
+
+Ring-1 test inventory (the review's list, normative): roster alone /
+ancestry-exclusion / not-alone-on-sleeper cases; startup no-op
+byte-identity when roster is missing or alone; startup routes to
+helper-mode and skips pulls/rsync/inbox when not alone; =ai --helper=
+assigns a sanitized id, exports both vars, uses the helper opener; primary
+and helper resolve distinct context paths; helper-originated =inbox-send=
+slugs carry the id; wrap-up pauses on live helpers before hygiene and
+commit; orphaned-helper close runs only when the roster reports alone;
+=todo-cleanup.el= takes a =/tmp= backup before any mutating mode.
 
 Nothing merges past ring 1 into the synced template paths until ring 2's
-drills pass, and the spec's NOT-IMPLEMENTATION-READY marker clears only
-when all three rings are written into the implementation plan.
+drills pass.
 
 * Open decisions
 
@@ -866,8 +911,57 @@ when all three rings are written into the implementation plan.
 - Which local agent CLI should be the first supported offline editor:
   =aider=, =opencode=, a simple custom wrapper, or something else?
 
+** Decisions required before phases 2-5 — added 2026-06-12 (review finding)
+
+These are the blocker subset of the open decisions above, plus two the
+review added. Phases 2-5 stay NOT READY until each has an accepted answer;
+deciding them inside code is the failure mode this section prevents.
+
+1. Generic instruction-file strategy (=AGENTS.md= / =AI.md= /
+   runtime-specific only).
+2. Default local runtime manager/server (=llama.cpp= only vs =ollama=
+   as the beginner default).
+3. First supported local editing CLI.
+4. Phase-2 adapter scope: Claude + one local runtime only, or Codex
+   support immediately.
+5. Compatibility behavior for existing =CLAUDE.md= / =.claude/= projects
+   during the transition.
+
 * Recommended next step
 
-Start with Phase 1 only. The singleton session-context file is the immediate
-correctness issue for simultaneous agents, and it can be fixed without renaming
-the whole repository or disrupting current Claude installs.
+Updated 2026-06-12: implement Phase 1.5 under its READY-WITH-CAVEATS rubric
+(the helper task in todo.org carries the plan). Phases 2-5 stay parked until
+the decisions section above is answered and Craig calls the go/no-go on the
+arc. The original recommendation — start with Phase 1 only — is complete:
+Phase 1 shipped.
+
+* Review dispositions — 2026-06-12 Codex review
+
+Every recommendation from [[file:2026-05-28-generic-agent-runtime-spec-review.org][the review]], dispositioned:
+
+| Recommendation | Disposition | Where it landed |
+|----------------+-------------+-----------------|
+| Split readiness labels by arc | Accept | Status: dual rubric (1.5 READY WITH CAVEATS, 2-5 NOT READY) |
+| "Decisions required before phases 2-5" section | Accept | Open decisions, new subsection (5 items) |
+| Phase 5 reverification prerequisite | Accept | Phase 5, first bullet; model table marked recommendation-only |
+| Exact =.emacs.d= handoff artifact | Accept | Emacs subsection: =inbox-send= handoff is implementation step one; task closes on =.emacs.d= confirmation |
+| Per-ring rollback actions | Accept | Pre-live gating: rollback line per ring, incl. the half-propagated-sync case |
+| Stale-helper message contract | Accept | Data-integrity rule 1: path + timestamps + suggested actions |
+| Roster unsupported-platform behavior | Accept | Roster subsection: explicit "roster unavailable" result, no-op path |
+| Ring-1 test inventory | Accept | Pre-live gating: the review's list adopted as normative |
+| Docs-update list for 1.5 | Accept | Already in Phase 1.5 items; INDEX auto-routed note included |
+| Physically split the spec (open question) | Modify | Dual rubric in one document; phases 2-5 stay parked here pending Craig's go/no-go, matching the standing task framing |
+
+Rejections: none.
+
+* Review and iteration history
+
+** 2026-06-12 Fri @ 02:09:10 -0500 — Codex — reviewer
+- What changed or was recommended: ran the spec-review workflow and wrote a formal review. Rubric for the whole spec: =Not ready=. Phase 1 is already shipped; Phase 1.5 helper instances are implementable as a scoped slice with the existing rollout/manual-validation caveats; phases 2-5 remain blocked on product choices and time-sensitive local-runtime/model verification.
+- Why: the spec now combines a concrete same-runtime helper implementation with a broader runtime-neutral refactor whose instruction-file, local-runtime, first-CLI, and adapter-scope decisions are still open.
+- Artifacts: [[file:2026-05-28-generic-agent-runtime-spec-review.org][2026-05-28-generic-agent-runtime-spec-review.org]]; existing [[file:../../todo.org][todo.org]] entries for "Helper-instance support" and "Generic agent runtime support — Codex spec v0" updated with the review outcome.
+
+** 2026-06-12 Fri @ 02:23:04 -0500 — Claude — author (spec-response)
+- What changed: folded the review in. All recommendations accepted except the document-split open question, modified to a dual rubric in one document (see Review dispositions). Status now labels Phase 1.5 READY WITH CAVEATS and phases 2-5 NOT READY; the original readiness checklist is fully resolved.
+- Why: the review's top finding was that one Not-ready label hid an implementable slice; the rest hardened the slice's rollout (per-ring rollbacks, normative test inventory, exact .emacs.d handoff artifact, stale-helper message contract, roster platform behavior) and fenced the parked arc (decisions-required section, Phase 5 reverification prerequisite).
+- Artifacts: this spec's Status, Pre-live gating, Phase 5, Open decisions, and Review dispositions sections; the helper task in todo.org carries the same caveats.
author	Craig Jennings <c@cjennings.net>	2026-06-12 02:24:01 -0500
committer	Craig Jennings <c@cjennings.net>	2026-06-12 02:24:01 -0500
commit	22e19c21e6aabe0319d4b09a862f4a3705c92509 (patch)
tree	5210f7ac7bba64b7b1b0b23f56d14f4dafefaf9a /docs/design/2026-05-28-generic-agent-runtime-spec.org
parent	c6fd73441ef0b683abb859863dcd0d48377a4838 (diff)
download	rulesets-22e19c21e6aabe0319d4b09a862f4a3705c92509.tar.gz rulesets-22e19c21e6aabe0319d4b09a862f4a3705c92509.zip