diff options
| author | Craig Jennings <c@cjennings.net> | 2026-06-12 02:24:01 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-06-12 02:24:01 -0500 |
| commit | 22e19c21e6aabe0319d4b09a862f4a3705c92509 (patch) | |
| tree | 5210f7ac7bba64b7b1b0b23f56d14f4dafefaf9a /docs/design/2026-05-28-generic-agent-runtime-spec.org | |
| parent | c6fd73441ef0b683abb859863dcd0d48377a4838 (diff) | |
| download | rulesets-22e19c21e6aabe0319d4b09a862f4a3705c92509.tar.gz rulesets-22e19c21e6aabe0319d4b09a862f4a3705c92509.zip | |
docs(spec): fold the Codex review into the agent-runtime spec
The review's top finding was that one Not-ready label hid an implementable slice. Status now splits by arc: Phase 1.5 helper instances are READY WITH CAVEATS (the three-ring gate and the manual drills are binding, and the ai-term.el work is a coordinated .emacs.d handoff with an exact artifact), while phases 2-5 stay NOT READY behind a decisions-required section and a Phase 5 reverification prerequisite that demotes the model table to a recommendation.
The remaining findings hardened the slice: per-ring rollback actions including the half-propagated-sync case, the review's test inventory adopted as normative, a message contract for stale helper files, and explicit roster-unavailable behavior on unsupported platforms. All recommendations accepted except the document split, modified to a dual rubric in one document. The review file and dispositions table ride along.
Diffstat (limited to 'docs/design/2026-05-28-generic-agent-runtime-spec.org')
| -rw-r--r-- | docs/design/2026-05-28-generic-agent-runtime-spec.org | 156 |
1 files changed, 125 insertions, 31 deletions
diff --git a/docs/design/2026-05-28-generic-agent-runtime-spec.org b/docs/design/2026-05-28-generic-agent-runtime-spec.org index 40a97b4..0b37814 100644 --- a/docs/design/2026-05-28-generic-agent-runtime-spec.org +++ b/docs/design/2026-05-28-generic-agent-runtime-spec.org @@ -65,30 +65,38 @@ under-specified — spawning a second Claude in the same project to look things up or update tasks safely — and a new Phase 1.5 sequences that slice ahead of the runtime-neutral phases 2-6, which remain pending a go/no-go. -*NOT IMPLEMENTATION-READY* (Craig, 2026-06-11, after the fourth design -revision). The helper-instance design iterated four times in one evening; -holding it open until the known gaps close. Readiness checklist — all of -these before any build starts: - -- [X] Emacs launch surface designed (see the open-issue subsection in the - helper section): every place a session can be born routes through, or is - caught by, the deterministic path. /Closed 2026-06-12: mechanics verified +*Readiness is split by arc* (per the 2026-06-12 Codex review's top finding — +the spec contains two different projects and one label misled): + +- *Phase 1.5 — helper instances: READY WITH CAVEATS* (2026-06-12). The + caveats are binding, not advisory: the three-ring pre-live gate governs + every merge into synced template paths; the manual drills are gates, not + suggestions; and the =ai-term.el= work lands as a coordinated + cross-project handoff to =~/.emacs.d= (the exact artifact is named in + Phase 1.5), so the rulesets side isn't "done" while the F9 path is + still unsafe. +- *Phases 2-5 — runtime-neutral refactor: NOT READY.* Blocked on the + /Decisions required before phases 2-5/ section under Open decisions, and + on Phase 5's reverification prerequisite (the local-model table is a + recommendation, not an implementation constant). Parked pending Craig's + go/no-go on the arc. + +The original readiness checklist, resolved: + +- [X] Emacs launch surface designed. /Closed 2026-06-12: mechanics verified in ai-term.el's code, integration design written, the three open calls confirmed by Craig (roster-only sharing, singleton primary, helper-mode.org as canonical home)./ -- [ ] Pre-live test strategy agreed (see Test strategy): sandbox drills - pass, and the rollout is gated so nothing reaches live projects via - template sync until validated — startup.org edits propagate to every - project on their next session, so "accidentally live everywhere" is the - default failure mode, not an edge case. /The three-ring gating is - written; "agreed" lands with the independent review below./ -- [X] A re-read of the whole helper section after the dust settles, since - four same-day revisions usually leave a seam somewhere. /Done 2026-06-12: - the coherence pass unified the churned subsections and verified the - ai-term.el claims against code./ -- [ ] Independent spec review (the =spec-review= cycle, as the KB and - consolidation specs got) comes back Ready or Ready-with-caveats, and its - dispositions are folded in via =spec-response=. +- [X] Pre-live test strategy agreed. /The review accepted the three-ring + gate as the release-safety mechanism and asked for per-ring rollback + actions — added to the gating section./ +- [X] A re-read of the whole helper section after the dust settles. /Done + 2026-06-12: the coherence pass unified the churned subsections and + verified the ai-term.el claims against code./ +- [X] Independent spec review. /Codex, 2026-06-12: Not-ready for the + combined spec, Phase 1.5 implementable as a scoped slice — which the + split rubric above now states directly. Dispositions folded in the same + day; see Review dispositions./ * Problem @@ -423,7 +431,11 @@ Known limits, accepted for v1: an agent session not running as a local process on this machine (a cloud session against the same checkout) is invisible to the scan; and the match is on process cwd, so an agent started from outside the project tree wouldn't be seen. Both are edge shapes the -operator created deliberately and can manage manually. +operator created deliberately and can manage manually. The scan is also +Linux-=/proc=-specific: on an unsupported platform the script reports +"roster unavailable" explicitly (never a silent "alone"), and startup +treats that result as the no-op path from the pre-live gate — same behavior +as the script being absent. *** Spawn paths: deterministic launcher, startup safety net @@ -534,7 +546,10 @@ every personal task and corruption has maximal blast radius. stale file would block hygiene forever, so staleness is surfaced as a judgment call — the file's own content and timestamps show whether the helper is really gone — never silently skipped past and never silently - honored indefinitely. + honored indefinitely. The surfaced message is contractual (review + finding): it names the file path, its timestamps, and the suggested + actions (treat as stale and proceed / wait / abort), so the judgment is + made on evidence rather than a bare "helper detected" warning. 2. /A new primary starting while a helper runs./ The previous primary may wrap and exit while a helper keeps working; the next =ai= launch becomes primary and runs full startup. The existing guards already do the right @@ -625,7 +640,12 @@ What remains to design — the integration, not a new surface: - The =emacs.md= live-reload discipline applies to the ai-term.el changes, and the change lands in the =~/.emacs.d= project (its own repo and session scope — a cross-project handoff from rulesets, not a rulesets - edit). + edit). The handoff artifact is exact (review finding, 2026-06-12): + implementation step one sends an =inbox-send .emacs.d= handoff carrying + this subsection's integration contract plus the recommendations below, + and the rulesets task does not close until =.emacs.d= confirms its task + is filed or landed — otherwise the shell path ships safe while F9 stays + unsafe and nothing tracks the gap. Recommendations for ai-term.el beyond the helper feature (Craig asked for these 2026-06-12; they ride the same handoff): @@ -799,6 +819,12 @@ Independent of the phases 2-6 go/no-go; same-runtime only. ** Phase 5: Local model install handoff +- Prerequisite (review finding, 2026-06-12): a reverification task runs + first — record /current/ model URLs, file sizes, licenses, backend + support, a smoke command, memory fit, and fallback behavior against live + sources. The model table in the Introductory note is a recommendation + frozen at 2026-05-28, not an implementation constant; nothing in doctor + checks or the archsetup handoff bakes it in unverified. - Send archsetup an inbox note requesting local model runtime support. - After archsetup lands it, teach =rulesets doctor= to verify: - =llama-server= or =ollama= installed. @@ -832,24 +858,43 @@ three rings: the self-ancestry exclusion against the test's own process chain. The startup hook tested for the no-op guarantee: when =agent-roster= is absent or reports alone, behavior is byte-identical to today. + /Rollback: revert the commit; nothing here touches synced paths yet./ 2. /Sandbox ring./ A disposable project (its own git repo, never template-synced back) runs the live drills before any real project sees the feature: primary + helper concurrent edits on one org file; the corruption drill (primary wrap-up pauses on a live helper); the orphaned-helper drill (primary wraps first, helper closes the door, tree ends clean); the raw-launch drill (helper started without the - launcher gets caught by the startup roster); and an Emacs-surface drill - once that design lands. + launcher gets caught by the startup roster); and the Emacs F9 drill + (helper spawned via ai-term once its handoff lands). + /Rollback: delete the sandbox project; no other surface was touched./ 3. /Pilot ring./ The startup detection ships dormant-by-construction — the hook is a no-op wherever =agent-roster= is missing, and the script ships first to one pilot project only (copied into its =.ai/project-scripts/=, which the sync never touches) before the template-wide release puts it everywhere. Rulesets itself is the natural pilot: it's where a broken sweep is noticed fastest. + /Rollback: delete =agent-roster= from the pilot's project-scripts; the + hook reverts to its no-op path on the next session./ +4. /Template-wide release./ The startup branch and the script land in the + synced template paths only after the pilot soaks. + /Rollback: revert the startup.org commit and remove the script from + =claude-templates/.ai/scripts/=; the next sync's =--delete= clears every + project's copy, and the no-op guarantee means a half-propagated state + (some projects synced, some not) is safe in both directions./ + +Ring-1 test inventory (the review's list, normative): roster alone / +ancestry-exclusion / not-alone-on-sleeper cases; startup no-op +byte-identity when roster is missing or alone; startup routes to +helper-mode and skips pulls/rsync/inbox when not alone; =ai --helper= +assigns a sanitized id, exports both vars, uses the helper opener; primary +and helper resolve distinct context paths; helper-originated =inbox-send= +slugs carry the id; wrap-up pauses on live helpers before hygiene and +commit; orphaned-helper close runs only when the roster reports alone; +=todo-cleanup.el= takes a =/tmp= backup before any mutating mode. Nothing merges past ring 1 into the synced template paths until ring 2's -drills pass, and the spec's NOT-IMPLEMENTATION-READY marker clears only -when all three rings are written into the implementation plan. +drills pass. * Open decisions @@ -866,8 +911,57 @@ when all three rings are written into the implementation plan. - Which local agent CLI should be the first supported offline editor: =aider=, =opencode=, a simple custom wrapper, or something else? +** Decisions required before phases 2-5 — added 2026-06-12 (review finding) + +These are the blocker subset of the open decisions above, plus two the +review added. Phases 2-5 stay NOT READY until each has an accepted answer; +deciding them inside code is the failure mode this section prevents. + +1. Generic instruction-file strategy (=AGENTS.md= / =AI.md= / + runtime-specific only). +2. Default local runtime manager/server (=llama.cpp= only vs =ollama= + as the beginner default). +3. First supported local editing CLI. +4. Phase-2 adapter scope: Claude + one local runtime only, or Codex + support immediately. +5. Compatibility behavior for existing =CLAUDE.md= / =.claude/= projects + during the transition. + * Recommended next step -Start with Phase 1 only. The singleton session-context file is the immediate -correctness issue for simultaneous agents, and it can be fixed without renaming -the whole repository or disrupting current Claude installs. +Updated 2026-06-12: implement Phase 1.5 under its READY-WITH-CAVEATS rubric +(the helper task in todo.org carries the plan). Phases 2-5 stay parked until +the decisions section above is answered and Craig calls the go/no-go on the +arc. The original recommendation — start with Phase 1 only — is complete: +Phase 1 shipped. + +* Review dispositions — 2026-06-12 Codex review + +Every recommendation from [[file:2026-05-28-generic-agent-runtime-spec-review.org][the review]], dispositioned: + +| Recommendation | Disposition | Where it landed | +|----------------+-------------+-----------------| +| Split readiness labels by arc | Accept | Status: dual rubric (1.5 READY WITH CAVEATS, 2-5 NOT READY) | +| "Decisions required before phases 2-5" section | Accept | Open decisions, new subsection (5 items) | +| Phase 5 reverification prerequisite | Accept | Phase 5, first bullet; model table marked recommendation-only | +| Exact =.emacs.d= handoff artifact | Accept | Emacs subsection: =inbox-send= handoff is implementation step one; task closes on =.emacs.d= confirmation | +| Per-ring rollback actions | Accept | Pre-live gating: rollback line per ring, incl. the half-propagated-sync case | +| Stale-helper message contract | Accept | Data-integrity rule 1: path + timestamps + suggested actions | +| Roster unsupported-platform behavior | Accept | Roster subsection: explicit "roster unavailable" result, no-op path | +| Ring-1 test inventory | Accept | Pre-live gating: the review's list adopted as normative | +| Docs-update list for 1.5 | Accept | Already in Phase 1.5 items; INDEX auto-routed note included | +| Physically split the spec (open question) | Modify | Dual rubric in one document; phases 2-5 stay parked here pending Craig's go/no-go, matching the standing task framing | + +Rejections: none. + +* Review and iteration history + +** 2026-06-12 Fri @ 02:09:10 -0500 — Codex — reviewer +- What changed or was recommended: ran the spec-review workflow and wrote a formal review. Rubric for the whole spec: =Not ready=. Phase 1 is already shipped; Phase 1.5 helper instances are implementable as a scoped slice with the existing rollout/manual-validation caveats; phases 2-5 remain blocked on product choices and time-sensitive local-runtime/model verification. +- Why: the spec now combines a concrete same-runtime helper implementation with a broader runtime-neutral refactor whose instruction-file, local-runtime, first-CLI, and adapter-scope decisions are still open. +- Artifacts: [[file:2026-05-28-generic-agent-runtime-spec-review.org][2026-05-28-generic-agent-runtime-spec-review.org]]; existing [[file:../../todo.org][todo.org]] entries for "Helper-instance support" and "Generic agent runtime support — Codex spec v0" updated with the review outcome. + +** 2026-06-12 Fri @ 02:23:04 -0500 — Claude — author (spec-response) +- What changed: folded the review in. All recommendations accepted except the document-split open question, modified to a dual rubric in one document (see Review dispositions). Status now labels Phase 1.5 READY WITH CAVEATS and phases 2-5 NOT READY; the original readiness checklist is fully resolved. +- Why: the review's top finding was that one Not-ready label hid an implementable slice; the rest hardened the slice's rollout (per-ring rollbacks, normative test inventory, exact .emacs.d handoff artifact, stale-helper message contract, roster platform behavior) and fenced the parked arc (decisions-required section, Phase 5 reverification prerequisite). +- Artifacts: this spec's Status, Pre-live gating, Phase 5, Open decisions, and Review dispositions sections; the helper task in todo.org carries the same caveats. |
