From 53cfd886d3594c60aaa14b07a3077c124c756ba5 Mon Sep 17 00:00:00 2001 From: Craig Jennings Date: Thu, 11 Jun 2026 20:09:21 -0500 Subject: docs(spec): hold helper instances as not-ready behind Emacs surface and test gating Two gaps block implementation. Sessions are also born from Emacs terminal buffers, where roster detection works (the scan matches process cwd, and eat/vterm shells are children of emacs) but the deterministic spawn path doesn't exist; the open issue weighs an elisp command against shelling out to ai with a no-tmux mode, leaning to the latter so the logic lives once. Second, template sync makes "live everywhere" the default failure mode for startup.org changes, so the test strategy gains three-ring gating: bats with sleeper processes and a byte-identical no-op guarantee, a disposable sandbox project for the corruption, orphaned-helper, and raw-launch drills, then a dormant-by-construction pilot through project-scripts before the template-wide release. The Status section carries the readiness checklist and the implementation task is blocked on it. --- .../2026-05-28-generic-agent-runtime-spec.org | 73 ++++++++++++++++++++++ todo.org | 4 +- 2 files changed, 76 insertions(+), 1 deletion(-) diff --git a/docs/design/2026-05-28-generic-agent-runtime-spec.org b/docs/design/2026-05-28-generic-agent-runtime-spec.org index 2b451cc..9b6d535 100644 --- a/docs/design/2026-05-28-generic-agent-runtime-spec.org +++ b/docs/design/2026-05-28-generic-agent-runtime-spec.org @@ -65,6 +65,22 @@ under-specified — spawning a second Claude in the same project to look things up or update tasks safely — and a new Phase 1.5 sequences that slice ahead of the runtime-neutral phases 2-6, which remain pending a go/no-go. +*NOT IMPLEMENTATION-READY* (Craig, 2026-06-11, after the fourth design +revision). The helper-instance design iterated four times in one evening; +holding it open until the known gaps close. Readiness checklist — all of +these before any build starts: + +- [ ] Emacs launch surface designed (see the open-issue subsection in the + helper section): every place a session can be born routes through, or is + caught by, the deterministic path. +- [ ] Pre-live test strategy agreed (see Test strategy): sandbox drills + pass, and the rollout is gated so nothing reaches live projects via + template sync until validated — startup.org edits propagate to every + project on their next session, so "accidentally live everywhere" is the + default failure mode, not an edge case. +- [ ] A re-read of the whole helper section after the dust settles, since + four same-day revisions usually leave a seam somewhere. + * Problem =rulesets= is named and wired as a Claude Code rules distribution: @@ -524,6 +540,32 @@ timestamps (=YYYY-MM-DD-HHMM-from--=). A helper and primary sending to the same target in the same minute with the same slug would collide; helper-originated sends include the agent id in the slug. +*** Open issue: the Emacs launch surface — unresolved, blocks readiness + +Craig works primarily inside Emacs, and sessions are born from more surfaces +than the =ai= script: an Emacs terminal buffer (eat/vterm — these run their +shells as children of the =emacs= process, with no separate emulator +process), a tmux pane viewed through Emacs, or a raw shell. Verified +2026-06-11: roster /detection/ is parent-agnostic — it matches on process +cwd, not ancestry, so an agent started inside an Emacs terminal is fully +visible to the scan. What's missing is the /deterministic spawn path/ on the +Emacs surface: the tmux-window mechanics of =ai --helper= don't apply inside +an Emacs buffer. + +To design before this spec is implementation-ready: + +- An elisp launch command (working name =ai-helper=) that does the same + (a) roster check, (b) id assignment + env export, (c) launch with the + helper-mode opener — inside an eat/vterm buffer, with a buffer-naming + convention standing in for tmux window names + (e.g. =*helper:rulesets:7f3a*=). +- Whether the existing Emacs entry points Craig actually uses to start + agents should call the =ai= script underneath (one implementation, two + surfaces) or reimplement the three steps in elisp (no tmux dependency + inside Emacs). Leaning toward shelling out to =ai= with a =--no-tmux= + mode so the roster/id/instruction logic lives in exactly one place. +- The =emacs.md= live-reload discipline applies to whatever elisp ships. + *** Helper startup and wrap-up Helper startup is deliberately light: resolve the context path, read @@ -697,6 +739,37 @@ Independent of the phases 2-6 go/no-go; same-runtime only. =session-context.d= without breaking old projects. - Test cross-agent targeted messages with =TARGET_AGENT_ID=. +** Phase 1.5 pre-live gating — added 2026-06-11, blocks readiness + +The structural risk is the template sync: =startup.org= and =.ai/scripts/= +propagate to /every/ project on its next session, so a half-validated +detection hook goes live everywhere by default. The rollout is gated in +three rings: + +1. /Bats ring./ =agent-roster= tested with spawned sleeper processes + standing in for agents (real =/proc=, real cwds, no fakes needed) plus + the self-ancestry exclusion against the test's own process chain. The + startup hook tested for the no-op guarantee: when =agent-roster= is + absent or reports alone, behavior is byte-identical to today. +2. /Sandbox ring./ A disposable project (its own git repo, never + template-synced back) runs the live drills before any real project sees + the feature: primary + helper concurrent edits on one org file; the + corruption drill (primary wrap-up pauses on a live helper); the + orphaned-helper drill (primary wraps first, helper closes the door, + tree ends clean); the raw-launch drill (helper started without the + launcher gets caught by the startup roster); and an Emacs-surface drill + once that design lands. +3. /Pilot ring./ The startup detection ships dormant-by-construction — + the hook is a no-op wherever =agent-roster= is missing, and the script + ships first to one pilot project only (copied into its + =.ai/project-scripts/=, which the sync never touches) before the + template-wide release puts it everywhere. Rulesets itself is the + natural pilot: it's where a broken sweep is noticed fastest. + +Nothing merges past ring 1 into the synced template paths until ring 2's +drills pass, and the spec's NOT-IMPLEMENTATION-READY marker clears only +when all three rings are written into the implementation plan. + * Open decisions - What should the generic project instruction file be: =AGENTS.md=, diff --git a/todo.org b/todo.org index 9f7ec41..2d44d8f 100644 --- a/todo.org +++ b/todo.org @@ -40,11 +40,13 @@ CLOSED: [2026-06-11 Thu] Cancelled 2026-06-11: Craig confirmed the decision — one todo queue with a single Open Work / Resolved pair. Home reshapes its consolidated file to that form, and the existing single-pair tooling works unmodified. No code change needed. -** TODO [#B] Helper-instance support — concurrent same-project Claude :feature: +** TODO [#B] Helper-instance support — concurrent same-project Claude :feature:spec: :PROPERTIES: :CREATED: [2026-06-11 Thu] :LAST_REVIEWED: 2026-06-11 :END: +BLOCKED ON SPEC READINESS (Craig, 2026-06-11): the spec is marked NOT IMPLEMENTATION-READY after four same-day design revisions. Before any build — design the Emacs launch surface (elisp launch command or ai --no-tmux; detection already covers Emacs-born agents, spawn does not), write the three-ring test gating into the implementation plan (bats → sandbox drills → pilot project, dormant-by-construction so template sync can't put it live early), and re-read the whole helper section for seams. Then implement. + Implement Phase 1.5 of the generic-agent-runtime spec ([[file:docs/design/2026-05-28-generic-agent-runtime-spec.org][spec]], amended 2026-06-11 with the "Concurrent same-project agents" section). Craig's case: spawn a second Claude in the same project to look things up or update tasks safely while the primary works. The session-context split (AI_AGENT_ID + session-context.d/) already shipped; this builds the rest: - =agent-roster= detection script (the load-bearing piece, replaces operator action entirely): pgrep + /proc cwd match within project root + self-ancestry exclusion; verified live 2026-06-11 with 4 concurrent agents. Bats coverage. -- cgit v1.2.3