aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/design/2026-05-28-generic-agent-runtime-spec.org73
1 files changed, 73 insertions, 0 deletions
diff --git a/docs/design/2026-05-28-generic-agent-runtime-spec.org b/docs/design/2026-05-28-generic-agent-runtime-spec.org
index 2b451cc..9b6d535 100644
--- a/docs/design/2026-05-28-generic-agent-runtime-spec.org
+++ b/docs/design/2026-05-28-generic-agent-runtime-spec.org
@@ -65,6 +65,22 @@ under-specified — spawning a second Claude in the same project to look things
up or update tasks safely — and a new Phase 1.5 sequences that slice ahead of
the runtime-neutral phases 2-6, which remain pending a go/no-go.
+*NOT IMPLEMENTATION-READY* (Craig, 2026-06-11, after the fourth design
+revision). The helper-instance design iterated four times in one evening;
+holding it open until the known gaps close. Readiness checklist — all of
+these before any build starts:
+
+- [ ] Emacs launch surface designed (see the open-issue subsection in the
+ helper section): every place a session can be born routes through, or is
+ caught by, the deterministic path.
+- [ ] Pre-live test strategy agreed (see Test strategy): sandbox drills
+ pass, and the rollout is gated so nothing reaches live projects via
+ template sync until validated — startup.org edits propagate to every
+ project on their next session, so "accidentally live everywhere" is the
+ default failure mode, not an edge case.
+- [ ] A re-read of the whole helper section after the dust settles, since
+ four same-day revisions usually leave a seam somewhere.
+
* Problem
=rulesets= is named and wired as a Claude Code rules distribution:
@@ -524,6 +540,32 @@ timestamps (=YYYY-MM-DD-HHMM-from-<project>-<slug>=). A helper and primary
sending to the same target in the same minute with the same slug would
collide; helper-originated sends include the agent id in the slug.
+*** Open issue: the Emacs launch surface — unresolved, blocks readiness
+
+Craig works primarily inside Emacs, and sessions are born from more surfaces
+than the =ai= script: an Emacs terminal buffer (eat/vterm — these run their
+shells as children of the =emacs= process, with no separate emulator
+process), a tmux pane viewed through Emacs, or a raw shell. Verified
+2026-06-11: roster /detection/ is parent-agnostic — it matches on process
+cwd, not ancestry, so an agent started inside an Emacs terminal is fully
+visible to the scan. What's missing is the /deterministic spawn path/ on the
+Emacs surface: the tmux-window mechanics of =ai --helper= don't apply inside
+an Emacs buffer.
+
+To design before this spec is implementation-ready:
+
+- An elisp launch command (working name =ai-helper=) that does the same
+ (a) roster check, (b) id assignment + env export, (c) launch with the
+ helper-mode opener — inside an eat/vterm buffer, with a buffer-naming
+ convention standing in for tmux window names
+ (e.g. =*helper:rulesets:7f3a*=).
+- Whether the existing Emacs entry points Craig actually uses to start
+ agents should call the =ai= script underneath (one implementation, two
+ surfaces) or reimplement the three steps in elisp (no tmux dependency
+ inside Emacs). Leaning toward shelling out to =ai= with a =--no-tmux=
+ mode so the roster/id/instruction logic lives in exactly one place.
+- The =emacs.md= live-reload discipline applies to whatever elisp ships.
+
*** Helper startup and wrap-up
Helper startup is deliberately light: resolve the context path, read
@@ -697,6 +739,37 @@ Independent of the phases 2-6 go/no-go; same-runtime only.
=session-context.d= without breaking old projects.
- Test cross-agent targeted messages with =TARGET_AGENT_ID=.
+** Phase 1.5 pre-live gating — added 2026-06-11, blocks readiness
+
+The structural risk is the template sync: =startup.org= and =.ai/scripts/=
+propagate to /every/ project on its next session, so a half-validated
+detection hook goes live everywhere by default. The rollout is gated in
+three rings:
+
+1. /Bats ring./ =agent-roster= tested with spawned sleeper processes
+ standing in for agents (real =/proc=, real cwds, no fakes needed) plus
+ the self-ancestry exclusion against the test's own process chain. The
+ startup hook tested for the no-op guarantee: when =agent-roster= is
+ absent or reports alone, behavior is byte-identical to today.
+2. /Sandbox ring./ A disposable project (its own git repo, never
+ template-synced back) runs the live drills before any real project sees
+ the feature: primary + helper concurrent edits on one org file; the
+ corruption drill (primary wrap-up pauses on a live helper); the
+ orphaned-helper drill (primary wraps first, helper closes the door,
+ tree ends clean); the raw-launch drill (helper started without the
+ launcher gets caught by the startup roster); and an Emacs-surface drill
+ once that design lands.
+3. /Pilot ring./ The startup detection ships dormant-by-construction —
+ the hook is a no-op wherever =agent-roster= is missing, and the script
+ ships first to one pilot project only (copied into its
+ =.ai/project-scripts/=, which the sync never touches) before the
+ template-wide release puts it everywhere. Rulesets itself is the
+ natural pilot: it's where a broken sweep is noticed fastest.
+
+Nothing merges past ring 1 into the synced template paths until ring 2's
+drills pass, and the spec's NOT-IMPLEMENTATION-READY marker clears only
+when all three rings are written into the implementation plan.
+
* Open decisions
- What should the generic project instruction file be: =AGENTS.md=,