From 00fc6f10d132e61adde26613372cf845a5abe776 Mon Sep 17 00:00:00 2001 From: Craig Jennings Date: Thu, 11 Jun 2026 20:02:18 -0500 Subject: docs(spec): detection-first helper routing, no operator action needed A second agent now discovers concurrency itself instead of being told: a stateless process scan (running agent processes, /proc cwd matched within the project root, own ancestry excluded) runs as the first action of every session, before any pull. Alone with no anchor is a fresh session, alone with an anchor is today's crash recovery, and not-alone skips startup and routes to helper-mode.org, the role-contract workflow. The scan also splits the previously ambiguous live-anchor signal into crashed versus concurrent primary. Verified the signal live with four concurrent agents on this machine. The ai --helper launcher flag drops from mechanism to convenience. Known v1 limits recorded: sessions not running as local processes are invisible to the scan, and the match is process-cwd based. --- .../2026-05-28-generic-agent-runtime-spec.org | 77 ++++++++++++++++++---- todo.org | 7 +- 2 files changed, 69 insertions(+), 15 deletions(-) diff --git a/docs/design/2026-05-28-generic-agent-runtime-spec.org b/docs/design/2026-05-28-generic-agent-runtime-spec.org index bd9d60b..243eac3 100644 --- a/docs/design/2026-05-28-generic-agent-runtime-spec.org +++ b/docs/design/2026-05-28-generic-agent-runtime-spec.org @@ -378,20 +378,61 @@ subagent (the Agent tool), use that — no second session exists and nothing here applies. A helper instance is for interactive, long-lived parallel work Craig drives himself in a second terminal. -*** Identity and spawn +*** Detection: an agent discovers it isn't alone (no operator action) + +Third revision (Craig, 2026-06-11): the agent detects concurrency itself, so +the operator can open a plain terminal, run the agent, and say nothing +special — no env var, no special opener, no launcher flag required. + +The signal is a stateless process scan, verified live on 2026-06-11 with four +concurrent agents: enumerate running agent processes (=pgrep -x claude=), +read each one's working directory from =/proc//cwd=, keep those whose +cwd is the project root or inside it, and exclude the scanner's own process +ancestry (walk parent pids from =/proc/self=). What remains is the set of +/other/ live agents in this project. A small script, =agent-roster=, wraps +this and prints the others (pid + cwd), exit 0 when alone. + +The check is the *first action of every session* — before any pull, rsync, +or anchor read, because everything startup does next forks on it: + +- /Alone, no anchor:/ fresh session. Normal startup. +- /Alone, anchor exists:/ the previous session crashed. Recover — exactly + today's behavior. +- /Not alone:/ a live primary (or primary + helpers) exists. Skip startup + entirely and execute =helper-mode.org= instead. + +This also resolves the standing ambiguity in the singleton check: a live +anchor used to mean only "crashed session"; with the roster it splits into +crashed (no live process) vs concurrent (live process). + +Known limits, accepted for v1: an agent session not running as a local +process on this machine (a cloud session against the same checkout) is +invisible to the scan; and the match is on process cwd, so an agent started +from outside the project tree wouldn't be seen. Both are edge shapes the +operator created deliberately and can manage manually. + +*** Identity and role files - The primary keeps the unset-id singleton (=.ai/session-context.org=), per Phase 1's compatibility rule. Zero friction for the overwhelmingly common one-agent case, and the asymmetry is harmless: the helper's writes land in =.ai/session-context.d/=, away from the singleton. -- The launcher assigns helper identity: =ai --helper [project]= detects a live - anchor (the singleton exists, or =session-context.d/= is non-empty), exports - =AI_AGENT_ID=helper-= (sanitized by =session-context-path=), and names - the tmux window =:helper-=. Spawning a helper into a project - with /no/ live anchor still works — it just means the "primary" duties below - have no owner, so the helper warns and runs as a normal full session instead. -- =--helper= also sets =AI_HELPER=1= so workflows can branch on role without - parsing the id. +- A helper self-assigns its identity when detection fires: pick + =helper-=, record it as the first line of its own context file at + =.ai/session-context.d/.org=, and use that path explicitly for the + rest of the session (exporting =AI_AGENT_ID= per Bash call where the + =session-context-path= resolver is involved). No launcher cooperation + needed. +- =helper-mode.org= (a template workflow, Craig's "helper.org") is the role + contract the detection routes to: self-assignment above, the read/write + tiers from this section, the light startup, and the helper wrap-up. It is + not operator-triggerable by phrase; startup's detection is its entry + point, plus an explicit "you are a helper" instruction as the manual + fallback. +- =ai --helper= remains as a convenience (exports =AI_AGENT_ID= and + =AI_HELPER=1=, names the tmux window =:helper-=), but it is + no longer the mechanism — detection is. The flag's value is a clearer + tmux roster and skipping the self-assignment step. *** Read/write contract for shared files @@ -546,11 +587,21 @@ Rationale: Independent of the phases 2-6 go/no-go; same-runtime only. -- =ai --helper= flag: live-anchor detection, =AI_AGENT_ID= + =AI_HELPER= - export, tmux window naming, no-live-anchor warning path. +- =agent-roster= script: process-scan detection (cwd match within project + root, self-ancestry exclusion), exit 0 alone / 1 not-alone, others listed + pid + cwd. Bats-tested with fake =/proc=-style fixtures or spawned sleeper + processes. +- Startup gains the detection as its first action, before Phase A.0's pulls: + not-alone routes to =helper-mode.org= and runs nothing else; alone keeps + today's crashed-vs-fresh anchor logic. +- =helper-mode.org= template workflow — the role contract: identity + self-assignment, the read/write tiers, light start, helper wrap-up. + INDEX entry marked auto-routed (no operator trigger phrase). +- =ai --helper= flag as the optional convenience: =AI_AGENT_ID= + =AI_HELPER= + export, tmux window naming. - The shared-file read/write contract documented where agents will obey it - (protocols.org, with startup.org and wrap-it-up.org carrying the helper - branches). + (protocols.org pointing at =helper-mode.org=, with startup.org and + wrap-it-up.org carrying the helper branches). - Bats tests: id assignment and sanitization through the launcher, helper vs primary path resolution, two simultaneous context files (extends the existing =session-context-path= suite). diff --git a/todo.org b/todo.org index caabb7e..941b837 100644 --- a/todo.org +++ b/todo.org @@ -47,8 +47,11 @@ Cancelled 2026-06-11: Craig confirmed the decision — one todo queue with a sin :END: Implement Phase 1.5 of the generic-agent-runtime spec ([[file:docs/design/2026-05-28-generic-agent-runtime-spec.org][spec]], amended 2026-06-11 with the "Concurrent same-project agents" section). Craig's case: spawn a second Claude in the same project to look things up or update tasks safely while the primary works. The session-context split (AI_AGENT_ID + session-context.d/) already shipped; this builds the rest: -- =ai --helper= flag: live-anchor detection, auto AI_AGENT_ID + AI_HELPER export, tmux window naming, warn-and-run-normal when no primary is live. -- Shared-file read/write contract into protocols.org (helper: scoped single-heading org edits only; file-wide passes, inbox processing, and all git mutation stay primary-only); helper branches in startup.org (light path, no pulls/rsync) and wrap-it-up.org (archive own file, skip hygiene + commit). +- =agent-roster= detection script (the load-bearing piece, replaces operator action entirely): pgrep + /proc cwd match within project root + self-ancestry exclusion; verified live 2026-06-11 with 4 concurrent agents. Bats coverage. +- Startup detection-first: the roster check runs before Phase A.0's pulls; not-alone routes to the new =helper-mode.org= role contract and runs nothing else; alone keeps crashed-vs-fresh anchor logic (the roster also disambiguates crashed primary from live primary). +- =helper-mode.org= template workflow: identity self-assignment (helper-, recorded in its own .d/ context file), the read/write tiers, light start, helper wrap-up. Auto-routed, no trigger phrase. +- =ai --helper= flag demoted to convenience: AI_AGENT_ID + AI_HELPER export, tmux window naming. +- Shared-file read/write contract into protocols.org pointing at helper-mode.org (helper: scoped single-heading org edits only; file-wide passes, inbox processing, and all git mutation stay primary-only); helper branches in startup.org (light path, no pulls/rsync) and wrap-it-up.org (archive own file, skip hygiene + commit). - Bats: launcher id assignment/sanitization, helper-vs-primary resolution, two simultaneous context files. - Data-integrity items (spec second pass, 2026-06-11): live-helper gate before any file-wide hygiene pass (todo-cleanup/lint-org/wrap-org-table check session-context.d/, pause + ask on live files, surface stale ones); todo-cleanup.el brought up to the backup-to-/tmp invariant (lint-org and wrap-org-table already conform — verified); log-before-write journaling for helper shared-file edits; memory writes primary-only (MEMORY.md has no heading anchors — helpers log candidates instead); agent id in helper-originated inbox-send slugs (minute-resolution filenames can collide). - Manual validation with Craig: live helper against a live primary — lookup, one scoped todo.org edit, wrap-up, primary commits the helper's edit cleanly. Then the corruption drill: primary attempts wrap-up while the helper is mid-task and the hygiene gate visibly pauses. -- cgit v1.2.3