diff options
66 files changed, 7633 insertions, 1793 deletions
diff --git a/.ai/notes.org b/.ai/notes.org index d74ef7e..698cd4b 100644 --- a/.ai/notes.org +++ b/.ai/notes.org @@ -76,9 +76,11 @@ Format: * Workflow State +:COMMIT_AUTONOMY: yes +:LAST_SPEC_SORT: 2026-07-02 Markers maintained by workflows to record when they last ran. Read by other workflows that gate their behavior on freshness. :LAST_AUDIT: 2026-06-28 -:LAST_INBOX_PROCESS: 2026-06-30 (9 handoffs from .emacs.d + 1 mid-session: 2 page-signal notes resolved [account never deregistered, dead symlink pruned via new make-install step], daily-drivers tailscale correction applied, 3 shared-asset proposals all accepted+shipped [green-baseline, todo-cleanup aging w/ gitignore self-protect, lint-org 4 checkers w/ 2+-star fix]; all replied + committed + pushed) +:LAST_INBOX_PROCESS: 2026-07-01 (15 handoffs: .emacs.d convert-subtasks bundle applied + planning-line fix [19ba7cb], task-audit C.6 [356b905], sweep anchored-/.ai/ security fix + public-reachability convention + real sweep + 14-project broadcast [909b21b, bac3fe4], archsetup UI-traps promoted into spec-review [9814b94], KB orphan report filed as [#C] task; all senders replied) Format: one =:MARKER: YYYY-MM-DD= line per workflow. Workflows overwrite their own marker on completion. diff --git a/.ai/protocols.org b/.ai/protocols.org index ed07c0e..5e18ab9 100644 --- a/.ai/protocols.org +++ b/.ai/protocols.org @@ -552,6 +552,8 @@ Claude needs to add information to =.ai/notes.org=. For large amounts of informa **The gitignore set follows that same decision.** A project that gitignores =.ai/= (the code-project case) gitignores the whole personal-tooling set: =.ai/=, =.claude/=, =CLAUDE.md=, =AGENTS.md=. =.claude/= is rulesets-owned — copies of =claude-rules/*.md= plus the language bundle's rules, hooks, and settings — and re-synced from rulesets on every startup, so git isn't how it travels between machines; ignoring it also keeps those private rule copies out of the repo, which ignoring =CLAUDE.md= alone would miss. A track-mode project (personal/doc repos, or a team repo that shares config with teammates who don't run rulesets) tracks the set instead. =install-ai.sh= writes the full set at bootstrap in gitignore mode; =scripts/sweep-gitignore-tooling.sh= backfills it idempotently across existing gitignore-mode projects when the set grows. +**Public reachability decides harder than project type.** Any repo whose remotes include a non-cjennings.net host gitignores the tooling set, whatever kind of project it is — the only exception is a team repo that deliberately shares the config, decided explicitly, never by default. And a private remote is not proof of privacy: a server-side =post-receive --mirror= hook republishes invisibly from the client (the 2026-06-30 =.emacs.d= exposure rode exactly that — a cjennings.net remote mirroring to public GitHub). The sweep recognizes both the anchored (=/.ai/=) and unanchored (=.ai/=) ignore styles — an anchored-style project used to be misread as track-mode and silently skipped — and warns when tracked tooling can reach a non-cjennings.net remote. + **Credential-leak concern: gate it on project type, not on the credential itself.** A tracked secret, token, or credentials doc is only a public-leak risk where the repo can reach a public remote — that is, *code projects pushed to public GitHub*, which is exactly why those gitignore =.ai/= and =.claude/=. For *personal / documentation projects* (the =~/projects/= set: elibrary, home, finances, health, philosophy, etc.), the git remote is a private single-user repo on =cjennings.net=, so tracked credentials inside =.ai/= files are fine — that's the design, the project history IS the project. Do NOT raise a leak warning or suggest gitignoring a secret for these. When the question "is this a leak / should we gitignore this secret?" comes up, decide it on *which kind of project and remote* this is, never on the mere presence of a credential in a tracked file. **When to break out documents:** diff --git a/.ai/scripts/lint-org.el b/.ai/scripts/lint-org.el index 5447cb3..90b1b1d 100644 --- a/.ai/scripts/lint-org.el +++ b/.ai/scripts/lint-org.el @@ -35,6 +35,7 @@ ;; empty-heading bare stars with no title ;; malformed-priority-cookie [#x]-shaped token org rejected ;; level2-done-without-closed completed level-2 task with no CLOSED +;; subtask-done-not-dated level-3+ done sub-task still a DONE keyword ;; (anything else) surfaced as judgment with checker name ;; ;; Output format on stdout: @@ -503,6 +504,32 @@ the live file on the next `task-sorted'." "level-2 DONE/CANCELLED has no CLOSED date — add CLOSED: [YYYY-MM-DD Day]; task-sorted's aging step archives an undated completed task immediately")))))))) ;;; --------------------------------------------------------------------------- +;;; level-3+ dated-header check (claude-rules/todo-format.md) +;; +;; The inverse of the level-2 check above. A completed sub-task — a heading at +;; level 3 or deeper, under a parent task — becomes a dated event-log entry, not +;; a DONE keyword, so the parent's subtree grows a chronological history instead +;; of a long tail of nested DONE lines. An interactive org close +;; (`org-log-done' → DONE + CLOSED) leaves the keyword in place, and +;; `--archive-done' only touches level 2, so these accumulate. Flag them for +;; conversion. Judgment-only and regex-based (independent of which TODO keywords +;; the batch Emacs recognizes); todo-cleanup.el --convert-subtasks does the fix. + +(defun lo--check-subtask-done-not-dated () + "Flag level-3+ headings carrying a done keyword (DONE/CANCELLED/FAILED). +Emits one judgment item per offending heading (checker +`subtask-done-not-dated')." + (save-excursion + (goto-char (point-min)) + ;; Case-sensitive: the keywords are uppercase, not the words in a title. + (let ((case-fold-search nil)) + (while (re-search-forward + "^\\*\\{3,\\} \\(DONE\\|CANCELLED\\|FAILED\\) " nil t) + (lo--emit-judgment + 'subtask-done-not-dated (line-number-at-pos) + "level-3+ done sub-task should be a dated event-log entry (todo-format.md): run todo-cleanup.el --convert-subtasks to rewrite it"))))) + +;;; --------------------------------------------------------------------------- ;;; File processing (defun lo--backup (file) @@ -543,6 +570,7 @@ left unmodified and mechanical entries are recorded with :preview t." (lo--check-empty-headings) (lo--check-malformed-priority-cookies) (lo--check-level2-done-without-closed) + (lo--check-subtask-done-not-dated) (when (and (not lo-check-only) (buffer-modified-p)) (save-buffer))) (with-current-buffer buf (set-buffer-modified-p nil)) diff --git a/.ai/scripts/route-batch b/.ai/scripts/route-batch new file mode 100755 index 0000000..8f27d19 --- /dev/null +++ b/.ai/scripts/route-batch @@ -0,0 +1,175 @@ +#!/usr/bin/env python3 +"""route-batch — the wrap-up router's mechanical go path. + +The wrap-up cross-project router (wrap-it-up.org Step 3; wrapup-routing spec +D7/D8/D9) surfaces the local tasks that inbox process mode stamped with +:ROUTE_CANDIDATE: <destination> at file time, and on "go" delivers each to its +destination project's inbox. This script does the mechanical half so the +subtree surgery is deterministic: + + route-batch --list [--todo todo.org] + One "<destination>\t<heading>" line per :ROUTE_CANDIDATE:-tagged task. + Silent with exit 0 when there are no candidates (the workflow's + empty-set-equals-zero-interaction rule). Read-only. + + route-batch --go [--todo todo.org] + For each candidate, bottom-up: extract the task's whole subtree + (children ride along), drop the :ROUTE_CANDIDATE: line (and the + property drawer if that leaves it empty), promote the subtree so its + top heading is level 1, write it to a temp file, and deliver it via + the sibling inbox-send.py to the destination's inbox/ (one file per + task, from-<source> provenance stamped by inbox-send). Only after a + successful send is the subtree removed from the local todo.org — a + failed send leaves that task in place, is reported, and the run exits + non-zero after attempting the rest. + +The candidate set is exactly the tagged tasks — never the standing backlog. +Discovery, roots, and the source-project name all come from inbox-send.py +(INBOX_SEND_ROOTS sandboxes it in tests). The reject-from-another-project +flow in inbox process mode is the mis-route recovery; that path is why +removing the local source after a successful send is safe. +""" + +import argparse +import os +import re +import subprocess +import sys +import tempfile +from pathlib import Path + +HEADING_RE = re.compile(r"^(\*+)\s+(.*)$") +MARKER_RE = re.compile(r"^\s*:ROUTE_CANDIDATE:\s+(\S+)\s*$") + + +def find_candidates(lines): + """[(heading_idx, end_idx, marker_idx, destination, heading_text)] — + end_idx is one past the subtree's last line.""" + candidates = [] + for i, line in enumerate(lines): + m = MARKER_RE.match(line) + if not m: + continue + head_idx = None + for j in range(i, -1, -1): + hm = HEADING_RE.match(lines[j]) + if hm: + head_idx = j + level = len(hm.group(1)) + heading = hm.group(2) + break + if head_idx is None: + continue + end = len(lines) + for k in range(head_idx + 1, len(lines)): + km = HEADING_RE.match(lines[k]) + if km and len(km.group(1)) <= level: + end = k + break + candidates.append((head_idx, end, i, m.group(1), heading)) + return candidates + + +def extract_handoff(lines, head_idx, end): + """The subtree as handoff text: every :ROUTE_CANDIDATE: line dropped + (a marker is meaningless at the destination), empty drawers pruned, + headings promoted so the task is level 1.""" + sub = [l for l in lines[head_idx:end] if not MARKER_RE.match(l)] + + pruned = [] + i = 0 + while i < len(sub): + if sub[i].strip() == ":PROPERTIES:" and i + 1 < len(sub) and sub[i + 1].strip() == ":END:": + i += 2 + continue + pruned.append(sub[i]) + i += 1 + + shift = len(HEADING_RE.match(pruned[0]).group(1)) - 1 + if shift > 0: + pruned = [l[shift:] if HEADING_RE.match(l) else l for l in pruned] + return "\n".join(pruned).rstrip() + "\n" + + +def send(destination, handoff_text, slug): + inbox_send = Path(__file__).with_name("inbox-send.py") + with tempfile.NamedTemporaryFile( + "w", suffix=".org", prefix=f"route-{slug}-", delete=False, encoding="utf-8" + ) as tf: + tf.write(handoff_text) + tmp = tf.name + try: + result = subprocess.run( + [sys.executable, str(inbox_send), destination, "--file", tmp], + capture_output=True, text=True, + ) + return result.returncode == 0, (result.stderr or result.stdout).strip() + finally: + os.unlink(tmp) + + +def main(): + ap = argparse.ArgumentParser(prog="route-batch") + mode = ap.add_mutually_exclusive_group(required=True) + mode.add_argument("--list", action="store_true", dest="list_mode") + mode.add_argument("--go", action="store_true") + ap.add_argument("--todo", default="todo.org") + args = ap.parse_args() + + todo_path = Path(args.todo) + if not todo_path.is_file(): + return 0 # no todo file, no candidates + lines = todo_path.read_text(encoding="utf-8").splitlines() + candidates = find_candidates(lines) + + # Two markers in one task's drawer are one candidate, not two: same span + + # same destination dedupes. Everything else that overlaps — a tagged child + # inside a tagged parent, one task tagged for two destinations — is a + # conflict: routing either span would silently take the other (or, with a + # stale end index, a bystander task) along. Conflicts are left in place + # and reported; the human untangles which project the pieces belong to. + deduped = [] + for cand in candidates: + if not any(c[0] == cand[0] and c[1] == cand[1] and c[3] == cand[3] for c in deduped): + deduped.append(cand) + conflicted = set() + for a in deduped: + for b in deduped: + if a is not b and a[0] <= b[0] and b[1] <= a[1]: + conflicted.add(a) + conflicted.add(b) + routable = [c for c in deduped if c not in conflicted] + + if not deduped: + return 0 + + if args.list_mode: + for _h, _e, _m, dest, heading in deduped: + flag = "\tCONFLICT (overlapping candidates — resolve by hand)" if (_h, _e, _m, dest, heading) in conflicted else "" + print(f"{dest}\t{heading}{flag}") + return 0 + + failures = 0 + for _h, _e, _m, dest, heading in sorted(conflicted): + failures += 1 + print(f"CONFLICT: {dest}\t{heading}\t(overlapping candidate subtrees — left in place, resolve by hand)") + + # Bottom-up so earlier indices stay valid as subtrees are removed; the + # file is rewritten after every successful send so a crash mid-run never + # leaves an already-sent task still present locally. + for head_idx, end, _marker_idx, dest, heading in sorted(routable, reverse=True): + handoff = extract_handoff(lines, head_idx, end) + slug = re.sub(r"[^a-z0-9]+", "-", heading.lower()).strip("-")[:40] or "task" + ok, detail = send(dest, handoff, slug) + if ok: + del lines[head_idx:end] + todo_path.write_text("\n".join(lines).rstrip("\n") + "\n", encoding="utf-8") + print(f"routed: {dest}\t{heading}") + else: + failures += 1 + print(f"FAILED: {dest}\t{heading}\t({detail})") + return 1 if failures else 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/.ai/scripts/self-inject.sh b/.ai/scripts/self-inject.sh new file mode 100755 index 0000000..e7340c1 --- /dev/null +++ b/.ai/scripts/self-inject.sh @@ -0,0 +1,68 @@ +#!/bin/sh +# self-inject.sh — type text into the tmux pane running this agent session. +# +# The building block for AUTO-FLUSH: an agent checkpoints its session-context, +# then has tmux type "/clear" and a resume prompt at its own idle prompt, so a +# session flushes with no human at the keyboard. +# +# Usage: +# self-inject.sh -t %PANE <delay> <text> [<delay2> <text2> ...] +# self-inject.sh <delay> <text> [...] # derive pane from ancestry +# self-inject.sh [-t %PANE] # no pairs: report the pane +# +# Each pair: sleep <delay> seconds, then type <text> literally and press Enter. +# +# TWO HARD-WON GOTCHAS (2026-07-02, archsetup session): +# 1. A detached child (setsid/nohup/&) of an agent tool call DIES when the +# tool call ends — the harness cleans up the process group. The arm step +# must run under the tmux SERVER instead: +# tmux run-shell -b "self-inject.sh -t %1 25 '/clear' 15 'go — resume...'" +# 2. Under tmux run-shell the process is a child of the tmux server, so +# ancestry-based pane detection CANNOT work there. Derive the pane FIRST, +# synchronously from the agent's own shell (no -t), then pass it +# explicitly with -t when arming. +# +# Collision hazard: if the user happens to be typing when the send fires, the +# injected text merges into their input line (a real /clear became "/clearto" +# mid-word). Auto-flush is for sessions running unattended; warn the user to +# keep hands off for the armed window if they're present. + +PANE="" +if [ "$1" = "-t" ]; then + PANE=$2; shift 2 +fi + +ppid_of() { + # /proc/<pid>/stat: pid (comm) state ppid ... — comm may contain spaces, + # so take the 2nd field after the LAST ')'. + stat=$(cat "/proc/$1/stat" 2>/dev/null) || return 1 + # shellcheck disable=SC2086 # word-splitting the stat tail is the point + set -- ${stat##*) } + echo "$2" +} + +find_pane() { + anc=" " + pid=$$ + while [ -n "$pid" ] && [ "$pid" -gt 1 ] 2>/dev/null; do + anc="$anc$pid " + pid=$(ppid_of "$pid") || break + done + tmux list-panes -a -F "#{pane_pid} #{pane_id}" 2>/dev/null | \ + while read -r ppid pane; do + case "$anc" in *" $ppid "*) echo "$pane"; break;; esac + done +} + +[ -n "$PANE" ] || PANE=$(find_pane) +[ -n "$PANE" ] || { echo "self-inject: no owning pane found (pass -t %PANE)" >&2; exit 1; } + +# With no delay/text pairs, just report the pane (the derive-first step). +[ $# -ge 2 ] || { echo "$PANE"; exit 0; } + +while [ $# -ge 2 ]; do + sleep "$1" + tmux send-keys -t "$PANE" -l "$2" + tmux send-keys -t "$PANE" Enter + shift 2 +done diff --git a/.ai/scripts/spec-sort b/.ai/scripts/spec-sort new file mode 100755 index 0000000..ebfef82 --- /dev/null +++ b/.ai/scripts/spec-sort @@ -0,0 +1,715 @@ +#!/usr/bin/env python3 +"""spec-sort — one-time docs-pile retrofit for the docs-lifecycle convention. + +Classifies every docs/**/*.org outside docs/specs/ by one predicate: a doc +carrying BOTH a "Decisions" heading AND an "Implementation phases" heading is +a spec candidate; everything else is a note. For each candidate it shows an +evidence panel (Status field, decision/finding cookies, the linking todo.org +task, recent dated history, cheap existence checks on phase-named artifacts) +and proposes a lifecycle keyword the evidence supports — conservative +non-terminal (DRAFT) when inconclusive. The helper proposes; a human confirms +every move. + +Dry-run report is the default. --apply executes under the fail-safe contract: + + - Clean-worktree preflight: refuses on a dirty git tree (exit 2) unless + --allow-dirty, which prints exactly what recovery loses. + - Every candidate must be addressed with --confirm REL=KEYWORD or + --skip REL; terminal keywords (IMPLEMENTED SUPERSEDED CANCELLED) also + need --reason REL=TEXT, recorded in the status-history line. + - The full move + relink plan is computed and validated first (every + destination free, every link resolvable), written to a plan file, and + only then executed from that recorded plan. + - Bare-path mentions of a moving doc inside the rewritten roots are + reported, never rewritten; they block --apply until --acknowledge-bare + explicitly waives them. + - Mid-apply failure stops the run, names what was and wasn't applied, and + prints the git-restore recovery recipe (plus deletion of newly created + destination copies, which git restore can't remove). + - After a successful apply, a residue scan across the rewritten roots must + find no link still resolving to an old path, or spec-sort exits non-zero + naming the residue. + +Per move: rename to carry the -spec.org suffix, prepend the status heading +(:ID: UUID + dated history line), rewrite the keyword header to the +two-sequence form, mirror the keyword into the Metadata Status field, and +recompute every affected file: link (inbound links to the moved doc AND the +moved doc's own outbound relative links). Rewritten roots: todo.org, +.ai/notes.org, docs/**, .ai/project-workflows/, .ai/project-scripts/. +Reported-never-rewritten: .ai/sessions/ (frozen history) and synced template +paths (.ai/workflows/, .ai/scripts/, .ai/protocols.org — the report names +the canonical claude-templates file instead). + +Finally stamps :LAST_SPEC_SORT: YYYY-MM-DD in .ai/notes.org's +* Workflow State section (created idempotently), which permanently clears +the startup nudge. A run with zero candidates still stamps. + +Exit codes: 0 done (or clean report), 1 blocked (confirm gate, validation, +bare mentions, residue, mid-apply failure), 2 usage / preflight refusal. + +Test hook: SPEC_SORT_INJECT_FAIL_AFTER=N aborts the apply after N write +operations, exercising the recovery path in the bats suite. +""" + +import argparse +import json +import os +import re +import subprocess +import sys +import tempfile +import uuid +from datetime import datetime + +LIFECYCLE = ("DRAFT", "READY", "DOING", "IMPLEMENTED", "SUPERSEDED", "CANCELLED") +TERMINAL = {"IMPLEMENTED", "SUPERSEDED", "CANCELLED"} +TODO_HEADER = [ + "#+TODO: TODO | DONE", + "#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED", +] + +# Project-owned surfaces whose file: links get rewritten. +REWRITE_ROOTS = ("todo.org", ".ai/notes.org", "docs", ".ai/project-workflows", ".ai/project-scripts") +# Frozen or synced surfaces: occurrences are reported, never rewritten. +REPORT_ROOTS = (".ai/sessions", ".ai/workflows", ".ai/scripts", ".ai/protocols.org") +# Synced template paths map to their canonical rulesets file for the report. +SYNCED_PREFIX = (".ai/workflows", ".ai/scripts", ".ai/protocols.org") + +LINK_RE = re.compile(r"\[\[file:([^\]\[]+)\](?:\[([^\]\[]*)\])?\]") +HEADING_RE = re.compile(r"^(\*+)\s+(.*)$") +COOKIE_RE = re.compile(r"\[\d+/\d+\]") +DATED_RE = re.compile(r"\b\d{4}-\d{2}-\d{2}\b") + + +def read_text(path): + try: + with open(path, encoding="utf-8") as f: + return f.read() + except (UnicodeDecodeError, OSError): + return None + + +def heading_text(line): + """Heading text with the org keyword and priority cookie stripped.""" + m = HEADING_RE.match(line) + if not m: + return None + text = re.sub(r"^[A-Z]+\s+", "", m.group(2)) + text = re.sub(r"^\[#[A-Z]\]\s+", "", text) + return text.strip() + + +def has_spine(content): + """The classification predicate: Decisions AND Implementation phases.""" + dec = imp = False + for line in content.splitlines(): + t = heading_text(line) + if t is None: + continue + tl = t.lower() + if tl.startswith("decisions"): + dec = True + elif tl.startswith("implementation phases"): + imp = True + return dec and imp + + +def walk_files(root, rel_base): + """Yield project-relative paths of files under rel_base (file or dir).""" + abs_base = os.path.join(root, rel_base) + if os.path.isfile(abs_base): + yield rel_base + return + for dirpath, dirs, files in os.walk(abs_base): + dirs.sort() + for name in sorted(files): + yield os.path.relpath(os.path.join(dirpath, name), root) + + +def classify(root): + """Split docs/**/*.org outside docs/specs/ into candidates / anomalies / notes.""" + candidates, anomalies, notes = [], [], [] + docs = os.path.join(root, "docs") + if not os.path.isdir(docs): + return candidates, anomalies, notes + for rel in walk_files(root, "docs"): + if not rel.endswith(".org"): + continue + parts = rel.split(os.sep) + if len(parts) > 1 and parts[1] == "specs": + continue + content = read_text(os.path.join(root, rel)) + if content is None: + continue + if has_spine(content): + candidates.append(rel) + elif os.path.basename(rel).endswith("-spec.org"): + anomalies.append(rel) + else: + notes.append(rel) + return candidates, anomalies, notes + + +def dest_for(rel): + base = os.path.basename(rel) + if not base.endswith("-spec.org"): + base = base[: -len(".org")] + "-spec.org" + return os.path.join("docs", "specs", base) + + +# ---- Evidence panel --------------------------------------------------- + + +def todo_task_for(root, rel): + """Heading of the first todo.org task whose subtree mentions the doc.""" + content = read_text(os.path.join(root, "todo.org")) + if content is None: + return None + lines = content.splitlines() + basename = os.path.basename(rel) + for i, line in enumerate(lines): + if basename in line or rel in line: + for j in range(i, -1, -1): + if HEADING_RE.match(lines[j]): + return lines[j].lstrip("* ").strip() + return None + return None + + +def gather_evidence(root, rel, content): + ev = {} + m = re.search(r"^\|\s*Status\s*\|\s*([^|]*)\|", content, re.MULTILINE | re.IGNORECASE) + ev["status"] = m.group(1).strip() if m else None + + cookies = [] + for line in content.splitlines(): + t = heading_text(line) + if t and COOKIE_RE.search(t) and ( + t.lower().startswith("decisions") or t.lower().startswith("review findings") + ): + cookies.append(t) + ev["cookies"] = cookies + + ev["todo"] = todo_task_for(root, rel) + kw = None + if ev["todo"]: + m = re.match(r"([A-Z]+)\s", ev["todo"]) + kw = m.group(1) if m else None + ev["todo_keyword"] = kw + + dated = [ln.strip() for ln in content.splitlines() if DATED_RE.search(ln)] + ev["history"] = dated[-1][:100] if dated else None + + # Cheap artifact check: =path= tokens inside the Implementation phases section. + artifacts, exists = [], 0 + section = re.split(r"^\*+\s+.*implementation phases.*$", content, maxsplit=1, flags=re.MULTILINE | re.IGNORECASE) + if len(section) > 1: + for tok in re.findall(r"=([^=\s]+)=", section[1]): + if "/" in tok: + artifacts.append(tok) + if os.path.exists(os.path.join(root, tok)): + exists += 1 + ev["artifacts"] = (exists, artifacts) + return ev + + +def propose_keyword(ev): + s = (ev["status"] or "").lower() + words = set(re.findall(r"[a-z]+", s)) + if words & {"implemented", "shipped", "complete", "completed", "done"}: + return "IMPLEMENTED" + if words & {"superseded"}: + return "SUPERSEDED" + if words & {"cancelled", "canceled", "dead", "abandoned"}: + return "CANCELLED" + if words & {"doing", "implementing"} or "in progress" in s or "in-progress" in s: + return "DOING" + if ev["todo_keyword"] == "DOING": + return "DOING" + if words & {"ready", "approved", "accepted"}: + return "READY" + return "DRAFT" # conservative non-terminal default + + +# ---- Link scanning ---------------------------------------------------- + + +def rewrite_files(root): + """Project-relative *.org files under the rewritten roots.""" + seen = [] + for base in REWRITE_ROOTS: + if not os.path.exists(os.path.join(root, base)): + continue + for rel in walk_files(root, base): + if rel.endswith(".org") and rel not in seen: + seen.append(rel) + return seen + + +def resolve_target(root, linker_rel, raw_target, moved): + """Resolve a file: link target to a project-relative path (org semantics + first — relative to the linking file's directory — then project-root + anchoring as a fallback for root-anchored links).""" + if raw_target.startswith(("/", "~", "http:", "https:")): + return None + rel_a = os.path.normpath(os.path.join(os.path.dirname(linker_rel), raw_target)) + if rel_a in moved or os.path.exists(os.path.join(root, rel_a)): + return rel_a + rel_b = os.path.normpath(raw_target) + if rel_b in moved or os.path.exists(os.path.join(root, rel_b)): + return rel_b + return rel_a + + +def plan_link_edits(root, moved): + """Compute every link rewrite: inbound links to moved docs and moved + docs' own outbound relative links. Returns ({linker_rel: [(old, new)]}, + [ambiguity descriptions]) — a link whose file-relative and root-anchored + readings are both live and disagree about a moving doc blocks validation + rather than being rewritten against a guess.""" + edits = {} + ambiguous = [] + for linker in rewrite_files(root): + content = read_text(os.path.join(root, linker)) + if content is None: + continue + linker_post = moved.get(linker, linker) + for m in LINK_RE.finditer(content): + raw = m.group(1) + desc = m.group(2) + target_path, sep, anchor = raw.partition("::") + target = resolve_target(root, linker, target_path, moved) + if target is None: + continue + rel_a = os.path.normpath(os.path.join(os.path.dirname(linker), target_path)) + rel_b = os.path.normpath(target_path) + if rel_a != rel_b: + live_a = rel_a in moved or os.path.exists(os.path.join(root, rel_a)) + live_b = rel_b in moved or os.path.exists(os.path.join(root, rel_b)) + if live_a and live_b and (rel_a in moved or rel_b in moved): + ambiguous.append( + "%s: [[file:%s]] reads as %s (file-relative) or %s (root-anchored) " + "and a moving doc is involved — resolve the link by hand" % (linker, raw, rel_a, rel_b)) + continue + if target not in moved and linker not in moved: + continue + if target not in moved and not os.path.exists(os.path.join(root, target)): + continue # already broken before this run; not ours to guess + target_post = moved.get(target, target) + new_path = os.path.relpath(target_post, os.path.dirname(linker_post) or ".") + new_raw = new_path + (sep + anchor if sep else "") + if new_raw == raw: + continue + new_link = "[[file:%s]%s]" % (new_raw, "[%s]" % desc if desc is not None else "") + if m.group(0) != new_link: + edits.setdefault(linker, []).append((m.group(0), new_link)) + return edits, ambiguous + + +def scan_bare_mentions(root, moved): + """Bare-path mentions of moving docs in the rewritten roots — text + occurrences outside any [[...]] link. Reported, never rewritten.""" + found = [] + for base in REWRITE_ROOTS: + if not os.path.exists(os.path.join(root, base)): + continue + for rel in walk_files(root, base): + content = read_text(os.path.join(root, rel)) + if content is None: + continue + for i, line in enumerate(content.splitlines(), 1): + stripped = re.sub(r"\[\[[^\]]*\](?:\[[^\]]*\])?\]", "", line) + for src in moved: + if src in stripped: + found.append((rel, i, src)) + return found + + +def scan_report_only(root, moved): + """Occurrences of moving docs in frozen/synced surfaces.""" + reports = [] + for base in REPORT_ROOTS: + if not os.path.exists(os.path.join(root, base)): + continue + for rel in walk_files(root, base): + content = read_text(os.path.join(root, rel)) + if content is None: + continue + for src in moved: + if src in content: + if rel.startswith(SYNCED_PREFIX): + note = ("synced template, not rewritten — a local edit is reverted by the " + "next sync; edit the canonical claude-templates/%s instead" % rel) + else: + note = "frozen history; not rewritten" + reports.append((rel, src, note)) + return reports + + +# ---- Content transforms ----------------------------------------------- + + +def transform_spec(content, keyword, reason, title, doc_id, link_edits): + """Apply the retrofit rewrite to a moving spec's content: two-sequence + keyword header, prepended status heading, Status-field mirror, and the + doc's own link edits.""" + for old, new in link_edits: + content = content.replace(old, new) + lines = content.splitlines() + + todo_idx = None + kept = [] + for line in lines: + if line.startswith("#+TODO:"): + if todo_idx is None: + todo_idx = len(kept) + continue + kept.append(line) + lines = kept + if todo_idx is None: + todo_idx = 0 + while todo_idx < len(lines) and lines[todo_idx].startswith("#+"): + todo_idx += 1 + lines[todo_idx:todo_idx] = TODO_HEADER + + head_end = 0 + while head_end < len(lines) and (lines[head_end].startswith("#+") or not lines[head_end].strip()): + head_end += 1 + ts = datetime.now().astimezone().strftime("%Y-%m-%d %a @ %H:%M:%S %z") + provenance = "reason: %s" % reason if reason else "evidence-based, human-confirmed" + block = [ + "* %s %s" % (keyword, title), + ":PROPERTIES:", + ":ID: %s" % doc_id, + ":END:", + "- %s — retrofitted by spec-sort; status set to %s (%s)" % (ts, keyword, provenance), + "", + ] + lines[head_end:head_end] = block + + out = [] + mirrored = False + for line in lines: + m = re.match(r"^(\|\s*Status\s*\|)([^|]*)(\|.*)$", line, re.IGNORECASE) + if m and not mirrored: + value = " %s" % keyword.lower() + width = len(m.group(2)) + line = m.group(1) + (value.ljust(width) if len(value) <= width else value + " ") + m.group(3) + mirrored = True + out.append(line) + return "\n".join(out) + "\n" + + +def title_for(content, rel): + m = re.search(r"^#\+TITLE:\s*(.+)$", content, re.MULTILINE | re.IGNORECASE) + if m: + return m.group(1).strip() + base = os.path.basename(rel)[: -len(".org")] + return base[: -len("-spec")] if base.endswith("-spec") else base + + +# ---- Marker ------------------------------------------------------------ + + +def stamp_marker(root, date): + path = os.path.join(root, ".ai", "notes.org") + os.makedirs(os.path.dirname(path), exist_ok=True) + content = read_text(path) or "" + line = ":LAST_SPEC_SORT: %s" % date + if ":LAST_SPEC_SORT:" in content: + content = re.sub(r":LAST_SPEC_SORT:.*", line, content, count=1) + elif re.search(r"^\* Workflow State\s*$", content, re.MULTILINE): + content = re.sub(r"(^\* Workflow State\s*$)", r"\1\n" + line, content, count=1, flags=re.MULTILINE) + else: + if content and not content.endswith("\n"): + content += "\n" + content += "\n* Workflow State\n\n%s\n" % line + with open(path, "w", encoding="utf-8") as f: + f.write(content) + + +# ---- Apply ------------------------------------------------------------- + + +class ApplyFailure(Exception): + """Mid-apply failure: args are (applied_labels, remaining_ops, cause).""" + + +def apply_plan(root, plan, fail_after): + """Execute the recorded plan. Returns the applied-op labels; raises + ApplyFailure mid-way on a write error or when the test hook fires.""" + ops = [] + for mv in plan["moves"]: + ops.append(("move", mv)) + for linker, edits in plan["link_edits"].items(): + if linker in {mv["src"] for mv in plan["moves"]}: + continue # a moving doc's own edits ride along in its transform + ops.append(("relink", (linker, edits))) + + applied = [] + specs_dir = os.path.join(root, "docs", "specs") + if plan["moves"] and not os.path.isdir(specs_dir): + os.makedirs(specs_dir) + plan["created_dirs"].append(os.path.join("docs", "specs")) + + for n, (kind, payload) in enumerate(ops, 1): + if fail_after and n > fail_after: + raise ApplyFailure(applied, ops[n - 1:], "injected test failure") + try: + if kind == "move": + mv = payload + content = read_text(os.path.join(root, mv["src"])) + new = transform_spec(content, mv["keyword"], mv["reason"], mv["title"], mv["id"], + plan["link_edits"].get(mv["src"], [])) + with open(os.path.join(root, mv["dest"]), "w", encoding="utf-8") as f: + f.write(new) + os.remove(os.path.join(root, mv["src"])) + applied.append("move %s -> %s" % (mv["src"], mv["dest"])) + else: + linker, edits = payload + path = os.path.join(root, linker) + content = read_text(path) + for old, new in edits: + content = content.replace(old, new) + with open(path, "w", encoding="utf-8") as f: + f.write(content) + applied.append("relink %s (%d link%s)" % (linker, len(edits), "s" if len(edits) != 1 else "")) + except OSError as exc: + raise ApplyFailure(applied, ops[n - 1:], str(exc)) + return applied + + +def residue_check(root, plan): + """Post-apply: no link in the rewritten roots may still resolve to an + old path; bare mentions beyond the acknowledged set fail too.""" + moved = {mv["src"]: mv["dest"] for mv in plan["moves"]} + residue = [] + for linker in rewrite_files(root): + content = read_text(os.path.join(root, linker)) + if content is None: + continue + for m in LINK_RE.finditer(content): + target_path = m.group(1).partition("::")[0] + target = resolve_target(root, linker, target_path, {}) + if target in moved: + residue.append("%s: link still resolves to %s" % (linker, target)) + # Acknowledged mentions were recorded pre-apply; a mention inside a moved + # doc now lives at the doc's destination, so map the file side through the + # moves before comparing. + acknowledged = {(moved.get(f, f), src) for f, _ln, src in plan["bare"]} + for f, ln, src in scan_bare_mentions(root, moved): + if (f, src) not in acknowledged: + residue.append("%s:%d: bare mention of %s" % (f, ln, src)) + return residue + + +def print_recovery(plan, applied, not_applied): + print("FAILURE — the apply did not complete.") + print(" applied:") + for a in applied or ["(nothing)"]: + print(" %s" % a) + print(" not applied:") + for kind, payload in not_applied: + if kind == "move": + print(" move %s -> %s" % (payload["src"], payload["dest"])) + else: + print(" relink %s" % payload[0]) + print("RECOVERY — restore the pre-run state (safe: preflight required a clean tree):") + touched = [mv["src"] for mv in plan["moves"]] + [l for l in plan["link_edits"] if l not in {mv["src"] for mv in plan["moves"]}] + print(" git restore -- %s" % " ".join(touched)) + created = [mv["dest"] for mv in plan["moves"]] + print(" rm -f -- %s # git restore can't remove the created copies" % " ".join(created)) + for d in plan.get("created_dirs", []): + print(" rmdir --ignore-fail-on-non-empty -- %s" % d) + + +# ---- Main --------------------------------------------------------------- + + +def parse_kv(pairs, label): + out = {} + for item in pairs or []: + if "=" not in item: + sys.exit("spec-sort: %s expects REL=VALUE, got %r" % (label, item)) + k, v = item.split("=", 1) + out[os.path.normpath(k)] = v + return out + + +def main(): + ap = argparse.ArgumentParser(prog="spec-sort", add_help=True) + ap.add_argument("--project-root", default=".") + ap.add_argument("--apply", action="store_true") + ap.add_argument("--allow-dirty", action="store_true") + ap.add_argument("--acknowledge-bare", action="store_true") + ap.add_argument("--confirm", action="append", metavar="REL=KEYWORD") + ap.add_argument("--reason", action="append", metavar="REL=TEXT") + ap.add_argument("--skip", action="append", metavar="REL") + ap.add_argument("--plan-file") + args = ap.parse_args() + + root = os.path.abspath(args.project_root) + confirms = parse_kv(args.confirm, "--confirm") + reasons = parse_kv(args.reason, "--reason") + skips = {os.path.normpath(s) for s in (args.skip or [])} + + candidates, anomalies, notes = classify(root) + if not candidates and not anomalies and not notes and not os.path.isdir(os.path.join(root, "docs")): + return 0 # no docs pile at all — silent no-op + + for named in list(confirms) + list(skips) + list(reasons): + if named not in candidates: + print("spec-sort: %s is not a spec candidate" % named) + return 1 + for rel, kw in confirms.items(): + if kw not in LIFECYCLE: + print("spec-sort: %r is not a lifecycle keyword (%s)" % (kw, " ".join(LIFECYCLE))) + return 1 + + # ---- Build the plan (shared by report and apply) ---- + moves = [] + for rel in candidates: + if rel in skips: + continue + if args.apply and rel not in confirms: + continue # gate failure reported below + content = read_text(os.path.join(root, rel)) + moves.append({ + "src": rel, + "dest": dest_for(rel), + "keyword": confirms.get(rel, None), + "reason": reasons.get(rel), + "title": title_for(content, rel), + "id": str(uuid.uuid4()), + }) + moved_map = {mv["src"]: mv["dest"] for mv in moves} + link_edits, ambiguous = plan_link_edits(root, moved_map) + bare = scan_bare_mentions(root, moved_map) + reports = scan_report_only(root, moved_map) + + # ---- Report ---- + for rel in candidates: + content = read_text(os.path.join(root, rel)) + ev = gather_evidence(root, rel, content) + proposed = propose_keyword(ev) + print("CANDIDATE %s -> %s" % (rel, dest_for(rel))) + suffix = " (terminal — requires --reason to apply)" if proposed in TERMINAL else "" + print(" proposed keyword: %s%s" % (proposed, suffix)) + print(" evidence:") + print(" status field: %s" % (ev["status"] or "(none)")) + print(" cookies: %s" % ("; ".join(ev["cookies"]) or "(none)")) + print(" todo.org: %s" % (ev["todo"] or "(no linking task)")) + print(" history: %s" % (ev["history"] or "(none)")) + n_exist, artifacts = ev["artifacts"] + if artifacts: + print(" artifacts: %d/%d named paths exist (%s)" % (n_exist, len(artifacts), ", ".join(artifacts))) + else: + print(" artifacts: (none named)") + for rel in anomalies: + print("ANOMALY %s: named -spec.org but lacks the spec spine (Decisions + Implementation phases); surfaced, not moved" % rel) + for rel in notes: + print("NOTE %s" % rel) + for linker, edits in sorted(link_edits.items()): + for old, new in edits: + print("RELINK %s: %s -> %s" % (linker, old, new)) + for a in ambiguous: + print("AMBIGUOUS %s" % a) + for f, ln, src in bare: + print("BARE-PATH %s:%d: %s (reported for manual handling, never rewritten)" % (f, ln, src)) + for rel, src, note in reports: + print("REPORT %s: reference to %s (%s)" % (rel, src, note)) + + if not args.apply: + if candidates or anomalies or notes: + print("DRY RUN — no changes written. Pass --apply with per-candidate --confirm/--skip to execute.") + return 0 + + # ---- Apply: preflight ---- + try: + porcelain = subprocess.run( + ["git", "status", "--porcelain"], cwd=root, + capture_output=True, text=True, check=True, + ).stdout + except (subprocess.CalledProcessError, FileNotFoundError): + print("spec-sort: --apply needs a git worktree (recovery depends on git restore)") + return 2 + if porcelain.strip(): + dirty = [ln[3:] for ln in porcelain.splitlines()] + if not args.allow_dirty: + print("spec-sort: refusing --apply on a dirty worktree (%d path%s). Commit or stash first, or pass --allow-dirty." + % (len(dirty), "s" if len(dirty) != 1 else "")) + return 2 + print("WARNING --allow-dirty: recovery via git restore would also revert your pre-existing uncommitted changes:") + for p in dirty: + print(" %s" % p) + + # ---- Apply: confirm gate ---- + unaddressed = [rel for rel in candidates if rel not in confirms and rel not in skips] + if unaddressed: + print("spec-sort: unconfirmed candidate(s) — pass --confirm REL=KEYWORD or --skip REL for each:") + for rel in unaddressed: + print(" %s" % rel) + return 1 + for mv in moves: + if mv["keyword"] in TERMINAL and not mv["reason"]: + print("spec-sort: %s -> %s is a terminal state and requires an explicit --reason %s=TEXT" + % (mv["src"], mv["keyword"], mv["src"])) + return 1 + + # ---- Apply: validation ---- + problems = [] + dests = {} + for mv in moves: + if os.path.exists(os.path.join(root, mv["dest"])): + problems.append("%s: destination exists (%s)" % (mv["src"], mv["dest"])) + if mv["dest"] in dests: + problems.append("%s and %s: destination exists twice (%s)" % (mv["src"], dests[mv["dest"]], mv["dest"])) + dests[mv["dest"]] = mv["src"] + for a in ambiguous: + problems.append("ambiguous link: %s" % a) + if bare and not args.acknowledge_bare: + problems.append("bare-path mention(s) listed above need manual handling — re-run with --acknowledge-bare to proceed without rewriting them") + if problems: + print("spec-sort: validation blocked — nothing written:") + for p in problems: + print(" %s" % p) + return 1 + + # ---- Apply: record the plan, then execute from it ---- + today = datetime.now().astimezone().strftime("%Y-%m-%d") + plan = { + "root": root, "date": today, "moves": moves, + "link_edits": link_edits, "bare": bare, + "reports": [list(r) for r in reports], "created_dirs": [], + } + plan_path = args.plan_file or os.path.join( + tempfile.gettempdir(), "spec-sort-plan-%s.json" % os.path.basename(root)) + with open(plan_path, "w", encoding="utf-8") as f: + json.dump(plan, f, indent=2) + print("plan written: %s" % plan_path) + + fail_after = int(os.environ.get("SPEC_SORT_INJECT_FAIL_AFTER", "0") or 0) + try: + applied = apply_plan(root, plan, fail_after) + except ApplyFailure as exc: + print("write failed: %s" % exc.args[2]) + print_recovery(plan, exc.args[0], exc.args[1]) + return 1 + + residue = residue_check(root, plan) + if residue: + print("spec-sort: residue after apply — old paths still referenced:") + for r in residue: + print(" %s" % r) + print_recovery(plan, applied, []) + return 1 + + stamp_marker(root, today) + for a in applied: + print("applied: %s" % a) + print("spec-sort: done — %d spec(s) sorted, :LAST_SPEC_SORT: %s stamped" % (len(moves), today)) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/.ai/scripts/tests/route-batch.bats b/.ai/scripts/tests/route-batch.bats new file mode 100644 index 0000000..84ded5f --- /dev/null +++ b/.ai/scripts/tests/route-batch.bats @@ -0,0 +1,202 @@ +#!/usr/bin/env bats +# +# Tests for claude-templates/.ai/scripts/route-batch — the wrap-up router's +# mechanical go path (wrapup-routing spec, Phase 4 / D7 / D9). +# +# Contract under test: +# route-batch --list one "<destination>\t<heading>" line per task +# carrying :ROUTE_CANDIDATE:; silent when none; +# never modifies anything +# route-batch --go per candidate: write the subtree (minus the +# :ROUTE_CANDIDATE: line) as a one-task handoff, +# deliver via inbox-send to the destination's +# inbox/, then remove the subtree from the local +# todo.org. Send failure leaves the task in +# place and exits non-zero. Empty set: no-op. +# +# Strategy: fixture roots under $TEST_DIR hold a source project and two +# destination projects; INBOX_SEND_ROOTS sandboxes inbox-send's discovery to +# them (the same hook inbox-send's own tests use). + +SCRIPT="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)/route-batch" + +setup() { + TEST_DIR="$(mktemp -d -t route-batch-bats.XXXXXX)" + ROOTS="$TEST_DIR/roots" + SRC="$ROOTS/srcproj" + mkdir -p "$SRC/.ai" "$SRC/inbox" \ + "$ROOTS/alpha/.ai" "$ROOTS/alpha/inbox" \ + "$ROOTS/beta/.ai" "$ROOTS/beta/inbox" + touch "$ROOTS/alpha/todo.org" # alpha has a todo.org; beta deliberately not + + cat > "$SRC/todo.org" <<'EOF' +* Srcproj Open Work +** TODO [#B] Alpha-bound task :feature: +:PROPERTIES: +:ROUTE_CANDIDATE: alpha +:END: +Body line about the alpha work. +*** TODO Sub-task that rides along +** TODO [#C] Purely local task +Local body stays put. +** TODO [#C] Beta-bound task :quick: +:PROPERTIES: +:CREATED: [2026-07-01 Tue] +:ROUTE_CANDIDATE: beta +:END: +Beta body. +EOF + + export INBOX_SEND_ROOTS="$ROOTS" + cd "$SRC" +} + +teardown() { + rm -rf "$TEST_DIR" +} + +# ---- --list ------------------------------------------------------------ + +@test "route-batch --list: one destination+heading line per candidate, backlog excluded" { + run "$SCRIPT" --list + [ "$status" -eq 0 ] + [[ "$output" == *"alpha"*"Alpha-bound task"* ]] + [[ "$output" == *"beta"*"Beta-bound task"* ]] + [[ "$output" != *"Purely local task"* ]] +} + +@test "route-batch --list: empty candidate set is silent (exit 0)" { + sed -i '/:ROUTE_CANDIDATE:/d' todo.org + run "$SCRIPT" --list + [ "$status" -eq 0 ] + [ -z "$output" ] +} + +@test "route-batch --list: modifies nothing (skip leaves all in place)" { + before="$(cat todo.org)" + run "$SCRIPT" --list + [ "$status" -eq 0 ] + [ "$(cat todo.org)" = "$before" ] + [ -z "$(ls "$ROOTS/alpha/inbox" "$ROOTS/beta/inbox" 2>/dev/null | grep -v ':')" ] +} + +# ---- --go -------------------------------------------------------------- + +@test "route-batch --go: delivers each candidate to its destination inbox with provenance" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + alpha_file=$(find "$ROOTS/alpha/inbox" -name '*from-srcproj*' -type f) + beta_file=$(find "$ROOTS/beta/inbox" -name '*from-srcproj*' -type f) + [ -n "$alpha_file" ] + [ -n "$beta_file" ] + grep -q 'Alpha-bound task' "$alpha_file" + grep -q 'Sub-task that rides along' "$alpha_file" # children ride along + grep -q 'Beta-bound task' "$beta_file" + ! grep -q ':ROUTE_CANDIDATE:' "$alpha_file" + ! grep -q ':ROUTE_CANDIDATE:' "$beta_file" +} + +@test "route-batch --go: removes routed subtrees from todo.org, leaves local tasks" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + ! grep -q 'Alpha-bound task' todo.org + ! grep -q 'Sub-task that rides along' todo.org + ! grep -q 'Beta-bound task' todo.org + grep -q 'Purely local task' todo.org + grep -q 'Local body stays put' todo.org +} + +@test "route-batch --go: a kept property drawer survives minus the marker" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + beta_file=$(find "$ROOTS/beta/inbox" -name '*from-srcproj*' -type f) + grep -q ':CREATED: \[2026-07-01 Tue\]' "$beta_file" +} + +@test "route-batch --go: destination with inbox/ but no todo.org still delivers" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + [ ! -f "$ROOTS/beta/todo.org" ] + [ -n "$(find "$ROOTS/beta/inbox" -name '*from-srcproj*' -type f)" ] +} + +@test "route-batch --go: empty candidate set is a silent no-op (exit 0)" { + sed -i '/:ROUTE_CANDIDATE:/d' todo.org + before="$(cat todo.org)" + run "$SCRIPT" --go + [ "$status" -eq 0 ] + [ -z "$output" ] + [ "$(cat todo.org)" = "$before" ] +} + +@test "route-batch --go: a failed send leaves that task in place, marker intact, and exits non-zero" { + sed -i 's/:ROUTE_CANDIDATE: beta/:ROUTE_CANDIDATE: ghost/' todo.org + run "$SCRIPT" --go + [ "$status" -ne 0 ] + grep -q 'Beta-bound task' todo.org # failed route stays local + grep -q ':ROUTE_CANDIDATE: ghost' todo.org # marker survives so it resurfaces next wrap + ! grep -q 'Alpha-bound task' todo.org # the good route still landed + [ -n "$(find "$ROOTS/alpha/inbox" -name '*from-srcproj*' -type f)" ] +} + +@test "route-batch --go: handoff headings are promoted to top level" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + alpha_file=$(find "$ROOTS/alpha/inbox" -name '*from-srcproj*' -type f) + grep -q '^\* TODO \[#B\] Alpha-bound task' "$alpha_file" + grep -q '^\*\* TODO Sub-task that rides along' "$alpha_file" +} + +@test "route-batch --go: a drawer emptied by the marker strip is pruned from the handoff" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + alpha_file=$(find "$ROOTS/alpha/inbox" -name '*from-srcproj*' -type f) + ! grep -q ':PROPERTIES:' "$alpha_file" +} + +# ---- Overlapping candidates (nested marker data-loss regression) -------- + +@test "route-batch --go: nested candidates conflict — both stay, bystander survives, exit non-zero" { + cat > todo.org <<'EOF' +* Srcproj Open Work +** TODO [#B] Parent bound for alpha +:PROPERTIES: +:ROUTE_CANDIDATE: alpha +:END: +Parent body. +*** TODO Child bound for beta +:PROPERTIES: +:ROUTE_CANDIDATE: beta +:END: +Child body. +** TODO [#C] Innocent bystander task +Bystander body. +EOF + run "$SCRIPT" --go + [ "$status" -ne 0 ] + [[ "$output" == *"CONFLICT"* ]] + grep -q 'Parent bound for alpha' todo.org + grep -q 'Child bound for beta' todo.org + grep -q 'Innocent bystander task' todo.org + grep -q 'Bystander body' todo.org + [ -z "$(find "$ROOTS/alpha/inbox" "$ROOTS/beta/inbox" -name '*from-srcproj*' -type f)" ] +} + +@test "route-batch: duplicate identical markers in one drawer dedupe to a single route" { + cat > todo.org <<'EOF' +* Srcproj Open Work +** TODO [#B] Double-tagged for alpha +:PROPERTIES: +:ROUTE_CANDIDATE: alpha +:ROUTE_CANDIDATE: alpha +:END: +Body. +EOF + run "$SCRIPT" --list + [ "$status" -eq 0 ] + [ "$(echo "$output" | grep -c 'Double-tagged')" -eq 1 ] + [[ "$output" != *"CONFLICT"* ]] + run "$SCRIPT" --go + [ "$status" -eq 0 ] + [ "$(find "$ROOTS/alpha/inbox" -name '*from-srcproj*' -type f | wc -l)" -eq 1 ] +} diff --git a/.ai/scripts/tests/self-inject.bats b/.ai/scripts/tests/self-inject.bats new file mode 100644 index 0000000..482f61d --- /dev/null +++ b/.ai/scripts/tests/self-inject.bats @@ -0,0 +1,78 @@ +#!/usr/bin/env bats +# Tests for self-inject.sh — tmux is the external boundary, stubbed with a +# recording fake so no real server is needed. + +setup() { + SCRIPT="$BATS_TEST_DIRNAME/../self-inject.sh" + STUB_DIR="$BATS_TEST_TMPDIR/bin" + LOG="$BATS_TEST_TMPDIR/tmux.log" + mkdir -p "$STUB_DIR" +} + +# A tmux stub that records every invocation and answers list-panes from +# $STUB_PANES (empty by default, so pane derivation fails unless a test +# provides ancestry-matching output). +make_stub() { + cat > "$STUB_DIR/tmux" <<'EOF' +#!/bin/sh +echo "$@" >> "$LOG" +case "$1" in + list-panes) printf '%s\n' "$STUB_PANES" ;; +esac +EOF + chmod +x "$STUB_DIR/tmux" +} + +@test "self-inject: -t pane with no pairs echoes the pane and exits 0" { + make_stub + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="" sh "$SCRIPT" -t %42 + [ "$status" -eq 0 ] + [ "$output" = "%42" ] + # Pane was supplied, nothing sent: tmux must not have been called. + [ ! -e "$LOG" ] +} + +@test "self-inject: no pane derivable and no -t exits 1 with an error" { + make_stub + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="" sh "$SCRIPT" 0 "hello" + [ "$status" -eq 1 ] + case "$output" in *"no owning pane"*) : ;; *) false ;; esac +} + +@test "self-inject: derives the pane from process ancestry via list-panes" { + make_stub + # The stub reports the bats test process itself as a pane's pane_pid; + # the script runs as our child, so that pid is in its ancestry. + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="$$ %7" sh "$SCRIPT" + [ "$status" -eq 0 ] + [ "$output" = "%7" ] +} + +@test "self-inject: one delay/text pair sends literal text then Enter" { + make_stub + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="" sh "$SCRIPT" -t %3 0 "/clear" + [ "$status" -eq 0 ] + run cat "$LOG" + [ "${lines[0]}" = "send-keys -t %3 -l /clear" ] + [ "${lines[1]}" = "send-keys -t %3 Enter" ] +} + +@test "self-inject: multiple pairs send in order" { + make_stub + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="" \ + sh "$SCRIPT" -t %3 0 "/clear" 0 "go — resume" + [ "$status" -eq 0 ] + run cat "$LOG" + [ "${lines[0]}" = "send-keys -t %3 -l /clear" ] + [ "${lines[1]}" = "send-keys -t %3 Enter" ] + [ "${lines[2]}" = "send-keys -t %3 -l go — resume" ] + [ "${lines[3]}" = "send-keys -t %3 Enter" ] +} + +@test "self-inject: dangling odd argument after pairs is ignored" { + make_stub + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="" sh "$SCRIPT" -t %3 0 "one" 99 + [ "$status" -eq 0 ] + run cat "$LOG" + [ "${#lines[@]}" -eq 2 ] +} diff --git a/.ai/scripts/tests/spec-sort.bats b/.ai/scripts/tests/spec-sort.bats new file mode 100644 index 0000000..583e458 --- /dev/null +++ b/.ai/scripts/tests/spec-sort.bats @@ -0,0 +1,453 @@ +#!/usr/bin/env bats +# +# Tests for claude-templates/.ai/scripts/spec-sort — the one-time docs-pile +# retrofit from the docs-lifecycle spec: classify docs/**/*.org outside +# docs/specs/ (spec candidate iff it carries BOTH a Decisions heading AND an +# Implementation phases heading), show an evidence panel, and on --apply +# move + rename confirmed candidates to docs/specs/*-spec.org, prepend the +# status heading (:ID:, dated history line), rewrite the keyword header to +# the two-sequence form, relink file: links across the rewritten roots, +# stamp :LAST_SPEC_SORT: in .ai/notes.org. +# +# Contract under test (docs/specs/2026-07-01-docs-lifecycle-spec.org, +# "The retrofit"): +# - dry-run report is the default; --apply writes +# - --apply refuses on a dirty worktree (exit 2) unless --allow-dirty +# - every candidate needs --confirm REL=KEYWORD or --skip REL (exit 1 +# otherwise); terminal keywords need --reason REL=TEXT +# - plan validated before the first write; destination collisions block +# - bare-path mentions in rewritten roots block --apply until +# --acknowledge-bare waives them (reported, never rewritten) +# - mid-apply failure names applied/not-applied + git restore recovery +# - idempotent: a sorted project yields no candidates, no changes +# +# Strategy: each test builds a throwaway git project fixture and runs the +# real script against it. Mid-apply failure is forced via the test-only +# SPEC_SORT_INJECT_FAIL_AFTER env hook. + +SCRIPT="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)/spec-sort" + +setup() { + TEST_DIR="$(mktemp -d -t spec-sort-bats.XXXXXX)" + PROJ="$TEST_DIR/proj" + mkdir -p "$PROJ" +} + +teardown() { + rm -rf "$TEST_DIR" +} + +# Standard fixture: one spec candidate, one note, a stray root spec with a +# spine, an anomaly (-spec.org name, no spine), inbound links from todo.org, +# a sibling note, a session archive (report-only surface), and .ai/notes.org +# with a Workflow State section. +make_project() { + cd "$PROJ" + git init -q + git config user.email test@test + git config user.name test + mkdir -p docs/design .ai/sessions + + cat > docs/design/widget.org <<'EOF' +#+TITLE: Widget Feature +#+DATE: 2026-05-01 +#+TODO: DRAFT REVIEW | SHIPPED + +* Metadata +| Status | draft | +| Owner | Craig | + +* Summary +The widget feature. See [[file:scratch-note.org][the note]]. + +* Decisions [1/2] +** DONE Pick the widget shape +** TODO Pick the color + +* Implementation phases +** Phase 1 — build =src/widget.py= +EOF + + cat > docs/design/scratch-note.org <<'EOF' +#+TITLE: Scratch Note + +* Metadata +| Status | n/a | + +* Thoughts +See [[file:widget.org][the widget spec]]. +EOF + + cat > docs/rooty-spec.org <<'EOF' +#+TITLE: Rooty + +* Decisions +** DONE Only decision + +* Implementation phases +** Phase 1 — nothing +EOF + + cat > docs/lonely-spec.org <<'EOF' +#+TITLE: Lonely +Just prose, no spine. +EOF + + cat > todo.org <<'EOF' +* Open Work +** DOING [#B] Widget feature +Spec: [[file:docs/design/widget.org][widget spec]]. +Summary anchor: [[file:docs/design/widget.org::*Summary][the summary]]. +EOF + + cat > .ai/notes.org <<'EOF' +* Active Reminders + +* Workflow State +:LAST_AUDIT: 2026-06-28 +EOF + + cat > .ai/sessions/2026-06-01-old.org <<'EOF' +Old log: [[file:../../docs/design/widget.org][widget]] +EOF + + git add -A + git commit -qm init +} + +# Confirm flags that satisfy the gate for the standard fixture's candidates. +CONFIRM_ALL=(--confirm docs/design/widget.org=DRAFT --confirm docs/rooty-spec.org=DRAFT) + +# ---- Classification (dry-run) ---------------------------------------- + +@test "spec-sort: dry-run classifies the spine-carrying doc as a candidate" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"CANDIDATE docs/design/widget.org -> docs/specs/widget-spec.org"* ]] +} + +@test "spec-sort: a Metadata table alone does not qualify — note stays a note" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"NOTE docs/design/scratch-note.org"* ]] + [[ "$output" != *"CANDIDATE docs/design/scratch-note.org"* ]] +} + +@test "spec-sort: stray root spec with a spine is a candidate, suffix not doubled" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"CANDIDATE docs/rooty-spec.org -> docs/specs/rooty-spec.org"* ]] + [[ "$output" != *"rooty-spec-spec.org"* ]] +} + +@test "spec-sort: -spec.org name without a spine is an anomaly, never auto-moved" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"ANOMALY docs/lonely-spec.org"* ]] + [[ "$output" != *"CANDIDATE docs/lonely-spec.org"* ]] +} + +@test "spec-sort: docs/specs/ contents are excluded from classification" { + make_project + mkdir -p docs/specs + cp docs/design/widget.org docs/specs/sorted-spec.org + git add -A && git commit -qm more + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" != *"CANDIDATE docs/specs/sorted-spec.org"* ]] +} + +@test "spec-sort: no docs/ directory is a silent no-op" { + cd "$PROJ" + git init -q + git config user.email test@test + git config user.name test + echo x > README.md + git add -A && git commit -qm init + run "$SCRIPT" + [ "$status" -eq 0 ] + [ -z "$output" ] +} + +# ---- Evidence panel --------------------------------------------------- + +@test "spec-sort: evidence panel shows status field, cookies, and todo.org task" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"status field: draft"* ]] + [[ "$output" == *"Decisions [1/2]"* ]] + [[ "$output" == *"todo.org:"*"DOING"*"Widget feature"* ]] +} + +@test "spec-sort: keyword proposal follows the evidence — DOING from the linked DOING task" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + # status field says draft, but the linking todo.org task is DOING — the + # panel proposes the state the strongest evidence supports + [[ "$output" == *"proposed keyword: DOING"* ]] +} + +@test "spec-sort: an 'incomplete' status field never proposes the terminal IMPLEMENTED" { + make_project + sed -i 's/| Status | draft |/| Status | incomplete |/' docs/design/widget.org + git add -A && git commit -qm status + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" != *"proposed keyword: IMPLEMENTED"* ]] +} + +# ---- Confirm gate ----------------------------------------------------- + +@test "spec-sort --apply: refuses when a candidate is neither confirmed nor skipped" { + make_project + run "$SCRIPT" --apply --confirm docs/design/widget.org=DRAFT + [ "$status" -eq 1 ] + [[ "$output" == *"unconfirmed"* ]] + [[ "$output" == *"docs/rooty-spec.org"* ]] + [ -f docs/design/widget.org ] # nothing moved +} + +@test "spec-sort --apply: a terminal keyword without --reason refuses" { + make_project + run "$SCRIPT" --apply --confirm docs/design/widget.org=IMPLEMENTED --skip docs/rooty-spec.org + [ "$status" -eq 1 ] + [[ "$output" == *"--reason"* ]] + [ -f docs/design/widget.org ] +} + +@test "spec-sort --apply: a terminal keyword with --reason records it in the history line" { + make_project + run "$SCRIPT" --apply --confirm docs/design/widget.org=IMPLEMENTED \ + --reason "docs/design/widget.org=shipped in v2, confirmed against src" \ + --skip docs/rooty-spec.org + [ "$status" -eq 0 ] + grep -q '^\* IMPLEMENTED Widget Feature' docs/specs/widget-spec.org + grep -q 'shipped in v2, confirmed against src' docs/specs/widget-spec.org +} + +@test "spec-sort --apply: --skip leaves the candidate in place and still stamps the marker" { + make_project + run "$SCRIPT" --apply --skip docs/design/widget.org --skip docs/rooty-spec.org + [ "$status" -eq 0 ] + [ -f docs/design/widget.org ] + grep -q ':LAST_SPEC_SORT:' .ai/notes.org +} + +# ---- Preflight -------------------------------------------------------- + +@test "spec-sort --apply: refuses on a dirty worktree (exit 2)" { + make_project + echo "drift" >> todo.org + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 2 ] + [[ "$output" == *"dirty"* ]] + [ -f docs/design/widget.org ] +} + +@test "spec-sort --apply --allow-dirty: proceeds and names what recovery loses" { + make_project + echo "drift" >> todo.org + git add todo.org && git commit -qm drift # keep the link intact; dirty a different file + echo "scratch" > untracked-note.txt + echo "local edit" >> .ai/notes.org + run "$SCRIPT" --apply --allow-dirty "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + [[ "$output" == *"pre-existing"* ]] + [[ "$output" == *".ai/notes.org"* ]] + [ -f docs/specs/widget-spec.org ] +} + +# ---- Move + rename + rewrite ------------------------------------------ + +@test "spec-sort --apply: moves, renames to -spec.org, prepends status heading with :ID: and history" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + [ -f docs/specs/widget-spec.org ] + [ ! -f docs/design/widget.org ] + grep -q '^\* DRAFT Widget Feature' docs/specs/widget-spec.org + grep -q ':ID:' docs/specs/widget-spec.org + grep -q 'retrofitted by spec-sort' docs/specs/widget-spec.org +} + +@test "spec-sort --apply: keyword header rewritten to the two-sequence form" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '^#+TODO: TODO | DONE$' docs/specs/widget-spec.org + grep -q '^#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED$' docs/specs/widget-spec.org + ! grep -q 'DRAFT REVIEW | SHIPPED' docs/specs/widget-spec.org +} + +@test "spec-sort --apply: Metadata Status field mirrors the confirmed keyword in lowercase" { + make_project + run "$SCRIPT" --apply --confirm docs/design/widget.org=READY --skip docs/rooty-spec.org + [ "$status" -eq 0 ] + grep -q '^\* READY Widget Feature' docs/specs/widget-spec.org + grep -Eq '^\| Status[[:space:]]*\|[[:space:]]*ready' docs/specs/widget-spec.org +} + +# ---- Relink ----------------------------------------------------------- + +@test "spec-sort --apply: rewrites the todo.org link, preserving the description" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '\[\[file:docs/specs/widget-spec.org\]\[widget spec\]\]' todo.org + ! grep -q 'docs/design/widget.org' todo.org +} + +@test "spec-sort --apply: preserves a ::anchor suffix through the rewrite" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '\[\[file:docs/specs/widget-spec.org::\*Summary\]\[the summary\]\]' todo.org +} + +@test "spec-sort --apply: recomputes a sibling note's relative link to the moved spec" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '\[\[file:../specs/widget-spec.org\]\[the widget spec\]\]' docs/design/scratch-note.org +} + +@test "spec-sort --apply: recomputes the moved spec's own outbound link to an unmoved note" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '\[\[file:../design/scratch-note.org\]\[the note\]\]' docs/specs/widget-spec.org +} + +@test "spec-sort: session archives are reported, never rewritten" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"REPORT .ai/sessions/2026-06-01-old.org"* ]] + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q 'docs/design/widget.org' .ai/sessions/2026-06-01-old.org +} + +@test "spec-sort: a synced template path report names the canonical rulesets file" { + make_project + mkdir -p .ai/workflows + echo 'See [[file:../../docs/design/widget.org][widget]]' > .ai/workflows/startup.org + git add -A && git commit -qm wf + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"REPORT .ai/workflows/startup.org"* ]] + [[ "$output" == *"claude-templates/.ai/workflows/startup.org"* ]] +} + +# ---- Bare-path mentions ----------------------------------------------- + +@test "spec-sort --apply: a bare-path mention in a rewritten root blocks until acknowledged" { + make_project + echo "raw mention: docs/design/widget.org needs review" >> todo.org + git add -A && git commit -qm bare + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 1 ] + [[ "$output" == *"BARE"* ]] + [ -f docs/design/widget.org ] # nothing moved + run "$SCRIPT" --apply --acknowledge-bare "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q 'raw mention: docs/design/widget.org' todo.org # reported, never rewritten +} + +@test "spec-sort --apply: a moving doc's bare mention of its own old path is acknowledgeable, not post-apply residue" { + make_project + echo "History: docs/design/widget.org was drafted in May." >> docs/design/widget.org + git add -A && git commit -qm selfmention + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 1 ] + [[ "$output" == *"BARE"* ]] + run "$SCRIPT" --apply --acknowledge-bare "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] # the acknowledged mention rides along to docs/specs/; not residue + grep -q ':LAST_SPEC_SORT:' .ai/notes.org +} + +# ---- Plan validation --------------------------------------------------- + +@test "spec-sort --apply: a destination collision blocks validation, nothing moved" { + make_project + mkdir -p docs/specs + echo "occupied" > docs/specs/widget-spec.org + git add -A && git commit -qm occupy + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 1 ] + [[ "$output" == *"destination exists"* ]] + [ -f docs/design/widget.org ] + [ "$(cat docs/specs/widget-spec.org)" = "occupied" ] +} + +@test "spec-sort --apply: writes the plan file before executing" { + make_project + run "$SCRIPT" --apply --plan-file "$TEST_DIR/plan.json" "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + [ -f "$TEST_DIR/plan.json" ] + grep -q 'widget-spec.org' "$TEST_DIR/plan.json" +} + +# ---- Mid-apply failure recovery ---------------------------------------- + +@test "spec-sort --apply: forced mid-apply failure yields named recovery, not a half-migrated shrug" { + make_project + run env SPEC_SORT_INJECT_FAIL_AFTER=1 "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 1 ] + [[ "$output" == *"RECOVERY"* ]] + [[ "$output" == *"git restore"* ]] + [[ "$output" == *"applied"* ]] + [[ "$output" == *"not applied"* ]] + ! grep -q ':LAST_SPEC_SORT:' .ai/notes.org # no stamp on a failed apply +} + +# ---- Idempotence + marker ---------------------------------------------- + +@test "spec-sort --apply: stamps :LAST_SPEC_SORT: in the Workflow State section" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q ':LAST_SPEC_SORT: ' .ai/notes.org + # lands inside the Workflow State section, alongside the existing marker + awk '/^\* Workflow State/{ws=1} ws && /:LAST_SPEC_SORT:/{found=1} END{exit !found}' .ai/notes.org +} + +@test "spec-sort --apply: creates the Workflow State section when notes.org lacks it" { + make_project + printf '* Active Reminders\n' > .ai/notes.org + git add -A && git commit -qm notes + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '^\* Workflow State' .ai/notes.org + grep -q ':LAST_SPEC_SORT: ' .ai/notes.org +} + +@test "spec-sort --apply: zero candidates still stamps the marker (clears the nudge)" { + make_project + rm docs/design/widget.org docs/rooty-spec.org docs/lonely-spec.org + git add -A && git commit -qm notes-only + run "$SCRIPT" --apply + [ "$status" -eq 0 ] + grep -q ':LAST_SPEC_SORT:' .ai/notes.org +} + +@test "spec-sort: a second run after a successful apply finds nothing to do" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + git add -A && git commit -qm sorted + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" != *"CANDIDATE"* ]] + run "$SCRIPT" --apply + [ "$status" -eq 0 ] + run git status --porcelain + # only the re-stamped marker (same date) may differ — tree stays clean + [ -z "$(git status --porcelain -- docs todo.org)" ] +} diff --git a/.ai/scripts/tests/test-lint-org.el b/.ai/scripts/tests/test-lint-org.el index 3b8a9bb..d14879f 100644 --- a/.ai/scripts/tests/test-lint-org.el +++ b/.ai/scripts/tests/test-lint-org.el @@ -685,6 +685,37 @@ missing-rules violation." (judgments (lo-test--judgments (plist-get out :issues)))) (should-not (member 'level-2-dated-header (lo-test--checkers judgments))))) +;;; subtask-done-not-dated check (the inverse: level-3+ done keyword) + +(ert-deftest lo-subtask-done-not-dated-flags-level3 () + "A level-3 DONE sub-task still carrying the keyword is flagged for conversion." + (let* ((out (lo-test--run + "* Open Work\n\n** TODO [#B] Parent\n*** DONE [#C] Sub-task done\nCLOSED: [2026-06-20 Sat 10:00]\nBody.\n")) + (judgments (lo-test--judgments (plist-get out :issues)))) + (should (= 0 (plist-get out :fixes))) ; judgment-only, never auto-fixed + (should (member 'subtask-done-not-dated (lo-test--checkers judgments))))) + +(ert-deftest lo-subtask-done-not-dated-flags-level4-cancelled () + "A level-4 CANCELLED sub-task is flagged too." + (let* ((out (lo-test--run + "* Open Work\n\n** PROJECT [#B] Parent\n*** TODO Mid\n**** CANCELLED Deep abandoned\nCLOSED: [2026-06-20 Sat]\n")) + (judgments (lo-test--judgments (plist-get out :issues)))) + (should (member 'subtask-done-not-dated (lo-test--checkers judgments))))) + +(ert-deftest lo-subtask-done-not-dated-ignores-level2 () + "A level-2 DONE task is a top-level task, not a sub-task — this checker skips it." + (let* ((out (lo-test--run + "* Open Work\n\n** DONE [#B] Top-level\nCLOSED: [2026-06-20 Sat]\nBody.\n")) + (judgments (lo-test--judgments (plist-get out :issues)))) + (should-not (member 'subtask-done-not-dated (lo-test--checkers judgments))))) + +(ert-deftest lo-subtask-done-not-dated-ignores-dated-and-lowercase () + "An already-dated level-3 entry, and the word done in a title, are not flagged." + (let* ((out (lo-test--run + "* Open Work\n\n** TODO [#B] Parent\n*** 2026-06-20 Sat @ 10:00:00 -0400 landed\n*** TODO wrap the done cleanup\n")) + (judgments (lo-test--judgments (plist-get out :issues)))) + (should-not (member 'subtask-done-not-dated (lo-test--checkers judgments))))) + ;;; --------------------------------------------------------------------------- ;;; structural heading checks (org-lint gaps) diff --git a/.ai/scripts/tests/test-todo-cleanup.el b/.ai/scripts/tests/test-todo-cleanup.el index e569d9a..ffbf2fb 100644 --- a/.ai/scripts/tests/test-todo-cleanup.el +++ b/.ai/scripts/tests/test-todo-cleanup.el @@ -768,5 +768,176 @@ in ISSUES, in document order." (should (= 2 (plist-get once :bumped))) (should (= 2 (plist-get twice :bumped))))) +;;; --------------------------------------------------------------------------- +;;; --convert-subtasks harness + tests + +(defun tc-test--reset-convert (&optional check) + (setq tc-fixes 0 tc-archived 0 tc-bumped 0 tc-converted 0 tc-archived-to-file 0 + tc-issues nil + tc-check-only (and check t) + tc-archive-done nil tc-sync-child-priority nil tc-convert-subtasks t + tc-current-file nil + tc-archive-retain-days nil tc-archive-reference-date nil tc-archive-file nil)) + +(defun tc-test--convert (content &optional runs check) + "Write CONTENT to a temp .org file, run `--convert-subtasks' RUNS times (default 1). +Return a plist: :result final file contents, :converted count from the last run, +:issues from the last run. CHECK non-nil ⇒ --check (preview, no writes)." + (let ((file (make-temp-file "tc-test-" nil ".org")) + last-converted last-issues) + (unwind-protect + (progn + (with-temp-file file (insert content)) + (dotimes (_ (or runs 1)) + (tc-test--reset-convert check) + (tc-process-file file) + (setq last-converted tc-converted last-issues tc-issues) + (tc-test--drop-buffer file)) + (list :result (with-temp-buffer (insert-file-contents file) + (buffer-string)) + :converted last-converted + :issues last-issues)) + (tc-test--drop-buffer file) + (delete-file file)))) + +;; The UTC offset in a converted header is the test machine's local offset for +;; that date, so assertions match it as `[-+]NNNN' rather than a fixed value — +;; the mode's job is to emit a well-formed offset, not to run in one timezone. + +(defconst tc-test--convert-timed + "* Project Open Work +** TODO [#B] Parent task +*** DONE [#C] F12 opens the terminal :feature:quick: +CLOSED: [2026-06-27 Sat 12:50] +Verified live: docks, toggles, colors clean. +") + +(ert-deftest tc-convert-timed-subtask-normal () + "Normal: a timed CLOSED close becomes a dated header, keyword/priority/tags/CLOSED gone." + (let* ((out (tc-test--convert tc-test--convert-timed)) + (res (plist-get out :result))) + (should (= 1 (plist-get out :converted))) + (should (string-match-p + "^\\*\\*\\* 2026-06-27 Sat @ 12:50:00 [-+][0-9]\\{4\\} F12 opens the terminal$" + res)) + (should-not (string-match-p "CLOSED:" res)) + (should-not (string-match-p "DONE" res)) + (should (string-match-p "Verified live: docks, toggles, colors clean\\." res)) + (should (string-match-p "^\\*\\* TODO \\[#B\\] Parent task$" res)))) + +(defconst tc-test--convert-dateonly + "* Project Open Work +** PROJECT [#B] Parent +**** DONE [#B] Write full spec :refactor: +CLOSED: [2026-05-04 Mon] +Body. +") + +(ert-deftest tc-convert-dateonly-boundary-midnight () + "Boundary: a date-only CLOSED (no time) yields 00:00:00, at level 4." + (let ((res (plist-get (tc-test--convert tc-test--convert-dateonly) :result))) + (should (string-match-p + "^\\*\\*\\*\\* 2026-05-04 Mon @ 00:00:00 [-+][0-9]\\{4\\} Write full spec$" + res)) + (should-not (string-match-p "CLOSED:" res)))) + +(defconst tc-test--convert-level2 + "* Project Open Work +** DONE [#B] Top-level task +CLOSED: [2026-06-01 Mon 09:00] +Body. +") + +(ert-deftest tc-convert-leaves-level-2-alone-boundary () + "Boundary: a level-2 DONE task is a top-level task, not a sub-task — untouched." + (let ((out (tc-test--convert tc-test--convert-level2))) + (should (= 0 (plist-get out :converted))) + (should (equal tc-test--convert-level2 (plist-get out :result))))) + +(ert-deftest tc-convert-idempotent-boundary () + "Boundary: a second run over an already-dated entry converts nothing new." + (let ((once (tc-test--convert tc-test--convert-timed 1)) + (twice (tc-test--convert tc-test--convert-timed 2))) + (should (equal (plist-get once :result) (plist-get twice :result))) + (should (= 0 (plist-get twice :converted))))) + +(defconst tc-test--convert-nested + "* Project Open Work +** TODO [#B] Parent +*** DONE Outer sub :feature: +CLOSED: [2026-06-10 Wed 08:15] +**** DONE Inner sub +CLOSED: [2026-06-09 Tue 07:00] +Inner body. +") + +(ert-deftest tc-convert-nested-done-subtasks-boundary () + "Boundary: a done sub-task nested under a done sub-task — both convert." + (let* ((out (tc-test--convert tc-test--convert-nested)) + (res (plist-get out :result))) + (should (= 2 (plist-get out :converted))) + (should (string-match-p + "^\\*\\*\\* 2026-06-10 Wed @ 08:15:00 [-+][0-9]\\{4\\} Outer sub$" res)) + (should (string-match-p + "^\\*\\*\\*\\* 2026-06-09 Tue @ 07:00:00 [-+][0-9]\\{4\\} Inner sub$" res)) + (should-not (string-match-p "CLOSED:" res)))) + +(defconst tc-test--convert-cancelled + "* Project Open Work +** TODO [#B] Parent +*** CANCELLED [#C] Abandoned idea :feature: +CLOSED: [2026-06-15 Mon 10:00] +") + +(ert-deftest tc-convert-cancelled-subtask-boundary () + "Boundary: a CANCELLED sub-task converts too (terminal state)." + (let ((res (plist-get (tc-test--convert tc-test--convert-cancelled) :result))) + (should (string-match-p + "^\\*\\*\\* 2026-06-15 Mon @ 10:00:00 [-+][0-9]\\{4\\} Abandoned idea$" res)) + (should-not (string-match-p "CANCELLED" res)))) + +(defconst tc-test--convert-noclosed + "* Project Open Work +** TODO [#B] Parent +*** DONE Orphan with no closed date +Body only. +") + +(ert-deftest tc-convert-skips-subtask-without-closed-error () + "Error: a done sub-task with no parseable CLOSED is flagged and left unchanged." + (let ((out (tc-test--convert tc-test--convert-noclosed))) + (should (= 0 (plist-get out :converted))) + (should (equal tc-test--convert-noclosed (plist-get out :result))) + (should (cl-some (lambda (i) (eq (plist-get i :kind) 'convert-skip)) + (plist-get out :issues))))) + +(ert-deftest tc-convert-check-mode-previews-without-writing () + "Check mode reports the conversion but writes nothing." + (let ((out (tc-test--convert tc-test--convert-timed 1 t))) + (should (= 1 (plist-get out :converted))) + (should (equal tc-test--convert-timed (plist-get out :result))) + (should (cl-some (lambda (i) (eq (plist-get i :kind) 'convert-would)) + (plist-get out :issues))))) + +(defconst tc-test--convert-closed-with-deadline + "* Project Open Work +** TODO [#B] Parent task +*** DONE [#C] Ship the panel :feature: +CLOSED: [2026-06-27 Sat 12:50] DEADLINE: <2026-06-30 Tue> +Body line. +") + +(ert-deftest tc-convert-preserves-deadline-on-shared-planning-line-boundary () + "Boundary: removing the CLOSED cookie keeps a DEADLINE sharing its planning line." + (let* ((out (tc-test--convert tc-test--convert-closed-with-deadline)) + (res (plist-get out :result))) + (should (= 1 (plist-get out :converted))) + (should (string-match-p + "^\\*\\*\\* 2026-06-27 Sat @ 12:50:00 [-+][0-9]\\{4\\} Ship the panel$" + res)) + (should-not (string-match-p "CLOSED:" res)) + (should (string-match-p "^DEADLINE: <2026-06-30 Tue>$" res)) + (should (string-match-p "^Body line\\.$" res)))) + (provide 'test-todo-cleanup) ;;; test-todo-cleanup.el ends here diff --git a/.ai/scripts/todo-cleanup.el b/.ai/scripts/todo-cleanup.el index 541d106..bd8166d 100644 --- a/.ai/scripts/todo-cleanup.el +++ b/.ai/scripts/todo-cleanup.el @@ -5,10 +5,12 @@ ;; emacs --batch -q -l todo-cleanup.el --check todo.org # hygiene report only ;; emacs --batch -q -l todo-cleanup.el --archive-done todo.org # archive completed subtrees ;; emacs --batch -q -l todo-cleanup.el --archive-done --check todo.org # preview the archive +;; emacs --batch -q -l todo-cleanup.el --convert-subtasks todo.org # dated-rewrite done level-3+ sub-tasks +;; emacs --batch -q -l todo-cleanup.el --convert-subtasks --check todo.org # preview the conversion ;; emacs --batch -q -l todo-cleanup.el --sync-child-priority todo.org # bump children whose priority drifted below the parent's ;; emacs --batch -q -l todo-cleanup.el --check-child-priority todo.org # preview the sync (same as --sync-child-priority --check) ;; -;; Three independent modes: +;; Four independent modes: ;; ;; * Default (hygiene). Designed for the wrap-it-up workflow: cheap, idempotent, ;; safe to run every session. @@ -52,6 +54,20 @@ ;; Archiving is consequential, so it's never run by default; it does *not* ;; also run the hygiene passes. ;; +;; * --convert-subtasks (opt-in). Rewrites every level-3-and-deeper heading whose +;; TODO state is DONE/CANCELLED/FAILED into a dated event-log entry +;; (`<stars> YYYY-MM-DD Day @ HH:MM:SS -ZZZZ <text>'), dropping the keyword, +;; priority cookie, and tags, and removing the now-redundant CLOSED line. The +;; date and time come from that entry's own CLOSED cookie; a date-only close +;; yields 00:00:00, and the UTC offset is computed DST-aware for that date. +;; This enforces the todo-format depth rule that interactive closes +;; (`org-log-done' → DONE + CLOSED) and `--archive-done' (level-2 only) leave +;; unapplied. The heading text is preserved verbatim — a batch tool can't +;; past-tense an imperative title reliably. Idempotent (an already-dated +;; heading has no done keyword); a done sub-task with no parseable CLOSED date +;; is flagged and left alone, never stamped with a fabricated date. Like +;; --archive-done it does not also run the hygiene passes. +;; ;; * --sync-child-priority (opt-in). Walks every heading with a priority cookie ;; ([#A]-[#D]) and, for each of its direct child headings whose own priority ;; is lower (later in the alphabet — D is lower than A), bumps the child's @@ -73,11 +89,16 @@ (require 'calendar) (setq org-todo-keywords - '((sequence "TODO" "DOING" "WAITING" "NEXT" "|" "DONE" "CANCELLED"))) + '((sequence "TODO" "DOING" "WAITING" "NEXT" "|" "DONE" "CANCELLED" "FAILED"))) (defconst tc-done-states '("DONE" "CANCELLED") "TODO keywords that mark an entry as completed for `--archive-done'.") +(defconst tc--convert-done-states '("DONE" "CANCELLED" "FAILED") + "TODO keywords whose level-3-and-deeper entries `--convert-subtasks' rewrites +to dated event-log entries. Broader than `tc-done-states' because a FAILED +sub-task is terminal too and belongs in the parent's dated history.") + (defconst tc--priority-cookie-regexp "\\[#\\([A-Z]\\)\\]" "Regexp matching an org priority cookie. Match group 1 is the letter.") @@ -89,10 +110,12 @@ every heading below it.") (defvar tc-fixes 0) (defvar tc-archived 0) (defvar tc-bumped 0) +(defvar tc-converted 0) (defvar tc-issues nil) (defvar tc-check-only nil) (defvar tc-archive-done nil) (defvar tc-sync-child-priority nil) +(defvar tc-convert-subtasks nil) (defvar tc-current-file nil) (defvar tc-current-dir nil) (defvar tc-archived-to-file 0) @@ -578,6 +601,138 @@ before their descendants — a [#A] → [#B] → [#D] chain collapses in one pas (org-map-entries #'tc-sync-child-priority-at-heading nil 'file)) ;;; --------------------------------------------------------------------------- +;;; --convert-subtasks mode +;; +;; A sub-task (a heading at level 3 or deeper, i.e. under a parent task) that is +;; marked DONE/CANCELLED/FAILED should become a dated event-log entry per the +;; todo-format depth rule: drop the keyword, priority cookie, and tags, and +;; rewrite the heading to `<stars> YYYY-MM-DD Day @ HH:MM:SS -ZZZZ <text>' so the +;; parent's subtree grows a chronological history instead of a long tail of +;; nested DONE lines. Nothing enforced this before: `org-log-done' just flips an +;; interactive close to DONE + CLOSED, and `--archive-done' only touches level 2. +;; So level-3+ closes piled up as DONE keywords. This mode converts them +;; mechanically, pulling the timestamp from each entry's own CLOSED cookie. The +;; heading text is kept verbatim (a batch tool can't reliably past-tense an +;; imperative title, and guessing prose in the task file is worse than leaving it +;; as written). Idempotent: an already-dated heading has no done keyword, so it +;; is skipped. A done sub-task with no parseable CLOSED cookie can't be dated, so +;; it is flagged and left alone rather than stamped with a fabricated date. + +(defun tc--closed-parts-in-entry () + "Return a plist (:year :month :day :dow :hour :minute) from the CLOSED cookie +of the entry at point, or nil when the entry has no parseable CLOSED line. +:hour and :minute are nil when the cookie carries only a date. The CLOSED line +sits in canonical position directly under the heading, so the first match within +the entry is the task's own close." + (save-excursion + (org-back-to-heading t) + (let ((end (save-excursion + (or (outline-next-heading) (goto-char (point-max))) + (point)))) + (when (re-search-forward + (concat "CLOSED:[ \t]*\\[\\([0-9]\\{4\\}\\)-\\([0-9]\\{2\\}\\)-\\([0-9]\\{2\\}\\)" + "[ \t]+\\([A-Za-z]+\\)" + "\\(?:[ \t]+\\([0-9]\\{2\\}\\):\\([0-9]\\{2\\}\\)\\)?\\]") + end t) + (list :year (match-string 1) :month (match-string 2) :day (match-string 3) + :dow (match-string 4) + :hour (match-string 5) :minute (match-string 6)))))) + +(defun tc--tz-offset-string (year month day hour minute) + "Return the local UTC offset (e.g. \"-0500\") for the given wall-clock instant. +DST-aware: `encode-time' with an unknown-DST field lets the system pick the +correct offset for that date, so a summer close reads -0400 and a winter one +-0500 without hardcoding either." + (format-time-string + "%z" (encode-time (list 0 minute hour day month year nil -1 nil)))) + +(defun tc--dated-header-line (level parts title) + "Build the dated event-log heading string from LEVEL, CLOSED PARTS, and TITLE. +Missing time in PARTS defaults to 00:00:00 (the close logged only a date)." + (let* ((year (plist-get parts :year)) + (month (plist-get parts :month)) + (day (plist-get parts :day)) + (dow (plist-get parts :dow)) + (hh (or (plist-get parts :hour) "00")) + (mm (or (plist-get parts :minute) "00")) + (tz (tc--tz-offset-string (string-to-number year) + (string-to-number month) + (string-to-number day) + (string-to-number hh) + (string-to-number mm)))) + (format "%s %s-%s-%s %s @ %s:%s:00 %s %s" + (make-string level ?*) year month day dow hh mm tz title))) + +(defun tc--convert-collect-targets () + "Markers at every heading at level >= 3 whose TODO state is a done state. +Collected up front so the rewrite loop can edit the buffer without disturbing an +in-progress `org-map-entries' walk; markers track their headings across edits." + (let (targets) + (org-map-entries + (lambda () + (when (and (>= (org-current-level) 3) + (member (org-get-todo-state) tc--convert-done-states)) + (push (copy-marker (point)) targets))) + nil 'file) + (nreverse targets))) + +(defun tc--convert-one-subtask (marker) + "Convert the done sub-task heading at MARKER to a dated event-log entry. +Under `tc-check-only' the conversion is reported but not performed." + (goto-char marker) + (org-back-to-heading t) + (let* ((level (org-current-level)) + (title (org-get-heading t t t t)) + (line (line-number-at-pos)) + (parts (tc--closed-parts-in-entry))) + (cond + ((null parts) + (push (list :kind 'convert-skip :file tc-current-file + :line line :heading title + :detail "no CLOSED date to derive the timestamp") + tc-issues)) + (t + (let ((new (tc--dated-header-line level parts title))) + (cl-incf tc-converted) + (if tc-check-only + (push (list :kind 'convert-would :file tc-current-file + :line line :heading title :new new) + tc-issues) + ;; Replace the heading line, then drop the now-redundant CLOSED + ;; cookie from the entry (its date now lives in the header). Only + ;; the cookie goes: a planning line can also carry DEADLINE: or + ;; SCHEDULED: beside it, and those survive on their line. A line + ;; left blank by the removal is deleted whole. + (delete-region (line-beginning-position) (line-end-position)) + (insert new) + (let ((end (save-excursion + (or (outline-next-heading) (goto-char (point-max))) + (point)))) + (save-excursion + (when (re-search-forward "CLOSED:[ \t]*\\[[^]]*\\][ \t]*" end t) + (replace-match "") + (let ((bol (line-beginning-position)) + (eol (line-end-position))) + (if (string-match-p "\\`[ \t]*\\'" + (buffer-substring bol eol)) + (delete-region bol (min (1+ eol) (point-max))) + (goto-char bol) + (when (looking-at "[ \t]+") + (replace-match ""))))))) + (push (list :kind 'convert-done :file tc-current-file + :line line :heading title :new new) + tc-issues))))))) + +(defun tc-convert-subtasks-in-file () + "Rewrite every level-3-and-deeper DONE/CANCELLED/FAILED heading to a dated +event-log entry, pulling the timestamp from its CLOSED cookie. Honors +`tc-check-only'." + (let ((targets (tc--convert-collect-targets))) + (dolist (m targets) + (tc--convert-one-subtask m) + (set-marker m nil)))) + +;;; --------------------------------------------------------------------------- ;;; Driver + reporting (defun tc-process-file (file) @@ -590,6 +745,8 @@ before their descendants — a [#A] → [#B] → [#D] chain collapses in one pas (tc-archive-done-in-file)) (tc-sync-child-priority (tc-sync-child-priority-in-file)) + (tc-convert-subtasks + (tc-convert-subtasks-in-file)) (t ;; Pass 1: auto-fix bogus state logs (or report under --check). (org-map-entries #'tc-fix-bogus-state-log-in-entry nil 'file) @@ -684,9 +841,34 @@ before their descendants — a [#A] → [#B] → [#D] chain collapses in one pas (plist-get i :child-heading) (plist-get i :parent-heading))))))) +(defun tc--emit-convert-report () + ;; Silent on a real-mode no-op (nothing to convert and nothing skipped), for + ;; the same reason as the archive report: the wrap runs cleanup passes more + ;; than once, and a vocal \"0 converted\" reads as noise. Check mode always + ;; reports (the preview is what the caller asked for), and a skip always + ;; reports (a done sub-task with no CLOSED date is a real condition to see). + (let ((has-skip (cl-some (lambda (i) (eq (plist-get i :kind) 'convert-skip)) + tc-issues))) + (when (or tc-check-only (> tc-converted 0) has-skip) + (princ (format "todo-cleanup --convert-subtasks: %d sub-task(s) %s%s\n" + tc-converted + (if tc-check-only "would convert" "converted") + (if tc-check-only " — CHECK MODE (no writes)" ""))) + (dolist (i (reverse tc-issues)) + (pcase (plist-get i :kind) + ((or 'convert-done 'convert-would) + (princ (format " %s:%d: %s\n → %s\n" + (plist-get i :file) (plist-get i :line) + (plist-get i :heading) (plist-get i :new)))) + ('convert-skip + (princ (format " skipped %s:%d: %s — %s\n" + (plist-get i :file) (plist-get i :line) + (plist-get i :heading) (plist-get i :detail))))))))) + (defun tc-emit-report () (cond (tc-archive-done (tc--emit-archive-report)) (tc-sync-child-priority (tc--emit-sync-report)) + (tc-convert-subtasks (tc--emit-convert-report)) (t (tc--emit-hygiene-report)))) (defun tc-main () @@ -701,6 +883,9 @@ before their descendants — a [#A] → [#B] → [#D] chain collapses in one pas (when (member "--sync-child-priority" command-line-args-left) (setq tc-sync-child-priority t) (setq command-line-args-left (delete "--sync-child-priority" command-line-args-left))) + (when (member "--convert-subtasks" command-line-args-left) + (setq tc-convert-subtasks t) + (setq command-line-args-left (delete "--convert-subtasks" command-line-args-left))) ;; --check-child-priority is the report-only alias for ;; `--sync-child-priority --check'. (when (member "--check-child-priority" command-line-args-left) @@ -708,7 +893,7 @@ before their descendants — a [#A] → [#B] → [#D] chain collapses in one pas (setq command-line-args-left (delete "--check-child-priority" command-line-args-left))) (if (null command-line-args-left) (progn - (princ "Usage: emacs --batch -q -l todo-cleanup.el [--check] [--archive-done | --sync-child-priority | --check-child-priority] FILE...\n") + (princ "Usage: emacs --batch -q -l todo-cleanup.el [--check] [--archive-done | --convert-subtasks | --sync-child-priority | --check-child-priority] FILE...\n") (kill-emacs 1)) (let ((files command-line-args-left)) (setq command-line-args-left nil) @@ -727,6 +912,7 @@ ert-run-tests-batch-and-exit'." (cl-every (lambda (a) (cond ((member a '("--check" "--archive-done" + "--convert-subtasks" "--sync-child-priority" "--check-child-priority")) t) diff --git a/.ai/sessions/2026-06-30-13-55-pager-mcp-ssh-alias-and-emacsd-proposals.org b/.ai/sessions/2026-06-30-13-55-pager-mcp-ssh-alias-and-emacsd-proposals.org new file mode 100644 index 0000000..3cc8501 --- /dev/null +++ b/.ai/sessions/2026-06-30-13-55-pager-mcp-ssh-alias-and-emacsd-proposals.org @@ -0,0 +1,101 @@ +#+TITLE: Session Context +#+DATE: 2026-06-30 + +* Summary + +** Active Goal +Startup, then process the .emacs.d inbox: chase how the 2026-06-29 Signal page was sent, clean up the dead page-signal path, and review + ship the three shared-asset proposals. Verify ratio's roam-sync, then wrap + tear down. + +** Decisions +- Pager: no CLI wrapper. The signal-mcp tool is one call for in-session agents (the only paging caller); a signal-cli wrapper only if a non-session caller ever needs Signal paging. Pager guidance = notify --persist at the machine, signal-mcp send_message_to_user (to Craig's UUID) when away. +- SSH alias drift: restore the bare =cjennings= alias in ~/.ssh/config (root fix, both daily drivers) rather than repointing one repo's remote. +- green-baseline: accept with 3 changes (no-suite guard, cross-ref When You Cannot Verify, placement as start-work 0.3). +- todo-cleanup aging: accept retain 7, default ON; ADD script self-protect so the archive inherits the todo file's gitignore status (Craig's rule). +- lint-org checkers: accept with indented-heading tightened to 2+ stars (1+ false-positives on valid `*` list bullets). + +** Data Collected / Findings +- The 2026-06-29 page went out via the signal-mcp tool (transcript 924ea200), not a revived account. The pager account (+15045173983) was never deregistered — the 2026-06-12 "deregistered" premise conflated the page-signal *script* removal with account loss. signal-cli registered, signal-mcp connected, live page reached Craig's phone. +- "Network down" was a misread: internet was up, DNS fine, cjennings.net resolves. Real cause = git@cjennings remote alias missing from ~/.ssh/config (only cjennings.net defined). Fixed; both repos' remotes work again. +- gitignore-mode code projects (chime, pearl, archangel, .emacs.d) all gitignore todo.org → the aging archive must follow or it leaks private task history to a tracked path on public repos. +- indented-heading 1+-star regex flagged 3 valid `*` list bullets (reproduced); 2+ stars is the correct, collision-free target. +- ratio confirmed over tailscale: roam clone + roam-sync.timer live and syncing; both daily drivers now set up. + +** Files Modified (all committed + pushed) +- a266250 — Makefile prune step for dangling bin symlinks + protocols.org pager section. +- dotfiles 3119bbb — ~/.ssh/config: restore bare =cjennings= alias (propagated to ratio over tailscale). +- d0ab047 — verification.md green-baseline section + start-work Phase 0.3. +- 324a52b — daily-drivers.md reframed around direct tailscale reach + mechanics section. +- f67e724 — todo-cleanup.el Resolved-section aging + tc--ensure-archive-gitignored self-protect + 2 tests. +- d9d8ce7 — lint-org.el four structural checkers, indented-heading at 2+ stars + negative-case test. +- a1f87b1 — inbox-process marker. 08772c5 — ratio roam-sync confirmed (todo log + cleared daily-drivers open instance). +- Three proposal source notes preserved under docs/design/. + +** Next Steps +- Memory-sync VERIFY (todo.org:225) cross-machine half done; only its manual-validation child (work/unknown-project refusal checks, needs Craig's eyes) remains before DONE. +- This wrap runs --archive-done with the new aging step: rulesets is track-mode, so ~most of its 65 Resolved entries move to a tracked archive/task-archive.org (intended). +- Pager: still-open caveat is signal-cli's 27-day receive gap (send unaffected; long gaps can desync eventually). + +KB: promoted 0 / consulted no + +* Session Log + +** 2026-06-30 ~08:50 EDT — Startup +Ran startup workflow. Network is down for git remotes (host =cjennings= unresolvable) — both the rulesets pull and the project-repo fetch failed; session continues on local state, no remote sync this session. No crash anchor (clean prior wrap). notes.org: no active reminders, no pending decisions. =make install= nothing new to link. =.ai/= synced from templates (no tracked change). + +Roam inbox: 5 items, all =emacs:= / =wttrin:= prefixed — none owned by rulesets; nothing to claim. + +Local inbox: 9 unprocessed handoffs, all from .emacs.d: +- green-baseline-proposal.org — shared-asset change (verification.md + start-work skill): add a "green baseline before starting work" gate. +- todo-cleanup.el + test-todo-cleanup.el + rulesets-note-archive-aging.org — shared-asset change to synced =.ai/scripts/todo-cleanup.el=: add Resolved-section file-aging to =--archive-done=. +- lint-org.el + test-lint-org.el + rulesets-note-lint-checkers.org — shared-asset change to synced =.ai/scripts/lint-org.el=: four structural heading checkers. +- dangling-page-signal note (2026-06-30 0112) — FYI + ask: =~/.local/bin/page-signal= is a dead symlink (canonical removed 2026-06-12); ask make install/uninstall to clean it on both daily drivers + confirm pager = =notify --persist=. +- follow-up page-signal note (0115) — FYI: GV pager account (+15045173983) may be live again as of 2026-06-29; capture the revival method, update pager guidance. + +Three of these are shared-asset proposals → skeptical review + Craig approval required (no silent apply). The two page-signal notes are one actionable cleanup + FYIs. Confirmed =~/.local/bin/page-signal= target is dead (exit 2). Working tree clean except the 9 untracked inbox files. + +** 2026-06-30 ~09:05 EDT — Signal pager investigation (Craig's option 2) +Craig asked how the 2026-06-29 Signal page was done. Traced it: the .emacs.d session (transcript 924ea200) called the signal-mcp tool =send_message_to_user= at 19:19 to Craig's UUID b1b5601e-…, after desktop notify couldn't reach his session bus. No re-registration / captcha — the pager account was never down. + +Finding: the "GV account deregistered 2026-06-12" premise in the follow-up note (and the PROCESSED-2026-06-12 note) is WRONG. Only the =page-signal= shell script was removed from canonical on 6-12; its =~/.local/bin/page-signal= symlink dangles. The account stayed registered the whole time. Verified live: signal-cli listAccounts shows +15045173983 registered; =claude mcp list= shows signal-mcp ✔ Connected. Two independent paths, opposite status: page-signal script = dead; signal-mcp tool = alive (the working pager). Caveat: signal-cli warns last *receive* 27 days ago — send unaffected, but long receive gaps can desync eventually. + +Updated the =project_signal_pager_account= memory to correct the page-signal-vs-signal-mcp distinction and kill the deregistration premise. + +** 2026-06-30 ~09:25 EDT — page-signal cleanup + pager guidance (Craig directed: proceed) +Craig confirmed the live signal-mcp page reached his phone. Directed: point pager guidance at signal-mcp for when away from laptop/desktop, clean up the dead script, commit+push when remote is reachable. + +Found page-signal already fully removed from the repo on 2026-06-12 (13256aa) — Makefile, workflow, mcp/README all clean. Only the on-disk =~/.local/bin/page-signal= symlink dangled. Root gap: the install bin loop links =claude-templates/bin/*= but never prunes orphans, so any removed script leaks a dangling symlink. Fixed durably: added a prune step to the install bin section (Makefile) that removes symlinks in =~/.local/bin= pointing into =claude-templates/bin/= whose target is gone. Ran =make install= — pruned page-signal on velox; ratio self-cleans on its next session's make install. + +Pager guidance: added a "Paging Craig — desktop vs. away" section to protocols.org (canonical), distinguishing =notify --persist= (at-machine) from signal-mcp =send_message_to_user= to the UUID (away), and explicitly retiring the page-signal script. sync-check --fix synced the mirror. + +Files modified: Makefile (prune step), claude-templates/.ai/protocols.org + .ai/protocols.org (pager section). Reviewed (/review-code --staged → Approve), /voice personal on the message. Committed a266250 (authored as Craig). + +** 2026-06-30 ~09:55 EDT — Decision: no CLI wrapper; push diagnosis (SSH alias drift) +Craig's design call: don't rebuild a page-signal-style script. The signal-mcp tool is one call for in-session agents (the only paging caller we have); a thin signal-cli wrapper gets added only if a non-session caller (hook/cron) ever needs Signal paging. protocols wording already reflects this. + +Push diagnosis: the "network down" read this session was WRONG. Internet is up, DNS works, cjennings.net resolves (IPv6). The real cause: =origin= is =git@cjennings:rulesets.git= (bare host alias =cjennings=), but =~/.ssh/config= only defines =Host cjennings.net= — no bare =cjennings= alias — so every git remote op failed to resolve the literal hostname. =git@cjennings.net:rulesets.git= works (ls-remote returned HEAD). Pushed a266250 via the explicit .net URL (clean fast-forward e4aeea6..a266250); remote main now at a266250. + +OPEN: the =cjennings= SSH alias drift is a root issue affecting every repo whose remote uses =git@cjennings:=. Durable fix is a Craig decision — restore =Host cjennings= in =~/.ssh/config= (covers all repos, dotfile-synced to both daily drivers) vs. repoint this repo's remote to =cjennings.net=. Daily-driver concern: ratio may differ. Surfaced to Craig. + +** 2026-06-30 ~10:05 EDT — SSH alias fix (Craig: option 1) + both pushes landed +Craig chose the root fix. =~/.ssh/config= is a stow symlink → canonical =~/.dotfiles/common/.ssh/config= (the ~/.dotfiles git repo). Edited the cjennings block from =Host cjennings.net / LogLevel ERROR= to =Host cjennings cjennings.net / HostName cjennings.net / LogLevel ERROR= so the bare alias maps to the real host. Verified: =ssh -G cjennings= → hostname cjennings.net, identity id_ed25519; =git ls-remote origin= (the original git@cjennings URL) returns a266250 — unbroken. + +Committed in dotfiles as 3119bbb (fix(ssh): restore the bare cjennings host alias, authored as Craig) and pushed via git@cjennings (alias works). rulesets a266250 also confirmed on remote. + +DAILY-DRIVER (ratio): ratio likely has the same alias drift. Once ratio runs =git -C ~/.dotfiles pull=, the stow symlink picks up the fix immediately (no re-stow needed if the symlink exists). Until then ratio's pushes to cjennings.net keep failing. Flagged to Craig per daily-drivers.md. + +NOTE: dotfiles repo has its own .ai/ scope + 5 pending inbox handoffs — left untouched (cross-project boundary; they belong to a dotfiles session). + +** 2026-06-30 ~10:15 EDT — ratio synced over tailscale; alias drift fully closed +Craig: do the ratio pull now, it's on tailscale (ratio = 100.71.182.1). Confirmed ratio had the same alias drift (=ssh -G cjennings= → hostname cjennings). ratio's dotfiles remote already used the .net form (=git@cjennings.net:dotfiles.git=), so a plain ff-only pull worked there — no bootstrap problem. Pulled 995f7d7..3119bbb (clean ff; ratio's untracked cross-agent-comms WIP left untouched). ratio's =~/.ssh/config= is a stow symlink, so the fix went live immediately: =ssh -G cjennings= now → cjennings.net, and =git@cjennings:rulesets.git= ls-remote returns a266250. Both daily drivers fixed. + +Page-signal arc fully closed. Replied to .emacs.d (inbox file 2026-06-30-1310-from-rulesets) and deleted both page-signal handoffs from rulesets inbox. 7 inbox handoffs remain = the 3 shared-asset proposals. + +** Reviewing shared-asset proposals in turn (Craig's direction) +1. green-baseline — DONE. Accepted with 3 changes (no-suite guard, cross-ref When You Cannot Verify, placement as start-work 0.3). Implemented in verification.md + .claude/commands/start-work.md, proposal preserved to docs/design/2026-06-29-green-baseline-proposal.org. Committed d0ab047, pushed. +4. daily-drivers tailscale correction (arrived mid-session 13:20) — DONE. Reframed daily-drivers.md from "can't reach, flag it" to "CAN reach over tailscale, sync/verify/repair directly" + a tailscale-mechanics section; my tweak: bare hostname resolves only with MagicDNS (ssh ratio worked from velox), so IP/MagicDNS is the reliable path. Note preserved to docs/design/2026-06-30-daily-drivers-tailscale-correction.org. Committed 324a52b, pushed. Replied to .emacs.d on both. +2. todo-cleanup.el Resolved-section aging — DONE. Accepted (retain 7, default ON, Craig confirmed). Applied proposed .el + tests to canonical. ADDED self-protect (Craig's gitignore rule): confirmed gitignore-mode code projects (chime/pearl/archangel/.emacs.d) all gitignore todo.org, so the archive must too or it leaks private task history to a tracked path on public repos. tc--ensure-archive-gitignored appends the archive path to .gitignore when the todo file is ignored but the archive isn't; track-mode leaves both tracked. +2 ERT tests (temp git repo, per branch). 36 todo-cleanup tests green, full make test green. Committed f67e724, pushed. Note preserved to docs/design/2026-06-29-todo-cleanup-aging-proposal.org. Replied to .emacs.d (incl. heads-up: its next --archive-done sheds backlog + auto-adds the .gitignore line). NOTE: rulesets is track-mode → its next wrap sheds ~most of 65 Resolved entries to a tracked archive/task-archive.org. +3. lint-org.el four structural heading checkers — DONE. Accepted with a fix: tightened indented-heading from one-or-more stars to TWO-or-more. The 1+ regex false-positives on valid org plain-list bullets (indented single `*` is a list bullet, not a demoted heading — reproduced: a normal `*` list flagged 3 valid bullets). `**`+ is never a bullet, so an indented one is unambiguously a demoted invisible heading. Added a negative-case test. 45 lint-org ERT green, full make test green. Committed d9d8ce7, pushed. Note preserved to docs/design/2026-06-29-lint-org-structural-checkers-proposal.org. Replied to .emacs.d. + +ALL inbox handoffs processed (inbox empty). Commits this session: a266250 (page-signal/paging), d0ab047 (green-baseline), 324a52b (daily-drivers tailscale), f67e724 (todo-cleanup aging + self-protect), d9d8ce7 (lint-org checkers); dotfiles 3119bbb (ssh alias). All pushed. + +OPPORTUNITY (now actionable): daily-drivers.md "Current open instance" wants ratio's roam clone + roam-sync timer verified — now doable directly over tailscale rather than waiting on Craig. Flagged, not yet done. diff --git a/.ai/workflows/INDEX.org b/.ai/workflows/INDEX.org index a474b29..88721ed 100644 --- a/.ai/workflows/INDEX.org +++ b/.ai/workflows/INDEX.org @@ -54,6 +54,11 @@ This index must list every =.org= file in =.ai/workflows/= except this one and e - Roam-mode triggers: "inbox zero", "empty the inbox", "process the roam inbox", "triage my roam inbox" - Auto-mode trigger: "auto inbox zero" (match before "inbox zero") +- =work-the-backlog.org= — the autonomous task-execution loop, the single home for working a batch of marked tasks unattended: takes an ordered task set (explicit list or tag query) + session mode (=file-only= default / =autonomous-commit= + paging) + a hard run cap; each candidate passes the mechanical eligibility gate (status =TODO= + =:solo:= per the project's scheme header) and the four-item defer checklist, then is implemented to the full quality bar (TDD, =/review-code=, =/voice=) as its own logical commits. Fed by the inbox auto-loop's chain step (yes-gated, file-only, cap 1) and the no-approvals speedrun preset (pre-flight Q&A → autonomous-commit + always-push + end-of-set page over an explicit ordered list). + - Speedrun triggers: "speedrun", "no approvals speedrun", "speedrun these: <task set>" — any phrase containing "speedrun" routes here (the preset), never to =no-approvals.org= + - Manual triggers: "work the backlog", "work the backlog with <task set>" (file-only defaults) + - Synthesis trigger: "synthesize backlog metrics" — read the per-project metrics logs, compute trends + the corrections signal, write one =:agent:metrics:= KB node (personal projects only) + ** Calendar - =add-calendar-event.org= — create a calendar event. @@ -117,6 +122,7 @@ This index must list every =.org= file in =.ai/workflows/= except this one and e - Triggers: "session harvest", "harvest the sessions", "let's run the session-harvest workflow", "monthly harvest", "mine the sessions" - =no-approvals.org= — drop the interaction-level approval gates for a pre-agreed batch while keeping engineering-discipline gates (=/review-code=, =/voice personal=, tests, session-log updates, subagent reviews, destructive-action consent). Mode stays on until Craig turns it off, a real question arises, the queue empties, or the conversation switches topics. - Triggers: "no-approvals mode", "no approvals", "no-approval", "no need for approval gates", "stop asking, just keep going", "I'll check back in when you're done or stuck", "do all =<selector>= with no-approval" + - Exception: any phrase containing "speedrun" routes to =work-the-backlog.org='s no-approvals speedrun preset instead * Living Document diff --git a/.ai/workflows/clean-todo.org b/.ai/workflows/clean-todo.org index dd33056..a1b2af5 100644 --- a/.ai/workflows/clean-todo.org +++ b/.ai/workflows/clean-todo.org @@ -27,7 +27,17 @@ Deletes bogus =- State "X" from "X" [date]= log lines (state didn't actually cha To preview without writing, run =--check= first: =emacs --batch -q -l .ai/scripts/todo-cleanup.el --check todo.org=. -** Step 2: Archive completed work +** Step 2: Convert done sub-tasks to dated entries + +#+begin_src bash +emacs --batch -q -l .ai/scripts/todo-cleanup.el --convert-subtasks todo.org +#+end_src + +Rewrites every heading at level 3 or deeper whose TODO state is DONE/CANCELLED/FAILED into a dated event-log entry (=<stars> YYYY-MM-DD Day @ HH:MM:SS -ZZZZ <text>=), dropping the keyword, priority cookie, and tags, and removing the =CLOSED:= line. Enforces the depth rule that a completed sub-task becomes dated history — a shape interactive org closes and =--archive-done= (level-2 only) leave unapplied. Timestamp comes from each entry's =CLOSED= cookie; heading text kept verbatim; idempotent; a done sub-task with no parseable =CLOSED= is flagged and left alone. Run before archiving so a parent's sub-tasks are already dated when it moves. Capture the output. + +To preview without writing: =emacs --batch -q -l .ai/scripts/todo-cleanup.el --convert-subtasks --check todo.org=. + +** Step 3: Archive completed work #+begin_src bash emacs --batch -q -l .ai/scripts/todo-cleanup.el --archive-done todo.org @@ -37,10 +47,11 @@ Moves every level-2 subtree whose TODO state is DONE or CANCELLED out of the "Op To preview the moves without writing: =emacs --batch -q -l .ai/scripts/todo-cleanup.el --archive-done --check todo.org=. -** Step 3: Summarize +** Step 4: Summarize -Report to Craig from the two captured outputs: +Report to Craig from the three captured outputs: - Hygiene: how many bogus state-log lines were deleted; any orphan-planning warnings (file:line + heading), or "none". +- Convert: how many done sub-tasks were rewritten to dated entries (heading + line), any flagged for no =CLOSED= date, or "nothing to convert". - Archive: how many subtrees moved and which (heading + line), or "nothing to move" / the skip reason if a section was missing or ambiguous. - If the file changed, note that =todo.org= now has an uncommitted edit — review =git diff -- todo.org= and commit it (in this repo's commit style) if it looks right. If nothing changed, say so and stop. @@ -49,7 +60,7 @@ Don't auto-commit. The summary is the review point; Craig decides whether the di * Principles - *Both passes apply, not just preview.* The workflow is invoked because cleanup is wanted. Use the =--check= variants only when Craig asks for a dry run. -- *Two passes, two invocations.* =--archive-done= is its own mode and does not run the hygiene pass; run both. +- *Separate modes, separate invocations.* =--convert-subtasks=, =--archive-done=, and the hygiene pass are each their own mode and don't run the others; run all three. - *Never auto-commit todo.org.* Surface the diff and let Craig commit it. The cleanup is a working-tree change, fully reversible until committed. - *Trust the script.* It's fast and idempotent; if there's nothing to do, it reports zero and exits clean. No pre-checks. diff --git a/.ai/workflows/inbox.org b/.ai/workflows/inbox.org index 5fc855f..b28fdaa 100644 --- a/.ai/workflows/inbox.org +++ b/.ai/workflows/inbox.org @@ -114,6 +114,14 @@ The item extends a task already filed. Update the parent TODO's body with a date ** File as TODO Substantive but waits, or needs design/triage before implementation. Add the TODO under =* <Project> Open Work= with priority + tags per the priority-scheme check (core §6). Body summarizes the proposal and links the inbox content if it's been moved to =docs/design/=. Delete the inbox file (or move it to =docs/design/= first if the content survives). +*Route-candidate marking (feeds the wrap-up router).* After filing, check whether the keeper's inferred home is a different project: + +#+begin_src bash +python3 .ai/scripts/route_recommend.py --item "<the keeper's heading + body text>" --exclude "$(basename "$PWD")" +#+end_src + +On a =<destination>\tstrong= or =<destination>\tweak= result, stamp the new TODO's property drawer with =:ROUTE_CANDIDATE: <destination>= (create the drawer if the task has none). A =none= result stamps nothing, and a local keeper stays unstamped. The marker is the wrap-up router's entire candidate set — =wrap-it-up.org= Step 3 surfaces exactly the =:ROUTE_CANDIDATE:=-tagged tasks and offers to deliver each to its destination's inbox, never scanning the standing backlog. Stamping is cheap and reversible (the router's skip leaves the task in place; a wrong marker is one property line to delete), so prefer stamping on any plausible match — the human reviews the batch at wrap time. + *Blocking-dependency handoff.* A special shape: another project sends a note that *this* project's work is blocking one of theirs ("your task X is blocked on us — we need Y"). File or link the owning task, tag it =:blocker:=, and name the requesting project in the body (see the cross-project dependency convention in =todo-format.md=). The =:blocker:= tag makes =open-tasks.org= surface that task *first*, since clearing it unblocks the other project. Dedup against an existing task rather than filing a duplicate. When the work later lands, drop =:blocker:= and notify the waiting project (=inbox-send <their-project> --text "Delivered: <what> — you're unblocked."=) so it can lift its own =:blocked:=. ** Defer @@ -453,18 +461,19 @@ Take these up when the single-destination version is in use and the multi-projec * Mode: auto inbox zero -A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop <interval>= running roam mode. It is in-session and interactive by design — each cycle reports, and a find waits for Craig's go before any work happens. +A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop <interval>= running roam mode. It is in-session and interactive by design — each cycle reports what it found and filed. ** Per cycle 1. Run roam mode's scan (Phase A local check + Phase B roam scan), read-only — no =git pull=. The capture-guard still gates any write: use =capture-guard --wait= (core §5) so a transient capture clears itself; if it's still open after the wait, *defer this cycle's roam reconcile to the next cycle* rather than surfacing — the loop cadence is the retry, and the filed items get swept next time. The rare write hands its git to =roam-sync= (roam Phase D). 2. *Nothing found* → no inbox summary. One acknowledgement line: =ran at HH:MM, nothing found=. Nothing else. The acknowledge-only-on-empty rule keeps a quiet inbox quiet. 3. *Items found* → summarize the found items, file them as tasks (roam Phase C), and *append them to a displayed queue* — the harness task list, via =TaskCreate= — so the queue accumulates across cycles. Then ask: "run this batch next?" - - *Yes* → launch into implementing the found items, each through the normal disposition ladder (core §3) + verify flow. + - *Yes* → chain into =work-the-backlog.org= as an explicit second step after routing completes: pass it the eligibility query over the queued items (status =TODO= + =:solo:= per the scheme header, priority-ordered), =file-only= mode, paging off, cap 1. The highest-priority eligible candidate runs; the rest wait for the next tick or a later yes. - *No* → they stay queued for a later go. + This mode never implements anything itself — routing ends here, and the execution loop lives in =work-the-backlog.org=, its one home. 4. *Cross-cycle dedup.* Subsequent cycles add only *newly-found* items to the same displayed queue, never re-surfacing what's already there. Dedup against the queue (the =TaskCreate= list), not against what's already been implemented — a find that was queued-but-not-yet-run must not reappear, and one already filed into =todo.org= is dropped by roam Phase C's status check. -A find is always surfaced and gated on Craig's yes; a quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because its execute step waits for a yes. +A find is always surfaced and filed; execution happens only through the =work-the-backlog.org= chain and waits for Craig's yes. A quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because its chain step waits for that yes. ** Fully-unattended pass (=/schedule=) — vNext, not v1 diff --git a/.ai/workflows/no-approvals.org b/.ai/workflows/no-approvals.org index 1efce82..9e1c894 100644 --- a/.ai/workflows/no-approvals.org +++ b/.ai/workflows/no-approvals.org @@ -22,6 +22,8 @@ Craig activates the mode with any of: - Queuing several tasks in =todo.org= followed by any phrase above - Any equivalent phrasing that signals he doesn't want to be re-asked between items +*Not this mode:* any phrase containing "speedrun" ("speedrun", "no approvals speedrun") routes to =work-the-backlog.org='s no-approvals speedrun preset — an autonomous batch over an explicit ordered task set, with a pre-flight Q&A, autonomous commits, always-push, and an end-of-set page. This mode is the general interaction-gate suspension for whatever work is already underway; the speedrun is the dedicated backlog-batch workflow. + Mode resets when: - Craig says approvals are back on diff --git a/.ai/workflows/open-tasks.org b/.ai/workflows/open-tasks.org index 4ba29dd..02a0847 100644 --- a/.ai/workflows/open-tasks.org +++ b/.ai/workflows/open-tasks.org @@ -23,15 +23,16 @@ Don't route "task review" / "review tasks" here — those trigger the hygiene ha * Phase A: Data Gathering (both modes) -** Phase A pre-step — archive any freshly-DONE tasks +** Phase A pre-step — normalize freshly-closed tasks -Before reading =todo.org=, run the cleanup script's archive-done sweep so completed level-2 subtrees move from =* $Project Open Work= to =* $Project Resolved=: +Before reading =todo.org=, run two cleanup sweeps so the read reflects current state. First convert any done sub-tasks to dated entries, then archive completed level-2 subtrees from =* $Project Open Work= to =* $Project Resolved=: #+begin_src bash +emacs --batch -q -l .ai/scripts/todo-cleanup.el --convert-subtasks todo.org emacs --batch -q -l .ai/scripts/todo-cleanup.el --archive-done todo.org #+end_src -Costs a few hundred milliseconds. Without it, a task that completed earlier in the session sits as =** DONE= under Open Work until the next =clean-todo= or wrap-up pass, and Next Mode would surface it as a "what's next" candidate. The sweep makes Phase A's read of =todo.org= reflect current state. +Costs a few hundred milliseconds. Without the archive sweep, a task that completed earlier in the session sits as =** DONE= under Open Work until the next =clean-todo= or wrap-up pass, and Next Mode would surface it as a "what's next" candidate. The convert sweep runs first so a completed parent's sub-tasks are already dated when it archives; it also keeps interactive level-3 closes from lingering as DONE keywords. Together they make Phase A's read of =todo.org= reflect current state. Skip the sweep if the workflow is invoked in an explicit read-only or dry-run context. Default is to run it. diff --git a/.ai/workflows/spec-create.org b/.ai/workflows/spec-create.org index 508b969..1249181 100644 --- a/.ai/workflows/spec-create.org +++ b/.ai/workflows/spec-create.org @@ -82,8 +82,9 @@ This is where the spec earns a "Ready" from review: an engineer must be able to ** Phase 5 — Wire it up (conventions) -- *Filename + location:* =docs/<problem-slug>-spec.org=. Org-mode. The slug names the *problem/feature*, not a date. Must end in =-spec.org=. -- *Metadata header:* a small table at the top — Status, Owner, Reviewer(s), Date, Related (link to the task/ticket). +- *Filename + location:* =docs/specs/YYYY-MM-DD-<problem-slug>-spec.org= — formal specs live in =docs/specs/=, never =docs/design/= (that's for notes, brainstorms, inventories; see =claude-rules/docs-lifecycle.md=). Org-mode. The slug names the *problem/feature*; no status suffixes ever — status lives in the file. Must end in =-spec.org=. +- *Status heading (first element after the file header):* a top-level heading carrying the lifecycle keyword, stamped =DRAFT= at authoring — spec-create owns this flip. It holds an =:ID:= UUID (generate with =uuidgen=) and dated history lines, newest first. The keyword is authoritative; the Metadata =Status= field mirrors it in lowercase. Transitions are three lines in one file (keyword + history line + mirror): spec-review flips =READY=, spec-response flips =DOING= at decomposition, the final build task flips =IMPLEMENTED=. Terminal states always record a reason. +- *Metadata header:* a small table at the top — Status (the lowercase mirror), Owner, Reviewer(s), Date, Related (link to the task/ticket). - *Review-and-iteration-history stub:* add a =Review and iteration history= section at the bottom and seed it with the author's first entry. =spec-review= and =spec-response= append provenance entries here, so the heading shape is a contract: =YYYY-MM-DD Day @ HH:MM:SS -ZZZZ — Contributor — Role=, body fields What / Why / Artifacts. - *Cross-link both ways:* the spec links its task; the task links the spec (replace the task's inline plan with a terse description + a =file:= link to the spec). @@ -103,7 +104,14 @@ Then it's ready for =spec-review.org=. Snapshot-vs-living rule: keep the spec li ,#+TITLE: <Feature> — Spec ,#+AUTHOR: <author> ,#+DATE: <YYYY-MM-DD> -,#+TODO: TODO | DONE SUPERSEDED CANCELLED +,#+TODO: TODO | DONE +,#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +,* DRAFT <spec short name> +:PROPERTIES: +:ID: <uuid — generate with uuidgen> +:END: +- <YYYY-MM-DD Day @ HH:MM:SS -ZZZZ> — drafted. ,* Metadata | Status | draft | diff --git a/.ai/workflows/spec-response.org b/.ai/workflows/spec-response.org index de5b1c8..7628e49 100644 --- a/.ai/workflows/spec-response.org +++ b/.ai/workflows/spec-response.org @@ -130,9 +130,11 @@ When related specs were reviewed together, two reviews can recommend opposite th This is the *last* step of the workflow, and it runs *only after the author confirms the spec is Ready* — never during review iterations. A Ready spec nobody can act on is unfinished; this phase turns it into tracked work. It applies to every project type (library, application, service, docs set). -1. *Decide where the tasks live.* If the work is spinning off into its own project/repo, move the parent task into that project's =todo.org= (and relocate the spec with it); otherwise use the current project's =todo.org=. One parent task owns the effort; the phase tasks hang under it. +*This phase owns the =READY= → =DOING= lifecycle flip* (docs-lifecycle convention): when the decomposition below lands, update the spec's top-level status heading keyword to =DOING=, add a dated history line, and set the Metadata =Status= mirror to =doing= — three lines, one file. -2. *Create one task per implementation phase* from the spec's =Implementation phases=, in dependency order, so the task set as a whole describes the *full* milestone (e.g. v1) with no gaps. Each task body names the deliverable, its tests, and how it is verified. Carry over deferred/vNext work and any publish/release steps as their own tasks. +1. *Decide where the tasks live.* If the work is spinning off into its own project/repo, move the parent task into that project's =todo.org= (and relocate the spec with it); otherwise use the current project's =todo.org=. One parent task owns the effort; the phase tasks hang under it. *Stamp the binding:* the parent task's =:PROPERTIES:= drawer gets a =:SPEC_ID:= line holding the spec's status-heading UUID. That property is the durable join task-audit uses to police =DOING= specs (a =DOING= spec whose bound parent is closed, archived, or missing gets flagged). + +2. *Create one task per implementation phase* from the spec's =Implementation phases=, in dependency order, so the task set as a whole describes the *full* milestone (e.g. v1) with no gaps. Each task body names the deliverable, its tests, and how it is verified. Carry over deferred/vNext work and any publish/release steps as their own tasks. *Always end the set with the flip task:* a final "flip the spec to IMPLEMENTED (+ dated history line + mirror)" task under the same parent — the tracked obligation that closes the lifecycle loop when the build finishes. Never skip it; "a human remembers" is the failure mode this exists to prevent. 3. *Turn a critical eye on completeness.* Re-read the spec — every phase, every acceptance criterion, every named deliverable, every data-safety/principle rule — and confirm each has a home in a task. The work is not done when the tasks merely exist; it is done when nothing in the spec is left untracked. This completeness pass is mandatory regardless of project type. diff --git a/.ai/workflows/spec-review.org b/.ai/workflows/spec-review.org index 833dfc9..d4998eb 100644 --- a/.ai/workflows/spec-review.org +++ b/.ai/workflows/spec-review.org @@ -50,6 +50,11 @@ Run it *early* — design review exists to catch viability problems and costly m Before Phase 1, verify the file under review ends with =-spec.org=. Every design, decision, or planning document under a project's =docs/= directory carries that suffix as its identifier. The =.org= extension alone is not enough because =docs/= holds non-spec org files too (tutorials, frozen inventories, reference material). +*Location expectation (docs-lifecycle convention).* Formal specs live in =docs/specs/=. Whether that's enforced depends on whether the project has run its one-time =spec-sort= retrofit: + +- =:LAST_SPEC_SORT:= present in =.ai/notes.org= Workflow State → the project has sorted; a =-spec.org= file outside =docs/specs/= fails this precondition. Surface it: "this spec sits outside docs/specs/ — move it (and update inbound links) before review." +- Marker absent → legacy locations (=docs/= root, =docs/design/=) stay reviewable; add one nudge line to the review output ("this project's docs pile has never been spec-sorted — say 'run spec-sort' to sort it") and proceed. No legacy spec is ever unreviewable during the transition. + If the file does not end with =-spec.org=, stop immediately and surface the mismatch: #+begin_example @@ -128,6 +133,7 @@ Work the spec against these. Each is a source of concrete findings, not a box to - *Performance & scale.* Expected counts (issues/comments/labels/teams/projects/views)? Server-side filtering where possible? Bounded, visible pagination? Cached name→ID lookups? Sync calls in the command path acceptable? Could a save hook or whole-file scan make N network calls? Rendering linear? Full-file rewrites avoided? Long-running operations async/cancellable/observable? Is concurrency/queueing/backpressure defined? Are high-output process filters throttled and cheap? Is progress/ETA exposed only when defensible, and are hung/stalled operations detectable and killable? Identify UI freezes, repeated network calls, unbounded pagination — without premature optimization. - *Security & privacy.* API keys safe? Debug logs leaking secrets or private issue text? Confirmations before mutating shared workspace objects? Personal vs shared distinguished? Local files holding sensitive descriptions/comments? Anything to redact from messages/logs? Any work-tracker integration may handle private company data. - *UX & accessibility.* Discoverable commands? Recoverable mistakes? Prompts ordered to the task? Safe, useful defaults? Informative-not-noisy status messages? Does the UI avoid implying unsupported actions are supported? Match the upstream product's permissions/concepts? Are customizations named in user language, with clear defaults and docstrings? For Emacs packages, command names, completion candidates, buffer layout, defcustom names, and message wording *are* the UX. +- *Operational-panel UI traps.* Applies when the spec covers a user-facing panel, dialog, or control surface; skip otherwise. Lists that mix saved, current, and generated items must name each item's source. Refresh or scan actions must not gate data that could be shown immediately. Add-forms must not ask the user to retype values the system already discovered. Destructive confirmations read in future tense before the action and verified-result tense after it. Diagnostics, performance, logging, and repair affordances are reviewed as one coherent flow before extra pages or buttons are added. A popup launched from a bar, tray, or tool surface should visually belong to that launcher. (Promoted from archsetup's Waybar network-panel review, 2026-06-30.) - *Test strategy and coverage.* Characterization tests before behavior changes? Pure functions to unit-test? API responses needing fixtures? Command flows needing stubs? Regression tests for prior bugs? Boundary/error cases? What's covered elsewhere and shouldn't be re-tested? Which existing tests must change? How is coverage generated, summarized, and used to find untested/refactor-worthy code? Prefer tests that lock contracts: representation shape, query compilation, sync no-op, conflict refusal, pagination, dirty-buffer protection, log redaction, and long-running/slow-operation behavior via fakes rather than flaky live dependencies. - *Observability & operations.* How does a user see what the package is doing? Progress messages for long ops? Useful, safe debug logging? Are logs structured enough to isolate issues from a bug report? Are commands provided to inspect/clear caches, test connectivity, diagnose backends/tools, copy redacted debug info, or reproduce command invocations? How are terminal states discovered: completion, failure, partial success, stalled/hung, cancelled, cleanup-unverified, and "needs user action"? Does the product notify only when useful, avoid noisy success spam, and keep non-success states visible until acknowledged? For generated org files, headers should often carry source, filter/view name, refresh time, count, truncation state. - *Comparable-product sentiment.* When there are obvious adjacent products, research what users love and hate about them from official docs plus current community reports. Do not cargo-cult their feature set; translate findings into the spec's scope. For each loved behavior, say whether the spec provides it, intentionally omits it, or defers it. For each hated behavior, say whether the spec avoids, resolves, inherits, or accepts it. @@ -166,6 +172,8 @@ Assign one label consistently: The most useful reviews move a spec from =Not ready= to =Ready with caveats= or =Ready= once decisions are captured. +*The =Ready= verdict flips the spec's lifecycle status.* spec-review owns the =DRAFT= → =READY= transition (docs-lifecycle convention): on assigning =Ready= (or =Ready with caveats= the author accepts), update the spec's top-level status heading keyword to =READY=, add a dated history line under it naming the review that passed, and set the Metadata =Status= mirror to =ready= — three lines, one file. Any other rubric label leaves the keyword where it stands (a re-review that finds new blockers on a =READY= spec demotes it back to =DRAFT= the same three-line way, with the reason in the history line). + Finding severity maps to blocking power: *high-priority findings block =Ready=* — they hold the rubric at =Not ready= (or =Ready with caveats= if the author accepts and tracks them) until dispositioned; *medium-priority findings are the author's discretion* and don't block. State the blocking status on each finding so the author running spec-response knows which ones gate the rubric. Then update the spec's review history. Specs should carry a bottom section named =Review and iteration history= (or the nearest existing equivalent) that tracks each material author/reviewer pass. Add a concise entry for this review even when the spec is ready and no findings are recorded. diff --git a/.ai/workflows/startup.org b/.ai/workflows/startup.org index 5e8f61e..9488dd0 100644 --- a/.ai/workflows/startup.org +++ b/.ai/workflows/startup.org @@ -151,7 +151,7 @@ These calls have no dependencies on each other. Issue them all together in one m 8. =[ -f todo.org ] && .ai/scripts/task-review-staleness.sh todo.org 7 || true= — count top-level tasks overdue for review (the daily task-review habit's startup nudge). The =[ -f todo.org ]= guard skips projects without a root todo.org; =|| true= keeps Phase A from failing if the script isn't synced yet. Threshold 7 days is one review cycle of slack — softer than the wrap-up health check's 30-day alarm. 9. =bash ~/code/rulesets/scripts/sync-language-bundle.sh "$PWD" 2>/dev/null || true= — language-bundle freshness for the current project. Fingerprint-detects which bundle (if any) the project has, auto-fixes drifted rulesets-owned files (=.claude/rules/*.md=, =.claude/hooks/*=, =githooks/*=), and surfaces drift in =settings.json= without writing it (a project may have customized it). =CLAUDE.md= is deliberately left untracked — it's seed-only in =install-lang= and project-owned afterward, mirroring how =diff-lang= skips it. Quiet when there's no bundle or everything's clean. Hardcodes the rulesets path because =languages/= is the canonical source and lives only there — the same absolute-path dependency the rsyncs already carry. =|| true= keeps Phase A from failing on older checkouts where the script isn't present yet. The =.ai/= rsyncs and this call write to disjoint paths (=.ai/= vs =.claude/=/=githooks/=), so the batch stays parallel-safe. 10. =[ -f "$HOME/org/roam/inbox.org" ] && grep -cE '^\*\* ' "$HOME/org/roam/inbox.org" || true= — count items in the roam global inbox (=~/org/roam/inbox.org=), the roam-mode startup nudge. Silent if the roam clone isn't on this machine. Phase C reads the file when the count is non-zero, splits total vs items related to this project, and surfaces the offer (see =inbox.org= roam mode). Read-only; never files at startup. -11. KB surface prep (the read + contribute startup nudges; see =docs/design/2026-06-16-encourage-kb-contribution-spec.org=). Gated on the agent KB clone. Counts =:agent:= nodes, lists up to 5 whose content matches the current project basename (titles only; a few most-recent nodes as a fallback when nothing matches), and resolves the best-practices node path. Read-only; silent when the clone is absent. Phase C surfaces the relevant titles (consult) and the best-practices link (contribute). +11. KB surface prep (the read + contribute startup nudges; see =docs/specs/2026-06-16-encourage-kb-contribution-spec.org=). Gated on the agent KB clone. Counts =:agent:= nodes, lists up to 5 whose content matches the current project basename (titles only; a few most-recent nodes as a fallback when nothing matches), and resolves the best-practices node path. Read-only; silent when the clone is absent. Phase C surfaces the relevant titles (consult) and the best-practices link (contribute). #+begin_src bash ra="$HOME/org/roam/agents" @@ -166,6 +166,16 @@ These calls have no dependencies on each other. Issue them all together in one m fi #+end_src +12. Spec-sort probe (the docs-lifecycle retrofit nudge; see the docs-lifecycle spec in =docs/specs/=). Read-only; prints one line when the project has an unsorted docs pile — a =docs/design/= directory or stray =docs/*-spec.org= root files — and no =:LAST_SPEC_SORT:= marker in =.ai/notes.org=. Silent for projects with nothing to sort or an already-stamped marker (the marker permanently clears it). + + #+begin_src bash + { [ -d docs/design ] || [ -n "$(find docs -maxdepth 1 -name '*-spec.org' -print -quit 2>/dev/null)" ]; } \ + && ! grep -qs ':LAST_SPEC_SORT:' .ai/notes.org \ + && echo "spec-sort: unsorted docs present" || true + #+end_src + + The stray-root check uses =find= rather than a glob so the probe behaves identically under bash and zsh (=compgen= is bash-only, and zsh aborts on an unmatched glob). + Notes on the rsync commands: - Trailing slashes on both source and destination matter — they tell rsync to sync /contents/ rather than nest a directory inside. - =--delete= on the directory syncs lets retired template files actually disappear from each project on next startup. @@ -199,6 +209,7 @@ This phase touches the user and runs sequentially: - *Roam inbox nudge.* If the Phase A roam-inbox count is greater than zero, read =~/org/roam/inbox.org=, split total vs items related to this project (claimed by the =<project>:= prefix, plus any unprefixed item whose topic plainly concerns this project), and surface one line: "Roam inbox: =<N>= total, =<M>= appear related to this project — say 'inbox zero' to file them." Offer it as a priority option; never auto-file. If the count is zero or the file is absent, say nothing. See =inbox.org= roam mode. - *KB consult nudge (read side).* If the Phase A KB-surface prep returned any =kb-relevant-titles=, surface one line listing them (capped 5): "KB lessons that may be relevant: =<title>=; =<title>=… — open the node before related work." The titles are declarative, so the list alone tells you whether to open one. Gated on the roam clone; silent when the clone is absent or nothing relevant surfaced. See the best-practices node and =knowledge-base.md=. - *KB contribute nudge (write side).* Once per session, surface one line pointing at the best-practices node (the =kb-bestpractices= path from Phase A): "Learned something durable? See =<path>= for how to write a KB node — contributing cross-project facts is welcome (personal projects only; work/unknown projects never write per =knowledge-base.md=)." Light encouragement, never a gate. Gated on the roam clone; silent when absent. + - *Spec-sort nudge.* If the Phase A spec-sort probe printed =spec-sort: unsorted docs present=, surface one line: "this project's docs pile has never been spec-sorted — say 'run spec-sort' to sort it." If the probe was silent, say nothing. A project with nothing to sort never sees the line; a stamped =:LAST_SPEC_SORT:= marker permanently clears it. See the docs-lifecycle rule and the spec in =docs/specs/=. - *Language-bundle sync.* If the Phase A step-12 call (=sync-language-bundle.sh=) printed anything, surface it. =fixed= lines are informational — the drift was already repaired (note that =.claude/= is now dirty if the project commits it). A =drift= line on =settings.json= is surface-only and needs the printed =make install-<lang> PROJECT=.= to reconcile; flag it so the user can decide. If the call was silent, say nothing. - *Newly-installed symlinks.* If the Phase A.0 =make install= step printed any =link= / =relink= / =WARN= line, surface it. A =link= line means a skill, rule, hook, or script added to rulesets is now linked into =~/.claude= for the first time on this machine. For a newly-linked *skill*, check the agent's available-skills list: if the harness already registered it mid-session, note it's available and move on; if it's absent, stop and tell Craig to restart the agent so it loads (whether a mid-session reload works is harness-version-dependent). For a newly-linked *hook*, note that the harness reads hooks at session start — it fires from the next session (or after Craig opens =/hooks= once); its settings.json wiring travels with the tracked file, so the link is usually the only missing piece. A =WARN ... not a symlink= line is a real collision at the target path — surface it; it needs a human. If the step printed only "nothing new to link", say nothing. - *Template-sync churn (safety net).* Check whether Phase A's rsync left uncommitted churn in the synced =.ai/= paths — accumulated from a prior session that crashed before wrap-up, or freshly added this session when rulesets advanced. Without surfacing, it builds up silently until it blocks Phase A.0's auto-ff (git won't ff a dirty tree). Skip in the rulesets repo itself (there =.ai/= is a committed mirror, kept honest by the pre-commit hook). The check is sequential here, after the rsync has finished — not a Phase A step, to keep that batch race-free. diff --git a/.ai/workflows/task-audit.org b/.ai/workflows/task-audit.org index 94b99da..7d2b758 100644 --- a/.ai/workflows/task-audit.org +++ b/.ai/workflows/task-audit.org @@ -61,6 +61,8 @@ For each open task, read its body and cross-check its claims against the actual - *Calendar* — did a scheduled event happen; is a SCHEDULED/DEADLINE date now past. - *Meeting recordings* — if a task hinges on "did this conversation happen / what was said," check the recording queue (e.g. =~/sync/recordings/=) and transcribe via =process-meeting-transcript.org= if the answer lives in an un-transcribed recording. (This is exactly how a "did the interview happen?" task gets resolved instead of guessed.) +*Spec lifecycle reconcile (docs-lifecycle convention).* If the project has a =docs/specs/=, run the =:SPEC_ID:= query as part of this phase: for each spec whose top-level status heading reads =DOING=, find the =todo.org= task whose =:SPEC_ID:= property matches the spec's =:ID:=. Flag the spec NEEDS-USER when that bound parent is =DONE=/=CANCELLED=, archived, or missing — the build finished (or evaporated) without the =IMPLEMENTED= flip, exactly the drift this check exists to catch. Check the parent's own keyword, not its children (completed children become dated entries and the final flip task is a child, so child-counting misleads). + Assign each task a bucket (CURRENT / STALE / NEEDS-USER) and, for STALE, the specific factual update. *Scale tactic.* For a large open-task set, dispatch read-only investigation sub-agents over batches of tasks (parallel-safe per =subagents.md= — independent read-only domains). Each returns a per-task bucket + suggested update. *Never* let sub-agents write to =todo.org= concurrently — apply all edits serially in the main thread (concurrent writes to one file race and lose work). @@ -79,7 +81,7 @@ For every STALE task, edit it in the main thread: - *Ensure priority is set per the project scheme.* The top of the project's =todo.org= should carry the priority legend (=[#A]= through =[#D]=). Every task should carry an explicit priority cookie. If a cookie is missing, or no longer matches the reconciled facts, assign the right level per the legend. If the level is unambiguous from the body, do it autonomously; if it's a judgment call (especially the [#A] / [#B] line for important-but-not-urgent work), flag NEEDS-USER. Also enforce the [#A]-discipline rule from the legend — an [#A] task without a =SCHEDULED:= or =DEADLINE:= line is mis-graded and is either down-graded to [#B] (when reconciled facts say "important but not urgent") or surfaced as NEEDS-USER for the user to date. - *Ensure a type tag is set.* Every task carries one type tag from the project's tag legend (typically =:feature:= / =:chore:= / =:spec:= / =:bug:=). If missing or wrong, assign or correct it from the body when the type is unambiguous. If two tags fit (a refactor that also fixes a bug; a spec that's also a chore), flag NEEDS-USER rather than picking one silently. - *Enforce the project's declared tag vocabulary.* If the project's tag legend declares an *exhaustive* set of allowed tags, strip from each task any tag outside that set — the heading and parent section already carry topic/scope context, so ad-hoc tags only fragment the vocabulary and defeat tag-based filtering. Normalize near-duplicate spellings to the canonical tag (a plural to its singular, say). Where the legend does not declare the set closed, leave existing tags alone; this step applies only where the allowed set is exhaustive by design. -- *Re-assess the =:quick:= and =:solo:= tags* — reconciliation can change a task's effort or autonomy: a resolved dependency may make a stuck task =:solo:=, a scope cut may make it =:quick:=, and new complexity surfaced by the sources can invalidate either. Add or remove the tags per the definitions in the project's tag legend (and [[file:task-review.org][task-review.org]]) when the reconciled facts make the call clear. When they don't — an effort estimate you can't pin down, a =:solo:= gate you can't confirm — it's a NEEDS-USER flag, not a guess. +- *Re-assess the =:quick:= and =:solo:= tags (mandatory — an audit that skips this is incomplete).* Reconciliation can change a task's effort or autonomy: a resolved dependency may make a stuck task =:solo:=, a scope cut may make it =:quick:=, and new complexity surfaced by the sources can invalidate either. Add or remove the tags per the hard definitions in [[file:../../claude-rules/todo-format.md][todo-format.md]] ("Hard definitions: :solo: and :quick:"; task-review carries the same three-gate walk). Autonomous execution reads =:solo:= as its eligibility gate and trusts the tag, so a stale one is a run-time hazard, not cosmetic drift. When the call isn't clear — an effort estimate you can't pin down, a =:solo:= gate you can't confirm — it's a NEEDS-USER flag, not a guess. - Bump =:LAST_REVIEWED:= on each edited task. Follow =todo-format.md= for completion mechanics (depth-based DONE vs dated-rewrite) and the working-files / link-hygiene rules when moving artifacts. @@ -99,6 +101,21 @@ Never merge or re-parent autonomously — which tasks belong together, and wheth When no clear cluster exists, say so in one line and move on — most audits won't find one, and forcing a merge fragments worse than it consolidates. +** Phase C.6 — Retire completed parents and promote stragglers (interactive) + +Phase C.5 consolidates related *open* tasks. This step retires parent tasks whose work is *finished*, so completed containers don't linger in Open Work as scaffolding. + +Run =todo-cleanup.el --convert-subtasks= first (it's part of the =clean-todo= / wrap-up cleanup, and =open-tasks.org= runs it too) so every completed sub-task is a dated event-log entry rather than a lingering =DONE= keyword. The closure logic below reads "open child" as a child heading still carrying a task keyword (=TODO=/=DOING=/=WAITING=/=VERIFY=/=NEXT=/=PROJECT=/=STALLED=/=DELEGATED=); a dated entry is correctly not open. + +Two shapes, both proposed to Craig (inline numbered options per =interaction.md=, no popup) before applying: + +- *Zero open children → close the parent.* A parent whose child *tasks* are all resolved (now dated) and that carries no open child task is finished: close it per =todo-format.md= (=**= parent → =DONE=/=CANCELLED= + =CLOSED:=), and it moves to Resolved on the next =--archive-done=. If the work resurfaces later, a fresh task is created then; a completed container shouldn't sit open as a placeholder. +- *One or two open children → promote, then close.* When a parent has only one or two open children, pull them out and rewrite them as standalone =**= level-2 tasks — give each a priority per the project scheme, and make the heading stand alone without the parent's context — then close the now-childless parent and let it move to Resolved. The former children become first-class Open Work tasks; the retired parent stops being scaffolding for one or two stragglers. + +*The leaf-with-notes carve-out (important).* "Zero open children" is not the same as "done." A =**= leaf task whose only descendants are dated *notes* — a captured "Ideas", "Goals", or "Current State" entry, not a real completed sub-task — is unstarted work with a note attached, not a finished container. Do not close it. Tell the two apart by intent: a container reads as a grouping (a =PROJECT= keyword, an explicit "parent grouping ..." line, or several dated entries that were genuinely separate sub-tasks that shipped); a leaf-with-notes is a single feature/bug task whose title names unstarted work and whose lone dated child is a design note. When the call is ambiguous, flag it NEEDS-USER rather than closing. + +Never close or promote autonomously past the ambiguous line — surface the candidates with a recommendation and let Craig ratify, the same interactive stance as Phase C.5. Clear container completions (a =PROJECT= whose every child is dated) can be proposed as a batch; leaf-with-notes ambiguities are flagged individually. Verify open-vs-done counts against the actual headings (a real scan of the subtree), not a fragile regex that a shell's =\b= support can silently break — a miscount here closes live work. + ** Phase D — Flag the judgment calls (interactive) Present the NEEDS-USER bucket as a short, scannable list — one line per task, naming the decision or the fact required. Adjudicate with the user one item at a time (inline numbered options per =interaction.md=, no popup). Apply the user's calls as they come (which may itself produce more autonomous updates, or new tasks). @@ -147,3 +164,7 @@ Two Phase C behaviors added, both surfaced by an Emacs-config =todo.org= audit: - *Tag-vocabulary enforcement.* That project declares a closed tag set (=bug=, =feature=, =refactor=, =test=, =quick=, =solo=); the audit had to strip ~44 ad-hoc tags that had accumulated across the file. The prior workflow only checked that a type tag was *present* — it had no concept of an exhaustive allowed set. The new bullet enforces a declared closed vocabulary and leaves open-vocabulary projects untouched. - *Code-complete-but-unverified closing.* Many tasks had shipped (tests green, live in the daemon) but stayed open awaiting a manual or visual verification, so they accumulated as half-open. Leaving them open is noise; auto-closing them would violate "never claim a fix verified before the user confirms." The fix routes the pending human check into the project's =Manual testing and validation= parent (dedup-checked) per =verification.md='s manual-verification hand-off, then closes the implementation task. The work is done and the check is tracked; a failed check promotes to a bug. + +** 2026-07-01 — Retire completed parents (Phase C.6) + +Added Phase C.6: retire a parent task once its child *tasks* are all done. Zero open children → close the parent; one or two open children → promote them to standalone level-2 tasks, then close. Surfaced by an Emacs-config =todo.org= audit where several PROJECT containers had all children complete. Depends on =todo-cleanup.el --convert-subtasks= running first so completed sub-tasks are dated (not lingering =DONE= keywords) and the open-child count is accurate. Carries a leaf-with-notes carve-out: a =**= leaf task whose only descendant is a dated design note ("Ideas"/"Goals") is unstarted work, not a finished container, and must not be closed — the ambiguous case is flagged NEEDS-USER. The step also warns against counting open-vs-done with a fragile regex (a =\b= that a given shell/awk silently drops miscounts and closes live work). diff --git a/.ai/workflows/task-review.org b/.ai/workflows/task-review.org index 69e172d..ba1571a 100644 --- a/.ai/workflows/task-review.org +++ b/.ai/workflows/task-review.org @@ -57,7 +57,9 @@ Keep is the common case — most tasks are still right and just need re-stamping *** Tagging =:quick:= — small tasks -While reviewing each task, estimate its effort. If you judge it *30 minutes or less* and it doesn't already carry =:quick:=, add the tag to the heading line. If the heading and body don't tell you how long it'll take, *ask Craig* — don't guess. A wrong =:quick:= is worse than none: the tag exists so Craig can grab a genuinely small task in a spare moment, and a mislabeled one wastes that moment. +The =:quick:= and =:solo:= assessments (this section and the next) are *mandatory* for every reviewed task except a Kill — a review that skips them is incomplete. The hard definitions live in [[file:../../claude-rules/todo-format.md][todo-format.md]] ("Hard definitions: :solo: and :quick:"); autonomous execution (work-the-backlog / the no-approvals speedrun) reads =:solo:= as its eligibility gate and trusts the author's tag, so the run-time gate is only as trustworthy as this pass. + +While reviewing each task, estimate its effort. If you judge it *30 minutes or less* and it doesn't already carry =:quick:=, add the tag to the heading line. If the heading and body don't tell you how long it'll take, *ask Craig* — don't guess. A wrong =:quick:= is worse than none: the tag exists so Craig can grab a genuinely small task in a spare moment, and a mislabeled one wastes that moment. =:quick:= is an effort hint only, never an eligibility gate — size does not decide what runs autonomously. This is orthogonal to the action chosen — a task can be kept (or re-graded, or marked DOING) *and* tagged =:quick:= in the same pass. Skip the assessment on a Kill, since it's leaving the pool. Tags go on the heading line per [[file:../../claude-rules/todo-format.md][todo-format.md]], sharing one =:tag1:tag2:= cluster. @@ -67,7 +69,7 @@ While reviewing each task, judge whether Claude could build *and* verify it with 1. *Buildable* — Claude has the capability and access to do the work. 2. *Verifiable by Claude* — an objective or local check exists that Claude can run itself. Craig's routine spot-checking does not count against this, and neither does handing off a residual human-in-the-loop confirmation as a structured manual-testing reminder (the =verification.md= "Handing Off Manual Verification" pattern). The disqualifier is having no verification path of Claude's own at all — when the success criterion is only judgeable by Craig's eyes or subjective taste. -3. *No upfront decision* — no design or preference call Craig must make before Claude can begin. +3. *No deliberation* — no open design question and no "weigh these approaches" with real tradeoffs. At most one or two *quick, upfront-answerable* factual decisions are allowed — the speedrun preset batches those into its pre-flight Q&A, so they don't break the hands-off run. A genuine design or preference call disqualifies. If any gate is shaky, leave the tag off. Like =:quick:=, a wrong =:solo:= is worse than none — it tells Craig he can hand the task off and walk away, so a mislabeled one wastes that trust. When the heading and body don't make all three gates clear, ask Craig instead of guessing. @@ -96,6 +98,8 @@ The exact date string matters: =task-review-staleness.sh= and the wrap-up health Follow the completion rules in [[file:../../claude-rules/todo-format.md][todo-format.md]]. A killed top-level =**= task stays task-shaped: change the keyword to =CANCELLED=, add a =CLOSED: [YYYY-MM-DD Day]= line under the heading (generate with =date "+%Y-%m-%d %a"=), and leave the priority and tags intact. It's then a candidate for =--archive-done= at the next cleanup. Don't stamp =:LAST_REVIEWED:= on a kill — it's leaving the review pool anyway. +A killed *sub-task* (=***= or deeper, under a parent task) instead becomes a dated event-log entry per the depth rule — but you don't have to hand-format it here. =todo-cleanup.el --convert-subtasks= (run in the =clean-todo= and wrap-up cleanup passes) rewrites any level-3+ DONE/CANCELLED/FAILED heading into its dated form mechanically from the =CLOSED= cookie, so a keyword-plus-=CLOSED= close at depth gets normalized on the next cleanup rather than lingering. =lint-org.el= flags any that slip through (checker =subtask-done-not-dated=). + * Phase D: Close out When the batch is done (or Craig calls it early): diff --git a/.ai/workflows/work-the-backlog.org b/.ai/workflows/work-the-backlog.org new file mode 100644 index 0000000..642162d --- /dev/null +++ b/.ai/workflows/work-the-backlog.org @@ -0,0 +1,263 @@ +#+TITLE: Work the Backlog +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-07-02 + +* Overview + +The single home for the autonomous task-execution loop: take a set of marked, solo-doable tasks from the project's =todo.org= and work them unattended, each held to the full quality bar, under a fixed safety contract. Spec: =rulesets/docs/specs/2026-06-16-autonomous-batch-execution-spec.org=. + +Two callers feed it, differing only in how they build the task set and which session mode they pass: + +- The *inbox auto-loop* (=inbox.org= auto mode) chains here after its routing completes, with a tag/priority query, file-only mode, cap 1. +- The *no-approvals speedrun* preset feeds an explicit ordered list with autonomous-commit + always-push + paging-on, after a pre-flight Q&A that front-loads every decision. + +This workflow owns the execution logic — eligibility gate, defer checklist, quality bar, run cap. Callers own input assembly and mode selection. Capture-routing (inbox surfaces) stays entirely in =inbox.org=; this file never reads an inbox. + +* When to Use This Workflow + +Invoked by its two callers, or directly by phrase: + +- *Speedrun triggers:* "speedrun", "no approvals speedrun", "speedrun these: <task set>" — run the no-approvals speedrun preset (below). The word "speedrun" always routes here, even when the phrase also says "no approvals": plain =no-approvals.org= is the general session mode; the speedrun is this workflow's preset over an explicit task set. +- *Loop caller:* =inbox.org= auto mode chains here after its routing (below). Not phrase-triggered. + +Manual fallback: "work the backlog" / "work the backlog with <task set>" — gather the three inputs below (ask for whichever are missing, defaulting to file-only mode; default cap is the list length for an explicit set, 1 for a query) and run the loop. + +* Inputs — the caller contract + +A caller hands this workflow three things: + +1. *A task set* — an ordered list of candidate task headings from the project's =todo.org=. Either an explicit ordered list (speedrun) or the result of a tag/priority query (the loop). The loop does not care how the set was assembled; it receives an ordered list of candidates. +2. *A session mode* — two orthogonal flags: + - *Commit autonomy:* =file-only= (default) or =autonomous-commit=. See "Commit autonomy" below. + - *Paging:* on or off. End-of-set only. +3. *A run cap* — the hard maximum number of tasks to complete this run. + +It returns a per-task outcome and a run summary. + +* Outcomes — the per-task vocabulary + +Every task in the set ends in exactly one of: + +- =implemented-committed= — implemented, committed (and pushed per the project's flow) under =autonomous-commit=. +- =implemented-diff-surfaced= — implemented, diff surfaced, *not* committed (=file-only=). +- =deferred-VERIFY= — a defer-checklist hit; a =VERIFY= filed naming what's missing or risky. +- =dropped-by-craig= — removed from the run at the speedrun pre-flight Q&A ("skip this"). +- =skipped-ineligible= — failed the mechanical eligibility gate. +- =failed= — implementation was attempted and abandoned: the tree is left working (never commit a broken state), the failure is surfaced in the run summary, and the run continues to the next task. + +The run summary lists each task with its outcome, plus the remaining set when the cap stopped the run. + +* The loop + +For the task set, in order, until the run cap is hit: + +1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task. +2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist. +3. *Defer checklist* (below). Any hit → defer: file the =VERIFY= naming the gap and record =deferred-VERIFY= (or, under the speedrun preset, route a quick-question gap to the pre-flight Q&A), next task. +4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped. If implementation fails partway, leave the tree working, record =failed=, surface it, and continue to the next task. +5. *Commit autonomy branch:* + - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=. + - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=. +6. *Record metrics* for the task (the JSONL append — see Metrics below). +7. Decrement the cap. At zero, stop. + +After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary either way. + +* Eligibility gate — mechanical, no judgment + +A task is autonomous-safe when *both* hold. This layer is a lookup, not a judgment; all the judgment lives in the defer checklist. + +1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= marks "awaiting Craig's input"; auto-implementing one defeats the check it represents. The do-not-implement set is safe-by-omission: anything not plainly =TODO= (plus any project-declared "hold" marker) is out. +2. *Tagged =:solo:=* — the autonomy tag, resolved against the project's priority/tag scheme header in =todo.org= (never hardcoded). =:solo:= carries the hard definition in =todo-format.md=: completable and verifiable without Craig beyond at most one or two quick decisions answerable up front, no design deliberation. A project whose scheme declares a different autonomous-safe tag set overrides the default. + +Priority and =:next:= drive *ordering* within the eligible set, not eligibility ([#A] before [#B] before [#C], then the author's ordering). =:quick:= is an effort hint for batching and duration estimates — never a gate. + +Task *size* is deliberately absent from this gate. A large but well-specified, decision-free task is in scope and gets decomposed into per-logical-commit chunks during implementation. Size never sends a task away; only *deliberation* or *risk* does (the checklist below). + +*No scheme header → don't run.* The gate reads =:solo:= semantics from the project's scheme header; a =todo.org= without one leaves the tag undefined (=todo-format.md= makes the header mandatory). Surface that the header is missing and stop rather than guessing eligibility. + +* The defer checklist — act vs file + +After the scope read, run each eligible candidate through the checklist. Each item is a concrete, answerable question, not an adjective. *Any* hit — or any "unsure" — defers the task. Only a task that clears every item is implemented. + +1. *Test-writability (the keystone).* Can I write the failing test from the task text — plus any decisions gathered up front — without inventing a requirement? *No / unsure* → underspecified. Under the speedrun preset, if the gap is one or two quick answerable questions, route it to the pre-flight Q&A; otherwise file a =VERIFY= noting what's missing. Under the unattended loop, file the =VERIFY= (no one to ask). +2. *Data-loss / irreversible / external operation.* Does implementing it require any of: =rm= of non-scratch data, =git reset --hard= / force-push, =DROP= / =DELETE= / =TRUNCATE=, file truncate/overwrite of persisted content, a schema or data migration, any external or shared-state mutation, any credential touch? *Yes* → do NOT implement; file a =VERIFY= naming the risk. This is the hard safety gate; an upfront answer never overrides it without an explicit checkpoint. +3. *Already-satisfied.* Does the scope read show the desired end-state already holds? *Yes* → file a =VERIFY= noting it and move on. Don't make a no-op change. +4. *Design deliberation.* Does the task carry an unresolved design question, a "weigh these approaches" with real tradeoffs, or a TBD that isn't a quick factual answer? *Yes* → under the speedrun preset, if it collapses to one or two quick questions, route to the pre-flight Q&A; otherwise file and surface as a =/start-work= candidate. Under the loop, file. The discriminator is *quick-answerable question* vs *deliberation* — never task size. + +When genuinely unsure which side a task falls on, defer — a wrong auto-implement costs a revert *and* the next-session correction. + +** Filing the deferral =VERIFY= + +Every checklist hit files a =VERIFY= in the project's =todo.org=, per =todo-format.md='s VERIFY rules: + +- *Dedup first.* If a =VERIFY= sibling for this deferral already exists (a prior run filed it), don't file another — record the outcome as =deferred-VERIFY= with a "previously filed" note and move on. The deferred task keeps its =TODO= status and tags, so without this check every subsequent run would re-defer and re-file. +- *Placement:* sibling of the deferred task (the deferred task is the trigger) — a =**= task gets its =VERIFY= at =**=, a =***= sub-task gets it at =***= under the same parent, never deeper. +- *Heading:* carries the question or risk on its own ("VERIFY <topic> — migration touches persisted rows"). +- *Body:* which checklist item hit, what's missing or risky, and what answer or action would make the task runnable. For an already-satisfied hit, the evidence that the end-state already holds. + +** Routing a quick-question gap (speedrun only) + +Under the speedrun preset, a checklist-1 or checklist-4 hit that collapses to one or two quick answerable questions routes to the pre-flight Q&A instead of deferring (see the preset section below). The discriminator: a *quick question* is a factual or preference pick answerable in one line without weighing tradeoffs ("cap at 5 or 8?", "which config key name?"); *deliberation* is anything that needs tradeoffs weighed, options explored, or code read by Craig. A task needing three or more questions isn't quick-question-gapped — it's underspecified; file the =VERIFY=. Checklist item 2 (data-loss / irreversible) never routes to the Q&A: an upfront answer doesn't override the hard safety gate. + +The unattended loop has no one to ask — every hit defers there. + +* Per-task quality bar + +Autonomy changes who approves, not what quality means. Per task, non-negotiable: + +- *TDD* per =testing.md=: red first, green, refactor. The keystone checklist item already proved the failing test is writable. +- *Verification* per =verification.md=: fresh evidence, full suite green before any commit. +- *=/review-code --staged=* before every commit; Critical and Important findings block until fixed. +- *=/voice personal=* on every commit message on the =autonomous-commit= path (or the patterns walked inline if the skill is unavailable), message printed inline so the log shows what landed. +- *Task closure* per =todo-format.md=: depth-based completion (keyword + =CLOSED:= at level 2, dated rewrite at level 3+). +- *One logical change per commit.* A large task becomes several commits, not one omnibus. + +* Commit autonomy + +=file-only= is the default: surface the diff, never commit. =autonomous-commit= is honored only when the project carries the commit-autonomy waiver, read fresh each run — never from memory of past runs or "this project usually allows it." + +The waiver lives in the project's =.ai/notes.org= *Workflow State* section as marker lines, the same shape as the workflow markers already there: + +#+begin_example +:COMMIT_AUTONOMY: yes +:LOOP_MAY_COMMIT: yes +#+end_example + +- =:COMMIT_AUTONOMY: yes= — the project has the waiver. An =autonomous-commit= request (the speedrun preset, or a manual run asking for it) is honored. +- =:LOOP_MAY_COMMIT: yes= — the *unattended loop caller* may also commit. It requires =:COMMIT_AUTONOMY:= alongside it; the split exists because "Craig-initiated speedrun may commit" and "the recurring loop may commit unattended" are different levels of trust. Without this flag the loop stays =file-only= even when the project holds the waiver. + +An absent marker means no. Anything other than a plain =yes= value also means no. The read is one grep of the Workflow State section — a lookup, not a judgment. + +*The degrade contract.* When a caller requests =autonomous-commit= and the required marker is missing, degrade to =file-only= and surface it in both the run intro and the run summary: "autonomous-commit requested, no :COMMIT_AUTONOMY: waiver in notes.org — running file-only." Never honor the request without the marker, and never drop to file-only silently — the first commits into a project that didn't opt in, the second hides why nothing got committed. + +* Bounding the run + +The cap is a hard per-run task ceiling passed by the caller — the kill switch a runaway can't exceed: + +- *Loop caller default: 1.* Implement the highest-priority eligible candidate, record, stop; the next tick continues. +- *Speedrun: the length of the explicit list*, capped at a ceiling — the human bounded the set by naming it. + +Even the speedrun stops at the cap and surfaces (and, with paging on, pages) the remainder. The cap bounds task *count*, not cost; a token budget is logged as vNext. + +* Context hygiene — auto-flush between tasks + +Task boundaries are clean boundaries by construction: the previous task is closed and committed (or filed), nothing is half-edited. When the context window grows heavy mid-run, run the flush skill's *auto mode* between tasks: checkpoint the session anchor with the remaining task set, session mode, and cap in Next Steps (so the resumed context continues the run blind), arm the self-injection (=.ai/scripts/self-inject.sh= via =tmux run-shell -b=), and end the turn. The fresh context resumes from the anchor and works on. Unattended runs only — the keystroke-collision hazard and the full mechanism live in the flush skill. + +* End-of-set page + +With paging on, fire one page when the set is done or the cap is hit — end-of-set only, never per-task: + +#+begin_src sh +notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>" --persist +#+end_src + +=--persist= keeps it on screen until dismissed (the page-me convention). The page fires when the set completes *or* the cap stops the run — either way exactly once. The message carries the project name, the completed count, and the remaining count (with skipped tasks noted in the run summary) so Craig can confirm ready and name the next project in one reply. There is no separate page-signal call — =notify= is the paging surface. + +* Metrics + +Each task outcome appends one JSON line to the project's =.ai/metrics/work-the-backlog.jsonl= — git-tracked, append-only, =jq=-queryable. Create the directory and file on the first append. Logging is a side effect only: a failed append surfaces a warning in the run summary but never blocks, reorders, or aborts execution. + +One record per task, written at the moment its outcome is decided: + +| Field | Meaning | +|--------------------+-------------------------------------------------------------------------------------------------| +| =ts= | ISO-8601 timestamp of the task outcome | +|--------------------+-------------------------------------------------------------------------------------------------| +| =run_id= | UUID shared by every record in one run (=uuidgen= at run start) | +|--------------------+-------------------------------------------------------------------------------------------------| +| =project= | project basename | +|--------------------+-------------------------------------------------------------------------------------------------| +| =caller= | =loop= / =speedrun= / =manual= | +|--------------------+-------------------------------------------------------------------------------------------------| +| =task= | the task heading (slug) | +|--------------------+-------------------------------------------------------------------------------------------------| +| =outcome= | =implemented-committed= / =implemented-diff= / =deferred-verify= / =skipped-ineligible= / | +| | =dropped-by-craig= / =failed= | +|--------------------+-------------------------------------------------------------------------------------------------| +| =defer_reason= | =underspecified= / =data-loss= / =already-satisfied= / =needs-deliberation= — set on | +| | =deferred-verify= records only | +|--------------------+-------------------------------------------------------------------------------------------------| +| =upfront_decision= | =true= when a pre-flight answer was recorded and used for this task | +|--------------------+-------------------------------------------------------------------------------------------------| +| =wall_clock_s= | seconds from task start to outcome | +|--------------------+-------------------------------------------------------------------------------------------------| +| =commit_sha= | committed tasks: the commit SHA (comma-separated when the task decomposed into several); empty | +| | otherwise | +|--------------------+-------------------------------------------------------------------------------------------------| +| =review_findings= | count of =/review-code= Critical + Important findings on this task | +|--------------------+-------------------------------------------------------------------------------------------------| + +The =outcome= slugs map one-to-one onto the outcome vocabulary above (=implemented-diff= is =implemented-diff-surfaced=; =deferred-verify= is =deferred-VERIFY=). Per-run rollups (attempted / completed / deferred / dropped, wall-clock total, findings per commit) are computed at synthesis, not stored per record. The =commit_sha= field is what the synthesis step's corrections signal keys on — whether a later commit reverted or hand-fixed an autonomous one — so never omit it on a committed task. + +* Caller: the inbox auto-loop + +=inbox.org= auto mode chains here as an explicit second step *after* its routing completes — never as a phase inside inbox processing. When a cycle files new items and Craig answers "run this batch next?" with yes, auto mode invokes this workflow with: + +- *Task set:* the eligibility query over the queued/filed items — status =TODO= + =:solo:= per the scheme header, priority-ordered. +- *Session mode:* =file-only=, paging off. (A project carrying both =:COMMIT_AUTONOMY:= and =:LOOP_MAY_COMMIT:= markers opts the loop into commits — see Commit autonomy above.) +- *Cap: 1.* The highest-priority eligible candidate runs, gets recorded, and the loop's next tick (or the next yes) continues from there. + +The loop has no human at kickoff of each task, so a needs-quick-decisions task defers with a =VERIFY= — the pre-flight Q&A is a speedrun capability, not a loop one. Startup and wrap-up never invoke this workflow. + +* Preset: the no-approvals speedrun + +The named preset is a label for one flag combination, not a second code path: *explicit ordered list + =autonomous-commit= + always-push + paging-on*, with every approval front-loaded into a single pre-flight step. "No approvals" means all input first, then hands-off — not no input ever. =autonomous-commit= still requires the =:COMMIT_AUTONOMY:= waiver (Commit autonomy above); without it the preset degrades to =file-only= and says so in the pre-flight intro. + +When Craig names a task set and says "speedrun": + +1. *Gather* the named task set. +2. *Scope-read and classify* each task against the eligibility gate + defer checklist: *ready* (clears everything), *needs-quick-decisions* (one or two upfront-answerable questions — checklist item 1 or 4), or *drop* (data-loss/irreversible, or deliberation that isn't a quick question). +3. *Order* the list — priority, then the author's ordering / =:next:=. +4. *Intro the work* — present the ordered plan: what will run, what was dropped and why, and the batched questions for the needs-quick-decisions tasks. +5. *Craig answers each question or says "skip this"* — a skip removes the task (recorded =dropped-by-craig=; the task itself stays =TODO=); an answer is recorded so implementation works from the decision, not a guess. +6. *Run the finalized list autonomously* — no further approvals until done. Cap = the list length (the human bounded the set by naming it), still one commit per logical change, always-push per the project's flow, auto-flushing between tasks when the context grows heavy (see Context hygiene above). +7. *End-of-set page* with completed + remaining + skipped. + +The batch-ask (step 4-5) is one message: each question names its task, puts the recommended answer at item 1 when there is one (per =interaction.md= — inline numbered, no popup), and offers "skip this" as the last option. Before the run starts, write each answer into its task's body in =todo.org= as a dated line — the implementation works from the recorded decision, and the record survives the session. The Q&A fires only under this preset; the loop caller never asks (its decision-needing tasks defer). + +*** Per-item disposition rule + +For every item the run picks up (this holds for any executing caller, including an auto-inbox-zero run given a standing yes): + +- *Feature-level task* → write a spec first (=spec-create=), don't implement directly. The spec is the run's deliverable for that item. +- *Needs decisions you can't confidently guess* → file it as a =VERIFY= carrying the question (under this preset, one or two quick questions route to the pre-flight Q&A instead). +- *Well-defined* → implement it, taking the time it needs. + +This extends the defer checklist: the checklist decides *act vs file*; this rule decides the *shape* of the act. + +* Synthesis: metrics → org-roam KB + +Trigger: "synthesize backlog metrics" (optionally a weekly scheduled run). This is the read side of the metrics log — Craig's ask was "gather data and create org-roam articles we can look at later," and this step is the second half. It is read-only over the logs plus exactly one KB write. + +1. *Gather the JSONL union.* Discover =.ai/metrics/work-the-backlog.jsonl= across the project roots (dirs carrying =.ai/protocols.org= under =~/code=, =~/projects=, =~/.emacs.d=). Classify each project per =knowledge-base.md= (work-root denylist, never inference) before reading it into the union. +2. *Enforce personal-only.* A work-classified or unknown project's metrics never enter the KB write — they stay in that project's own log. Report the exclusion per the KB refusal contract: the classification, a one-line redacted summary, and where the data stayed. +3. *Compute the rollups and trends.* Per run: attempted / completed / deferred (by reason) / dropped / failed, wall-clock total, commits landed, review findings per commit. Trends across runs: completion rate over time, defer-reason distribution, findings-per-commit trend. +4. *Compute the corrections signal* — the key metric. For each =commit_sha= in the window, check that project's history for a later commit (within ~14 days) that reverts it or carries a fix touching the same files. A clean run is one whose autonomous commits survive untouched; a flagged run is what Craig reviews by hand. This is a cheap proxy, not proof — it flags candidates, it doesn't convict. +5. *Write one KB node* at =~/org/roam/agents/YYYYMMDDHHMMSS-backlog-metrics-<window>.org= per =knowledge-base.md=: =:agent:metrics:= filetags, a concise title, the rollup table, the trend narrative, and =[[id:...]]= links to prior synthesis nodes so the series is traceable. Pull before writing, commit and push after — the normal KB session discipline. + +The KB node is the artifact Craig reads later: "are the runs completing more and getting corrected less?" should read off the trend table without touching raw logs. Synthesis never mutates the JSONL, todo.org, or any project tree. + +* Common Mistakes + +1. *Implementing a =VERIFY= or =DOING= task.* The gate is status =TODO= only — a =VERIFY= exists precisely because Craig's input is pending. +2. *Treating =:quick:= as eligibility.* It's an effort hint. =:solo:= is the gate. +3. *Deferring on size.* A large, well-specified, decision-free task runs — decomposed into logical commits. Size is not a checklist item. +4. *Guessing past the keystone.* If the failing test isn't writable from the task text, the task isn't ready. Inventing the requirement is the failure the checklist exists to stop. +5. *Rationalizing through the data-loss list.* "The migration is small" doesn't clear checklist item 2. Enumerated operations defer, full stop. +6. *Committing in =file-only= mode.* The diff is the deliverable; the commit is Craig's. +7. *One omnibus commit for the whole run.* Every logical change is its own reviewed commit. +8. *Skipping =/review-code= or =/voice= because nobody's watching.* Autonomy removes interaction gates, never engineering-discipline gates (same contract as =no-approvals.org=). +9. *Running past the cap.* The cap is the kill switch; hitting it means stop and surface, even mid-set. +10. *Paging per-task.* One page, end of set. +11. *Honoring =autonomous-commit= from memory.* The waiver is the marker line in =notes.org=, read fresh each run. "This project usually allows it" isn't a read. +12. *Re-filing the same deferral =VERIFY= every run.* The deferred task stays =TODO=, so a run that skips the existing-sibling check spams =todo.org= with duplicates. +13. *Routing a data-loss hit to the pre-flight Q&A.* Checklist item 2 is the hard gate — an upfront answer never clears it without an explicit checkpoint. + +* Living Document + +Refine as the dogfooding signal arrives — the metrics log and the corrections-in-next-session signal are the feedback loop. Fold recurring adjustments in rather than accumulating caller-side workarounds. + +* History + +Created 2026-07-02 as Phase 1 of the autonomous-batch execution spec, reconciling the inbox-zero "Phase E" proposal and the =.emacs.d= speedrun proposal into one execution loop. The auto-inbox-zero execute step in =inbox.org= reverted to routing-only in the same change so this file is the loop's only home. Phases 2-6 (same day) wired both callers, pinned the commit-autonomy waiver markers, fleshed the defer/Q&A/page mechanics, and added the metrics record + KB synthesis step. diff --git a/.ai/workflows/wrap-it-up.org b/.ai/workflows/wrap-it-up.org index 5d2cdd2..d0c4e75 100644 --- a/.ai/workflows/wrap-it-up.org +++ b/.ai/workflows/wrap-it-up.org @@ -137,6 +137,22 @@ Run the report-only variant first if you want to see what would change without w emacs --batch -q -l .ai/scripts/todo-cleanup.el --check todo.org #+end_src +*** Convert done sub-tasks to dated entries + +#+begin_src bash +[ -f todo.org ] && emacs --batch -q -l .ai/scripts/todo-cleanup.el --convert-subtasks todo.org +#+end_src + +=--convert-subtasks= rewrites every heading at level 3 or deeper whose TODO state is DONE/CANCELLED/FAILED into a dated event-log entry (=<stars> YYYY-MM-DD Day @ HH:MM:SS -ZZZZ <text>=), dropping the keyword, priority cookie, and tags, and removing the now-redundant =CLOSED:= line. This enforces the =todo-format.md= depth rule that a completed *sub-task* (a heading under a parent task) becomes dated history, not a lingering DONE keyword — a shape an interactive org close (=org-log-done= → DONE + CLOSED) never applies and =--archive-done= (level-2 only) never reaches. The timestamp comes from each entry's own =CLOSED= cookie; a date-only close yields =00:00:00=. Heading text is kept verbatim. Idempotent (an already-dated heading has no keyword to match), and a done sub-task with no parseable =CLOSED= is flagged and left alone rather than stamped with a fabricated date. + +Run this *before* =--archive-done= so that when a completed level-2 parent is archived, its sub-tasks already carry their dated form. Any rewrites show up in the wrap-up commit's diff for review before push. + +Preview without writing: + +#+begin_src bash +emacs --batch -q -l .ai/scripts/todo-cleanup.el --convert-subtasks --check todo.org +#+end_src + *** Archive completed work #+begin_src bash @@ -244,6 +260,32 @@ The check exempts =lint-followups.org= explicitly because lint-org runs earlier This integrates with =inbox.org= process mode, which stamps =:LAST_INBOX_PROCESS:= in =notes.org='s *Workflow State* section on completion. Wrap-up doesn't double-stamp. It only ensures the inbox carries nothing but the expected pipeline artifacts at session end. +*** Cross-project router (optional — route filed keepers to their home projects) + +Runs directly after the inbox sanity check. The split between the two: the sanity check *gates* the wrap (a dirty inbox blocks until resolved); the router is *optional* (skipping it never blocks anything — the candidates just stay local until a future wrap). Spec: =docs/specs/wrapup-routing-spec.org= (D7/D8/D9). + +The candidate set is exactly the local tasks carrying a =:ROUTE_CANDIDATE:= property — keepers that inbox process mode filed this session whose inferred home is another project. Never scan the standing backlog. + +#+begin_src bash +.ai/scripts/route-batch --list +#+end_src + +*Empty set = zero interaction.* =--list= prints nothing when there are no candidates; continue the wrap silently — no prompt, no "0 items" line. + +When candidates exist, surface the batch as one line per task — the task heading, the destination project, the delivery mode (=inbox-send= file handoff), and the engine's confidence — then offer exactly two options: *go* (route the whole batch) or *skip* (leave everything local). Derive each confidence label by running the engine on the task's heading + body (=python3 .ai/scripts/route_recommend.py --item "..." --exclude "$(basename "$PWD")"=); label weak matches visibly ("weak — verify the destination") so a low-confidence route gets a human glance before the keystroke. + +On *go*: + +#+begin_src bash +.ai/scripts/route-batch --go +#+end_src + +Per candidate, the helper writes the task's subtree (children ride along; =:ROUTE_CANDIDATE:= stripped, headings promoted to top level) to a one-task handoff, delivers it via =inbox-send <destination> --file= (so the =from-<this-project>= provenance is stamped and the destination's inbox process mode dispositions it as a single item), and only after a successful send removes the subtree from the local =todo.org= — a single-file local edit the wrap is already committing. A failed send leaves that task in place and exits non-zero; report it and continue the wrap. Never write the destination's =todo.org= directly; its own inbox processing files the task per its conventions. + +On *skip*, leave every candidate in place, marker included — they resurface next wrap. + +Mis-routes are recoverable: the receiving project rejects via inbox process mode's reject-from-another-project flow, which returns the item to this project's inbox with the rationale. That reject path is why removing the local source on send is safe. + *** Review-habit health check (surface a slipped daily task-review) The daily task-review habit walks the open top-level tasks on a rotating cycle, stamping =:LAST_REVIEWED:= as it goes (see =task-review.org=). This check is the watchdog for that habit. When tasks have gone too long unreviewed, the habit has slipped, and the wrap-up says so in one line — it does not re-list the tasks. @@ -536,7 +578,7 @@ Before considering wrap-up complete: - [ ] The Summary ends with the =KB: promoted N / consulted yes-no= line (promotion check ran) - [ ] File renamed to =.ai/sessions/YYYY-MM-DD-HH-MM-description.org= - [ ] =.ai/session-context.org= no longer exists -- [ ] =todo-cleanup.el= ran — hygiene pass + =--archive-done= + =--sync-child-priority= (if =todo.org= exists at project root) +- [ ] =todo-cleanup.el= ran — hygiene pass + =--convert-subtasks= + =--archive-done= + =--sync-child-priority= (if =todo.org= exists at project root) - [ ] =lint-org.el= ran on =todo.org= — mechanical fixes applied, judgments appended to follow-ups file (if =todo.org= exists) - [ ] Any orphan-planning-line warnings reviewed (fix or accept) - [ ] Inbox carries nothing but expected pipeline artifacts (=.gitkeep=, =lint-followups.org=, =PROCESSED-*= prefixes), OR each remaining handoff has an explicit deferral logged in the valediction diff --git a/.claude/settings.json b/.claude/settings.json index 33ed7e6..6bd53fd 100644 --- a/.claude/settings.json +++ b/.claude/settings.json @@ -7,6 +7,7 @@ "permissions": { "defaultMode": "bypassPermissions" }, + "model": "fable", "hooks": { "PreToolUse": [ { diff --git a/archive/task-archive.org b/archive/task-archive.org new file mode 100644 index 0000000..9d2bfc6 --- /dev/null +++ b/archive/task-archive.org @@ -0,0 +1,1666 @@ +#+TITLE: Task Archive +#+FILETAGS: :archive: + +* Resolved (archived) +** DONE [#C] Fix =cj-scan= false positives on cj fences nested inside other =#+begin_*= blocks :bug: +CLOSED: [2026-05-15 Fri] + +=cj-scan.py= was matching =#+begin_src cj:= / =#+end_src= line-by-line +without awareness of enclosing block scopes. A cj fence embedded inside a +=#+begin_example= block (typically when documenting what the =<cj= yasnippet +emits) or inside =#+begin_src snippet= (the yasnippet definition itself) was +misclassified as a live cj annotation. Surfaced from a /respond-to-cj-comments +run against the dotemacs =todo.org= that reported two false positives in the +=<cj= yasnippet documentation. + +Fix: track an active =wrapper_type= state. When the scanner sees =#+begin_<type>= +(for any =<type>= other than =cj:= via the more-specific cj-open regex, which +is checked first), it enters a wrapper state where every line is treated as +content until the matching =#+end_<type>= closer fires. Inside a wrapper, cj +fence patterns and legacy inline =cj:= lines are both suppressed. + +Tests: added =TestCjScanNestedFencesIgnored= (6 tests) to +=claude-templates/.ai/scripts/tests/test_cj_scan.py= covering nesting inside +=#+begin_example=, =#+begin_src <other-lang>=, and =#+begin_quote=, plus +regression guards that a wrapper closes cleanly (a subsequent real cj fence +is still detected) and that an unclosed wrapper doesn't silently swallow +later content into false-positive cj blocks. + +Full =make test-scripts= equivalent (=python3 -m pytest=): 302 passed, 1 +skipped, 0 failures. +** DONE [#A] Add =make doctor= — verify ~/.claude/ matches repo + settings.json :feature: + +A drift detector that scans =~/.claude/= and reports anything inconsistent with what the repo expects. Single-command answer to "is my machine consistent with rulesets?" + +*** Why this matters + +A 2026-05-06 sweep found =~/.claude/hooks/= didn't exist on this machine even though =settings.json= referenced =~/.claude/hooks/precompact-priorities.sh= as a PreCompact hook. Compaction would have silently failed to invoke the hook. The fix was =make install-hooks=, but the breakage was invisible until I happened to grep for it. =make doctor= run regularly (or even as part of session start) would catch this kind of drift in seconds instead of after the fact. + +*** Checks + +- Every entry in =settings.json= ="hooks"= block points at a file that exists. +- Every entry in =enabledPlugins= has a matching install under =~/.claude/plugins/data/=. +- Every skill in =$(SKILLS)= has a working symlink at =~/.claude/skills/<name>=. +- Every rule in =$(RULES)= has a working symlink at =~/.claude/rules/<name>=. +- Every default hook has a symlink at =~/.claude/hooks/<name>= (warn-only — opt-out is legitimate). +- =settings.json= and =.mcp.json= symlinks resolve to the rulesets versions. +- =mcp/install.py= state matches =claude mcp list= (every server in =servers.json= is registered). +- No dangling symlinks anywhere under =~/.claude/=. + +*** Output + +One line per check: =ok= / =WARN= / =FAIL=. Final summary: =N ok, M warnings, K failures=. Exit non-zero on any failure so it can ride a pre-flight check. +** DONE [#A] Build =voice= skill — combine =humanizer= with universal + personal style passes :feature: + +Combine =humanizer= with universal good-writing passes (Strunk & White, Orwell, Plain English) and the personal-style passes from =commits.md=. Two modes — =general= for arbitrary writing, =personal= for commits/PRs/comments — share a foundation and diverge on register. + +Built and shipped 2026-05-07: =voice/SKILL.md= with 39 numbered patterns walked sequentially. Patterns 1-25 carried over from humanizer, 26-31 are universal good-writing additions, 32-39 are personal-only. Migrated three callers (=commits.md=, =respond-to-cj-comments.md=, =start-work.md=). Removed the standalone =humanizer= skill since voice supersedes it. + +*** Why this matters + +Three transformations want to run together for personal-mode artifacts (commits, PR titles + bodies, PR comments) but lived in three places: =humanizer= as a skill, S&W-style universal rules nowhere (applied ad-hoc), and the personal-style passes as prose steps in =commits.md= that got re-applied by hand each time. Costs: (1) the "I forgot pass (e)" failure mode — skipping a pass without flagging is a defect but happens in practice. (2) No single-call invocation of the full transform. (3) General-mode writing (research notes, philosophy, history) got only humanizer with no universal-prose pass at all. Combining brings them under one skill with one invocation. + +*** Design + +Two modes: + +- *general* (default) — for arbitrary writing not bound for commit/PR/comment publishing (research notes, philosophy/history essays, emails, README prose). Runs: + - humanizer (current behavior — strip AI-generated-writing fingerprints) + - tier-1 universal passes (canonical good-writing rules) + - the 2 personal-style passes that have no register conflict (jargon-fragment rewrite, noun-ified verbs) + +- *personal* — for commits, PR titles + bodies, PR comments. Runs general PLUS: + - 8 personal-only passes (first-person rewrite, semicolons, contractions, sentence-split, felt-experience, sentence fragments, terse cut, public-artifact scope check) + +The 8 personal-only passes are explicitly *not* in general mode. They conflict with academic / literary / philosophical register. Forcing first-person on a Foucault essay or stripping felt-experience from a journal entry would damage the writing. + +*** Tier 1 universals (v1) + +From Strunk & White, Orwell's "Politics and the English Language", Plain English Campaign, and Garner's Modern English Usage. Each is a detection-pattern + rewrite-rule pair, mechanical enough to apply consistently across runs. + +- *Omit needless words* — curated phrase list (=the fact that= → =that=/=because=, =in order to= → =to=, =at this point in time= → =now=, =due to the fact that= → =because=, =for the purpose of= → =to=, =in spite of= → =despite=, etc.) +- *Long word → short word* — Plain English wordlist (~150 entries: =utilize=→=use=, =commence=→=start=, =terminate=→=end=, =facilitate=→=help=, =demonstrate=→=show=, =sufficient=→=enough=, =prior to=→=before=, =subsequent to=→=after=, =in the event that=→=if=, =a great deal of=→=much=) +- *Active over passive voice* — detect "to be + past-participle" patterns. Suggestion-only in v1 (auto-rewrite is risky in technical contexts where passive is appropriate); graduate to auto-rewrite for unambiguous cases in v2. +- *Comma splices* — detect independent clauses joined only by comma; rewrite to period or semicolon-then-period. +- *Cliché flag* — small curated list (=at the end of the day=, =moving forward=, =going forward=, =at this juncture=, =circle back=, =low-hanging fruit=, =deep dive=, =leverage= as verb). + +*** Tier 2 universals (v2) + +- *Positive over negative form* (S&W) — =not unlike= → =like=, =do not fail to= → =remember to=, =did not pay any attention= → =ignored= +- *Garner-style word-pair corrections* — comprise/compose, less/fewer, that/which (restrictive vs nonrestrictive), affect/effect, principal/principle +- *Parallelism in lists* — detect mismatched grammar in bullet items +- *Tense consistency* — flag mid-paragraph tense shifts +- *Acronym definition on first use* — detect uppercase tokens used before being expanded + +*** Tier 3 (v3, may not land) + +- *Concrete-over-abstract* preference +- *Emphatic word at sentence end* (S&W rule 18) +- *Vary sentence length / rhythm* +- *Reading-grade-level scoring* (Hemingway-style) + +*** Personal-style pass placement + +| # | Pass | Mode | Why | +|----+-------------------------------------+-------------------------------------+-------------------------------------| +| 1 | First-person voice rewrite | personal only | Forces "I" voice; wrong for | +| | | | academic prose where third-person | +| | | | and "we" are conventional | +|----+-------------------------------------+-------------------------------------+-------------------------------------| +| 2 | Jargon-fragment → complete sentence | both | Universal clarity, no genre | +| | | | conflict | +|----+-------------------------------------+-------------------------------------+-------------------------------------| +| 3 | Semicolon → period/comma | personal only | Semicolons are conventional in | +| | | | long-form / academic prose | +|----+-------------------------------------+-------------------------------------+-------------------------------------| +| 4 | Contractions ("it's", "don't") | personal only | Academic and formal writing | +| | | | typically avoids contractions | +|----+-------------------------------------+-------------------------------------+-------------------------------------| +| 5 | Sentence split on conjunctions | personal only | Foucault, Hegel, Adorno | +| | | | deliberately use long compound | +| | | | sentences | +|----+-------------------------------------+-------------------------------------+-------------------------------------| +| 6 | Felt-experience narration ("I'll | personal only | Personal essays *use* | +| | feel this every time") | | felt-experience as content | +|----+-------------------------------------+-------------------------------------+-------------------------------------| +| 7 | Noun-ified verbs ("the ask", "a | both | Targets corporate-speak with | +| | learn", "the spend") | | curated wordlist; doesn't catch | +| | | | philosophical nominalizations like | +| | | | "the becoming" | +|----+-------------------------------------+-------------------------------------+-------------------------------------| +| 8 | Sentence fragments → complete (in | personal only | Fragments are valid stylistic | +| | prose) | | devices in literary prose | +|----+-------------------------------------+-------------------------------------+-------------------------------------| +| 9 | Terse cut (rhetorical padding: | personal only | Tier 1 omit-needless-words covers | +| | "worth noting", "it's important to | | the worst offenders universally; | +| | understand") | | aggressive cut conflicts with | +| | | | academic register | +|----+-------------------------------------+-------------------------------------+-------------------------------------| +| 10 | Public-artifact scope check (local | personal only — *flag-only*, no | Operational/safety check, not | +| | paths, private repos, personal | auto-rewrite | stylistic; auto-masking risks | +| | tooling) | | silently editing meaningful text | +|----+-------------------------------------+-------------------------------------+-------------------------------------| + +*** Inclusive-language pass — explicitly excluded + +Considered and rejected. Conflicts with planned writing on philosophy/history topics (Foucault on sexuality and gender, history of slavery in New Orleans). Wordlist substitutions would override deliberate vocabulary choices in those genres. + +*** V1 scope + +- [ ] Skill at =~/code/rulesets/voice/= with =SKILL.md= +- [ ] Frontmatter with positive triggers (commit, PR, comment, "humanize", "voice pass") and negative triggers (code, structured data, plain bullet lists) +- [X] Mode invocation: default = =general= when invoked bare; =personal= invoked explicitly by publish-context callers +- [X] humanizer content migrated from =humanizer/= → =voice/= +- [X] Tier 1 universal passes implemented (5 patterns: #26-30, plus #31 noun-ified verbs as a universal personal addition) +- [X] 2 personal passes that run in both modes (#30 jargon-fragment, #31 noun-ified verbs) +- [X] 8 personal passes that run in personal mode only (#32 first-person, #33 semicolons, #34 contractions, #35 sentence-split, #36 felt-experience, #37 fragments, #38 terse cut, #39 scope check) +- [X] Each pass = detection-pattern + rewrite-rule pair (#39 is detection + flag-only) +- [X] Total v1 pattern count: 31 in general mode (humanizer's 25 + 4 tier-1 + 2 universal personal); +8 personal-only = 39 in personal mode +- [X] Update =commits.md= to invoke =/voice personal= instead of "run =humanizer= and apply five passes manually" +- [X] Remove the existing =humanizer/= skill (no callers outside this repo, all migrated) +- [X] =make doctor= still passes +- [X] =make lint= clean + +*** v2 (deferred) + +- [ ] Tier 2 universals (positive form, word-pair corrections, parallelism, tense consistency, acronym definition) +- [ ] Per-pass severity flags for Tier 1 active-voice (suggestion-only when actor is implicit; auto-rewrite when actor is named) +- [ ] Reporting mode: list which passes fired and which were no-ops + +*** v3 (aspirational, may not land) + +- [ ] Tier 3 (concrete-over-abstract, emphatic-word position, sentence-length variation, reading-grade scoring) +- [ ] Progressive disclosure split: =voice/SKILL.md= orchestrator + =voice/passes/<pass-name>.md= per pass with worked examples + +*** Migration (resolved) + +Decision: deleted =humanizer/= entirely. Three callers (=commits.md=, =respond-to-cj-comments.md=, =start-work.md=) all updated to invoke =/voice= directly. No alias needed since nothing outside the repo invoked humanizer. + +*** Naming alternatives considered + +- =voice= — chosen. Captures both modes; broad enough. +- =polish= — descriptive of multi-pass nature; less prescriptive about whose voice. +- =house-style= — signals "this is the house style"; appropriate for personal repo. +- =commit-voice= — too narrow (passes apply to research notes, emails, etc. in general mode). +- =humanize= (extending current) — undersells the universal + personal additions. + +*** Open questions before implementation + +Resolved during implementation: +- Default mode when =/voice= is invoked bare: =general=. Personal-context callers (=commits.md= publish flow, =respond-to-cj-comments.md=) invoke =/voice personal= explicitly. Avoids accidentally first-person-ifying research notes. +- Reporting: skill prints "Summary of changes" listing which patterns fired (audit value). +- Public-artifact scope check (#39): flag-only, user resolves manually. Blocking would frustrate on legitimate path mentions. +- Tier 1 active-voice detection: suggestion-only in v1. Auto-rewrite for unambiguous cases deferred to v2. +** DONE [#B] Add =--archive-done= mode to =.ai/scripts/todo-cleanup.el= :feature: + +Opt-in mode that moves every level-2 subtree whose TODO state is DONE or CANCELLED out of the "Open Work" section and into the "Resolved" section of the same org file, subtree intact. + +- *Section matching.* Key on a top-level heading containing "Open Work" and one containing "Resolved" — that pairing is the only naming consistent across projects (=Work Open Work= / =Work Resolved= here; bare =Open Work= / =Resolved= elsewhere). Require exactly one match for each; otherwise skip with a clear message, no crash. +- *Modes.* =--check= previews and writes nothing, same as the existing hygiene pass. Idempotent. Not run by default in the wrap-up flow — archiving is consequential, so it stays opt-in: =emacs --batch -q -l todo-cleanup.el --archive-done FILE=. +- *Edge cases.* Source or target section missing; subtree at EOF; nested DONE subtree under an open parent stays put (only level-2 entries move); nothing to move → clean no-op. +- *Tests.* TDD with ERT — the project's first elisp tests. Fixtures (synthetic) under =.ai/scripts/tests/=; run via =make test= (rulesets) or =make test-scripts= (claude-templates), which run pytest + every =tests/test-*.el= ERT suite. Cases: one DONE level-2 moves; multiple; CANCELLED also moves; structural (no-state) headings don't move; nested DONE under an open parent stays; level-2 DONE with open level-3 children moves intact; subtree at EOF; missing source/target section; ambiguous "Resolved"; lowercase headings; nothing-to-do; idempotency; =--check= preview + its idempotency; realistic-sample integration. + +Origin: came up while scrubbing a project's todo.org on 2026-05-11 — moving a big completed PROJECT subtree (plus a few smaller ones) into the Resolved section by hand was the cue to build a reusable tool. + +Built and shipped 2026-05-11: =--archive-done= added to =.ai/scripts/todo-cleanup.el= test-first; 13-test ERT suite (=tests/test-todo-cleanup.el=) + realistic synthetic fixture (=tests/fixtures/todo-sample.org=), wired into =make test= / =make test-scripts= alongside pytest. The CLI dispatch moved into =tc-main= behind a guard so the suite can =require= the file without firing it. Section matching is case-insensitive and tolerates the =<Project> Open Work= / =<Project> Resolved= naming variants. Opt-in only — not wired into the wrap-up flow. Source of truth is =~/projects/claude-templates/=; rsync'd into this repo. +** DONE [#B] Encode follow-up filing rules into =/start-work= +CLOSED: [2026-05-15 Fri] + +Phase 4 step 5 of =/start-work= ("refactor audit") says any candidate that isn't fix-now must land in one of three buckets: fold-into-related-commit, separate =refactor:= commit, or "file a ticket or todo.org entry." The third disposition doesn't say *where* — which leaves the orchestrator picking a location ad-hoc. Result: follow-ups buried under children of an epic parent get orphaned when the parent closes, or follow-ups for standalone tasks scatter across the file with no convention. + +Proposed placement rule (already memorized for this project as =feedback_followups_as_siblings.md=, generalizing): + +- *Epic-style parent task* (level-2 with multiple level-3 children) → follow-ups file as level-2 *siblings* of the parent. Stays visible after parent closure. +- *Standalone task* (level-2 with no children, or a level-3 inside another structure) → follow-up files as a new level-2 top-level entry in the same =* Open Work= section. Don't nest under the originating task. + +Both cases: include a "Triggered by: <date> <task or commit>" line so a future reader sees what surfaced it. + +Update =.claude/commands/start-work.md= Phase 4 step 5's "Disposition for each candidate" section to spell this out. Update any cross-references in =commits.md= or other files that touch the discipline. + +Triggered by: 2026-05-15 fold-epic session — Craig flagged the gap mid-flight after I'd surfaced a follow-up but hadn't filed it. +** DONE [#A] Consolidate =.ai/= template infrastructure (fold + audit + install-ai + ratio) :feature: +CLOSED: [2026-05-15 Fri] + +End-state: one repo (=rulesets=) is the single source of truth for =.ai/= template content. =make audit= verifies and applies drift across every =.ai/=-using project on the machine. =make install-ai= bootstraps new projects. Same setup propagated to ratio so both machines run the same way. + +Today (2026-05-15) the canonical-source rule got violated again: rulesets commit =372fb76= added a wrap-up subsection to =rulesets= without going through =claude-templates= first, and the next session's startup rsync was about to silently undo it. Two-repo coordination is the root cause; fold solves it. + +Build order: fold first (others depend on the new canonical path), then audit + install-ai in parallel, then test, then propagate to ratio. + +*** DONE [#A] Fold =claude-templates= into rulesets +CLOSED: [2026-05-15 Fri] + +Two repos, one source of truth. =~/projects/claude-templates/= is the canonical =.ai/= template that gets rsync'd into every project at session start. Keeping it standalone means a second =git pull= in startup Phase A.0, a second remote to push to at wrap-up, and a split history any time a change touches both. Folding it into =rulesets/claude-templates/= gives one repo to clone on a fresh machine and one place to edit templates. + +**** Open design choices + +- *History.* =git subtree add --prefix=claude-templates ~/projects/claude-templates main= preserves the 84-commit history under the new prefix. Plain content copy (=cp -a= + =git add=) is simpler but loses history. Either is fine since the standalone repo stays archived on =cjennings.net=. +- *Layout.* =rulesets/claude-templates/= mirrors the old repo name and sits next to =claude-rules/= cleanly. Alternative: absorb =.ai/= directly under a different name (=rulesets/.ai-template/= or similar). First option is clearer. +- *bin/ai.* The standalone Makefile symlinks =$HOME/.local/bin/ai → bin/ai=. After the move, fold that into rulesets' Makefile as another install target. + +**** Mechanical steps + +1. Subtree-merge or copy =~/projects/claude-templates/= into =rulesets/claude-templates/=. +2. Update 3 references in rulesets: + - =.ai/protocols.org= line 163 — pointer in the "Let's run/do the X workflow" section. + - =.ai/workflows/cross-agent-comms.org= line 8 — promotion-target path. + - =.ai/workflows/startup.org= lines 22, 96-98 — Phase A.0 pull + Phase A rsync sources. +3. Update Phase A.0 of =startup.org= to pull rulesets instead of claude-templates. Inside rulesets sessions, the existing project-repo pull already covers it. Outside rulesets (every other project's session), Phase A.0 needs an explicit =git pull= on =~/code/rulesets/= before the rsync — otherwise the templates will be stale. +4. Replace =~/projects/claude-templates/= with a symlink to =~/code/rulesets/claude-templates/= for transition continuity. +5. After every active project has had one session start (and rsync'd the new =startup.org=), drop the symlink and archive =cjennings.net:git/claude-templates.git=. + +**** Bootstrap gap + +Every project on the machine has a =.ai/workflows/startup.org= that rsyncs from =~/projects/claude-templates/=. Until each project's startup.org gets refreshed (which happens via the rsync itself), the old path needs to keep resolving. The symlink at step 4 is the bridge: old paths resolve into the new location, the rsync delivers the updated startup.org, next session uses the new path directly. + +*** DONE [#A] Add =make audit= — drift detector across all =.ai/=-using projects +CLOSED: [2026-05-15 Fri] + +Companion to =make doctor= (single-machine scope, checks =~/.claude/=). =audit= is cross-project scope: walks every directory on the machine that has a =.ai/=, diffs the synced template files against the canonical source, and reports drift. =--apply= flag rsyncs the drift into the project's working tree (no auto-commit). Catches stale projects without forcing a session start in each one. + +**** Open design choices + +- *Scope.* Template-sync drift is the useful flavor: for each project, diff =.ai/protocols.org=, =.ai/workflows/=, =.ai/scripts/= against the canonical source. +- *Source path.* Post-fold: =~/code/rulesets/claude-templates/.ai/=. Build =audit= against the new path from day one. +- *Project discovery.* Walk =~/code/=, =~/projects/=, =~/.emacs.d/= up to depth 3 for any directory containing =.ai/=. Skip the canonical source itself. +- *Default mode is report-only.* =--apply= triggers rsync; =--force= overrides the dirty-skip safety. + +**** Per-project flow (designed 2026-05-15) + +For each discovered project, in order: + +1. Verify =.ai/= exists (path probe). If missing → =FAIL=, skip, continue loop. +2. Detect git tracking via =git check-ignore .ai/= → =tracked= or =gitignored=. +3. Verify no uncommitted =.ai/= changes (=git status --porcelain .ai/=). Dirty → =WARN=, skip rsync unless =--force=. +4. Verify content matches canonical via three =rsync -a --dry-run --itemize-changes= calls (=protocols.org=, =workflows/=, =scripts/=). Zero items = clean. +5. Action (=--apply= only, drift detected): three =rsync -a [--delete]= calls. +6. Verify rsync converged (re-run the dry-runs; zero now). +7. Verify working-tree state after rsync (tracked projects). Report deltas. Do not auto-commit. +8. Verify no unpushed =.ai/= commits (=git log @{u}..HEAD -- .ai/=). Informational only. + +**** Output format (mirrors =doctor=) + +#+begin_example +Claude-templates source: + ok rulesets/claude-templates is current (origin/main) + +Per-project .ai/ drift: + ok ~/projects/work + applied ~/projects/homelab 3 files changed + skipped ~/code/winvm uncommitted .ai/ (use --force) + ok ~/projects/clipper + +Summary: 18 ok, 3 applied, 1 skipped, 0 failed +#+end_example + +Exit code: =0= if all clean, no skips, no failures. =1= otherwise. + +**** Why not extend =make doctor= instead + +=doctor= has a clean meaning today: "is this machine's =~/.claude/= consistent with rulesets?" Mixing in cross-project =.ai/= drift muddies the exit code. Keep them separate. =audit= can optionally invoke =doctor= as its last check since both ask "did the symlinks keep up with the source?". A future =make all-checks= can wrap both. + +*** DONE [#A] Add =make install-ai PROJECT=<path>= — bootstrap =.ai/= in a fresh project +CLOSED: [2026-05-15 Fri] + +Separate target from =audit= because operating on projects that lack =.ai/= is a distinct action. The absence might be intentional, so =audit= skips them. Bootstrap is explicit opt-in. + +**** Flow + +1. Refuse if =.ai/= already exists in =PROJECT=. Message: "already installed; use =make audit --apply= to update." +2. Verify =PROJECT= is a git checkout (warn if not — works without git, loses some lifecycle benefits). +3. Create =PROJECT/.ai/= directory. +4. Rsync canonical content: =protocols.org=, =workflows/=, =scripts/= (same three rsyncs as =audit=). +5. Seed =PROJECT/.ai/notes.org= from a canonical template with project-name placeholder. +6. Create empty =PROJECT/.ai/sessions/= (with =.gitkeep= for tracked projects). +7. Track or gitignore =.ai/=? Default: ask. Flag: =--track= / =--gitignore=. +8. Print next-steps banner: =make install-lang LANG=<lang> PROJECT=<path>=; open Claude Code in the project. + +**** Symmetry with existing install targets + +#+begin_example +make install-lang LANG=python PROJECT=/path # language bundle (existing) +make install-ai PROJECT=/path # .ai/ template (new) +make install-lang # no args → fzf-pick +make install-ai # no args → fzf-pick from + # ~/projects/* + ~/code/* dirs + # without an existing .ai/ +#+end_example + +*** DONE [#A] Test plan for audit + install-ai before propagating to ratio +CLOSED: [2026-05-15 Fri] + +Test against the current state of this machine before pushing changes to ratio. + +**** =make audit= tests + +1. Dry-run report only (no =--apply=). Should show: claude-templates current; per-project drift; correct =ok=/=drift= classifications; summary line and exit code match. +2. After the fold lands, every project should be reported as drift (their =startup.org= still points at the old path). Run =--apply= → rsync converges. Re-run audit → all =ok=. +3. Manually edit one =.ai/workflows/foo.org= in a tracked project. Re-run audit → should report =skipped: uncommitted .ai/=. Run =--apply --force= → rsync clobbers the edit. Verify the edit is gone. +4. Manually delete one =.ai/= dir. Re-run audit → =FAIL: .ai/ missing=. Loop continues. +5. Idempotency: =--apply= twice in a row converges to all =ok= on the second pass. + +**** =make install-ai= tests + +1. Create =/tmp/test-fresh-project= as a git repo. Run =make install-ai PROJECT=/tmp/test-fresh-project=. Verify =.ai/= structure matches canonical, =notes.org= has placeholder, =sessions/= exists. +2. Run =make install-ai PROJECT=/tmp/test-fresh-project= again → should refuse (=.ai/= already exists). +3. Open Claude Code in the new project. Startup workflow runs cleanly (Phase A.0 + Phase A rsync should be a no-op since the install just ran). +4. fzf form: =make install-ai= with no args. Lists candidate dirs (=~/projects/*=, =~/code/*= without =.ai/=). + +**** Pass criteria + +- =audit= behavior matches the per-project flow spec for every classification path. +- =install-ai= produces a project indistinguishable from one that's been running sessions for a while. +- =make doctor= still passes 36/0/0 after all the work. +- =make test= (pytest + ERT) passes. + +*** DONE [#A] Migrate projects on ratio (second machine) +CLOSED: [2026-05-15 Fri] + +After local fold + audit + install-ai are working, propagate to ratio. + +**** Steps + +1. On ratio: =git -C ~/code/rulesets pull= — picks up the folded =claude-templates/= subdir and updated =Makefile= targets. +2. On ratio: archive or =mv= the standalone =~/projects/claude-templates/= aside, replace with symlink to =~/code/rulesets/claude-templates/= (same bridge mechanic as local). +3. On ratio: =make audit= → see drift across ratio's projects. +4. On ratio: =make audit --apply= → rsync into each tracked/gitignored project. Surface projects with uncommitted =.ai/= drift for manual handling. +5. On ratio: =make doctor= → catch any =~/.claude/= install drift (likely some, since ratio hasn't seen recent rulesets updates). +6. Verify by opening Claude Code in a few ratio projects. Startup should be a no-op or near-zero rsync. + +**** Known unknowns + +- Ratio may have its own project list overlapping with this machine's but not identical. =audit= discovers projects via the walk, so this is automatic. +- Ratio might have uncommitted =.ai/= work in some projects that this machine doesn't. =audit= surfaces them; handle case-by-case. +- If anything goes wrong, ratio's archived =~/projects/claude-templates/= is the safety net — restore the symlink target and re-run audit. + +**** Adjacent: cross-machine memory sync + +The =[#A] DOING= memory-sync investigation (todo.org:10) is adjacent. Both involve "make my Claude setup portable across machines." Coordinate so the memory-sync stow approach (if approved) doesn't conflict with this fold's symlink mechanics. +** DONE [#B] Document startup pull-ordering rule in protocols.org +CLOSED: [2026-05-15 Fri] + +Phase A.0 of =startup.org= now pulls rulesets ff-only before the project repo +(shipped 2026-05-15 as part of the claude-templates fold — after the subtree +merge, there's no separate claude-templates pull, just rulesets-then-project). +The protocols.org paragraph stating the ordering and "resolve any issues +before proceeding" rule shipped 2026-05-15 in the =** Startup Pull Ordering= +subsection under =IMPORTANT - MUST DO=. +** DONE [#A] Build =/lint-org= skill + wrap-up integration +CLOSED: [2026-05-14 Thu] + +Spec: [[file:.ai/specs/lint-org-skill-spec.md]] + +A two-mode skill (=interactive=, =mechanical-only=) that runs =org-lint=, +auto-fixes safe categories (item-number, missing-language-in-src-block, +misplaced-planning-info, markdown-bold → single-asterisk), and walks judgment +items (broken local-file links, invalid fuzzy links, verbatim-asterisk false +positives, suspicious-language blocks) inline. + +Wrap-up integration: =wrap-it-up.org= invokes +=/lint-org todo.org --mode=mechanical-only= after the existing +=todo-cleanup.el --archive-done= pass. Judgment items defer to a +carry-forward file that the next morning's daily-prep merges in, so +wrap-up never blocks on a judgment call. + +Baseline that motivated this: the 2026-05-14 manual pass took =todo.org= +from 55 → 1 lint warnings across two commits (=0d10458= signal, +=9ad5b30= cosmetic). A nightly mechanical sweep keeps the count near +zero forever — each day's drift is small. +** DONE [#C] Test harness for =make audit= + =make install-ai= edge cases :test: +CLOSED: [2026-05-15 Fri] + +Three edge cases from the fold-epic test plan were not exercised because they're destructive on real projects: + +- =audit --force= clobbers uncommitted =.ai/= work — needs a project with intentionally dirty =.ai/= to verify the override path. +- =audit= reports =FAIL= when =.ai/= is missing — needs a project where the directory was deleted to verify the loop continues past the failure. +- =install-ai= fzf-pick form (no =PROJECT= arg) — needs interactive testing. + +Build a self-contained test harness under =.ai/scripts/tests/= that spins up =/tmp/audit-test-projects/= with a known matrix of project states (clean, dirty, missing =.ai/=, pristine, etc.), runs the audit + install-ai targets against it, and asserts expected outputs. The harness should clean up after itself. + +Pattern reference: bats or shell-based assertions (similar to the elisp ERT suites for =todo-cleanup= and =lint-org=, but for shell scripts). + +Triggered by: 2026-05-15 fold-epic, child 4 test plan; commits =94782ee= (audit) + =d364cf2= (install-ai). +** DONE [#A] wrap it up mentions github, which isn't the remote for many projects. :chore: +CLOSED: [2026-05-16 Sat] +For many of them, git.cjennings.net mirrors to github.com, and github.com isn't the remote. +For many others, git.cjennings.net is the remote with no mirror. +Remove or replace the reference to github.com +** DONE [#B] Phase A startup blind to =claude-templates/inbox/= post-fold :bug:fold: +CLOSED: [2026-05-19 Tue] + +Resolved on inspection: the bug is moot in current state. =inbox-send.py='s discovery scans =~/code/*= and =~/projects/*= single-level only, so =claude-templates/= (two levels under =~/code/=) is never a routable target; the 2026-05-15 incident was a one-time manual workaround because =rulesets/inbox/= didn't exist yet, and that root inbox was added in =470085f=. =claude-templates/inbox/= was removed 2026-05-15 and is no longer on disk. + +Phase A's inbox check at =startup.org:107= runs =\ls -la inbox/= against the project root. Post-fold, the canonical's inbox sits inside the subtree at =claude-templates/inbox/= and never gets scanned. A 2026-05-15 cross-project handoff from a dotemacs session dropped a record there; the next rulesets session (this one) missed it at startup entirely. Picked up only when the working-tree drift surfaced during the publish flow. + +Fix: extend Phase A's discovery to also scan =claude-templates/inbox/= when the canonical lives in-repo (i.e., when =claude-templates/.ai/= exists alongside =./.ai/=). The Phase B/C inbox-processing flow already handles per-file routing once a file is surfaced; the gap is only in discovery. + +Adjacent question worth answering at the same time: should cross-project handoffs file into =./inbox/= at the project root (matching what Phase A already scans), or stay in =claude-templates/inbox/= and rely on the discovery fix? The =inbox-send= script's target-project logic is the place to settle that. + +Triggered by: 2026-05-15 evening session, surfaced when committing the test-harness work. +** DONE [#A] Implement task-review daily-habit per spec +CLOSED: [2026-05-20 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-20 +:END: +Spec: [[file:docs/design/task-review.org]] + +Retires =wrap-it-up.org='s date-coverage scan and replaces it with a daily list-hygiene review (N=7 oldest-unreviewed top-level =[#A]= / =[#B]= / =[#C]= tasks per session, ~12-day rotation). Built as a pure Claude workflow — Shape B, no elisp; see the spec's Revision section for why the elisp approach was dropped. + +Status: +1. [X] =task-review-staleness.sh= + bats (count + =--list= modes). +2. [X] =wrap-it-up.org= health check (threshold 30). +3. [-] =task-review.el= — dropped (Shape B is a pure workflow, not an Emacs mode). +4. [X] New =task-review.org= workflow + INDEX entry (the existing listing workflow was renamed to =open-tasks.org= to free the name). +5. [X] Startup nudge in template =startup.org= (threshold 7), not the project-only startup-extras layer. +6. [X] Smoke test against live =todo.org= — first cycle run 2026-05-20 (7 tasks reviewed: 3 re-grades, 1 cancellation, 1 bump-and-tag). + +Triggered by: 2026-05-16 brainstorm on retiring the date-coverage scan. +** CANCELLED [#B] Build =ov-1= skill for DoDAF OV-1 (High-Level Operational Concept Graphic) +CLOSED: [2026-05-20 Wed] + +Cancelled during the 2026-05-20 task review. + +Triggered by SOFWeek (May 2026, Tampa) — DeepSat attending; DoD attendees +may ask for architecture diagrams. OV-1 is the universal informal +currency in DoD briefings ("show me the architecture" → OV-1 by default). + +Priority upgrades to =[#A]= if Craig confirms scenario 2 below (personal +load-bearing need at the event); stays =[#B]= or drops to =[#C]= if +scenario 1 (team already covers it, future asset only). + +*** Prior art (searched 2026-04-19) + +No existing Claude Code skill exists for DoDAF / OV-1 / SV-1 / SysML. + +- =anthropics/skills= — 17 skills, zero DoDAF/SysML/defense coverage. +- =awesome-claude-code= list — zero hits for DoDAF/OV-1/SysML/UAF. +- =mfsgr/sysml2dodaf= — empty repo (0 stars, no code). Vapor. +- =HowardKao-1130/mini-NEXEN= — broad SE methodology skill that + name-drops DoDAF as a trigger keyword; no artifact generation. 0 stars. +- =gaphor/gaphor= (Apache-2.0, 2.2k stars) — mature UML/SysML GUI + modeler. Not a skill; not a pipeline. Useful reference only. + +Nearest prior art to lean on when building: +- DoDAF 2.02 Viewpoints & Models reference (dodcio.defense.gov) — + canonical OV-1 exemplars. Embed 3-5 layouts as skill =references/=. +- Pattern from existing =c4-diagram= skill — same shape (prose → diagram + spec), swap the viewpoint vocabulary to DoDAF. +- PlantUML for SV-1 (when that skill comes later); Mermaid or draw.io + XML for OV-1 lightweight visuals. + +*** Build scope (when triggered) + +*In scope:* +- Input: prose description of a system + its operational context. +- Output: structured OV-1 *spec* — performers, external actors (other + systems, forces, adversaries), relationships (data/control flows), + narrative captions, classification marking, legend requirements. +- DoDAF 2.02 completeness checklist as a quality gate — verify the + produced spec contains every element a correct OV-1 requires. +- Optional lightweight visual: draw.io XML or Mermaid approximation for + quick review; NOT a finished rendering. + +*Out of scope:* +- Icon libraries, pictorial assets, finished PowerPoint export. OV-1 + final art belongs to a designer or Craig in Visio/PowerPoint; the + skill's job is the spec and the check, not the slide. +- SV-1, SV-2, UAF, IDEF1X, other viewpoints. Build only when a + concrete need triggers each. + +Estimate: 4-6 hours. + +*** Craig's investigation before kickoff + +1. Does DeepSat's systems-engineering or marketing team already have an + OV-1 (or the equivalent briefing artifact) for SOFWeek? +2. If yes (scenario 1) — skill is a future asset, not event-load-bearing. + Ship after SOFWeek. Priority drops to =[#C]=. +3. If no, or if the scenario is "Craig may need to produce/iterate an + OV-1 on the fly during the event" (scenario 2) — skill is load-bearing + for the event. Priority upgrades to =[#A]=; build before SOFWeek. +4. Confirm the classification level the skill needs to handle + (unclassified-only? or FOUO markings? affects the classification + block in the spec). +5. Confirm the target rendering format DeepSat uses for OV-1 + deliverables (PowerPoint slide? Cameo? Visio? affects whether the + skill emits draw.io XML vs Mermaid vs pure structured spec). + +*** Related + +See also the DoD-specific notations section under the later TODO +(=c4-*= rename revisit) — OV-1 is flagged there as the highest-value +starting point across the DoD notation landscape (SysML, DoDAF/UAF, +IDEF1X). This entry is the execution plan for that starting point. +** DONE [#A] Split team-specific publishing rules out of commits.md :commits: +CLOSED: [2026-05-22 Fri] +Shipped 3cb467e. Moved the DeepSat publishing steps (Linear ticket-state, the Slack notification protocol + channel ID, the GHE host, the team merge norm, the Linear ticket-body structure) out of the global =claude-rules/commits.md= into =teams/deepsat/claude/rules/publishing.md=. The global file keeps the universal skeleton and uses seams ("run the project's publishing overlay here if present") like startup-extras. Added =install-team= (targeted per-project copy, keyed on PROJECT, never globally symlinked) and generalized =sync-language-bundle.sh= to keep team overlays fresh at startup (3 new bats; make test green). + +Remaining deploy step (cross-project, surfaced to Craig): install the overlay into the DeepSat work project — =make install-team TEAM=deepsat PROJECT=<deepsat-path>= — so it actually loads there. +** DONE [#A] Define a /voice-unavailable fallback in the commits.md publish flow :commits: +CLOSED: [2026-05-22 Fri] +Added an "If =/voice= is unavailable" paragraph to the Single-skill gate in =commits.md=: walk the same patterns inline (the flow already names which matter), state the skill was unavailable and the pass was applied by hand ("/voice unavailable — patterns walked inline"), and flag the missing skill for install. The gate is the pattern walk, not the tooling. The original "=humanizer= unavailable" framing was moot (humanizer → /voice). +** DONE [#A] wrap-it-up Step 3.5 assumes GitHub-family remote :chore:quick: +CLOSED: [2026-05-22 Fri] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-20 +:END: +Documented the assumption inline at =wrap-it-up.org= Step 3.5 (chose the lightweight path over a provider-agnostic rewrite): the =gh= lookup expects a GitHub-family host, holds today via DeepSat on GHE, flagged for update if a future Linear project lands on GitLab/Gitea/Bitbucket. +Triggered by: 2026-05-16 wrap-it-up github.com cleanup (audit of the same file). + +Step 3.5 (Linear ticket-state hygiene) at =wrap-it-up.org:207= says "the project's GitHub remote — use =gh pr list ...=". Currently fine in practice: the step is Linear-gated, and the only Linear-using project is DeepSat (on =deepsat.ghe.com=, a GitHub-family host where =gh= works). Would break if a future Linear-using project lived on a non-GitHub host (gitlab, gitea, bitbucket). Either drop the GitHub-family assumption (provider-agnostic lookup, harder) or document the assumption explicitly so future projects know the step needs an update if they don't fit. +** DONE [#C] Review pass: tighten skills and rulesets after 2026-05-04 audit +CLOSED: [2026-05-22 Fri] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-20 +:END: +All 55 grouped-index items dispositioned (2026-05-22): ~49 edited across skills, commands, rule files, hooks, and the two playwright skills; several came out moot post-audit (humanizer→voice, skills→commands, typescript ruleset added); the two commits.md items shipped as the team-overlay split + /voice fallback. Freshness-checked each item against current reality before editing. + +Source notes used in this pass: +- C4 official docs: C4 is notation-independent; System Context and Container + diagrams are enough for most teams; every diagram needs title, key/legend, + explicit element types, and audience-appropriate abstraction. + [[https://c4model.com/diagrams][C4 diagrams]], + [[https://c4model.com/diagrams/notation][C4 notation]], + [[https://c4model.com/abstractions/component][C4 component]] +- arc42 docs: quality requirements need measurable scenarios; section 10 + should reference top quality goals and capture lesser quality requirements + with specific measures. [[https://docs.arc42.org/section-10/][arc42 section 10]], + [[https://quality.arc42.org/articles/specify-quality-requirements][specifying quality requirements]] +- ADR references: ADRs capture one justified architecturally significant + decision and its rationale; Nygard's original guidance emphasizes short, + numbered, repository-stored records and superseding rather than rewriting old + decisions. [[https://adr.github.io/][adr.github.io]], + [[https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions][Nygard ADR article]] +- Playwright docs: prefer user-visible locators and web assertions; locators + auto-wait and retry; =networkidle= is discouraged for testing readiness. + [[https://playwright.dev/docs/best-practices][Playwright best practices]], + [[https://playwright.dev/docs/locators][Playwright locators]], + [[https://playwright.dev/docs/next/api/class-page][Playwright page API]] +- OWASP references: Top 10 2021 includes Broken Access Control, + Cryptographic Failures, Injection, Insecure Design, Security + Misconfiguration, Vulnerable and Outdated Components, Identification and + Authentication Failures, Software and Data Integrity Failures, Security + Logging and Monitoring Failures, and SSRF; WSTG adds a broader testing map + across configuration, identity, authn/z, sessions, input validation, error + handling, cryptography, business logic, client-side, and API testing. + [[https://owasp.org/Top10/2021/][OWASP Top 10 2021]], + [[https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/][OWASP WSTG]] +- V2MOM references: Salesforce calls the last M "Measures" and emphasizes a + simple alignment document with prioritized Methods, explicit Obstacles, and + measurable outcomes. [[https://trailhead.salesforce.com/content/learn/modules/selfmotivation/get-focused-with-your-personal-v2mom][Salesforce Trailhead personal V2MOM]], + [[https://www.salesforce.com/blog/?p=12][Salesforce V2MOM alignment]] +- Prompt research: the cited Meincke paper is titled "Call Me A Jerk: + Persuading AI to Comply with Objectionable Requests"; its scope is + persuasion increasing compliance with objectionable requests, not a general + proof that persuasion framing improves prompt quality. + [[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5357179][SSRN paper]] +- Combinatorial testing references: NIST supports t-way combinatorial testing + and notes pairwise is one covering strength, with higher-strength arrays + useful for failures requiring more interacting factors. + [[https://www.nist.gov/publications/practical-combinatorial-testing-beyond-pairwise][NIST beyond pairwise]], + [[https://www.nist.gov/publications/combinatorial-software-testing][NIST combinatorial testing]] + +*** Grouped index (for batching by area) + +Each item below is a one-line summary of a sub-TODO further down. Tick the box when the matching sub-TODO is moved to =DONE=. Items are grouped by area so they can be batched (e.g., "do all Playwright items in one session"). + +**** Browser testing +- [X] [#A] =playwright-js=: locator/assertion-first guidance (replace raw CSS, =networkidle=) +- [X] [#B] =playwright-js= + =playwright-py=: reconcile headless/visible defaults +- [X] [#B] =playwright-js= + =playwright-py=: remove emoji console markers from examples + +**** Frontend / UI +- [X] [#B] =frontend-design=: WCAG 2.2 alignment, accessibility non-optional +- [X] [#B] =frontend-design=: harmonize aesthetic guidance with anti-pattern rules + +**** Security +- [X] [#A] =security-check=: OWASP 2021 + WSTG coverage +- [X] [#B] =security-check=: tooling and offline/network caveats + +**** Combinatorial testing +- [X] [#B] =pairwise-tests=: t-way escalation guidance beyond pairwise +- [X] [#B] =pairwise-tests=: clarify negative value syntax + generator availability + +**** V2MOM +- [X] [#A] =create-v2mom=: rename Metrics → Measures (Salesforce alignment) +- [X] [#B] =create-v2mom=: prevent task migration from turning V2MOM into a backlog +- [X] [#B] =create-v2mom=: mitigation/owner fields for Obstacles + +**** Prompt engineering +- [X] [#A] =prompt-engineering=: correct/narrow Meincke citation +- [X] [#B] =prompt-engineering=: eval-harness requirement for production prompts + +**** Codify +- [X] [#B] =codify=: stale-entry review + privacy checks before writing project =CLAUDE.md= + +**** Code review +- [X] [#A] =review-code=: resolve local-verification vs CI boundary +- [X] [#B] =review-code=: =CLAUDE.md= citation scope for public artifacts +- [X] [#B] =review-code=: relax three-strengths rule for tiny/failing diffs + +**** PR / review responses +- [X] [#A] =respond-to-review=: remove review-process language from commit messages +- [X] [#B] =respond-to-review=: use unresolved threads + resolution state +- [X] [#B] =respond-to-cj-comments=: drop personal absolute paths from public-writing (moot — already clean) +- [X] [#B] =respond-to-cj-comments=: fallback when =humanizer= or =emacsclient= unavailable (moot — superseded by /voice + VERIFY pattern) + +**** Branch workflow +- [X] [#A] =finish-branch=: fix base-branch detection +- [X] [#B] =finish-branch=: worktree-aware pull/merge safety +- [X] [#B] =start-work=: tool-availability + ceremony-scaling rules +- [X] [#B] =start-work=: claim-before-justify rollback risk + +**** Tests / TDD +- [X] [#B] =add-tests=: fix missing =typescript-testing.md= reference or add ruleset (moot — ruleset now exists) +- [X] [#B] =add-tests=: explicit exceptions to "all three categories per function" + +**** Debugging / RCA +- [X] [#B] =debug=: capture environment + recent-change context before hypotheses +- [X] [#B] =root-cause-trace=: constrain defense-in-depth to trust boundaries +- [X] [#B] =five-whys=: require evidence + counterfactual validation per why + +**** Brainstorming +- [X] [#B] =brainstorm=: timebox + research/source rules for high-stakes designs + +**** Architecture +- [X] [#B] =arch-decide=: timeless examples, drop unverifiable claims +- [X] [#B] =arch-decide=: standardize statuses + immutability language +- [X] [#B] =arch-design=: threat modeling + privacy/compliance as first-class inputs +- [X] [#B] =arch-design=: separate paradigms from tactical patterns +- [X] [#B] =arch-document=: arc42/Q42 quality scenarios +- [X] [#B] =arch-document=: staleness + ownership metadata for generated docs +- [X] [#B] =arch-evaluate=: confidence levels for framework-agnostic findings +- [X] [#B] =arch-evaluate=: report skipped tool checks explicitly + +**** C4 modeling +- [X] [#A] =c4-analyze= + =c4-diagram=: notation/output fallback (not draw.io-only) +- [X] [#B] =c4-analyze= + =c4-diagram=: clarify abstraction boundaries + +**** Global rules +- [X] [#B] =commits.md=: split DeepSat/Linear/Slack-specific from global rules → promoted to a top-level task (deferred for Craig) +- [X] [#A] =commits.md= + publish flows: =humanizer=-unavailable fallback → promoted to a top-level task (deferred; humanizer premise moot) +- [X] [#B] =verification.md=: explicit "unable to verify" reporting standard +- [X] [#B] =testing.md=: property-based + mutation testing as escalation paths +- [X] [#B] =testing.md=: soften absolute TDD with explicit spike protocol +- [X] [#B] =subagents.md=: capability/availability + cost checks + +**** Languages +- [X] [#A] =python-testing.md=: revisit in-memory SQLite guidance +- [X] [#B] =python-testing.md=: separate "never mock ORM" from unit-test boundaries +- [X] [#B] =elisp.md=: drop tool-specific advice +- [X] [#B] =elisp-testing.md=: batch-mode + native-comp caveats + +**** Hooks +- [X] [#A] =hooks/README.md=: include =destructive-bash-confirm.py= in install/settings snippets +- [X] [#A] =hooks/git-commit-confirm.py= + =gh-pr-create-confirm.py=: inspect message/body files referenced by =-F= / =--body-file= +- [X] [#B] =hooks/destructive-bash-confirm.py=: shell-aware command parsing (not regex) + +*** 2026-05-22 Fri @ 15:47:10 -0500 Made playwright guidance locator/assertion-first, dropped networkidle-as-readiness + +Rewrote the readiness guidance in both =playwright-js/SKILL.md= and =playwright-py/SKILL.md=: reconnaissance now waits for a visible app landmark via a web assertion or locator (=expect(...).toBeVisible()= / =get_by_role(...).wait_for()=), not =networkidle= (which Playwright discourages). Updated the login/form examples to =getByLabel=/=getByRole= + web assertions, the API_REFERENCE.md waiting section, and =lib/helpers.js= defaults (=waitForPageReady= now defaults to =load= and prefers a caller-supplied landmark; =authenticate= races the success indicator over a =load= navigation). node --check passes. + +*** 2026-05-22 Fri @ 14:23:02 -0500 Added headed/headless decision tables to both playwright skills + +Added matching purpose-based decision tables to =playwright-js/SKILL.md= (was "always visible") and =playwright-py/SKILL.md= Best Practices (was "always headless"). Each names its own default and points at the other skill, so the difference is deliberate, not a habit-flip: headed for interactive debugging, headless for CI/pytest. Also softened the absolutist "Always launch... headless" comment in the py example. + +*** 2026-05-22 Fri @ 15:47:10 -0500 Removed emoji console markers from the playwright skills + +Replaced every emoji status marker with a plain ASCII prefix across =playwright-js/= (run.js, lib/helpers.js, SKILL.md) and =playwright-py/= (SKILL.md, examples/*.py): 📦/⚡/📄/📥/🎭/🚀/📋/✅/❌/🔍/📸/✓/✗ → =[setup]=/=[run]=/=[ok]=/=[error]=/=[fail]= etc. Post-change emoji grep is clean (excluding node_modules); node --check and py_compile pass. + +*** 2026-05-22 Fri @ 14:35:16 -0500 Made accessibility a non-optional WCAG 2.2 gate in frontend-design + +Added an "Accessibility Gate (required before handoff)" section to =frontend-design/SKILL.md= covering keyboard operation, focus visibility, focus-not-obscured (2.2), target size (2.2), contrast, reduced motion, labels, and semantic structure — a baseline for all frontend work, not just interactive components. Rewrote the Build/Review phases to build accessibly as you go and clear the gate before handoff, and bumped =references/accessibility.md= from WCAG 2.1 to 2.2 with backing detail for the new criteria. + +*** 2026-05-22 Fri @ 14:35:16 -0500 Added a "creative but bounded" section to frontend-design + +Added a subsection under Frontend Aesthetics framing the bold/maximalist directions as tools, not obligations: domain fit, readability first, responsive stability, and no decorative effect that degrades the workflow. Reconciles rather than contradicts the maximalist encouragement (maximalism stays on the table as deliberate usable density), and ties the readability bullet to the new accessibility gate. + +*** 2026-05-22 Fri @ 14:35:16 -0500 Updated security-check to OWASP Top 10 2021 + WSTG mapping + +Replaced the older six-category list in =.claude/commands/security-check.md= with the full Top 10 2021 set, each finding mapped to a 2021 category or WSTG area. Added the four missing categories (Insecure Design, Software and Data Integrity Failures, Security Logging and Monitoring Failures, SSRF) plus explicit checks for object/function-level authorization, SSRF on URL-fetch paths, update/plugin/dependency integrity, and logging/monitoring gaps. + +*** 2026-05-22 Fri @ 14:35:16 -0500 Added scanner tooling + network caveats to security-check + +Added an optional configured-scanners step (=gitleaks=/=trufflehog= secrets, =semgrep= source patterns, OSV scanner, lockfile-diff review) that supplements the manual scans, plus a network caveat: dependency audits that can't run (offline, tool absent, DB unreachable) must report "not run" naming the tool and reason, never read as a pass. Carried that into the no-issues summary. + +*** 2026-05-22 Fri @ 14:35:16 -0500 Added t-way escalation guidance to pairwise-tests + +Added an "Escalating Beyond Pairwise (t-way)" subsection: start with pairwise across the whole space, then escalate specific high-risk clusters to 3-way+ when history, safety, security, or domain coupling says a fault needs more than two interacting factors. Lists escalation triggers and shows the sub-model order syntax (={ A, B, C } @ 3=) vs a blanket =/o:3= bump, stressing targeted not uniform escalation. Cites NIST combinatorial-testing work. + +*** 2026-05-22 Fri @ 14:35:16 -0500 Clarified PICT ~ syntax + honest generator-availability path in pairwise-tests + +Added a "~ prefix" explanation (PICT marker tagging a value as negative/invalid, not an arithmetic operator; PICT pairs negatives with valid values once and strips the marker before the SUT) and a stop-at-the-model rule: if neither the =pict= binary nor =pypict= is present, produce the model and stop rather than hand-writing a table and passing it off as PICT output. + +*** 2026-05-22 Fri @ 14:43:17 -0500 Renamed Metrics → Measures throughout create-v2mom + +Full rename across =.claude/commands/create-v2mom.md= (acronym expansions, Phase 7 heading, the "Measures must be measurable" principle, exit criteria, review questions, red flags, examples) to match Salesforce's official term. Kept the "vanity metrics" idiom intact — it's the anti-pattern term, not a section reference. + +*** 2026-05-22 Fri @ 14:43:17 -0500 Split strategy from execution in create-v2mom task migration + +Rewrote Phase 8 (and tightened Phase 5.5): tasks stay in the backlog grouped by method, and each method gains a one-line link to where its tasks live, instead of transplanting the task tree into the V2MOM. Strategy (V2MOM) and execution (backlog) are now explicitly separate sources of truth, keeping the V2MOM concise. + +*** 2026-05-22 Fri @ 14:43:17 -0500 Made create-v2mom obstacles operational (mitigation/owner/cadence) + +Phase 6 now captures, per obstacle: name, manifestation, stakes, mitigation, owner, and review cadence — with a worked example per domain (health/finance/software), a "good obstacle" characteristic, a Phase 9 review question, and a red flag for candid-but-not-operational obstacles. An obstacle without a countermove is now flagged as an observation, not a plan. + +*** 2026-05-22 Fri @ 14:43:17 -0500 Corrected and narrowed the Meincke citation in prompt-engineering + +Fixed the title to "Call Me A Jerk: Persuading AI to Comply with Objectionable Requests" (SSRN abstract_id=5357179) in all three spots (frontmatter, Seven Principles intro, References). Reframed the ~33%→72% result as what it is — a prompt-safety caution that persuasion raises compliance with objectionable requests — explicitly not evidence that persuasion framing improves engineering prompt quality. Kept the seven principles as a tone vocabulary. + +*** 2026-05-22 Fri @ 14:43:17 -0500 Added an eval-harness requirement to prompt-engineering critique mode + +Added critique step 7 + a checklist line: for fragile or reusable/production prompts, write 3-5 adversarial/edge inputs, run both the old and new prompt against each, and record the behavioral delta. A throwaway prompt can ship on the rewrite alone; a discipline/reused/production one can't. Without examples, "the rewrite is better" is an assertion, not a result. + +*** 2026-05-22 Fri @ 14:43:17 -0500 Added mandatory stale-entry + privacy pre-write checks to codify + +Added a "Mandatory pre-write checks" block at the top of Phase 3 (Write) in =.claude/commands/codify.md=: a stale-entry scan (update/remove no-longer-true entries in place, don't append contradictions around them) and a privacy/leak check carrying both questions verbatim — "safe if the project were public?" and "belongs in private memory instead?" — routing private content to auto-memory. Gates, not background guidance. + +*** 2026-05-22 Fri @ 14:06:41 -0500 Scoped review-code's CI-trust rule to reviewing, not shipping + +Expanded the False-Positive Filter bullet in =review-code/SKILL.md=: "trust CI, don't run builds" applies to reading a diff, not producing one. A pre-commit/pre-push flow still owes the local verification =verification.md= requires (run the suite or state "not run because..."). Closes the apparent contradiction with =verification.md= / =finish-branch=. + +*** 2026-05-22 Fri @ 14:06:41 -0500 Added private-vs-public CLAUDE.md citation modes to review-code + +Expanded the Content scope section in =review-code/SKILL.md= with two modes: a private/internal review cites =CLAUDE.md= directly; a public/team review translates the rule into the engineering reason it encodes and doesn't name the rules file (a teammate can act on the reason, not on a file they can't reach). Same principle =commits.md= states for personal tooling in public artifacts. + +*** 2026-05-22 Fri @ 13:48:14 -0500 Relaxed review-code "three strengths" to up-to-three-or-none + +Changed all three "three minimum" spots in =review-code/SKILL.md= (Strengths section, Critical Rules DO list, Anti-Patterns) to "up to three specific; say none found on a tiny or weak diff." Reframed the old "No Strengths section" anti-pattern as "Skipping strengths out of laziness" so a substantive diff still demands them while a weak one can honestly report nothing notable. Landed alongside Craig's adjacent edit telling reviewers not to explain why a strength is good (sycophantic padding). + +*** 2026-05-22 Fri @ 14:12:24 -0500 Removed review-process language from respond-to-review commit guidance + +Replaced the =fix: Address review — [description]= example (and the matching description-line phrasing) in =.claude/commands/respond-to-review.md= with "name the actual fix (=fix: validate export filename=), not the review that prompted it." Killed the non-ASCII dash and the process-in-commit pattern that conflicted with =commits.md=. + +*** 2026-05-22 Fri @ 14:12:24 -0500 Made respond-to-review fetch unresolved threads + resolve after verification + +Rewrote section 1 (Gather) in =.claude/commands/respond-to-review.md= to pull =reviewThreads= via =gh api graphql= with =isResolved=, skipping already-resolved threads so settled feedback isn't re-processed; top-level conversation comments still come from REST. Added a section-4 step: reply and resolve a thread only after the fix is verified, never before. + +*** 2026-05-22 Fri @ 14:12:24 -0500 Verified respond-to-cj-comments no longer embeds an absolute path (moot) + +Already resolved by a prior migration: =grep= for =/home/= and =/Users/= in =.claude/commands/respond-to-cj-comments.md= returns nothing. The public-writing section refers to the rules by name, not by local path. No edit needed. + +*** 2026-05-22 Fri @ 14:12:24 -0500 Closed respond-to-cj-comments humanizer/emacsclient fallback (largely moot) + +Overtaken by two later changes: =/humanizer= was replaced by =/voice personal= (no =/humanizer= invocation remains), and the mandatory =emacsclient= summary-open was replaced by the in-place VERIFY-task pattern (workflow line ~262, Craig's 2026-05-12 standing instruction). Only a stale descriptive phrase remained — tidied "humanizer's signs of AI writing" to "the signs of AI writing." The original fresh-environment-fallback concern no longer applies as written. + +*** 2026-05-22 Fri @ 14:51:37 -0500 Fixed finish-branch base-branch detection + +Rewrote Phase 2: resolve the base *branch name* in priority order (open PR's =baseRefName=, then =git symbolic-ref --short refs/remotes/origin/HEAD= stripped, then ask), and compute the merge-base *SHA* separately only where a commit range is needed. Made the branch-name-vs-merge-base distinction explicit, since the old command returned a SHA where a branch name was needed. + +*** 2026-05-22 Fri @ 14:51:37 -0500 Made finish-branch merge safer + worktree-aware + +Added pre-flight checks to Option 1 (Merge Locally): dirty-tree refusal with no auto-stash, protected-branch awareness, upstream-gated =git pull --ff-only=, and merge-commit-vs-rebase as a team-policy choice instead of a hardcoded =--no-ff=. Replaced the fragile =git worktree list | grep <branch>= detection with a =git rev-parse --git-dir= vs =--git-common-dir= comparison plus =git worktree list --porcelain= for the path. + +*** 2026-05-22 Fri @ 14:51:37 -0500 Added tool-availability + ceremony-scale paths to start-work + +Added a "Tool availability" section (graceful degradation when Linear MCP / =gh= / =/voice= / Playwright are missing — do what's available, surface what isn't, don't block) and a "Ceremony scale" section (trivial / small / standard tiers so a two-line fix skips ticket+branch+gates unless asked). The =humanizer= reference in the original item is moot — the file already uses =/voice= throughout. + +*** 2026-05-22 Fri @ 14:51:37 -0500 Resolved start-work claim-before-justify rollback risk + +Split the claim by tracker type: personal todo.org claims defer to after the Justify gate (a killed task needs no rollback), while team trackers (Linear/GitHub) still claim first to signal intent but record prior state (status, assignee, label) so the Phase 2 rollback restores exactly it. Updated the per-tracker rollback steps and the matching anti-pattern. + +*** 2026-05-22 Fri @ 14:28:41 -0500 Verified add-tests typescript-testing.md reference resolves (moot) + +Resolved since the audit: =languages/typescript/claude/rules/typescript-testing.md= now exists, and =add-tests/SKILL.md:68= references it by bare filename, the same way it references =python-testing.md= (both get copied into a project's =.claude/rules/=). The "missing file" premise no longer holds. No edit needed. + +*** 2026-05-22 Fri @ 14:28:41 -0500 Added a category-exception protocol to add-tests + +Added an exception note to step 7 (proposal) in =add-tests/SKILL.md=: pure adapters, generated code, tiny pass-through wrappers, and framework glue may skip a category that would only re-test the framework, but the skip must be stated and justified in the plan and the behavior covered at integration/E2E level — never a silent omission. Step 12 (write) now points back to "honor documented category exceptions." + +*** 2026-05-22 Fri @ 14:25:37 -0500 Added environment + recent-change capture to debug Phase 1 + +Added a fourth Phase-1 step in =debug/SKILL.md=: record versions, feature-flag/config state, dataset/fixture, seed/clock, concurrency, and recent commits/config-infra changes. Noted that intermittent bugs usually live in environment/state transitions (and "what changed recently" is often the fastest route), while a deterministic local bug only needs a one-liner. Updated the phase's closing recap to include the context. + +*** 2026-05-22 Fri @ 14:25:37 -0500 Constrained root-cause-trace defense-in-depth to boundaries + +Rewrote step b in =root-cause-trace/SKILL.md=: instead of "add a check at each layer that could have caught it," add one only at a layer that owns a boundary or invariant — ingress/trust, persistence, invariant-owning service, final render. Added the explicit rule that a pass-through function owning neither shouldn't get a duplicate null check (validation spam). Recast the three example layers as the boundary types. + +*** 2026-05-22 Fri @ 14:25:37 -0500 Required evidence + counterfactual per why in five-whys + +Expanded step 2 in =five-whys/SKILL.md=: each link now owes an evidence field (a log/commit/metric/config you can point to) and a counterfactual check (remove this cause — does the symptom above plausibly not happen?). Framed the counterfactual as the main guard against monocausal storytelling, and updated the worked example to show both fields. + +*** 2026-05-22 Fri @ 15:51:59 -0500 Added timebox + fresh-sources rules to brainstorm + +Phase 1 gained a "Timebox the dialogue" rule (aim for the one-sentence restatement in ~5-8 questions, then move on and park the rest as open questions). Phase 2 gained "Ground high-stakes claims in fresh sources" (check load-bearing claims about markets/regulations/tools/vendors/APIs against a current source; mark unverified ones as assumptions). The design-doc skeleton gained an "## Assumptions" section that distinguishes researched facts (with source) from assumptions (to confirm before building). + +*** 2026-05-22 Fri @ 14:59:32 -0500 Made arch-decide examples timeless + required citations + +Dated the MongoDB multi-document-transaction example (scoped to 2024-01) with a backing reference, and added a "Cite, don't assert" Do: every concrete technical claim about a tool/version/platform carries a link, doc, version, or "checked YYYY-MM" date, or gets a domain-neutral placeholder — so unsourced "X can't do Y" doesn't rot into stale fact. + +*** 2026-05-22 Fri @ 14:59:32 -0500 Standardized arch-decide ADR statuses + immutability rule + +Declared a canonical five-status set (Proposed, Accepted, Rejected, Deprecated, Superseded) with an explicit "no synonyms" line, and spelled out the immutability rule in the Don'ts: an accepted ADR's body is frozen, only status/link metadata changes, a changed decision gets a new superseding ADR and the old one stays as the historical record. + +*** 2026-05-22 Fri @ 14:59:32 -0500 Added Trust/Data/Compliance phase to arch-design + +Added a new Phase 4 (Trust, Data, and Compliance) before the paradigm shortlist: trust boundaries, data classification, abuse/misuse cases, privacy constraints, compliance evidence, and operational ownership — surfaced early so the architecture is drawn around them, not retrofitted by a downstream =security-check=. Threaded into the workflow list, brief template (new §6), review checklist, and anti-patterns. + +*** 2026-05-22 Fri @ 14:59:32 -0500 Split paradigms from tactical patterns in arch-design + +Split Phase 5's single mixed table into Step 1 (pick one paradigm: monolith/microservices/layered/event-driven/serverless/pipeline/space-based) and Step 2 (compose tactical patterns: DDD, hexagonal, CQRS, event sourcing — several or none, often per-module), with composition examples and an anti-pattern against treating DDD/CQRS as alternatives to a paradigm. Recommendation + brief now name a paradigm plus composed patterns. + +*** 2026-05-22 Fri @ 14:59:32 -0500 Expanded arch-document quality scenarios to the Q42 six-part template + +Replaced §10's thin "Under [condition]..." template with the arc42/Q42 six-part structure (source, stimulus, environment, artifact, response, response measure), each glossed, with the cart-checkout example rewritten across all six parts. A one-line prose form stays acceptable once all six parts are recoverable. + +*** 2026-05-22 Fri @ 14:59:32 -0500 Added staleness/ownership metadata to arch-document output + +Added a per-section metadata block (owner, generated-against SHA + date, review cadence, "stale-when" conditions) as an HTML-comment header plus a visible Doc-status note, with field-fill guidance, and a whole-document Doc Status table replacing the README's "Last Updated" stub. Wired into the review checklist and an "Undated docs" anti-pattern. + +*** 2026-05-22 Fri @ 14:59:32 -0500 Added confidence levels to arch-evaluate findings + +Added a "Confidence and Provenance" subsection: every framework-agnostic finding carries High/Medium/Low + how it was determined, with a required "Not fully checked because..." note when scale, runtime imports, reflection, or dynamic dispatch cap certainty. Updated the example findings and review checklist; a finding with no note now asserts a full read. + +*** 2026-05-22 Fri @ 14:59:32 -0500 Made arch-evaluate report skipped tool checks explicitly + +Replaced "skip silently" with explicit reporting: for each detected language whose tool isn't configured or can't run, emit an Info "tool not configured / not run" finding (with an example) so the audit shows what was and wasn't verified. A check that didn't run no longer reads as a pass. Updated workflow step 4 and the review checklist. + +*** 2026-05-22 Fri @ 14:51:37 -0500 Added notation/output fallback to c4-analyze + c4-diagram + +Both commands now treat C4 as notation-independent: a "Choosing a notation" section (draw.io XML, Structurizr DSL, Mermaid with native C4 types, PlantUML/C4-PlantUML) and a headless fallback that emits a text notation (Mermaid or Structurizr DSL) and skips PNG-export/desktop-open when =drawio= or a GUI is absent, rather than failing. draw.io is now one option, not the only one. + +*** 2026-05-22 Fri @ 14:51:37 -0500 Clarified C4 abstraction boundaries in c4-analyze + c4-diagram + +Added an "Abstraction boundaries" section to both: a Container is a separately deployable/runnable unit (not synonymous with a Docker container — a SPA or managed DB counts), a Component lives inside one Container and isn't separately deployable. Added a 4e "Verify single abstraction level" check that walks every element and relationship to confirm it stays at the diagram's level, notation-independent. + +*** 2026-05-22 Fri @ 15:10:35 -0500 Added "When You Cannot Verify" standard to verification.md + +Added a section requiring, when a verification command can't run, a four-part report: command attempted, why it couldn't run, risk left unverified, and the smallest next command for the user. States the principle that a check that didn't run is never reported as a pass — "unable to verify" is a required honest outcome, not silence. Placed after Red Flags. + +*** 2026-05-22 Fri @ 15:10:35 -0500 Added property-based + mutation testing escalation to testing.md + +Added an "Escalation Beyond Category and Pairwise" section: property-based testing for invariants over a broad input domain (round-trips, idempotence, ordering — Hypothesis/fast-check/proptest) and mutation testing for when high line coverage hides thin assertions (mutmut/cosmic-ray/Stryker). Both framed as escalation paths to reach for on a gap, not gates on every unit. + +*** 2026-05-22 Fri @ 15:10:35 -0500 Added a disciplined spike protocol to testing.md + +Formalized the existing "I need to spike first" excuse-table row into a "Spike Exception (Disciplined)" subsection under TDD Discipline: TDD stays the default, but a spike is sanctioned when all three hold — timeboxed, spike code not committed, and the first failing test written before productionizing the discovered approach. Built on the existing row rather than contradicting it. + +*** 2026-05-22 Fri @ 15:10:35 -0500 Added pre-dispatch availability + cost checks to subagents.md + +Added a "Pre-Dispatch Checks" section with two gates: Availability (no Agent capability → do the work in the main thread under the same scope/constraints/output discipline the contract would enforce) and Cost (when writing the full contract costs more than the task, do it inline). Cross-references the existing "Don't Subagent At All" section and "Subagenting trivial work" anti-pattern rather than duplicating. + +*** 2026-05-22 Fri @ 15:06:04 -0500 Revised python-testing SQLite guidance toward production-like DBs + +Replaced "prefer in-memory SQLite for speed" with: run ORM/query tests against a production-like DB (same engine as prod, often containerized), since SQLite diverges from Postgres/MySQL on query semantics, constraints, transactions, JSON, time zones, and indexes (a test can pass on SQLite and fail in prod). SQLite stays only for pure unit tests with no DB-semantics dependency. + +*** 2026-05-22 Fri @ 15:06:04 -0500 Clarified python-testing ORM-mocking boundary + +Changed the "never mock" bullet from "ORM queries" to "ORM internals (querysets, sessions, model internals)" and added a paragraph: domain services use real model methods/validation, but a thin orchestration unit can inject a fake at a deliberate data-access port (a repository/interface the code owns). That's still mocking at a boundary, not at ORM internals. + +*** 2026-05-22 Fri @ 15:06:04 -0500 Made elisp.md editing advice tool-agnostic + +Rephrased the "prefer Write over repeated Edits" bullet around intent: land nontrivial Elisp as one cohesive change rather than dribbling it in over tiny partial edits (which accumulate paren mismatches), and run paren-balance + byte-compile checks immediately after, whatever editing mechanism the environment uses. + +*** 2026-05-22 Fri @ 15:06:04 -0500 Added batch-mode + native-comp caveats to elisp-testing.md + +Added three sections: Batch-Mode Reproducibility (=emacs --batch= as source of truth, no interactive-session state, no blocking prompts, deterministic), Isolating Emacs State (temp =user-emacs-directory=, explicit load-path, declared deps only, with an unwind-protect sandbox example), and Byte-Compile/Native-Comp Warnings (=byte-compile-error-on-warn=, native-comp gated on =native-comp-available-p= and kept opt-in/version-aware). + +*** 2026-05-22 Fri @ 15:16:22 -0500 Synced hooks/README install snippets with the destructive hook (opt-in) + +Brought the README's manual-install and settings-JSON snippets in line with the canonical =hooks/settings-snippet.json= (which already wires all three) and the Makefile's opt-in design: added the destructive-bash-confirm.py symlink as an opt-in step, added its settings entry, and reworded the note to say all three are no-op-safe but the destructive gate is opt-in (=make install-hooks= excludes it by default — link manually before relying on the snippet entry). + +*** 2026-05-22 Fri @ 15:35:06 -0500 Hooks now scan file-backed commit/PR messages + +Added =read_referenced_file()= to =_common.py= (safe local read: missing/oversize/non-UTF-8 → None) and wired it in: =git-commit-confirm.py= =extract_commit_message= now handles =-F=/=--file=/=--file===<path>= (reads + scans the file, falls through to UNPARSEABLE → asks if unreadable), and =gh-pr-create-confirm.py= reads =--body-file= content instead of a placeholder. Attribution scanning now sees the real committed/posted text. Built a pytest harness (=hooks/tests/=, importlib-by-path loader for the hyphen-named hooks) and wired =hooks/tests= into =make test=. 54 hook tests pass; full suite green. + +*** 2026-05-22 Fri @ 15:35:06 -0500 Rewrote destructive-bash rm parsing on shlex + +=detect_rm_rf= now tokenizes with =shlex.split= instead of a whitespace split, so quoted/spaced paths and combined/separate/reordered flags (=-rf=, =-r -f=, =-fr=, =--recursive=/=--force=) all parse. Fails toward asking — returns a sentinel that still fires the modal — on unbalanced quotes or when a forced recursive rm coexists with a compound/pipeline/substitution/redirect construct. Documented the supported/unsupported shell constructs in the docstrings, and extended the dangerous-path banner to =$HOME=-prefixed and wildcard targets. Covered by 25 new tests. (Pre-existing, out-of-scope: path-prefixed =rm= like =/bin/rm= still isn't matched.) +** DONE [#B] Add =make remove= for interactive ruleset removal via fzf +CLOSED: [2026-05-22 Fri] +Shipped: =scripts/remove.sh= (three modes — =--list=, =--remove-selected= reading stdin, and the default fzf-multi interactive flow) + =make remove= target + =scripts/tests/remove.bats= (5 cases). Lists only symlinks resolving into the repo (foreign links left alone); rm's picked links while leaving repo sources untouched; reports-and-continues on a missing target; quiet no-op on empty selection. shellcheck clean, make test green. Dropped the stale =bridge= entry per the note below. + +Add a Makefile target that lists every currently-installed ruleset entry +and lets me pick one or more to remove via fzf. Granular alternative to +=make uninstall= (removes everything) and =make uninstall-hooks= (removes +only hooks). + +*** Why this matters + +Tearing down a single skill, rule, hook, or config file currently means +either running =make uninstall= and re-installing what I want to keep, +or =rm=ing the symlink directly and remembering the exact path. Both are +friction. An interactive picker lets me filter, multi-select with Tab, +and confirm with Enter — the typical fzf flow. Costs about 3-5 seconds +per teardown instead of 15+ seconds of "what's the exact name?". + +*** Design + +The recipe builds a tab-separated list of every currently-installed item, +categorized by type, and pipes it to =fzf --multi=. The user filters, +marks with Tab, and confirms with Enter. The recipe parses the selections +and =rm=s the matching symlinks. + +#+begin_example + skill debug + rule commits.md + hook destructive-bash-confirm.py + config settings.json + commands commands + bridge claude-rules +#+end_example + +Each line is =<kind>\t<name>=. The recipe maps =<kind>= to the right path: + +- =skill= → =$(SKILLS_DIR)/<name>= +- =rule= → =$(RULES_DIR)/<name>= +- =hook= → =$(HOOKS_DIR)/<name>= +- =config= → =$(CLAUDE_DIR)/<name>= +- =commands= → =$(CLAUDE_DIR)/commands= +- =bridge= → =$(SKILLS_DIR)/claude-rules= + +Source files in =rulesets/= stay untouched. =make install= re-creates the +removed links if needed (the install loop is idempotent). + +*** Edge cases + +- Esc instead of Enter → empty selection → clean exit, no removal. +- Filter to nothing then Enter → same as Esc. +- Selected item already gone → =rm= fails visibly, processing continues + on the rest. +- =fzf= not installed → fail fast with a clear error (matches the pattern + used by =install-lang=). + +*** Possible extensions + +- Parallel =make pick-install= target that lists not-yet-installed items + and installs the chosen ones. Symmetric UX, same fzf flow. +- Confirmation prompt when more than N items selected (defense against + accidental select-all). +- =--source= flag that also runs =git rm= against the rulesets source for + the selected item. Probably bad idea — too easy to lose work. +- The =bridge → $(SKILLS_DIR)/claude-rules= entry above is stale — the + bridge symlink got removed in a later commit. Drop that bullet when the + recipe lands. +** DONE [#B] Document the =mcp/= install pipeline in =mcp/README.org= +CLOSED: [2026-05-22 Fri] +Wrote =mcp/README.org= covering everything in the "what to cover" list: the file layout (tracked vs gitignored), the secrets-bundle shape (plain =${VAR}= secrets + base64-bundled OAuth artifacts, AES256 symmetric =gpg -c=), the install flow (decrypt → materialize keys/token caches at mode 600 → expand → register unregistered, idempotent), the http/sse-vs-stdio transport split, token rotation when a Google refresh token is revoked, and adding a new server. Grounded in a read of the actual =install.py= + =servers.json=. + +=mcp/= has =install.py=, =servers.json=, =secrets.env.gpg=, =gcp-oauth.keys.json= (gitignored, regenerated at install). No README. Coming back to this in three months I'll re-discover how the bundle is structured, what =install.py= does, and how to rotate tokens. Saving that re-discovery is the whole point. + +*** What to cover + +- Layout: what each file is, which are tracked vs gitignored. +- Secrets bundle shape: how vars are listed in =secrets.env=, the symmetric-encryption pattern (=gpg -c --cipher-algo AES256=), the base64-bundled OAuth artifacts (=GCP_OAUTH_KEYS_JSON_B64=, =GOOGLE_DOCS_PERSONAL_TOKEN_B64=, =GOOGLE_DOCS_WORK_TOKEN_B64=). +- Install flow: =make install-mcp= → =install.py= decrypts, writes the keys file and Google Docs token caches at mode 600, expands =${VAR}= in =servers.json=, calls =claude mcp add --scope user= for unregistered servers. Idempotent. +- Token rotation: when a refresh token gets revoked, the recovery flow (re-auth on one machine, re-bundle, recommit). +- Adding a new server: edit =servers.json=, add any new =${VAR}= placeholders to the bundle, re-encrypt. +- The OAuth dance for HTTP-transport servers (linear, notion) versus stdio (google-docs-*) — different paths, different gotchas. +** DONE [#C] Add =make uninstall-mcp= + =mcp/install.py --check= for symmetry :feature:solo:quick: +CLOSED: [2026-05-28 Thu] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-28 +:END: + +Currently the MCP install pipeline only flows one direction. No way to remove rulesets-managed MCP servers in one command. No way to ask "what's the drift between =servers.json= and =claude mcp list=" without eyeballing. + +*** =make uninstall-mcp= + +Iterate over =servers.json=, run =claude mcp remove <name> -s user= for each. Ignore "not registered" errors. Idempotent. + +*** =mcp/install.py --check= + +Dry-run mode. Decrypt secrets, but instead of registering, print the drift report: + +- Servers in =servers.json= not in =claude mcp list= → =MISSING= +- Servers in =claude mcp list= not in =servers.json= → =EXTRA= +- Servers in both → =ok= + +Useful for diagnosing connection failures and for the eventual =make doctor= integration. +** DONE [#C] Update =README.org= with MCP install pipeline section :chore:solo:quick: +CLOSED: [2026-05-28 Thu] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-28 +:END: + +=README.org= covers global install, per-project language bundles, and design principles, but doesn't mention =make install-mcp= or the =mcp/= directory. Add a short section after "Per-project language bundles" describing the user-scope MCP install pattern (decrypt → expand → register) and pointing at the eventual =mcp/README.org=. +** DONE [#C] Consolidate =claude-templates/Makefile= after fold :chore:quick:solo: +CLOSED: [2026-05-28 Thu] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-28 +:END: + +Sibling follow-up from the fold child (2026-05-15). After the subtree merge, =rulesets/claude-templates/Makefile= still has its standalone =install= / =uninstall= / =list= / =test-scripts= targets. The =install= target's =bin/ai= logic is now duplicated in =rulesets/Makefile=. Both work; the redundancy is harmless but worth cleaning up. + +Options: +- *Delete* =claude-templates/Makefile= entirely — forces all install through rulesets root. Cleaner. +- *Strip down* to just =test-scripts= — the one piece not redundant with =rulesets/Makefile=. +- *Leave it* — slight redundancy, no functional harm. + +Triggered by: 2026-05-15 fold session's refactor audit (commit =2d645fc=). +** DONE [#C] Run =--archive-done= sweep at start of =open-tasks.org= Phase A :chore:quick:solo: +CLOSED: [2026-05-28 Thu] +:PROPERTIES: +:CREATED: [2026-05-28 Thu] +:LAST_REVIEWED: 2026-05-28 +:END: + +From pearl handoff 2026-05-28. =open-tasks.org= Next Mode reads =* Project Open Work= and skips =* Project Resolved= correctly, but a level-2 task that completed during a session sits as =** DONE= under Open Work until something archives it. Between cleanups, a freshly-DONE task can surface as a "what's next" candidate. + +Proposed fix: as the first step of =open-tasks.org= Phase A, run =emacs --batch -q -l .ai/scripts/todo-cleanup.el --archive-done todo.org=, then read =todo.org=. The cleanup tool already exists; this is wiring it into the workflow. + +Cost: a few hundred ms at the start of every "what's next" invocation. Win: recommendations never include DONE work. + +Optional refinement: gate behind a check for read-only / dry-run mode if that's ever introduced. The default invocation archives. +** DONE [#C] Triage Codex enhancement backlog :spec: +CLOSED: [2026-05-28 Thu] +:PROPERTIES: +:CREATED: [2026-05-28 Thu] +:LAST_REVIEWED: 2026-05-28 +:END: + +Triaged interactively 2026-05-28. Disposition table for all 14 items lives at [[file:docs/design/2026-05-28-rulesets-enhancement-backlog.org][2026-05-28-rulesets-enhancement-backlog.org]] under "Triage Dispositions": 3 accepted (filed below as TODOs), 3 pilot/scope-limited (filed below), 2 marked as conventions rather than tracked tasks, 6 rejected with rationale. Items #1 and #2 already had homes (#16 and the Phase-1 codex TODO). +** DONE [#C] Canonical/mirror drift detection via pre-commit hook or =make sync-check= :feature:quick:solo: +CLOSED: [2026-05-28 Thu] +:PROPERTIES: +:CREATED: [2026-05-28 Thu] +:LAST_REVIEWED: 2026-05-28 +:END: + +From the codex enhancement backlog (item #7), reframed: don't dedupe the dual source — the canonical-in-=claude-templates/= + mirror-in-=.ai/= pattern is a feature (other projects rsync from the canonical; the mirror lets rulesets-as-a-project have a working copy). The real pain is sync-discipline overhead — every workflow edit needs both copies updated, and forgetting one leaves the next startup's rsync to surface the drift. + +Scope: write a small =scripts/sync-check.sh= (or fold into the existing Makefile) that diffs =claude-templates/.ai/workflows/= against =.ai/workflows/=, exits non-zero on drift. Wire as a pre-commit hook (=githooks/pre-commit= or equivalent) so the discipline is enforced before publish, not at the next startup. =make sync-check= as a manual entry point. + +Verification: introduce a deliberate diff, commit, hook should block. Restore parity, hook should pass. +** DONE [#C] Add =make status= — compose audit + doctor + open-task count :feature:quick:solo: +CLOSED: [2026-05-28 Thu] +:PROPERTIES: +:CREATED: [2026-05-28 Thu] +:LAST_REVIEWED: 2026-05-28 +:END: + +From the codex enhancement backlog (item #12), scope-limited: =make status= only. Reject the rest of #12 (=make sync= duplicates the existing sync flow; =make health= wraps existing checks without adding signal; =make bootstrap-project= duplicates =install-ai= + =install-lang=). + +Scope: one Makefile target that prints a compact summary of: + +- Install audit state (clean / drift, calling =make audit=). +- Machine-global doctor state (calling =make doctor=). +- Open-task count (top-level entries in =todo.org= under =* Rulesets Open Work=). +- Inbox count (files in =inbox/= excluding =.gitkeep= and =PROCESSED-= prefixes). +- Git working-tree status (clean / dirty, ahead/behind upstream). + +Output should be roughly 10 lines, scannable in one glance. Composes the existing checks; no new logic except the summary formatting. +** DONE [#C] Iteration-history backfill for spec-review and spec-response :docs:followup: +CLOSED: [2026-05-28 Thu] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-28 +:END: +Source: org-drill inbox 2026-05-28. + +Once the in-flight WIP lands (the requirement that specs carry a bottom =Review and iteration history= section, with iteration / date / contributor / role / what / why / artifacts), backfill the two workflow files themselves using rulesets' session history as evidence. + +Files to update: +- =claude-templates/.ai/workflows/spec-review.org= +- =claude-templates/.ai/workflows/spec-response.org= + +Investigation: search =.ai/sessions/=, =.ai/notes.org=, inbox archive, and git log for mentions of these workflow docs. Identify review/response/design iterations, dates, and contributors (including agents where known: Claude Code, Codex, local models). Distinguish high-confidence history (commits, dated session entries) from inferred (chat-only context). Recommend whether enough evidence exists to populate the section, and draft the entries if so. + +Dependency: spec-review.org and spec-response.org have uncommitted edits in flight. Wait for those to land before writing to the files. The read-only research portion (search sessions, identify iterations, draft entries to a scratch file) can run in parallel without conflict. +** DONE [#B] Startup Phase A rsync propagates dirty rulesets WIP into downstream projects :feature: +CLOSED: [2026-05-30 Sat] +:PROPERTIES: +:CREATED: [2026-05-29 Fri] +:LAST_REVIEWED: 2026-05-29 +:END: +Fixed via option 1 (skip-when-dirty), scoped to the synced source paths: startup.org Phase A now guards the protocols/workflows/scripts rsyncs behind a =git status --porcelain= check on =claude-templates/.ai/{protocols.org,workflows/,scripts/}=, skipping the sync when any are dirty. The propagation anomaly (cross-project-broadcast.org / page-signal.org not reaching jr-estate) was a timeline artifact: both files were added in 664bf01 on 2026-05-29, after jr-estate's Phase A rsync had already run — correct behavior, not a bug. + +From jr-estate handoff 2026-05-29. When rulesets has uncommitted WIP at the moment a downstream project starts a session, Phase A.0 reports "dirty, skipping pull" and proceeds. Phase A's =rsync -a --delete= then runs against the dirty rulesets working tree and copies the WIP state into the downstream project's =.ai/workflows/= and =.ai/scripts/=. The downstream project's =git status= then shows drift the user did not author. Two bad recovery paths: commit the drift as "chore: sync .ai tooling from templates" (creates fake commit history about template state) or leave it dirty (noisy wrap-ups, pressure to commit anyway). + +Three options proposed in the handoff: +1. *Skip-when-dirty.* Make Phase A's workflows/ and scripts/ rsync no-op when Phase A.0 reports rulesets dirty. Simplest defense. +2. *Clean-files-only.* Restrict the rsync to files git considers unmodified in rulesets. Untracked files in rulesets do not propagate. Most precise. +3. *Clean-ref-based.* Cache the last-known-clean state as a git tag or ref and rsync from that ref rather than the working tree. Most decoupled, also the most infrastructure. + +Recommendation (mine): option 1. The downstream impact of skipping a sync once is small (the next session with rulesets clean catches up), and the implementation is one =if [ "$dirty" -eq 0 ]= guard around the existing rsync block. Option 2 adds shellout complexity per file; option 3 requires tagging discipline that has no other reason to exist. + +The original handoff also noted a related anomaly: even with =--delete=, two files that DO exist in rulesets canonical (=cross-project-broadcast.org=, =page-signal.org=) did NOT propagate to jr-estate. Worth confirming whether that was a transient rsync issue or evidence of a deeper Phase A bug. Could be ordering: those files were added to rulesets AFTER the jr-estate Phase A rsync ran, in which case the behavior is correct and the report is misreading the timeline. + +Source: =inbox/2026-05-29-0832-from-jr-estate-investigate-startup-rsync-carried-dirty.org= (processed and deleted). +** DONE [#B] Codex Phase 1 — AI_AGENT_ID + session-context.d/<id>.org :feature: +CLOSED: [2026-05-30 Sat] +:PROPERTIES: +:CREATED: [2026-05-28 Thu] +:LAST_REVIEWED: 2026-05-28 +:END: +Shipped backward-compatibly. New =.ai/scripts/session-context-path= helper resolves the active path from =AI_AGENT_ID=: unset → the legacy =.ai/session-context.org= singleton (one-agent default unchanged, per the spec's compatibility rule), set → =.ai/session-context.d/<sanitized-id>.org=. startup.org's existence check and wrap-it-up.org's rename now resolve through the helper (with a singleton fallback for older checkouts); wrap folds the agent id into the archive name. protocols.org documents the rule. Verified: 5 bats cases + a two-agent simulation showing distinct paths per id. Larger runtime-neutral arc (runtimes/ manifests, launcher refactor) stays parked under the parent spec. + +Lifted from the broader codex runtime spec ([[file:docs/design/2026-05-28-generic-agent-runtime-spec.org]]) as the immediate-correctness slice independent of the larger arc. The singleton =.ai/session-context.org= is unsafe under simultaneous agents — two LLMs running in the same project at the same time would overwrite each other's session state. + +Scope: introduce an =AI_AGENT_ID= environment variable and split the single =session-context.org= into a per-agent =session-context.d/<id>.org= directory. No other phases of the runtime refactor are in this task — keep the surface small, fix the race, ship. + +Touches: =.ai/protocols.org= (rename rule + recovery anchor), =.ai/workflows/startup.org= (Phase A check), wrap-up workflow (rename target), per-project session record discoverability. + +Verification: simulate two agents sharing a project (separate AI_AGENT_ID values) and confirm session-context writes land in distinct files without interleaving. + +Parent: see [[Generic agent runtime support — Codex spec v0]] above for the larger arc this is sliced from. +** DONE [#C] Decide on category-3 rule copies in the deepsat tree :chore:quick:solo: +CLOSED: [2026-05-31 Sun] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-28 +:END: +Diffed 2026-05-31. Both copies (coding-rulesets vendored + orchestration_dashboard_mvp) are byte-identical to each other and stale against canonical: =testing.md= 221 lines behind with 5 lines unique to the copies (older wording or a small team tweak), =verification.md= 40 behind with nothing unique. Same older vendored version in both spots. Left untouched per the A1 decision — team-owned, and canonicalizing would create a cross-repo dependency on the private rulesets (the orchestration_dashboard_mvp pair is team-visible from Vrezh's PR thread). No files modified. + +While symlinking personal-project =.claude/rules/= mirrors to the rulesets canonical on 2026-05-07, two locations didn't fit the "personal mirror → symlink" pattern and were left untouched pending judgment: + +- =~/projects/work/deepsat/code/coding-rulesets/claude-rules/{testing,verification}.md= — looks like a vendored team-shared copy. +- =~/projects/work/deepsat/code/orchestration_dashboard_mvp/.claude/rules/{testing,verification}.md= — could be project-specific overrides. + +For each: read the file, diff against the rulesets canonical, decide whether it's an intentional diverge (leave alone), stale (sync content), or should canonicalize (replace with symlink and accept the cross-repo dependency). The orchestration_dashboard_mvp pair is the project where Vrezh's PR review surfaced this whole thread, so any decision there has team-visibility implications. + +Decision (Craig, 2026-05-31): *leave team-tree copies alone.* Personal rulesets does not reach into team repos — canonicalizing would create a cross-repo dependency on the private rulesets, and the orchestration_dashboard_mvp copy is team-visible. This makes the task solo: diff each copy against canonical, record whether it's identical / drifted / overridden in the disposition, and close as "left alone (team-owned)" without modifying the team-tree files. +** DONE [#C] Audit language-specific rule files for cross-project duplication :chore:solo: +CLOSED: [2026-05-31 Sun] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-28 +:END: +Audited 2026-05-31. Findings: in sync with canonical (=languages/<lang>/claude/rules/=) — work =python-testing.md=, deepsat =typescript-testing.md=, =.emacs.d= =elisp-testing.md= + =elisp.md=. Drifted — =gloss= and =chime= (byte-identical to each other): =elisp-testing.md= 44 lines behind (canonical added Batch-Mode Reproducibility + Isolating Emacs State; zero lines unique to the copies), =elisp.md= one line behind (canonical expanded the edit-cohesively guidance). No project-specific additions anywhere — every copy is either current or purely stale. + +Disposition: *leave them project-local* (the task's own option). The language-rule copies in code projects are the bundle's deliberate copy-and-sync model, not the symlink pattern the generic rules (commits/testing/verification/subagents) use in personal doc-projects. =sync-language-bundle.sh= auto-fixes drifted bundle rules on each startup, so gloss/chime self-heal the moment those projects next boot — no canonicalize/symlink needed, and symlinking would fight the bundle model. Did not reach into work/deepsat/gloss/chime/.emacs.d from here (cross-project boundary; team copies left alone per the 2026-05-31 category-3 decision). + +The four canonical rules (=commits=, =testing=, =verification=, =subagents=) are now symlinked across the five personal-project mirrors as of 2026-05-07. But several language-specific rule files exist in multiple project mirrors and may be duplicated or drifted: + +- =python-testing.md= in =~/projects/work/.claude/rules/= +- =typescript-testing.md= in =~/projects/work/deepsat/code/.claude/rules/= +- =elisp-testing.md= and =elisp.md= in =~/.emacs.d/=, =~/code/gloss/=, =~/code/chime/= + +The Elisp pair is the most suspicious — three repos using essentially the same rules. Audit: diff these across the projects, check for drift, then decide whether to canonicalize them under =~/code/rulesets/claude-rules/languages/<lang>/= and symlink, or leave them as project-local. +** DONE [#C] Refactor =daily-prep.org= to delegate to =triage-intake.org= for the triage section :chore:solo: +CLOSED: [2026-05-31 Sun] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-28 +:END: +Collapsed Phase 3's inline source scans (sub-steps 3b email / 3c mark-read / 3d Slack / 3e Linear / 3f PRs / 3g dedup, ~280 lines) into four: 3b runs the triage-intake engine, 3c surfaces today's reactive items as Day's Priorities thin links, 3d re-sorts by urgency, 3e writes the audit footer from the engine's coverage. Source coverage carries via the engine's Phase 0 two-dir glob (general + .ai/project-workflows/ plugins), so the work account's Gmail/Slack/Linear/GHE plugins still get scanned. Adapted the downstream refs (Prep Doc Structure rule, Heads-up FYI source, Recommended Approach Pattern reframed as engine-applied), removed the orphaned Linear-digest note, added a Living Document entry. Verified: workflow-integrity clean (no dangling script refs), sync-check clean, full suite green. daily-prep.org went 825 → 576 lines. + +=daily-prep.org= still does its own inline triage (Gmail × 3 accounts, Slack, Linear, GHE PRs, calendars) as part of the full prep flow. =triage-intake.org= is now a source-agnostic engine that loads =triage-intake.<source>.org= plugins (refactored 2026-05-26), so daily-prep could call the engine and consume its synthesis instead of duplicating the source-scan logic. That DRYs up a large workflow and keeps both flows in sync when sources change — a source change now lives in one plugin that both flows pick up. + +Scope: +- Identify the sections in =daily-prep.org= that do the inline triage (the email / Slack / Linear / PR / calendar fan-out, plus the "Sources checked: ..." footer at the top of each generated prep doc). +- Replace those sections with "run the =triage-intake.org= engine" and adapt the downstream sections (Heads-up, Day's Priorities, Carry-forwards) to read the engine's synthesis output rather than the inline scan results. +- Verify the generated prep doc still has the same shape (Heads-up + Day's Priorities + Carry-forwards + Sources checked). +- Reconcile source coverage: daily-prep's inline triage scans work accounts (3 Gmail, Slack, Linear, GHE PRs) that are project-specific plugins under =.ai/project-workflows/=, not general plugins. The delegation must ensure the engine loads those project plugins (Phase 0 globs both dirs) so nothing daily-prep currently scans drops out. + +Origin: came up while authoring =triage-intake.org= on 2026-05-11; body refreshed after the engine/plugin refactor on 2026-05-26. +** DONE [#C] Templatize =make coverage-summary= into the language bundles (Elisp pilot) :feature:solo: +CLOSED: [2026-05-31 Sun] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-28 +:END: + +Done 2026-05-31 (Elisp pilot, the scoped milestone): ported the kernel into the elisp bundle as a self-contained =languages/elisp/claude/scripts/coverage-summary.el= (no coverage-core dependency), proven end-to-end against the real dotemacs SimpleCov report (93 tracked, 27 untested modules surfaced, project number 66.4%). The missing-file-as-0% + unit-weighted number is the kernel. Delivery: the script ships under =.claude/scripts/= (gitignored, auto-fixed on drift by =sync-language-bundle.sh=); =languages/elisp/coverage-makefile.txt= holds the project-owned Makefile fragment, seeded at project root by =install-lang.sh= and dropped into =.ai/inbox/= by sync when that convention exists. Tests: 12 ERT (=languages/elisp/tests/=, wired into =make test=), 5 new sync bats, 2 new install-lang bats. The fan-out to Python/Go/TS is the follow-up below. + +Borrow dotemacs's =make coverage-summary= into the language bundles. After =make coverage= writes a coverage file, =coverage-summary= prints per-unit covered/total with percentages, a unit-weighted project number, and a list of source files present on disk but missing from the coverage report. + +*The kernel — the only part worth building.* Weight the project number by file/module rather than by line, and count a source file absent from the report as 0% instead of omitting it. A module no test imports just doesn't appear in coverage.py or nyc output, so it silently fails to drag the number down. That missing-file detection is the value; everything else (per-file table, total) the built-in reporters already print, so don't reimplement those. + +*Scope Elisp-first.* Port the proven dotemacs version into the elisp bundle, prove the pattern end-to-end, then fan out. Don't open all four bundles at once. + +*Delivery (settled 2026-05-25).* Two rulesets-owned pieces per language: +- The summary *script* ships in the bundle under =.claude/= (inside the now-gitignored tooling footprint), copied in on install and auto-fixed on drift by =sync-language-bundle.sh=, never committed by the project. +- One *text file per language* holding the Makefile fragment (the =coverage-summary= target plus its =coverage= prerequisite) and a block recommending how to set up coverage for that language. The bundle never edits the project's own Makefile. + - *New project:* install copies that file in for the project to own. + - *Existing project:* sync drops the fragment into the project's =inbox/= rather than touching its Makefile — the project adopts it deliberately. + +*Prerequisite caveat.* The summary presumes a coverage harness exists (undercover, coverage.py, nyc, =go cover=). Several bundles may have no =make coverage= yet, so for those this task implies adding the harness first — or the per-language file documents it as a prereq. + +Per-language parser (the script is ~40 lines over each tool's output): +- Elisp: undercover SimpleCov JSON (=.coverage/simplecov.json=) — dotemacs/auto-dim scripts already parse this. +- Go: =go test -coverprofile=cover.out=; parse =cover.out= (simple text), or lean on =go tool cover -func=. +- Python: =coverage json= per-file JSON, or lean on =coverage report=. +- TypeScript/JS: nyc/Istanbul =coverage-final.json= / json-summary. + +Reference (dotemacs): =scripts/coverage-summary.el=, =modules/coverage-core.el=, and the =coverage= / =coverage-summary= Makefile targets. + +Origin: handoff from the .emacs.d session, 2026-05-25. +** DONE [#C] Fan out coverage-summary across all language bundles :feature: +CLOSED: [2026-05-31 Sun] +:PROPERTIES: +:CREATED: [2026-05-31 Sun] +:END: + +Done 2026-05-31: coverage-summary now ships in all four bundles. Elisp pilot, then Python, Go, and TypeScript. Each parses its tool's report (SimpleCov / coverage.py JSON / Go cover.out / Istanbul json-summary), counts on-disk source files absent from the report as 0%, and file-weights the project number. The plumbing proved generic: =install-lang.sh= seeds the project-owned =coverage-makefile.txt= and ships the script into the gitignored =.claude/scripts/=; =make test= discovers ERT (=test-*.el=), pytest (=test_*.py=), =go test= (=*_test.go=), and =node --test= (=*.test.js=) under =languages/*/tests/=, each guarded on its toolchain. TypeScript and Go scripts were dogfooded (Go against a live profile, TS against the CLI); Python and TS weren't run against a live coverage tool (coverage.py / nyc not installed) — proven against faithful fixtures matching each tool's stable schema. + +Remaining follow-ups (not blockers): +- Go is a coverage-only slice — =languages/go/= has no rule file, so =sync-language-bundle.sh= can't fingerprint it and won't sync-maintain the script. Build out the real Go bundle (=go.md= / =go-testing.md= + =CLAUDE.md=) to close that. +- First real adopters of the Python and TS scripts should sanity-check against a live =coverage json= / nyc =coverage-summary.json= run. + +Original notes retained below for the next person. + +The Elisp pilot proved the pattern; Python and Go followed. The plumbing is generic: =install-lang.sh= seeds the fragment, and =make test= now discovers ERT (=test-*.el=), pytest (=test_*.py=), and =go test= (=*_test.go=) under =languages/*/tests/=. TypeScript is the last one. + +- TypeScript/JS: nyc/Istanbul =coverage-final.json= / =coverage-summary.json=. Same kernel: file-weighted project number, on-disk =*.ts=/=*.js= absent from the report counted as 0%. nyc prints its own table, so the script focuses on the missing-file list and the number. Needs a vitest/jest (or =node --test=) discovery path in =make test=, mirroring the go-test block. + +Notes for the next person, from the Python + Go runs: +- Python: parses coverage.py's =files[path].summary.{covered_lines,num_statements}= (stable since coverage 5.x), resolves report paths against the report's parent dir. Proven against a synthetic report, not a live =coverage json= run (coverage.py wasn't installed). Sanity-check against a real one. +- Go: =languages/go/= is a coverage-only slice with no rule file, so =sync-language-bundle.sh= can't fingerprint it (detection keys on a bundle's own =.claude/rules/*.md=). The script is delivered by =make install-lang LANG=go= but is not sync-maintained until the Go bundle gets a real rule file + =CLAUDE.md=. Building out that bundle is the natural companion task. Also: modern =go test ./...= already lists every module package in the profile at 0%, so the missing-file list is usually empty for in-module code; it earns its keep on build-tagged files and dirs outside =./...=. +** DONE [#C] Enumerate implementation tasks in =spec-review.org= Phase 6 :feature:solo: +CLOSED: [2026-05-31 Sun] +:PROPERTIES: +:CREATED: [2026-05-28 Thu] +:LAST_REVIEWED: 2026-05-28 +:END: +Added a Phase 6 step that lifts the spec's =Implementation phases= into a drop-in =todo.org= block (one =[#B]= per phase + a test-surface entry mirroring =Acceptance criteria=); a spec lacking phase decomposition raises that as a finding instead. Added Exit Criterion 6 and a review-history entry. Pure workflow-doc change. + +From pearl handoff 2026-05-28. =spec-review.org= Phase 6 currently says "log deferred work to =todo.org=: v1 implementation = [#B] ... vNext/someday = [#D]." That covers deferred and v1 in passing but doesn't lift the spec's =Implementation phases= section into a drop-in =todo.org= block. + +Proposed addition to Phase 6: a structured step that reads the spec's =Implementation phases= section and produces a =[#B] TODO= entry per phase (subject line, tags, one-line body, pointer back to spec), plus a final entry for the test surface (unit / integration / e2e / manual-verify mirroring the spec's =Acceptance criteria= when present). Emit under a new section "Implementation tasks (drop-in for todo.org)" in the review file. Format follows =todo-format.md= (terse heading, body holds context, tags on heading). + +Three wins: handoff is one paste not a re-read; forces specs to be implementable in pieces (a spec without a phase decomposition fails this step, surfacing the shape problem); closes the loop on =Acceptance criteria= as manual-verify entries. + +If the spec lacks an =Implementation phases= section, the step is the prompt to ask the author to add one before =Ready=. +** DONE [#C] Add =.aiignore= for agent inventory exclusions :chore:solo: +CLOSED: [2026-05-31 Sun] +:PROPERTIES: +:CREATED: [2026-05-28 Thu] +:LAST_REVIEWED: 2026-05-28 +:END: +Shipped a gitignore-syntax =.aiignore= at the rulesets root (deps, build output, language caches, editor cruft, token artifacts, lockfiles-as-agent-read-skip) and documented the convention + defaults + lockfile policy in protocols.org ("Recursive Reads"). Per Craig's scope call (2026-05-31): did NOT wire audit.sh / diff-lang.sh / sync-language-bundle.sh — they do targeted finds over .ai/.claude/bundle dirs, never naive whole-tree walks, so honoring .aiignore there would be dead code. Script-side honoring belongs in a future catalog/inventory tool if one ships; the real consumer today is agent recursive reads (the protocols guidance). + +From the codex enhancement backlog (item #8). Filesystem scans by agents and helper scripts pick up =node_modules=, =__pycache__=, =.pytest_cache=, lockfiles, generated OAuth artifacts, and test caches, even when those are gitignored. Token waste during exploration and skewed project summaries. + +Scope: add a shared =.aiignore= file (or =rulesets-ignore.json= if a more structured format helps) listing default exclusions. Teach the scripts that walk the project (=audit.sh=, =diff-lang.sh=, =sync-language-bundle.sh=, future =catalog= work if any) to honor it. Document in =protocols.org= so agents know to consult it before naive recursive reads. + +Keep the lockfile policy explicit: ignored when a local skill dependency cache, tracked when reproducibility matters. +** DONE [#C] Workflow test harness — drift + integrity tests :feature:solo: +CLOSED: [2026-05-31 Sun] +:PROPERTIES: +:CREATED: [2026-05-28 Thu] +:LAST_REVIEWED: 2026-05-28 +:END: + +From the codex enhancement backlog (item #10). Startup's drift check catches index-vs-directory mismatches but not deeper integrity: a workflow that references a script that's been renamed, a plugin whose parent engine has been deleted, a required section missing from a newly-added workflow. + +Scope: add =scripts/tests/workflow-integrity.bats= (or pytest equivalent) verifying: + +- Every =.org= file in =.ai/workflows/= is either indexed in =INDEX.org= or classifiable as a source plugin under an indexed engine. +- Every indexed workflow file actually exists. +- Every =file:= or shell-command reference inside a workflow to a script under =.ai/scripts/= or =scripts/= resolves to an existing file. +- Every source plugin maps to a parent workflow that exists and is indexed. +- Required sections (Overview, When to Use, the workflow's main phases) are present in each workflow. +- Workflow trigger phrases are unique enough to route — no two workflows claim the same exact trigger. + +Wire into =make test=. Run on the canonical =claude-templates/.ai/workflows/= as the source of truth. +** DONE [#C] Token-tier pilot on largest workflows :feature:solo: +CLOSED: [2026-05-31 Sun] +:PROPERTIES: +:CREATED: [2026-05-28 Thu] +:LAST_REVIEWED: 2026-05-28 +:END: + +Done 2026-05-31: restructured both =startup.org= and =triage-intake.org= into the four-lane structure (Summary / Execution / Reference / History), preserving every existing instruction. triage-intake's reorder ran through a content-preservation guard (the multiset of content lines is unchanged; only heading depth and lane grouping moved). workflow-integrity, sync-check, and the full test suite pass. + +From the codex enhancement backlog (item #5), scope-limited to a pilot rather than a universal template change. + +Apply a standardized section structure to the largest workflow files first — =startup.org= and =triage-intake.org= are the prime candidates. Sections: + +- *Summary* / *Quick Contract* — one-screen purpose and outputs. +- *Execution* — the steps an agent must follow. +- *Reference* — examples, edge cases, rationale, old decisions. +- *History* / *Design Notes* — durable context not needed every run. + +Decision (Craig, 2026-05-31): *approved the four-lane structure (Summary/Execution/Reference/History) and the scope — restructure both =startup.org= and =triage-intake.org= now.* Makes the task solo: apply the lanes to both, preserving every existing instruction (reorganize, don't rewrite), verify the workflows still read coherently and the drift/integrity checks pass. + +Teach startup/routing to read =Summary= only at routing time, then =Execution= only for the selected workflow. Other sections become opt-in. + +After the pilot, evaluate: did the savings show up in real session token use? Did the structure constrain the workflow expressiveness too much? If yes to savings and no to constraint, expand to the next-largest workflows. If not, document why and stop. Don't templatize universally — shorter workflows don't need tiering. +** DONE [#B] Add Signal MCP server (rymurr/signal-mcp) :feature: +CLOSED: [2026-06-02 Tue] +:PROPERTIES: +:CREATED: [2026-05-29 Fri] +:LAST_REVIEWED: 2026-05-29 +:END: +Done 2026-06-02. Registered signal-cli to the Google Voice pager account, added the signal-mcp entry to servers.json, installed via make install-mcp (claude mcp list shows it connected), and documented the signal-cli + GV dependency in mcp/README.org. The GV-registration dependency this task flagged is resolved. Shipped in cfaff12 (page-signal routing) and this commit (README). + +Install [[https://github.com/rymurr/signal-mcp][rymurr/signal-mcp]] so Claude can call =send_message_to_user=, =send_message_to_group=, and =receive_message= natively rather than shelling out to the =page-signal= wrapper. Python, MCP framework, depends on =signal-cli= being configured locally. + +Two-way capability is the differentiator over the CLI: =receive_message= lets the agent listen for replies on the phone, enabling page-as-confirm flows, "should I proceed?" loops over Signal, and structured Q&A across devices. + +*** Dependency + +This depends on the Google Voice account being registered with =signal-cli= first. Sending from Craig's primary number to itself doesn't notify (Signal treats it as one account on linked devices). The MCP server takes =--user-id= at startup, one account per instance, so it has to point at the GV account, with the primary as the per-send recipient. + +If GV registration is still pending when this task runs, block here and surface that. + +*** Implementation + +- =mcp/servers.json= — add =signal-mcp= entry under stdio transport (=command=, =args=, optional =env= for the user-id pointer). +- =mcp/README.org= — document the signal-cli + GV-registration dependency and the user-id pattern. +- =mcp/secrets.env.gpg= — only if the MCP server's user-id needs to be encrypted (probably not; the GV number isn't a secret beyond being personal). +- Verify: =make install-mcp= followed by =make check-mcp= shows =signal-mcp ok=; smoke-test via a Claude tool call sending a message + waiting on =receive_message=. + +*** Why this matters + +=page-signal= is the fast path (a hook, a script, a make recipe can call it without an MCP round-trip). The MCP server is the smart path. When Claude wants to send and then *react to the reply*, the CLI can't do that — only the MCP server can. The two complement each other; this task adds the second half. +** DONE [#C] task-review pass at end of task-audit :chore:solo: +CLOSED: [2026-06-02 Tue] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-02 +:END: +Have the =task-audit= workflow chain a =task-review= pass as its final phase, so a freshly-audited list also gets the lighter staleness/honesty sweep without a second invocation. The legend already notes the division of labor — task-audit assigns and refreshes tags, task-review keeps them honest in passing — so running task-review at the tail of task-audit closes the loop in one pass. Edit =claude-templates/.ai/workflows/task-audit.org= (and the synced mirror) to add the final phase; check whether =open-tasks.org= already invokes task-review so the chaining stays consistent. +** DONE [#C] lint-followups drift — reconcile-on-write + audit dead-link reaping :feature:solo: +CLOSED: [2026-06-02 Tue] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-02 +:END: +From an .emacs.d handoff (2026-06-02): running task-audit against a large todo.org proved several =.ai/lint-followups.org= entries stale (four dead-link flags pointed at docs that now exist; three near-duplicate dated lint runs had piled up). Two fixes, scoped separately. + +1. =lint-org= workflow/script (the real fix): reconcile-on-write. Before appending a run, drop entries whose finding no longer reproduces (dead link now resolves, flagged block/timestamp now clean) and dedupe against the prior run instead of re-logging. Key entries by content/finding rather than line number, so they survive edits to the target file (line numbers go stale immediately). +2. =task-audit.org= (small, narrow): in the Phase C link-hygiene step, when fixing/verifying a =file:= link, also reap any matching dead-link entry in the project's lint-followups file so the two artifacts don't drift. Scope explicitly to dead-link entries — do NOT pull general lint cleanup into the audit; that mixes two concerns and slows the audit. +** DONE [#C] start-work Justify gate: explicit "reasons not to do this" item :feature:quick:solo: +CLOSED: [2026-06-02 Tue] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-02 +:END: +From a work handoff (2026-06-02, surfaced running /start-work on a clean low-risk refactor). The Phase 2 Justify gate has "Downsides" and "Alternatives considered" but no forced devil's-advocate verdict on "should we even do this?" Add a "top reasons not to do this" item: surface the top three objections if any exist; when none rise to a real objection, state one line instead of manufacturing three (e.g. "Nothing material argues against this; no reason to defer or drop it"). Building the case against the work before committing is cheapest exactly at this gate, which is its purpose. Edit the start-work skill's Justify-gate phase. +** DONE [#C] start-work Approach gate: spec-needed check :feature:quick:solo: +CLOSED: [2026-06-02 Tue] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-02 +:END: +From Craig (2026-06-02). The Approach phase should consider whether the work needs a spec when one doesn't already exist. For a big task, this isn't a silent skip — the pre-confirmation summary must explicitly report why a spec isn't needed, so the decision is visible and challengeable at the gate rather than assumed. Small tasks can pass without comment. Edit the start-work skill's Approach-gate phase to add the spec-needed consideration and the big-task report-why-not requirement. +** DONE [#B] Cross-project pattern catalog :spec:thinking: +CLOSED: [2026-06-05 Fri] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-02 +:END: + +From pearl handoffs [[file:docs/design/2026-05-27-pattern-catalog-pearl-notes.org][2026-05-27]] + [[file:docs/design/2026-05-28-pattern-catalog-no-empty-input.org][2026-05-28 follow-up]]. + +Meta-question: how do good patterns travel from project A to project B? Pearl shipped three worked examples worth capturing — one-prompt picker with typed prefix (pearl-pick-source), magit-transient state buttons, and "no empty input as meaningful" (none-sentinel as first candidate). Each is a small principle with wide surface area; without a catalog, every project re-derives them from scratch. + +Open design questions before any implementation: +- Catalog format — structured (one pattern per file with frontmatter) vs free-form doc +- Surfacing mechanism — agent-driven (model spots opportunity) vs human-driven (Craig grep-searches) +- Anti-patterns included or only what worked +- Intake cadence — every time one lands, or batch review +- Home — rulesets repo (agent visibility) vs Linear doc vs per-project cross-links + +Pearl recommends a one-page spec (problem + design + open questions + acceptance) before implementation. Pearl available to come back for spec-review iterations. + +*** 2026-05-28 Thu @ 08:12:55 -0500 Pearl shipped patterns 4-6, filed alongside the prior two +Three more pearl handoffs landed and were filed during this audit. Filed: [[file:docs/design/2026-05-28-pattern-catalog-prompt-labels-and-defaults.org][prompt-labels-and-defaults]] (patterns 4-5: label-matches-behavior, default-most-common with friction-proportional-to-consequence) and [[file:docs/design/2026-05-28-pattern-catalog-prompt-collapse.org][prompt-collapse]] (pattern 6: collapse N orthogonal prompts into one enriched prompt). The catalog's evidence base is now four pearl notes in =docs/design/= covering six patterns plus the synthesizing principle Pearl articulated — "choices on screen, accurately labeled, ordered by what the user most often wants, friction sized to the cost of being wrong." + +*** 2026-06-05 Fri @ 00:47:59 -0500 Spec approved as written — all 5 decisions + 3 open questions accepted +Craig approved the spec ([[file:docs/design/2026-06-02-pattern-catalog-spec.org][2026-06-02-pattern-catalog-spec.org]]) as written. Confirmed: one file per pattern with frontmatter; home =patterns/= in rulesets; thin =claude-rules/patterns.md= pointer, agent-driven; anti-patterns as a per-pattern field; capture-on-landing/promote-on-review intake. Open questions resolved to the spec's leans: directory name =patterns/=; concrete-now, generalize-on-second-use; manual promote flow first, no =/pattern= skill yet. Built as =.org= files with =#+KEYWORD= frontmatter (Craig's call over the initial =.md= draft); the =claude-rules/patterns.md= pointer stays =.md= since the rules layer and the Makefile glob require it. + +*** 2026-06-05 Fri @ 00:47:59 -0500 Built the catalog — 6 seed patterns + pointer + README +Created =patterns/= with the six seed patterns (one-prompt-picker-typed-prefix, transient-state-buttons, no-empty-input-as-meaningful, label-matches-behavior, default-most-common-friction-proportional, collapse-orthogonal-prompts), each carrying the frontmatter contract (name/principle/problem/tags/source/examples) plus Problem/Do/Anti-pattern/Applicability/Related sections. =patterns/README.org= states the root principle, the frontmatter contract, and the intake cadence. =claude-rules/patterns.md= is the agent-facing pointer, auto-installed via the Makefile RULES glob. Sourced from the four pearl notes in =docs/design/=. +** CANCELLED [#C] Try Skill Seekers on a real DeepSat docs-briefing need :chore: +CLOSED: [2026-06-10 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-28 +:END: + +=Skill Seekers= ([[https://github.com/yusufkaraaslan/Skill_Seekers]]) is a Python +CLI + MCP server that ingests 18 source types (docs sites, PDFs, GitHub +repos, YouTube videos, Confluence, Notion, OpenAPI specs, etc.) and +exports to 20+ AI targets including Claude skills. MIT licensed, 12.9k +stars, active as of 2026-04-12. + +*Evaluated: 2026-04-19 — not adopted for rulesets.* Generates +*reference-style* skills (encyclopedic dumps of scraped source material), +not *operational* skills (opinionated how-we-do-things content). Doesn't +fit the rulesets curation pattern. + +*Next-trigger experiment (this TODO):* the next time a DeepSat task needs +Claude briefed deeply on a specific library, API, or docs site — try: +#+begin_src bash +pip install skill-seekers +skill-seekers create <url> --target claude +#+end_src +Measure output quality vs hand-curated briefing. If usable, consider +installing as a persistent tool. If output is bloated / under-structured, +discard and stick with hand briefing. + +*Candidate first experiments (pick one from an actual need, don't invent):* +- A Django ORM reference skill scoped to the version DeepSat pins +- An OpenAPI-to-skill conversion for a partner-vendor API +- A React hooks reference skill for the frontend team's current patterns +- A specific AWS service's docs (e.g. GovCloud-flavored) + +*Patterns worth borrowing into rulesets even without adopting the tool:* +- Enhancement-via-agent pipeline (scrape raw → LLM pass → structured + SKILL.md). Applicable if we ever build internal-docs-to-skill tooling. +- Multi-target export abstraction (one knowledge extraction → many output + formats). Clean design for any future multi-AI-tool workflow. + +*Concerns to verify on actual use:* +- =LICENSE= has an unfilled =[Your Name/Username]= placeholder (MIT is + unambiguous, but sloppy for a 12k-star project) +- Default branch is =development=, not =main= — pin with care +- Heavy commercialization signals (website at skillseekersweb.com, + Trendshift promo, branded badges) — license might shift later; watch +- Companion =skill-seekers-configs= community repo has only 8 stars + despite main's 12.9k — ecosystem thinner than headline adoption +** DONE [#C] Promote meeting-prep to a template workflow :feature:solo: +CLOSED: [2026-06-10 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-10 +:END: +meeting-prep lives in the work project's =project-workflows/= and is general-purpose — it builds a per-meeting prep doc — but its body carries project-specific references: =deepsat/assets/= transcript paths, Linear as the tracker, =knowledge.org=. Promoting to =claude-templates= means generalizing those to project-neutral terms (the project's transcript home, the project's tracker), adding it plus its =meeting-prep.pre-wire.org= supporting doc to the =.ai/= mirror and INDEX.org, and a workflow-integrity pass. Once promoted, the daily-prep 5-Day Look-Ahead's conditional "where the project has one" reference can become a direct link. + +Out of the 2026-06-10 daily-prep handoff from the work project. +** DONE [#C] Build Craig's writing voice profile from real corpora :spec: +CLOSED: [2026-06-10 Wed] +:PROPERTIES: +:CREATED: [2026-05-29 Fri] +:LAST_REVIEWED: 2026-05-29 +:END: +Shipped across 2026-05-29 → 2026-06-10. =voice/references/voice-profile.org= is the canonical paired file: Phases 1-2 corpora measured (commit bodies 128k words + email/PR/review registers), all 45 patterns carry entries with basis and history, and every reconciliation delta landed in =voice/SKILL.md= (#13/#33 self-discipline reframing, #7 soft flag, new corpus-derived #43-#45). Extension corpora (Slack, long-form, syntactic fragment detection) deliberately not pursued. + +Build a grounded profile of Craig's actual writing voice by mining the corpora he's produced over time. The =voice/SKILL.md= patterns today are observation-derived (em-dash zero-tolerance, semicolon → period, contractions kept, sentence-fragment rewrite, felt-experience cut, etc.). Some are spot-on; others are intuition. A real corpus pass would tell us which patterns are genuinely Craig's voice and which were guesses, plus surface idioms, sentence structures, and vocabulary the current ruleset misses. + +*** Sources to mine + +- *Email* — sent folders across all three accounts (=gmail=, =dmail/DeepSat=, =cmail/Proton=). Filter to Craig-authored (not forwards or replies-just-quoting). Separate work voice (=dmail=) from personal voice (=gmail=, =cmail=) since they're likely distinct registers. +- *Commit messages* — =git log --author= across his repos. Captures terse-imperative voice. +- *PR descriptions and review comments* — same corpora. More deliberate prose than commits. +- *Org files he authored* — =notes.org=, todo bodies he typed, design docs in =docs/design/=, journal entries. Heavier on first-person voice than emails. +- *Slack/messages* — DeepSat work slack, family group, friends. Casual register. +- *Long-form artifacts* — résumé, proposals, white papers, blog posts (if any). + +Skip session-context files, which are Claude-co-written and would muddy the signal. + +*** Output + +- =voice/references/voice-profile.org= (or =.md=) — the canonical reference doc: + - Vocabulary tendencies (preferred verbs, avoided cliché classes, technical-vs-plain word choice). + - Sentence structures (typical length, conjunction patterns, parenthetical use). + - Punctuation patterns (em-dash actual frequency, semicolon vs period split, contraction rate). + - Register markers (signs of formal vs casual mode, work vs personal). + - Idioms and recurring phrasings. + - "Anti-patterns" — phrasings Craig consistently avoids that show up in AI-generated prose. +- Updated =voice/SKILL.md= patterns grounded in evidence rather than intuition. Patterns that the corpus confirms get strengthened; patterns the corpus contradicts get rewritten or removed. + +Each finding should cite at least two evidence samples from the corpora so the basis for a rule is reviewable. + +*** Approach + +Phase 1 (corpus assembly) — pull the relevant slices: sent-mail dumps, =git log --author --no-merges --pretty=format:'%B'=, =gh pr list --author= bodies, org-file extracts. Strip headers, replies-quoted blocks, signatures. Land in =voice/corpus/= (gitignored if the project's =.ai/= is gitignored, tracked if private repo with private remote). + +Phase 2 (analysis) — pass over the corpus with focused queries: distribution of em-dashes per 1000 words, semicolon count, contraction frequency by register, sentence-length histogram, top-N adjectives/adverbs, etc. Subagent dispatch fits here. + +Phase 3 (draft profile) — write =voice-profile.org= with findings + evidence. Surface contradictions with the current ruleset. + +Phase 4 (reconcile with voice/SKILL.md) — present the deltas to Craig. Each delta is one of: confirm existing rule with evidence, strengthen rule, weaken rule, add new pattern, remove unsupported pattern. Apply approved deltas. + +*** Privacy + +Email and Slack content is private. The corpus must NOT enter any commit unless rulesets stays on the private cjennings.net remote (which it does today). If a future move to a public remote is on the table, the corpus and any direct quotes have to go before that happens. The profile doc itself can stay (it's analysis, not raw content), but cite by pattern not by verbatim quote. + +*** Why this matters + +The voice skill earns its place when Craig sees the rewrite and recognizes it as his own voice rather than a "clean" AI voice that approximates him. Today the skill catches common AI tells (em-dashes, semicolons, the felt-experience tic), which is useful. Corpus-grounding would make it catch the absence of *Craig-specific positive traits* — the phrasings he actually reaches for — not just the AI traits he doesn't. + +Likely improves =/voice personal= output quality on PR bodies, commit messages, and email drafts. Compound interest over the long run. +** DONE [#C] Wide org-table handling — helper/lint/standard :spec: +CLOSED: [2026-06-11 Thu] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-11 +:END: +The org-table standard keeps project-doc tables <=120 cols with multi-line wrapped cells and a rule between rows, but nothing enforces it and hand-wrapping a wide cell into multi-row form is tedious and error-prone. Decide among: (a) a helper that auto-wraps a wide table into multi-row cells at a target width, (b) a lint check that flags tables over the width budget, (c) tighten the written standard with a worked before/after example. Likely some combination. A worked before/after example exists in a work-project prep doc (a 6-col table reformatted by hand to a 4-col multi-row-cell version), to be reproduced generically when this lands. + +Out of a work-project handoff 2026-06-09. + +Resolution 2026-06-11: all three shipped. (c) The standard, generalized from the work project's notes.org local copy, is now claude-rules/org-tables.md (globally loaded; render-width semantics — links measure at their visible label, never split a link) with the worked wrapped-table example. (a) .ai/scripts/wrap-org-table.el reflows tables mechanically: render-width measurement, link-atomic tokenizing, column shrink-to-floor allocation, continuation rows, rules between logical rows; idempotent (rule-delimited continuation groups merge back before re-wrapping); 23 ERT tests. (b) lint-org.el gained an org-table-standard judgment check (width overruns, missing rules; conformant wrapped tables not false-flagged); 5 new ERT tests, 32 total. Verified end-to-end on a demo file: 150-col table reflowed to budget, idempotent second pass, lint clean on the result. +** DONE [#C] SessionStart-on-clear hook for auto-resume :feature: +CLOSED: [2026-06-11 Thu] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-11 +:END: +Add a SessionStart hook (matcher: clear) in settings.json that auto-injects "read .ai/session-context.org and resume if present, else run startup.org". Today /flush prompts the user to /clear and the next session relies on the model re-reading session-context; the hook makes resume automatic on /clear. Keep full startup.org for genuine fresh starts (new day, other machine, been away). Likely lands as claude-templates workflow notes plus the hook in settings.json. + +The checkpoint+resume halves already shipped as /flush. This is the remaining automation piece. Out of a work-project handoff 2026-06-09 (process tooling, belongs in rulesets not the work project). + +Resolution 2026-06-11: the hook itself had already shipped 2026-06-02 (hooks/session-clear-resume.sh + the SessionStart clear entry in the tracked settings.json — this task duplicated it). What was actually broken: make install didn't cover hooks, so the symlink never reached machines that hadn't run make install-hooks by hand, and the hook errored silently on every /clear. Fixed by folding default-hook linking into make install (startup's Phase A.0 now propagates hooks machine-wide), with bats coverage in scripts/tests/install-hooks-link.bats. Both hook branches verified on ratio; the live /clear fire is a one-keystroke manual test. +*** TODO Manual testing and validation :test: +**** /clear mid-session resumes from the anchor +What we're verifying: the SessionStart(clear) hook fires and the fresh context resumes instead of cold-starting. +- In any project session with a live .ai/session-context.org (this rulesets session qualifies), type /clear +- Send any short message (the injected context loads but the model waits for your next keystroke) +Expected: the reply starts with "flushed." on its own line, restates the Active Goal and immediate Next Step, and does NOT run the startup workflow. +** DONE New personal projects are home regroupings — no mechanism needed +CLOSED: [2026-06-12 Fri] +Craig's call (2026-06-12): new personal projects will live in home, and there's no project-creation mechanism to build — he'll be working in home and simply decide to group some things differently. Nothing to do. + +Concurrence, verified: no template doc directs new personal work into ~/projects (first-session.org, install-ai.sh, and the README carry no such guidance; the only ~/projects references are discovery-root scans, which home and work still need). The situation as it stands: a new personal "project" is an area dir plus tasks inside home's existing =.ai/= machinery, no bootstrap step; =first-session.org= remains the bootstrap for standalone code projects in ~/code, unchanged and correct; "launch finances"-style trigger phrases for folded names degrade politely to the no-match candidate list, worth work only if real friction shows up. +** DONE [#C] Build =/update-skills= skill for keeping forks in sync with upstream :feature: +CLOSED: [2026-06-11 Thu] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-10 +:END: + +The rulesets repo has a growing set of forks (=arch-decide= from +wshobson/agents, =playwright-js= from lackeyjb/playwright-skill, =playwright-py= +from anthropics/skills/webapp-testing). Over time, upstream releases fixes, +new templates, or scope expansions that we'd want to pull in without losing +our local modifications. A skill should handle this deliberately rather than +by manual re-cloning. + +Shipped 2026-06-11: [[file:.claude/commands/update-skills.md][/update-skills command]] + [[file:scripts/update-skills.py][helper script]] (17 bats tests) + three bootstrapped manifests under [[file:upstreams/][upstreams/]]. The first real upstream drift will exercise the interactive per-file/per-hunk flow end to end; the merge mechanics are covered by the test suite. + +*** 2026-06-11 Thu @ 17:05:28 -0500 Specification written as the shipped artifacts +The command doc ([[file:.claude/commands/update-skills.md][update-skills.md]]) carries the user-facing spec: discovery, classification statuses, the per-file confirmation and per-hunk conflict flow, mark-synced semantics, and the missing-baseline fallback. The script's module docstring specifies the manifest schema. Two deviations from the 2026-05-16 design, with reasons: manifests live centrally at =upstreams/<name>/= instead of per-skill =.skill-upstream= dotfile dirs (arch-decide became two flat files in =commands/= and can't carry one — a =files= rename map covers it); baselines were seeded from the 2026-06-11 upstream HEADs since the true fork-point commits are unrecoverable, so pre-existing local modifications classify as =local-only= going forward. + +*** 2026-05-16 Sat @ 01:14:20 -0500 original goals and decisions +**** Design decisions (agreed) + +- *Upstream tracking:* per-fork manifest =.skill-upstream= (YAML or JSON): + - =url= (GitHub URL) + - =ref= (branch or tag) + - =subpath= (path inside the upstream repo when it's a monorepo) + - =last_synced_commit= (updated on successful sync) +- *Local modifications:* 3-way merge. Requires a pristine baseline snapshot of + the upstream-at-time-of-fork. Store under =.skill-upstream/baseline/= or + similar; committed to the rulesets repo so the merge base is reproducible. +- *Apply changes:* skill edits files directly with per-file confirmation. +- *Conflict policy:* per-hunk prompt inside the skill. When a 3-way merge + produces a conflict, the skill walks each conflicting hunk and asks Craig: + keep-local / take-upstream / both / skip. Editor-independent; works on + machines where Emacs isn't available. Fallback when baseline is missing + or corrupt (can't run 3-way merge): write =.local=, =.upstream=, + =.baseline= files side-by-side and surface as manual review. + +**** V1 Scope + +- [ ] Skill at =~/code/rulesets/update-skills/= +- [ ] Discovery: scan sibling skill dirs for =.skill-upstream= manifests +- [ ] Helper script (bash or python) to: + - Clone each upstream at =ref= shallowly into =/tmp/= + - Compare current skill state vs latest upstream vs stored baseline + - Classify each file: =unchanged= / =upstream-only= / =local-only= / =both-changed= + - For =both-changed=: run =git merge-file --stdout <local> <baseline> <upstream>=; + if clean, write result directly; if conflicts, parse the conflict-marker + output and feed each hunk into the per-hunk prompt loop +- [ ] Per-hunk prompt loop: + - Show base / local / upstream side-by-side for each conflicting hunk + - Ask: keep-local / take-upstream / both (concatenate) / skip (leave marker) + - Assemble resolved hunks into the final file content +- [ ] Per-fork summary output with file-level classification table +- [ ] Per-file confirmation flow (yes / no / show-diff) BEFORE per-hunk loop +- [ ] On successful sync: update =last_synced_commit= in the manifest +- [ ] =--dry-run= to preview without writing + +**** V2+ (deferred) + +- [ ] Track upstream *releases* (tags) not just branches, so skill can propose + "upgrade from v1.2 to v1.3" with release notes pulled in +- [ ] Generate patch files as an alternative apply method (for users who prefer + =git apply= / =patch= over in-place edits) +- [ ] Non-interactive mode (=--non-interactive= / CI): skip conflict resolution, + emit side-by-side files for later manual review +- [ ] Auto-run on a schedule via Claude Code background agent +- [ ] Summary of aggregate upstream activity across all forks (which forks have + upstream changes waiting, which don't) +- [ ] Optional editor integration: on machines with Emacs, offer + =M-x smerge-ediff= as an alternate path for users who prefer ediff over + per-hunk prompts + +**** Initial forks to enumerate (for manifest bootstrap) + +- [ ] =arch-decide= → =wshobson/agents= :: =plugins/documentation-generation/skills/architecture-decision-records= :: MIT +- [ ] =playwright-js= → =lackeyjb/playwright-skill= :: =skills/playwright-skill= :: MIT +- [ ] =playwright-py= → =anthropics/skills= :: =skills/webapp-testing= :: Apache-2.0 + +**** Open questions + +- [ ] What happens when upstream *renames* a file we fork? Skill would see + "file gone from upstream, still present locally" — drop, keep, or prompt? +- [ ] What happens when upstream splits into multiple forks (e.g., a plugin + reshuffles its structure)? Probably out of scope for v1; manual migration. +- [ ] Rate-limit / offline mode: if GitHub is unreachable, should skill fail + or degrade gracefully? Likely degrade; print warning per fork. +** DONE [#C] Monthly session-harvest workflow :feature: +CLOSED: [2026-06-11 Thu] +:PROPERTIES: +:CREATED: [2026-06-11 Thu] +:LAST_REVIEWED: 2026-06-11 +:END: +A monthly pass over recent =.ai/sessions/= summaries across projects proposing promotion candidates: patterns for the catalog, durable facts for the KB, rule refinements, workflow learnings. Sibling cadence to the roam-hygiene timer; a workflow run on schedule, not a standing agent. From the 2026-06-11 insights report's "Canonical-Aware Knowledge & Workflow Curator" — the capture/promote machinery exists (pattern catalog, /codify, KB); this adds the mining cadence. + +Shipped 2026-06-11 as [[file:.ai/workflows/session-harvest.org][session-harvest.org]] (template + INDEX entry): five phases, four promotion lanes, /codify-grade gates + work-confidentiality scrub, =:LAST_HARVEST:= marker in notes.org, and the KB receipt-line metrics readout for the ~2026-07-10 checkpoint. Window filter reads session-filename date prefixes (mtime proved unreliable in a live test). First run due ~2026-07-11. +** CANCELLED [#B] todo-cleanup.el per-area Open Work / Resolved pairs :feature: +CLOSED: [2026-06-11 Thu] +=--archive-done= assumes exactly one level-1 "Open Work" and one "Resolved" heading per todo.org. Home's consolidated file briefly carried per-area pairs and the pass skipped. Filed from home's 2026-06-11 addendum, then held the same evening when Craig flagged that he expected a single pair. + +Cancelled 2026-06-11: Craig confirmed the decision — one todo queue with a single Open Work / Resolved pair. Home reshapes its consolidated file to that form, and the existing single-pair tooling works unmodified. No code change needed. +** CANCELLED [#D] todo-cleanup =--archive-done= reports 0 moves while moving subtrees :bug: +CLOSED: [2026-06-12 Fri] +:PROPERTIES: +:CREATED: [2026-06-12 Fri] +:END: +Observed at the 2026-06-12 wrap: the pass relocated closed subtrees from Open Work to Resolved while printing "todo-cleanup --archive-done: 0 subtree(s) moved". + +CANCELLED 2026-06-12 — cannot reproduce. =todo-cleanup.el= is unchanged since the wrap that logged this, and =tc-archived= is incremented inline with each move and read straight in the report, so no move can go uncounted. Running the exact pre-archive state (=b6d286f:todo.org=) through the tool reports the right count (3 moved, all listed). The "0 moved" was a correct second-run report: =open-tasks.org= Phase A runs =--archive-done= after wrap-it-up already archived, so the second pass finds nothing to move and prints 0 next to the first pass's git diff. Not a code defect. +** DONE [#C] Session title hostname-project, no space :feature:quick: +CLOSED: [2026-06-13 Sat] +:PROPERTIES: +:CREATED: [2026-06-13 Sat] +:LAST_REVIEWED: 2026-06-13 +:END: +Routed from the roam global inbox via inbox-zero 2026-06-13. The SessionStart hook (=hooks/session-title.sh=) emitted =<host> <project>= with a space; Craig wanted =<host>-<project>= with a hyphen and no space. Changed the =sessionTitle= join to ="$host-$project"= plus the header comments, and updated the three =session-title-hook.bats= expectations (test-first; 6/6 green). +** DONE [#B] ~/.dotfiles discovery added to ai launcher; bootstrapped on velox +CLOSED: [2026-06-20 Sat] +Craig reported =~/.dotfiles= missing from the launcher picker. Two root causes, both fixed: (1) applied the parked one-liner =maybe_add_candidate "$HOME/.dotfiles"= in =build_candidates()= (=claude-templates/bin/ai=, after the =~/.emacs.d= line); (2) =~/.dotfiles/.ai/= was absent on velox — the 2026-06-16 bootstrap was on another machine and =.ai/= is gitignored, so it never traveled — re-bootstrapped via =install-ai.sh --gitignore ~/.dotfiles=. + +Verified end-to-end: =build_candidates()= now lists =~/.dotfiles= (protocols.org guard passes). sync-check clean (bin/ai is single-canonical, no mirror). By-name launch =ai ~/.dotfiles= already worked via single_mode's marker-only check. working/ai-dotfiles-discovery/ staging dir removed. +** DONE Phase E spec'd — folded into the autonomous-batch spec +CLOSED: [2026-06-16 Tue] +:PROPERTIES: +:CREATED: [2026-06-16 Tue] +:END: +Craig's answer (2026-06-16): spec it. Phase E reconciles with the "fix speedrun" proposal into one feature — see [[file:docs/design/2026-06-16-autonomous-batch-execution-spec.org][the autonomous-batch execution spec]]: a dedicated =work-the-backlog.org= holds the execution loop, inbox-zero keeps its A-D routing, and "fix speedrun" is a thin preset over the same loop. The prepared Phase E change stays under [[file:working/inbox-zero-phase-e/]] as a source. Tracked from here under the "fix speedrun" / autonomous-batch task below, where the spec-review VERIFY lives. +** DONE [#C] Encourage org-roam KB contribution across workflows :feature: +CLOSED: [2026-06-20 Sat] +:PROPERTIES: +:CREATED: [2026-06-16 Tue] +:END: +From the roam global inbox (Craig, 2026-06-16). Encourage agents to keep durable, strategic knowledge in the org-roam KB so it compounds into a cross-project asset: +- Curate a best-practices node (good note-taking + org-roam practices, drawing on established advice) and link it from =startup.org= with encouragement to contribute through the session. +- Add a reminder at the end of =triage-intake.org= and =inbox-zero.org= to store strategic / durable / useful info in the KB. +- Add an early =wrap-it-up.org= prompt asking the agent what it learned worth remembering, then to write it to the KB before proceeding. +Touches four synced template workflows and needs a curation pass on the best-practices content, so it's a design task — not a loop auto-implement. Filed from a =:next:=-tagged roam item; the eligibility tag was dropped on filing because the work needs a design decision (see the loop guardrail). Pairs with [[file:claude-rules/knowledge-base.md]] and the agent-knowledge-base spec. + +*** 2026-06-16 Tue @ 00:53:36 -0500 Spec written for review +Drafted [[file:docs/design/2026-06-16-encourage-kb-contribution-spec.org][the KB-contribution spec]]: four light workflow prompts (startup nudge, triage-intake + inbox-zero end-of-flow reminders, an early wrap-up reflection feeding the existing KB receipt) plus one Craig-authored best-practices node curated from Ahrens / Matuschak / org-roam guidance. Five open sub-decisions filed as decisions-as-TODO in the spec. +*** 2026-06-20 Sat @ 23:29:10 -0400 Spec ratified + built +Craig ratified all five decisions (2026-06-20) and added D6 — a read-side startup consult-nudge surfacing project-relevant KB node titles, the counterpart the original write-only design lacked. Built all of it: the best-practices node (=~/org/roam/agents/20260620232112-agent-kb-best-practices.org=), startup's two Phase C nudges (consult + contribute, gated on the roam clone), the conditional capture reminders in triage-intake + inbox-zero, and the early wrap-up reflection feeding the existing receipt. Commits 76e5559 (workflows + spec) and the related lint checker f6dde4e. Trigger for the build: receipt data showed "promoted 0 / consulted no" across recent sessions. diff --git a/claude-rules/cross-project.md b/claude-rules/cross-project.md index caceec9..73c0e1b 100644 --- a/claude-rules/cross-project.md +++ b/claude-rules/cross-project.md @@ -35,6 +35,8 @@ Two acceptable outcomes: ``` Output filenames follow `YYYY-MM-DD-HHMM-from-<this-project>-<slug>.<ext>` automatically, so the target's next session sees the source + timestamp at a glance without you having to construct the name. Fall back to `Write`/`Edit` only when the script isn't available (e.g. a freshly-cloned project before the first startup-rsync). + + The wrap-up cross-project router rides this same sanctioned path: at wrap time, `wrap-it-up.org`'s router step delivers `:ROUTE_CANDIDATE:`-tagged keeper tasks to their home projects' inboxes via `route-batch` → `inbox-send` (never a direct foreign `todo.org` write), and the destination's own inbox processing files each task per its conventions. 2. **"Switch projects"** — stop. Let the user reopen the agent session in the right cwd. Don't assume which one was meant. Either guess is wrong half the time and the cost of asking once is one short turn. diff --git a/claude-rules/docs-lifecycle.md b/claude-rules/docs-lifecycle.md new file mode 100644 index 0000000..3906d86 --- /dev/null +++ b/claude-rules/docs-lifecycle.md @@ -0,0 +1,75 @@ +# Docs Lifecycle + +Applies to: `**/*` (any project carrying a `docs/` tree) + +How formal documents are separated from working notes, and how a document's +lifecycle state stays visible without opening the file. Specs are the first +instance of the shape; the pattern is reusable for any growing collection of +processed artifacts. Full design: the docs-lifecycle spec in rulesets +`docs/specs/`. + +## The shape (reusable) + +1. **Separate formal artifacts from working notes by location.** A formal + artifact proposes a buildable change and carries the full spine; everything + else is a note. +2. **Lifecycle state lives in the artifact**, on a scannable, greppable + carrier — an org TODO keyword on a top-level status heading — with a dated + history of every transition. +3. **Links use rename-safe identifiers** so a move or rename never orphans + inbound references. +4. A collection **earns this treatment when "which of these are live?" starts + requiring a file-by-file read.** + +## The spec instance + +- `docs/specs/` holds formal specs only — a doc with both a `Decisions` + section and an `Implementation phases` section (the spec-create spine). + `docs/design/` holds everything else: brainstorms, proposals, inventories, + research notes, frozen source material. Spec filenames end `-spec.org` + (spec-review's precondition keys on it); no status suffixes ever. +- Every spec opens with a top-level status heading directly after the file + header, carrying the lifecycle keyword, an `:ID:` UUID, and dated history + lines (newest first). The keyword header is two sequences, and both lines + are required: + + #+TODO: TODO | DONE + #+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + + The first drives `Decisions` / `Review findings` tasks and their `[/]` + cookies; the second is the lifecycle. They share no keyword — never merge + them into one line, and never drop the first (that silently breaks the + cookies that gate readiness). +- **The heading keyword is authoritative.** The Metadata table's `Status` + field mirrors it in lowercase; on disagreement the heading wins. A + transition is three lines in one file — keyword, history line, mirror — and + never a rename or a link edit. +- **Every flip has a named owner:** spec-create stamps `DRAFT`; spec-review + flips `DRAFT` → `READY` on a passing gate; spec-response flips `READY` → + `DOING` when it decomposes phases into build tasks (stamping the spec's + UUID as a `:SPEC_ID:` property on the build parent, and always emitting a + final "flip the spec to IMPLEMENTED" task); task-audit flags any `DOING` + spec whose `:SPEC_ID:`-bound parent is closed, archived, or missing. + Terminal states (`IMPLEMENTED` / `SUPERSEDED` / `CANCELLED`) always carry a + stated reason in the history line. +- **The status board is one grep:** + + rg -H '^\* (DRAFT|READY|DOING|IMPLEMENTED|SUPERSEDED|CANCELLED) ' docs/specs/ + +- **Legacy compatibility:** projects that haven't run the one-time `spec-sort` + retrofit (no `:LAST_SPEC_SORT:` marker in `.ai/notes.org` Workflow State) + keep their legacy spec locations reviewable; the `docs/specs/` requirement + hardens only after the sort runs. +- **Cross-doc links to specs are `file:` links for now.** Specs carry `:ID:` + UUIDs, but conversion to `[[id:...]]` is a gated follow-up (the Emacs id + index has to know about project docs first) — don't convert links ad hoc. + +## Watch for + +- Editing the `#+TODO:` header down to one sequence — the `[/]` cookies stop + computing and readiness gates go vacuous. +- Bare `[N/N]` tokens in prose or list items — org's cookie updater rewrites + them; spell counts out in words outside real cookie positions. +- A "done" spec whose keyword still says `DOING` — that's the failure this + convention exists to prevent; flip it with a history line rather than + leaving it for the audit to catch. diff --git a/claude-rules/todo-format.md b/claude-rules/todo-format.md index 55530de..5e9ca32 100644 --- a/claude-rules/todo-format.md +++ b/claude-rules/todo-format.md @@ -33,6 +33,37 @@ When a project's `todo.org` lacks the section, add it before filing or grading further tasks — propose the priority semantics and tag set from the project's existing usage, and confirm with Craig. +### Hard definitions: `:solo:` and `:quick:` (fixed across projects) + +A project's scheme may add or rename its other tags, but these two carry +fixed definitions everywhere, because autonomous execution +(work-the-backlog / the no-approvals speedrun) reads `:solo:` as its +eligibility gate and trusts the author's tag rather than re-deriving +autonomy at run time. + +- **`:solo:` — autonomy.** The task can be completed *and verified* without + Craig's involvement beyond at most one or two quick decisions that can be + stated and answered before work starts. No open design question, no + "weigh these approaches," no waiting on Craig mid-task. Three gates, all + must hold: *buildable* (the agent has the capability and access), + *verifiable by the agent* (an objective or local check it can run itself — + handing off a residual human-in-the-loop confirmation as a structured + manual-testing reminder does not disqualify), and *no deliberation* (a + quick, upfront-answerable factual question is allowed — it gets batched + into the speedrun's pre-flight Q&A; a genuine design or preference call + is not). A wrong `:solo:` is worse than none: it tells Craig he can hand + the task off and walk away when he can't. +- **`:quick:` — effort hint only.** Likely 30 minutes or less from start + through verification. Informational, for batching and estimating a run's + duration; never an eligibility gate. `:quick:` and `:solo:` are + orthogonal — a bounded refactor can be `:solo:` but slow; a five-minute + change hinging on a preference call is `:quick:` but not `:solo:`. + +Both tags are applied at task creation and **re-checked as a mandatory +step** in the task-review and task-audit workflows, so the run-time gate +can trust the tag. A review or audit that skips the `:solo:`/`:quick:` +assessment is incomplete. + ### Bug priority from severity × frequency (mandatory where a codebase exists) Some projects carry a codebase — source the project maintains under version @@ -172,6 +203,8 @@ becomes *** 2026-05-15 Fri @ 12:58:08 -0500 Wired yasnippet for universal availability +**Enforcement.** This is applied at close time by whoever closes the task, but an interactive org close (`org-log-done` flips the keyword to `DONE` and stamps `CLOSED:`) never applies the dated rewrite, so level-3+ closes accumulate as `DONE` keywords. `todo-cleanup.el --convert-subtasks` (run in the `clean-todo` and wrap-up cleanup passes) normalizes them mechanically: it rewrites any level-3+ `DONE`/`CANCELLED`/`FAILED` heading into the dated form above, pulling the timestamp from the `CLOSED` cookie and keeping the heading text verbatim (a batch tool can't reliably past-tense a title — polish wording by hand where it matters). `lint-org.el` flags any that slip through (checker `subtask-done-not-dated`). So the depth rule holds even when tasks are closed interactively rather than by an agent applying this section. + ### Why depth-based The agenda view (`org-agenda`) shows entries at the section + top-task level. Letting `**` tasks stay task-shaped preserves their visibility as "things that recently shipped." Letting `***+` sub-tasks flip to dated entries keeps the agenda from being clogged with a long list of completed sub-tasks at every depth — those become history within their parent instead. diff --git a/claude-templates/.ai/protocols.org b/claude-templates/.ai/protocols.org index ed07c0e..5e18ab9 100644 --- a/claude-templates/.ai/protocols.org +++ b/claude-templates/.ai/protocols.org @@ -552,6 +552,8 @@ Claude needs to add information to =.ai/notes.org=. For large amounts of informa **The gitignore set follows that same decision.** A project that gitignores =.ai/= (the code-project case) gitignores the whole personal-tooling set: =.ai/=, =.claude/=, =CLAUDE.md=, =AGENTS.md=. =.claude/= is rulesets-owned — copies of =claude-rules/*.md= plus the language bundle's rules, hooks, and settings — and re-synced from rulesets on every startup, so git isn't how it travels between machines; ignoring it also keeps those private rule copies out of the repo, which ignoring =CLAUDE.md= alone would miss. A track-mode project (personal/doc repos, or a team repo that shares config with teammates who don't run rulesets) tracks the set instead. =install-ai.sh= writes the full set at bootstrap in gitignore mode; =scripts/sweep-gitignore-tooling.sh= backfills it idempotently across existing gitignore-mode projects when the set grows. +**Public reachability decides harder than project type.** Any repo whose remotes include a non-cjennings.net host gitignores the tooling set, whatever kind of project it is — the only exception is a team repo that deliberately shares the config, decided explicitly, never by default. And a private remote is not proof of privacy: a server-side =post-receive --mirror= hook republishes invisibly from the client (the 2026-06-30 =.emacs.d= exposure rode exactly that — a cjennings.net remote mirroring to public GitHub). The sweep recognizes both the anchored (=/.ai/=) and unanchored (=.ai/=) ignore styles — an anchored-style project used to be misread as track-mode and silently skipped — and warns when tracked tooling can reach a non-cjennings.net remote. + **Credential-leak concern: gate it on project type, not on the credential itself.** A tracked secret, token, or credentials doc is only a public-leak risk where the repo can reach a public remote — that is, *code projects pushed to public GitHub*, which is exactly why those gitignore =.ai/= and =.claude/=. For *personal / documentation projects* (the =~/projects/= set: elibrary, home, finances, health, philosophy, etc.), the git remote is a private single-user repo on =cjennings.net=, so tracked credentials inside =.ai/= files are fine — that's the design, the project history IS the project. Do NOT raise a leak warning or suggest gitignoring a secret for these. When the question "is this a leak / should we gitignore this secret?" comes up, decide it on *which kind of project and remote* this is, never on the mere presence of a credential in a tracked file. **When to break out documents:** diff --git a/claude-templates/.ai/scripts/lint-org.el b/claude-templates/.ai/scripts/lint-org.el index 5447cb3..90b1b1d 100644 --- a/claude-templates/.ai/scripts/lint-org.el +++ b/claude-templates/.ai/scripts/lint-org.el @@ -35,6 +35,7 @@ ;; empty-heading bare stars with no title ;; malformed-priority-cookie [#x]-shaped token org rejected ;; level2-done-without-closed completed level-2 task with no CLOSED +;; subtask-done-not-dated level-3+ done sub-task still a DONE keyword ;; (anything else) surfaced as judgment with checker name ;; ;; Output format on stdout: @@ -503,6 +504,32 @@ the live file on the next `task-sorted'." "level-2 DONE/CANCELLED has no CLOSED date — add CLOSED: [YYYY-MM-DD Day]; task-sorted's aging step archives an undated completed task immediately")))))))) ;;; --------------------------------------------------------------------------- +;;; level-3+ dated-header check (claude-rules/todo-format.md) +;; +;; The inverse of the level-2 check above. A completed sub-task — a heading at +;; level 3 or deeper, under a parent task — becomes a dated event-log entry, not +;; a DONE keyword, so the parent's subtree grows a chronological history instead +;; of a long tail of nested DONE lines. An interactive org close +;; (`org-log-done' → DONE + CLOSED) leaves the keyword in place, and +;; `--archive-done' only touches level 2, so these accumulate. Flag them for +;; conversion. Judgment-only and regex-based (independent of which TODO keywords +;; the batch Emacs recognizes); todo-cleanup.el --convert-subtasks does the fix. + +(defun lo--check-subtask-done-not-dated () + "Flag level-3+ headings carrying a done keyword (DONE/CANCELLED/FAILED). +Emits one judgment item per offending heading (checker +`subtask-done-not-dated')." + (save-excursion + (goto-char (point-min)) + ;; Case-sensitive: the keywords are uppercase, not the words in a title. + (let ((case-fold-search nil)) + (while (re-search-forward + "^\\*\\{3,\\} \\(DONE\\|CANCELLED\\|FAILED\\) " nil t) + (lo--emit-judgment + 'subtask-done-not-dated (line-number-at-pos) + "level-3+ done sub-task should be a dated event-log entry (todo-format.md): run todo-cleanup.el --convert-subtasks to rewrite it"))))) + +;;; --------------------------------------------------------------------------- ;;; File processing (defun lo--backup (file) @@ -543,6 +570,7 @@ left unmodified and mechanical entries are recorded with :preview t." (lo--check-empty-headings) (lo--check-malformed-priority-cookies) (lo--check-level2-done-without-closed) + (lo--check-subtask-done-not-dated) (when (and (not lo-check-only) (buffer-modified-p)) (save-buffer))) (with-current-buffer buf (set-buffer-modified-p nil)) diff --git a/claude-templates/.ai/scripts/route-batch b/claude-templates/.ai/scripts/route-batch new file mode 100755 index 0000000..8f27d19 --- /dev/null +++ b/claude-templates/.ai/scripts/route-batch @@ -0,0 +1,175 @@ +#!/usr/bin/env python3 +"""route-batch — the wrap-up router's mechanical go path. + +The wrap-up cross-project router (wrap-it-up.org Step 3; wrapup-routing spec +D7/D8/D9) surfaces the local tasks that inbox process mode stamped with +:ROUTE_CANDIDATE: <destination> at file time, and on "go" delivers each to its +destination project's inbox. This script does the mechanical half so the +subtree surgery is deterministic: + + route-batch --list [--todo todo.org] + One "<destination>\t<heading>" line per :ROUTE_CANDIDATE:-tagged task. + Silent with exit 0 when there are no candidates (the workflow's + empty-set-equals-zero-interaction rule). Read-only. + + route-batch --go [--todo todo.org] + For each candidate, bottom-up: extract the task's whole subtree + (children ride along), drop the :ROUTE_CANDIDATE: line (and the + property drawer if that leaves it empty), promote the subtree so its + top heading is level 1, write it to a temp file, and deliver it via + the sibling inbox-send.py to the destination's inbox/ (one file per + task, from-<source> provenance stamped by inbox-send). Only after a + successful send is the subtree removed from the local todo.org — a + failed send leaves that task in place, is reported, and the run exits + non-zero after attempting the rest. + +The candidate set is exactly the tagged tasks — never the standing backlog. +Discovery, roots, and the source-project name all come from inbox-send.py +(INBOX_SEND_ROOTS sandboxes it in tests). The reject-from-another-project +flow in inbox process mode is the mis-route recovery; that path is why +removing the local source after a successful send is safe. +""" + +import argparse +import os +import re +import subprocess +import sys +import tempfile +from pathlib import Path + +HEADING_RE = re.compile(r"^(\*+)\s+(.*)$") +MARKER_RE = re.compile(r"^\s*:ROUTE_CANDIDATE:\s+(\S+)\s*$") + + +def find_candidates(lines): + """[(heading_idx, end_idx, marker_idx, destination, heading_text)] — + end_idx is one past the subtree's last line.""" + candidates = [] + for i, line in enumerate(lines): + m = MARKER_RE.match(line) + if not m: + continue + head_idx = None + for j in range(i, -1, -1): + hm = HEADING_RE.match(lines[j]) + if hm: + head_idx = j + level = len(hm.group(1)) + heading = hm.group(2) + break + if head_idx is None: + continue + end = len(lines) + for k in range(head_idx + 1, len(lines)): + km = HEADING_RE.match(lines[k]) + if km and len(km.group(1)) <= level: + end = k + break + candidates.append((head_idx, end, i, m.group(1), heading)) + return candidates + + +def extract_handoff(lines, head_idx, end): + """The subtree as handoff text: every :ROUTE_CANDIDATE: line dropped + (a marker is meaningless at the destination), empty drawers pruned, + headings promoted so the task is level 1.""" + sub = [l for l in lines[head_idx:end] if not MARKER_RE.match(l)] + + pruned = [] + i = 0 + while i < len(sub): + if sub[i].strip() == ":PROPERTIES:" and i + 1 < len(sub) and sub[i + 1].strip() == ":END:": + i += 2 + continue + pruned.append(sub[i]) + i += 1 + + shift = len(HEADING_RE.match(pruned[0]).group(1)) - 1 + if shift > 0: + pruned = [l[shift:] if HEADING_RE.match(l) else l for l in pruned] + return "\n".join(pruned).rstrip() + "\n" + + +def send(destination, handoff_text, slug): + inbox_send = Path(__file__).with_name("inbox-send.py") + with tempfile.NamedTemporaryFile( + "w", suffix=".org", prefix=f"route-{slug}-", delete=False, encoding="utf-8" + ) as tf: + tf.write(handoff_text) + tmp = tf.name + try: + result = subprocess.run( + [sys.executable, str(inbox_send), destination, "--file", tmp], + capture_output=True, text=True, + ) + return result.returncode == 0, (result.stderr or result.stdout).strip() + finally: + os.unlink(tmp) + + +def main(): + ap = argparse.ArgumentParser(prog="route-batch") + mode = ap.add_mutually_exclusive_group(required=True) + mode.add_argument("--list", action="store_true", dest="list_mode") + mode.add_argument("--go", action="store_true") + ap.add_argument("--todo", default="todo.org") + args = ap.parse_args() + + todo_path = Path(args.todo) + if not todo_path.is_file(): + return 0 # no todo file, no candidates + lines = todo_path.read_text(encoding="utf-8").splitlines() + candidates = find_candidates(lines) + + # Two markers in one task's drawer are one candidate, not two: same span + + # same destination dedupes. Everything else that overlaps — a tagged child + # inside a tagged parent, one task tagged for two destinations — is a + # conflict: routing either span would silently take the other (or, with a + # stale end index, a bystander task) along. Conflicts are left in place + # and reported; the human untangles which project the pieces belong to. + deduped = [] + for cand in candidates: + if not any(c[0] == cand[0] and c[1] == cand[1] and c[3] == cand[3] for c in deduped): + deduped.append(cand) + conflicted = set() + for a in deduped: + for b in deduped: + if a is not b and a[0] <= b[0] and b[1] <= a[1]: + conflicted.add(a) + conflicted.add(b) + routable = [c for c in deduped if c not in conflicted] + + if not deduped: + return 0 + + if args.list_mode: + for _h, _e, _m, dest, heading in deduped: + flag = "\tCONFLICT (overlapping candidates — resolve by hand)" if (_h, _e, _m, dest, heading) in conflicted else "" + print(f"{dest}\t{heading}{flag}") + return 0 + + failures = 0 + for _h, _e, _m, dest, heading in sorted(conflicted): + failures += 1 + print(f"CONFLICT: {dest}\t{heading}\t(overlapping candidate subtrees — left in place, resolve by hand)") + + # Bottom-up so earlier indices stay valid as subtrees are removed; the + # file is rewritten after every successful send so a crash mid-run never + # leaves an already-sent task still present locally. + for head_idx, end, _marker_idx, dest, heading in sorted(routable, reverse=True): + handoff = extract_handoff(lines, head_idx, end) + slug = re.sub(r"[^a-z0-9]+", "-", heading.lower()).strip("-")[:40] or "task" + ok, detail = send(dest, handoff, slug) + if ok: + del lines[head_idx:end] + todo_path.write_text("\n".join(lines).rstrip("\n") + "\n", encoding="utf-8") + print(f"routed: {dest}\t{heading}") + else: + failures += 1 + print(f"FAILED: {dest}\t{heading}\t({detail})") + return 1 if failures else 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/claude-templates/.ai/scripts/self-inject.sh b/claude-templates/.ai/scripts/self-inject.sh new file mode 100755 index 0000000..e7340c1 --- /dev/null +++ b/claude-templates/.ai/scripts/self-inject.sh @@ -0,0 +1,68 @@ +#!/bin/sh +# self-inject.sh — type text into the tmux pane running this agent session. +# +# The building block for AUTO-FLUSH: an agent checkpoints its session-context, +# then has tmux type "/clear" and a resume prompt at its own idle prompt, so a +# session flushes with no human at the keyboard. +# +# Usage: +# self-inject.sh -t %PANE <delay> <text> [<delay2> <text2> ...] +# self-inject.sh <delay> <text> [...] # derive pane from ancestry +# self-inject.sh [-t %PANE] # no pairs: report the pane +# +# Each pair: sleep <delay> seconds, then type <text> literally and press Enter. +# +# TWO HARD-WON GOTCHAS (2026-07-02, archsetup session): +# 1. A detached child (setsid/nohup/&) of an agent tool call DIES when the +# tool call ends — the harness cleans up the process group. The arm step +# must run under the tmux SERVER instead: +# tmux run-shell -b "self-inject.sh -t %1 25 '/clear' 15 'go — resume...'" +# 2. Under tmux run-shell the process is a child of the tmux server, so +# ancestry-based pane detection CANNOT work there. Derive the pane FIRST, +# synchronously from the agent's own shell (no -t), then pass it +# explicitly with -t when arming. +# +# Collision hazard: if the user happens to be typing when the send fires, the +# injected text merges into their input line (a real /clear became "/clearto" +# mid-word). Auto-flush is for sessions running unattended; warn the user to +# keep hands off for the armed window if they're present. + +PANE="" +if [ "$1" = "-t" ]; then + PANE=$2; shift 2 +fi + +ppid_of() { + # /proc/<pid>/stat: pid (comm) state ppid ... — comm may contain spaces, + # so take the 2nd field after the LAST ')'. + stat=$(cat "/proc/$1/stat" 2>/dev/null) || return 1 + # shellcheck disable=SC2086 # word-splitting the stat tail is the point + set -- ${stat##*) } + echo "$2" +} + +find_pane() { + anc=" " + pid=$$ + while [ -n "$pid" ] && [ "$pid" -gt 1 ] 2>/dev/null; do + anc="$anc$pid " + pid=$(ppid_of "$pid") || break + done + tmux list-panes -a -F "#{pane_pid} #{pane_id}" 2>/dev/null | \ + while read -r ppid pane; do + case "$anc" in *" $ppid "*) echo "$pane"; break;; esac + done +} + +[ -n "$PANE" ] || PANE=$(find_pane) +[ -n "$PANE" ] || { echo "self-inject: no owning pane found (pass -t %PANE)" >&2; exit 1; } + +# With no delay/text pairs, just report the pane (the derive-first step). +[ $# -ge 2 ] || { echo "$PANE"; exit 0; } + +while [ $# -ge 2 ]; do + sleep "$1" + tmux send-keys -t "$PANE" -l "$2" + tmux send-keys -t "$PANE" Enter + shift 2 +done diff --git a/claude-templates/.ai/scripts/spec-sort b/claude-templates/.ai/scripts/spec-sort new file mode 100755 index 0000000..ebfef82 --- /dev/null +++ b/claude-templates/.ai/scripts/spec-sort @@ -0,0 +1,715 @@ +#!/usr/bin/env python3 +"""spec-sort — one-time docs-pile retrofit for the docs-lifecycle convention. + +Classifies every docs/**/*.org outside docs/specs/ by one predicate: a doc +carrying BOTH a "Decisions" heading AND an "Implementation phases" heading is +a spec candidate; everything else is a note. For each candidate it shows an +evidence panel (Status field, decision/finding cookies, the linking todo.org +task, recent dated history, cheap existence checks on phase-named artifacts) +and proposes a lifecycle keyword the evidence supports — conservative +non-terminal (DRAFT) when inconclusive. The helper proposes; a human confirms +every move. + +Dry-run report is the default. --apply executes under the fail-safe contract: + + - Clean-worktree preflight: refuses on a dirty git tree (exit 2) unless + --allow-dirty, which prints exactly what recovery loses. + - Every candidate must be addressed with --confirm REL=KEYWORD or + --skip REL; terminal keywords (IMPLEMENTED SUPERSEDED CANCELLED) also + need --reason REL=TEXT, recorded in the status-history line. + - The full move + relink plan is computed and validated first (every + destination free, every link resolvable), written to a plan file, and + only then executed from that recorded plan. + - Bare-path mentions of a moving doc inside the rewritten roots are + reported, never rewritten; they block --apply until --acknowledge-bare + explicitly waives them. + - Mid-apply failure stops the run, names what was and wasn't applied, and + prints the git-restore recovery recipe (plus deletion of newly created + destination copies, which git restore can't remove). + - After a successful apply, a residue scan across the rewritten roots must + find no link still resolving to an old path, or spec-sort exits non-zero + naming the residue. + +Per move: rename to carry the -spec.org suffix, prepend the status heading +(:ID: UUID + dated history line), rewrite the keyword header to the +two-sequence form, mirror the keyword into the Metadata Status field, and +recompute every affected file: link (inbound links to the moved doc AND the +moved doc's own outbound relative links). Rewritten roots: todo.org, +.ai/notes.org, docs/**, .ai/project-workflows/, .ai/project-scripts/. +Reported-never-rewritten: .ai/sessions/ (frozen history) and synced template +paths (.ai/workflows/, .ai/scripts/, .ai/protocols.org — the report names +the canonical claude-templates file instead). + +Finally stamps :LAST_SPEC_SORT: YYYY-MM-DD in .ai/notes.org's +* Workflow State section (created idempotently), which permanently clears +the startup nudge. A run with zero candidates still stamps. + +Exit codes: 0 done (or clean report), 1 blocked (confirm gate, validation, +bare mentions, residue, mid-apply failure), 2 usage / preflight refusal. + +Test hook: SPEC_SORT_INJECT_FAIL_AFTER=N aborts the apply after N write +operations, exercising the recovery path in the bats suite. +""" + +import argparse +import json +import os +import re +import subprocess +import sys +import tempfile +import uuid +from datetime import datetime + +LIFECYCLE = ("DRAFT", "READY", "DOING", "IMPLEMENTED", "SUPERSEDED", "CANCELLED") +TERMINAL = {"IMPLEMENTED", "SUPERSEDED", "CANCELLED"} +TODO_HEADER = [ + "#+TODO: TODO | DONE", + "#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED", +] + +# Project-owned surfaces whose file: links get rewritten. +REWRITE_ROOTS = ("todo.org", ".ai/notes.org", "docs", ".ai/project-workflows", ".ai/project-scripts") +# Frozen or synced surfaces: occurrences are reported, never rewritten. +REPORT_ROOTS = (".ai/sessions", ".ai/workflows", ".ai/scripts", ".ai/protocols.org") +# Synced template paths map to their canonical rulesets file for the report. +SYNCED_PREFIX = (".ai/workflows", ".ai/scripts", ".ai/protocols.org") + +LINK_RE = re.compile(r"\[\[file:([^\]\[]+)\](?:\[([^\]\[]*)\])?\]") +HEADING_RE = re.compile(r"^(\*+)\s+(.*)$") +COOKIE_RE = re.compile(r"\[\d+/\d+\]") +DATED_RE = re.compile(r"\b\d{4}-\d{2}-\d{2}\b") + + +def read_text(path): + try: + with open(path, encoding="utf-8") as f: + return f.read() + except (UnicodeDecodeError, OSError): + return None + + +def heading_text(line): + """Heading text with the org keyword and priority cookie stripped.""" + m = HEADING_RE.match(line) + if not m: + return None + text = re.sub(r"^[A-Z]+\s+", "", m.group(2)) + text = re.sub(r"^\[#[A-Z]\]\s+", "", text) + return text.strip() + + +def has_spine(content): + """The classification predicate: Decisions AND Implementation phases.""" + dec = imp = False + for line in content.splitlines(): + t = heading_text(line) + if t is None: + continue + tl = t.lower() + if tl.startswith("decisions"): + dec = True + elif tl.startswith("implementation phases"): + imp = True + return dec and imp + + +def walk_files(root, rel_base): + """Yield project-relative paths of files under rel_base (file or dir).""" + abs_base = os.path.join(root, rel_base) + if os.path.isfile(abs_base): + yield rel_base + return + for dirpath, dirs, files in os.walk(abs_base): + dirs.sort() + for name in sorted(files): + yield os.path.relpath(os.path.join(dirpath, name), root) + + +def classify(root): + """Split docs/**/*.org outside docs/specs/ into candidates / anomalies / notes.""" + candidates, anomalies, notes = [], [], [] + docs = os.path.join(root, "docs") + if not os.path.isdir(docs): + return candidates, anomalies, notes + for rel in walk_files(root, "docs"): + if not rel.endswith(".org"): + continue + parts = rel.split(os.sep) + if len(parts) > 1 and parts[1] == "specs": + continue + content = read_text(os.path.join(root, rel)) + if content is None: + continue + if has_spine(content): + candidates.append(rel) + elif os.path.basename(rel).endswith("-spec.org"): + anomalies.append(rel) + else: + notes.append(rel) + return candidates, anomalies, notes + + +def dest_for(rel): + base = os.path.basename(rel) + if not base.endswith("-spec.org"): + base = base[: -len(".org")] + "-spec.org" + return os.path.join("docs", "specs", base) + + +# ---- Evidence panel --------------------------------------------------- + + +def todo_task_for(root, rel): + """Heading of the first todo.org task whose subtree mentions the doc.""" + content = read_text(os.path.join(root, "todo.org")) + if content is None: + return None + lines = content.splitlines() + basename = os.path.basename(rel) + for i, line in enumerate(lines): + if basename in line or rel in line: + for j in range(i, -1, -1): + if HEADING_RE.match(lines[j]): + return lines[j].lstrip("* ").strip() + return None + return None + + +def gather_evidence(root, rel, content): + ev = {} + m = re.search(r"^\|\s*Status\s*\|\s*([^|]*)\|", content, re.MULTILINE | re.IGNORECASE) + ev["status"] = m.group(1).strip() if m else None + + cookies = [] + for line in content.splitlines(): + t = heading_text(line) + if t and COOKIE_RE.search(t) and ( + t.lower().startswith("decisions") or t.lower().startswith("review findings") + ): + cookies.append(t) + ev["cookies"] = cookies + + ev["todo"] = todo_task_for(root, rel) + kw = None + if ev["todo"]: + m = re.match(r"([A-Z]+)\s", ev["todo"]) + kw = m.group(1) if m else None + ev["todo_keyword"] = kw + + dated = [ln.strip() for ln in content.splitlines() if DATED_RE.search(ln)] + ev["history"] = dated[-1][:100] if dated else None + + # Cheap artifact check: =path= tokens inside the Implementation phases section. + artifacts, exists = [], 0 + section = re.split(r"^\*+\s+.*implementation phases.*$", content, maxsplit=1, flags=re.MULTILINE | re.IGNORECASE) + if len(section) > 1: + for tok in re.findall(r"=([^=\s]+)=", section[1]): + if "/" in tok: + artifacts.append(tok) + if os.path.exists(os.path.join(root, tok)): + exists += 1 + ev["artifacts"] = (exists, artifacts) + return ev + + +def propose_keyword(ev): + s = (ev["status"] or "").lower() + words = set(re.findall(r"[a-z]+", s)) + if words & {"implemented", "shipped", "complete", "completed", "done"}: + return "IMPLEMENTED" + if words & {"superseded"}: + return "SUPERSEDED" + if words & {"cancelled", "canceled", "dead", "abandoned"}: + return "CANCELLED" + if words & {"doing", "implementing"} or "in progress" in s or "in-progress" in s: + return "DOING" + if ev["todo_keyword"] == "DOING": + return "DOING" + if words & {"ready", "approved", "accepted"}: + return "READY" + return "DRAFT" # conservative non-terminal default + + +# ---- Link scanning ---------------------------------------------------- + + +def rewrite_files(root): + """Project-relative *.org files under the rewritten roots.""" + seen = [] + for base in REWRITE_ROOTS: + if not os.path.exists(os.path.join(root, base)): + continue + for rel in walk_files(root, base): + if rel.endswith(".org") and rel not in seen: + seen.append(rel) + return seen + + +def resolve_target(root, linker_rel, raw_target, moved): + """Resolve a file: link target to a project-relative path (org semantics + first — relative to the linking file's directory — then project-root + anchoring as a fallback for root-anchored links).""" + if raw_target.startswith(("/", "~", "http:", "https:")): + return None + rel_a = os.path.normpath(os.path.join(os.path.dirname(linker_rel), raw_target)) + if rel_a in moved or os.path.exists(os.path.join(root, rel_a)): + return rel_a + rel_b = os.path.normpath(raw_target) + if rel_b in moved or os.path.exists(os.path.join(root, rel_b)): + return rel_b + return rel_a + + +def plan_link_edits(root, moved): + """Compute every link rewrite: inbound links to moved docs and moved + docs' own outbound relative links. Returns ({linker_rel: [(old, new)]}, + [ambiguity descriptions]) — a link whose file-relative and root-anchored + readings are both live and disagree about a moving doc blocks validation + rather than being rewritten against a guess.""" + edits = {} + ambiguous = [] + for linker in rewrite_files(root): + content = read_text(os.path.join(root, linker)) + if content is None: + continue + linker_post = moved.get(linker, linker) + for m in LINK_RE.finditer(content): + raw = m.group(1) + desc = m.group(2) + target_path, sep, anchor = raw.partition("::") + target = resolve_target(root, linker, target_path, moved) + if target is None: + continue + rel_a = os.path.normpath(os.path.join(os.path.dirname(linker), target_path)) + rel_b = os.path.normpath(target_path) + if rel_a != rel_b: + live_a = rel_a in moved or os.path.exists(os.path.join(root, rel_a)) + live_b = rel_b in moved or os.path.exists(os.path.join(root, rel_b)) + if live_a and live_b and (rel_a in moved or rel_b in moved): + ambiguous.append( + "%s: [[file:%s]] reads as %s (file-relative) or %s (root-anchored) " + "and a moving doc is involved — resolve the link by hand" % (linker, raw, rel_a, rel_b)) + continue + if target not in moved and linker not in moved: + continue + if target not in moved and not os.path.exists(os.path.join(root, target)): + continue # already broken before this run; not ours to guess + target_post = moved.get(target, target) + new_path = os.path.relpath(target_post, os.path.dirname(linker_post) or ".") + new_raw = new_path + (sep + anchor if sep else "") + if new_raw == raw: + continue + new_link = "[[file:%s]%s]" % (new_raw, "[%s]" % desc if desc is not None else "") + if m.group(0) != new_link: + edits.setdefault(linker, []).append((m.group(0), new_link)) + return edits, ambiguous + + +def scan_bare_mentions(root, moved): + """Bare-path mentions of moving docs in the rewritten roots — text + occurrences outside any [[...]] link. Reported, never rewritten.""" + found = [] + for base in REWRITE_ROOTS: + if not os.path.exists(os.path.join(root, base)): + continue + for rel in walk_files(root, base): + content = read_text(os.path.join(root, rel)) + if content is None: + continue + for i, line in enumerate(content.splitlines(), 1): + stripped = re.sub(r"\[\[[^\]]*\](?:\[[^\]]*\])?\]", "", line) + for src in moved: + if src in stripped: + found.append((rel, i, src)) + return found + + +def scan_report_only(root, moved): + """Occurrences of moving docs in frozen/synced surfaces.""" + reports = [] + for base in REPORT_ROOTS: + if not os.path.exists(os.path.join(root, base)): + continue + for rel in walk_files(root, base): + content = read_text(os.path.join(root, rel)) + if content is None: + continue + for src in moved: + if src in content: + if rel.startswith(SYNCED_PREFIX): + note = ("synced template, not rewritten — a local edit is reverted by the " + "next sync; edit the canonical claude-templates/%s instead" % rel) + else: + note = "frozen history; not rewritten" + reports.append((rel, src, note)) + return reports + + +# ---- Content transforms ----------------------------------------------- + + +def transform_spec(content, keyword, reason, title, doc_id, link_edits): + """Apply the retrofit rewrite to a moving spec's content: two-sequence + keyword header, prepended status heading, Status-field mirror, and the + doc's own link edits.""" + for old, new in link_edits: + content = content.replace(old, new) + lines = content.splitlines() + + todo_idx = None + kept = [] + for line in lines: + if line.startswith("#+TODO:"): + if todo_idx is None: + todo_idx = len(kept) + continue + kept.append(line) + lines = kept + if todo_idx is None: + todo_idx = 0 + while todo_idx < len(lines) and lines[todo_idx].startswith("#+"): + todo_idx += 1 + lines[todo_idx:todo_idx] = TODO_HEADER + + head_end = 0 + while head_end < len(lines) and (lines[head_end].startswith("#+") or not lines[head_end].strip()): + head_end += 1 + ts = datetime.now().astimezone().strftime("%Y-%m-%d %a @ %H:%M:%S %z") + provenance = "reason: %s" % reason if reason else "evidence-based, human-confirmed" + block = [ + "* %s %s" % (keyword, title), + ":PROPERTIES:", + ":ID: %s" % doc_id, + ":END:", + "- %s — retrofitted by spec-sort; status set to %s (%s)" % (ts, keyword, provenance), + "", + ] + lines[head_end:head_end] = block + + out = [] + mirrored = False + for line in lines: + m = re.match(r"^(\|\s*Status\s*\|)([^|]*)(\|.*)$", line, re.IGNORECASE) + if m and not mirrored: + value = " %s" % keyword.lower() + width = len(m.group(2)) + line = m.group(1) + (value.ljust(width) if len(value) <= width else value + " ") + m.group(3) + mirrored = True + out.append(line) + return "\n".join(out) + "\n" + + +def title_for(content, rel): + m = re.search(r"^#\+TITLE:\s*(.+)$", content, re.MULTILINE | re.IGNORECASE) + if m: + return m.group(1).strip() + base = os.path.basename(rel)[: -len(".org")] + return base[: -len("-spec")] if base.endswith("-spec") else base + + +# ---- Marker ------------------------------------------------------------ + + +def stamp_marker(root, date): + path = os.path.join(root, ".ai", "notes.org") + os.makedirs(os.path.dirname(path), exist_ok=True) + content = read_text(path) or "" + line = ":LAST_SPEC_SORT: %s" % date + if ":LAST_SPEC_SORT:" in content: + content = re.sub(r":LAST_SPEC_SORT:.*", line, content, count=1) + elif re.search(r"^\* Workflow State\s*$", content, re.MULTILINE): + content = re.sub(r"(^\* Workflow State\s*$)", r"\1\n" + line, content, count=1, flags=re.MULTILINE) + else: + if content and not content.endswith("\n"): + content += "\n" + content += "\n* Workflow State\n\n%s\n" % line + with open(path, "w", encoding="utf-8") as f: + f.write(content) + + +# ---- Apply ------------------------------------------------------------- + + +class ApplyFailure(Exception): + """Mid-apply failure: args are (applied_labels, remaining_ops, cause).""" + + +def apply_plan(root, plan, fail_after): + """Execute the recorded plan. Returns the applied-op labels; raises + ApplyFailure mid-way on a write error or when the test hook fires.""" + ops = [] + for mv in plan["moves"]: + ops.append(("move", mv)) + for linker, edits in plan["link_edits"].items(): + if linker in {mv["src"] for mv in plan["moves"]}: + continue # a moving doc's own edits ride along in its transform + ops.append(("relink", (linker, edits))) + + applied = [] + specs_dir = os.path.join(root, "docs", "specs") + if plan["moves"] and not os.path.isdir(specs_dir): + os.makedirs(specs_dir) + plan["created_dirs"].append(os.path.join("docs", "specs")) + + for n, (kind, payload) in enumerate(ops, 1): + if fail_after and n > fail_after: + raise ApplyFailure(applied, ops[n - 1:], "injected test failure") + try: + if kind == "move": + mv = payload + content = read_text(os.path.join(root, mv["src"])) + new = transform_spec(content, mv["keyword"], mv["reason"], mv["title"], mv["id"], + plan["link_edits"].get(mv["src"], [])) + with open(os.path.join(root, mv["dest"]), "w", encoding="utf-8") as f: + f.write(new) + os.remove(os.path.join(root, mv["src"])) + applied.append("move %s -> %s" % (mv["src"], mv["dest"])) + else: + linker, edits = payload + path = os.path.join(root, linker) + content = read_text(path) + for old, new in edits: + content = content.replace(old, new) + with open(path, "w", encoding="utf-8") as f: + f.write(content) + applied.append("relink %s (%d link%s)" % (linker, len(edits), "s" if len(edits) != 1 else "")) + except OSError as exc: + raise ApplyFailure(applied, ops[n - 1:], str(exc)) + return applied + + +def residue_check(root, plan): + """Post-apply: no link in the rewritten roots may still resolve to an + old path; bare mentions beyond the acknowledged set fail too.""" + moved = {mv["src"]: mv["dest"] for mv in plan["moves"]} + residue = [] + for linker in rewrite_files(root): + content = read_text(os.path.join(root, linker)) + if content is None: + continue + for m in LINK_RE.finditer(content): + target_path = m.group(1).partition("::")[0] + target = resolve_target(root, linker, target_path, {}) + if target in moved: + residue.append("%s: link still resolves to %s" % (linker, target)) + # Acknowledged mentions were recorded pre-apply; a mention inside a moved + # doc now lives at the doc's destination, so map the file side through the + # moves before comparing. + acknowledged = {(moved.get(f, f), src) for f, _ln, src in plan["bare"]} + for f, ln, src in scan_bare_mentions(root, moved): + if (f, src) not in acknowledged: + residue.append("%s:%d: bare mention of %s" % (f, ln, src)) + return residue + + +def print_recovery(plan, applied, not_applied): + print("FAILURE — the apply did not complete.") + print(" applied:") + for a in applied or ["(nothing)"]: + print(" %s" % a) + print(" not applied:") + for kind, payload in not_applied: + if kind == "move": + print(" move %s -> %s" % (payload["src"], payload["dest"])) + else: + print(" relink %s" % payload[0]) + print("RECOVERY — restore the pre-run state (safe: preflight required a clean tree):") + touched = [mv["src"] for mv in plan["moves"]] + [l for l in plan["link_edits"] if l not in {mv["src"] for mv in plan["moves"]}] + print(" git restore -- %s" % " ".join(touched)) + created = [mv["dest"] for mv in plan["moves"]] + print(" rm -f -- %s # git restore can't remove the created copies" % " ".join(created)) + for d in plan.get("created_dirs", []): + print(" rmdir --ignore-fail-on-non-empty -- %s" % d) + + +# ---- Main --------------------------------------------------------------- + + +def parse_kv(pairs, label): + out = {} + for item in pairs or []: + if "=" not in item: + sys.exit("spec-sort: %s expects REL=VALUE, got %r" % (label, item)) + k, v = item.split("=", 1) + out[os.path.normpath(k)] = v + return out + + +def main(): + ap = argparse.ArgumentParser(prog="spec-sort", add_help=True) + ap.add_argument("--project-root", default=".") + ap.add_argument("--apply", action="store_true") + ap.add_argument("--allow-dirty", action="store_true") + ap.add_argument("--acknowledge-bare", action="store_true") + ap.add_argument("--confirm", action="append", metavar="REL=KEYWORD") + ap.add_argument("--reason", action="append", metavar="REL=TEXT") + ap.add_argument("--skip", action="append", metavar="REL") + ap.add_argument("--plan-file") + args = ap.parse_args() + + root = os.path.abspath(args.project_root) + confirms = parse_kv(args.confirm, "--confirm") + reasons = parse_kv(args.reason, "--reason") + skips = {os.path.normpath(s) for s in (args.skip or [])} + + candidates, anomalies, notes = classify(root) + if not candidates and not anomalies and not notes and not os.path.isdir(os.path.join(root, "docs")): + return 0 # no docs pile at all — silent no-op + + for named in list(confirms) + list(skips) + list(reasons): + if named not in candidates: + print("spec-sort: %s is not a spec candidate" % named) + return 1 + for rel, kw in confirms.items(): + if kw not in LIFECYCLE: + print("spec-sort: %r is not a lifecycle keyword (%s)" % (kw, " ".join(LIFECYCLE))) + return 1 + + # ---- Build the plan (shared by report and apply) ---- + moves = [] + for rel in candidates: + if rel in skips: + continue + if args.apply and rel not in confirms: + continue # gate failure reported below + content = read_text(os.path.join(root, rel)) + moves.append({ + "src": rel, + "dest": dest_for(rel), + "keyword": confirms.get(rel, None), + "reason": reasons.get(rel), + "title": title_for(content, rel), + "id": str(uuid.uuid4()), + }) + moved_map = {mv["src"]: mv["dest"] for mv in moves} + link_edits, ambiguous = plan_link_edits(root, moved_map) + bare = scan_bare_mentions(root, moved_map) + reports = scan_report_only(root, moved_map) + + # ---- Report ---- + for rel in candidates: + content = read_text(os.path.join(root, rel)) + ev = gather_evidence(root, rel, content) + proposed = propose_keyword(ev) + print("CANDIDATE %s -> %s" % (rel, dest_for(rel))) + suffix = " (terminal — requires --reason to apply)" if proposed in TERMINAL else "" + print(" proposed keyword: %s%s" % (proposed, suffix)) + print(" evidence:") + print(" status field: %s" % (ev["status"] or "(none)")) + print(" cookies: %s" % ("; ".join(ev["cookies"]) or "(none)")) + print(" todo.org: %s" % (ev["todo"] or "(no linking task)")) + print(" history: %s" % (ev["history"] or "(none)")) + n_exist, artifacts = ev["artifacts"] + if artifacts: + print(" artifacts: %d/%d named paths exist (%s)" % (n_exist, len(artifacts), ", ".join(artifacts))) + else: + print(" artifacts: (none named)") + for rel in anomalies: + print("ANOMALY %s: named -spec.org but lacks the spec spine (Decisions + Implementation phases); surfaced, not moved" % rel) + for rel in notes: + print("NOTE %s" % rel) + for linker, edits in sorted(link_edits.items()): + for old, new in edits: + print("RELINK %s: %s -> %s" % (linker, old, new)) + for a in ambiguous: + print("AMBIGUOUS %s" % a) + for f, ln, src in bare: + print("BARE-PATH %s:%d: %s (reported for manual handling, never rewritten)" % (f, ln, src)) + for rel, src, note in reports: + print("REPORT %s: reference to %s (%s)" % (rel, src, note)) + + if not args.apply: + if candidates or anomalies or notes: + print("DRY RUN — no changes written. Pass --apply with per-candidate --confirm/--skip to execute.") + return 0 + + # ---- Apply: preflight ---- + try: + porcelain = subprocess.run( + ["git", "status", "--porcelain"], cwd=root, + capture_output=True, text=True, check=True, + ).stdout + except (subprocess.CalledProcessError, FileNotFoundError): + print("spec-sort: --apply needs a git worktree (recovery depends on git restore)") + return 2 + if porcelain.strip(): + dirty = [ln[3:] for ln in porcelain.splitlines()] + if not args.allow_dirty: + print("spec-sort: refusing --apply on a dirty worktree (%d path%s). Commit or stash first, or pass --allow-dirty." + % (len(dirty), "s" if len(dirty) != 1 else "")) + return 2 + print("WARNING --allow-dirty: recovery via git restore would also revert your pre-existing uncommitted changes:") + for p in dirty: + print(" %s" % p) + + # ---- Apply: confirm gate ---- + unaddressed = [rel for rel in candidates if rel not in confirms and rel not in skips] + if unaddressed: + print("spec-sort: unconfirmed candidate(s) — pass --confirm REL=KEYWORD or --skip REL for each:") + for rel in unaddressed: + print(" %s" % rel) + return 1 + for mv in moves: + if mv["keyword"] in TERMINAL and not mv["reason"]: + print("spec-sort: %s -> %s is a terminal state and requires an explicit --reason %s=TEXT" + % (mv["src"], mv["keyword"], mv["src"])) + return 1 + + # ---- Apply: validation ---- + problems = [] + dests = {} + for mv in moves: + if os.path.exists(os.path.join(root, mv["dest"])): + problems.append("%s: destination exists (%s)" % (mv["src"], mv["dest"])) + if mv["dest"] in dests: + problems.append("%s and %s: destination exists twice (%s)" % (mv["src"], dests[mv["dest"]], mv["dest"])) + dests[mv["dest"]] = mv["src"] + for a in ambiguous: + problems.append("ambiguous link: %s" % a) + if bare and not args.acknowledge_bare: + problems.append("bare-path mention(s) listed above need manual handling — re-run with --acknowledge-bare to proceed without rewriting them") + if problems: + print("spec-sort: validation blocked — nothing written:") + for p in problems: + print(" %s" % p) + return 1 + + # ---- Apply: record the plan, then execute from it ---- + today = datetime.now().astimezone().strftime("%Y-%m-%d") + plan = { + "root": root, "date": today, "moves": moves, + "link_edits": link_edits, "bare": bare, + "reports": [list(r) for r in reports], "created_dirs": [], + } + plan_path = args.plan_file or os.path.join( + tempfile.gettempdir(), "spec-sort-plan-%s.json" % os.path.basename(root)) + with open(plan_path, "w", encoding="utf-8") as f: + json.dump(plan, f, indent=2) + print("plan written: %s" % plan_path) + + fail_after = int(os.environ.get("SPEC_SORT_INJECT_FAIL_AFTER", "0") or 0) + try: + applied = apply_plan(root, plan, fail_after) + except ApplyFailure as exc: + print("write failed: %s" % exc.args[2]) + print_recovery(plan, exc.args[0], exc.args[1]) + return 1 + + residue = residue_check(root, plan) + if residue: + print("spec-sort: residue after apply — old paths still referenced:") + for r in residue: + print(" %s" % r) + print_recovery(plan, applied, []) + return 1 + + stamp_marker(root, today) + for a in applied: + print("applied: %s" % a) + print("spec-sort: done — %d spec(s) sorted, :LAST_SPEC_SORT: %s stamped" % (len(moves), today)) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/claude-templates/.ai/scripts/tests/route-batch.bats b/claude-templates/.ai/scripts/tests/route-batch.bats new file mode 100644 index 0000000..84ded5f --- /dev/null +++ b/claude-templates/.ai/scripts/tests/route-batch.bats @@ -0,0 +1,202 @@ +#!/usr/bin/env bats +# +# Tests for claude-templates/.ai/scripts/route-batch — the wrap-up router's +# mechanical go path (wrapup-routing spec, Phase 4 / D7 / D9). +# +# Contract under test: +# route-batch --list one "<destination>\t<heading>" line per task +# carrying :ROUTE_CANDIDATE:; silent when none; +# never modifies anything +# route-batch --go per candidate: write the subtree (minus the +# :ROUTE_CANDIDATE: line) as a one-task handoff, +# deliver via inbox-send to the destination's +# inbox/, then remove the subtree from the local +# todo.org. Send failure leaves the task in +# place and exits non-zero. Empty set: no-op. +# +# Strategy: fixture roots under $TEST_DIR hold a source project and two +# destination projects; INBOX_SEND_ROOTS sandboxes inbox-send's discovery to +# them (the same hook inbox-send's own tests use). + +SCRIPT="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)/route-batch" + +setup() { + TEST_DIR="$(mktemp -d -t route-batch-bats.XXXXXX)" + ROOTS="$TEST_DIR/roots" + SRC="$ROOTS/srcproj" + mkdir -p "$SRC/.ai" "$SRC/inbox" \ + "$ROOTS/alpha/.ai" "$ROOTS/alpha/inbox" \ + "$ROOTS/beta/.ai" "$ROOTS/beta/inbox" + touch "$ROOTS/alpha/todo.org" # alpha has a todo.org; beta deliberately not + + cat > "$SRC/todo.org" <<'EOF' +* Srcproj Open Work +** TODO [#B] Alpha-bound task :feature: +:PROPERTIES: +:ROUTE_CANDIDATE: alpha +:END: +Body line about the alpha work. +*** TODO Sub-task that rides along +** TODO [#C] Purely local task +Local body stays put. +** TODO [#C] Beta-bound task :quick: +:PROPERTIES: +:CREATED: [2026-07-01 Tue] +:ROUTE_CANDIDATE: beta +:END: +Beta body. +EOF + + export INBOX_SEND_ROOTS="$ROOTS" + cd "$SRC" +} + +teardown() { + rm -rf "$TEST_DIR" +} + +# ---- --list ------------------------------------------------------------ + +@test "route-batch --list: one destination+heading line per candidate, backlog excluded" { + run "$SCRIPT" --list + [ "$status" -eq 0 ] + [[ "$output" == *"alpha"*"Alpha-bound task"* ]] + [[ "$output" == *"beta"*"Beta-bound task"* ]] + [[ "$output" != *"Purely local task"* ]] +} + +@test "route-batch --list: empty candidate set is silent (exit 0)" { + sed -i '/:ROUTE_CANDIDATE:/d' todo.org + run "$SCRIPT" --list + [ "$status" -eq 0 ] + [ -z "$output" ] +} + +@test "route-batch --list: modifies nothing (skip leaves all in place)" { + before="$(cat todo.org)" + run "$SCRIPT" --list + [ "$status" -eq 0 ] + [ "$(cat todo.org)" = "$before" ] + [ -z "$(ls "$ROOTS/alpha/inbox" "$ROOTS/beta/inbox" 2>/dev/null | grep -v ':')" ] +} + +# ---- --go -------------------------------------------------------------- + +@test "route-batch --go: delivers each candidate to its destination inbox with provenance" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + alpha_file=$(find "$ROOTS/alpha/inbox" -name '*from-srcproj*' -type f) + beta_file=$(find "$ROOTS/beta/inbox" -name '*from-srcproj*' -type f) + [ -n "$alpha_file" ] + [ -n "$beta_file" ] + grep -q 'Alpha-bound task' "$alpha_file" + grep -q 'Sub-task that rides along' "$alpha_file" # children ride along + grep -q 'Beta-bound task' "$beta_file" + ! grep -q ':ROUTE_CANDIDATE:' "$alpha_file" + ! grep -q ':ROUTE_CANDIDATE:' "$beta_file" +} + +@test "route-batch --go: removes routed subtrees from todo.org, leaves local tasks" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + ! grep -q 'Alpha-bound task' todo.org + ! grep -q 'Sub-task that rides along' todo.org + ! grep -q 'Beta-bound task' todo.org + grep -q 'Purely local task' todo.org + grep -q 'Local body stays put' todo.org +} + +@test "route-batch --go: a kept property drawer survives minus the marker" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + beta_file=$(find "$ROOTS/beta/inbox" -name '*from-srcproj*' -type f) + grep -q ':CREATED: \[2026-07-01 Tue\]' "$beta_file" +} + +@test "route-batch --go: destination with inbox/ but no todo.org still delivers" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + [ ! -f "$ROOTS/beta/todo.org" ] + [ -n "$(find "$ROOTS/beta/inbox" -name '*from-srcproj*' -type f)" ] +} + +@test "route-batch --go: empty candidate set is a silent no-op (exit 0)" { + sed -i '/:ROUTE_CANDIDATE:/d' todo.org + before="$(cat todo.org)" + run "$SCRIPT" --go + [ "$status" -eq 0 ] + [ -z "$output" ] + [ "$(cat todo.org)" = "$before" ] +} + +@test "route-batch --go: a failed send leaves that task in place, marker intact, and exits non-zero" { + sed -i 's/:ROUTE_CANDIDATE: beta/:ROUTE_CANDIDATE: ghost/' todo.org + run "$SCRIPT" --go + [ "$status" -ne 0 ] + grep -q 'Beta-bound task' todo.org # failed route stays local + grep -q ':ROUTE_CANDIDATE: ghost' todo.org # marker survives so it resurfaces next wrap + ! grep -q 'Alpha-bound task' todo.org # the good route still landed + [ -n "$(find "$ROOTS/alpha/inbox" -name '*from-srcproj*' -type f)" ] +} + +@test "route-batch --go: handoff headings are promoted to top level" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + alpha_file=$(find "$ROOTS/alpha/inbox" -name '*from-srcproj*' -type f) + grep -q '^\* TODO \[#B\] Alpha-bound task' "$alpha_file" + grep -q '^\*\* TODO Sub-task that rides along' "$alpha_file" +} + +@test "route-batch --go: a drawer emptied by the marker strip is pruned from the handoff" { + run "$SCRIPT" --go + [ "$status" -eq 0 ] + alpha_file=$(find "$ROOTS/alpha/inbox" -name '*from-srcproj*' -type f) + ! grep -q ':PROPERTIES:' "$alpha_file" +} + +# ---- Overlapping candidates (nested marker data-loss regression) -------- + +@test "route-batch --go: nested candidates conflict — both stay, bystander survives, exit non-zero" { + cat > todo.org <<'EOF' +* Srcproj Open Work +** TODO [#B] Parent bound for alpha +:PROPERTIES: +:ROUTE_CANDIDATE: alpha +:END: +Parent body. +*** TODO Child bound for beta +:PROPERTIES: +:ROUTE_CANDIDATE: beta +:END: +Child body. +** TODO [#C] Innocent bystander task +Bystander body. +EOF + run "$SCRIPT" --go + [ "$status" -ne 0 ] + [[ "$output" == *"CONFLICT"* ]] + grep -q 'Parent bound for alpha' todo.org + grep -q 'Child bound for beta' todo.org + grep -q 'Innocent bystander task' todo.org + grep -q 'Bystander body' todo.org + [ -z "$(find "$ROOTS/alpha/inbox" "$ROOTS/beta/inbox" -name '*from-srcproj*' -type f)" ] +} + +@test "route-batch: duplicate identical markers in one drawer dedupe to a single route" { + cat > todo.org <<'EOF' +* Srcproj Open Work +** TODO [#B] Double-tagged for alpha +:PROPERTIES: +:ROUTE_CANDIDATE: alpha +:ROUTE_CANDIDATE: alpha +:END: +Body. +EOF + run "$SCRIPT" --list + [ "$status" -eq 0 ] + [ "$(echo "$output" | grep -c 'Double-tagged')" -eq 1 ] + [[ "$output" != *"CONFLICT"* ]] + run "$SCRIPT" --go + [ "$status" -eq 0 ] + [ "$(find "$ROOTS/alpha/inbox" -name '*from-srcproj*' -type f | wc -l)" -eq 1 ] +} diff --git a/claude-templates/.ai/scripts/tests/self-inject.bats b/claude-templates/.ai/scripts/tests/self-inject.bats new file mode 100644 index 0000000..482f61d --- /dev/null +++ b/claude-templates/.ai/scripts/tests/self-inject.bats @@ -0,0 +1,78 @@ +#!/usr/bin/env bats +# Tests for self-inject.sh — tmux is the external boundary, stubbed with a +# recording fake so no real server is needed. + +setup() { + SCRIPT="$BATS_TEST_DIRNAME/../self-inject.sh" + STUB_DIR="$BATS_TEST_TMPDIR/bin" + LOG="$BATS_TEST_TMPDIR/tmux.log" + mkdir -p "$STUB_DIR" +} + +# A tmux stub that records every invocation and answers list-panes from +# $STUB_PANES (empty by default, so pane derivation fails unless a test +# provides ancestry-matching output). +make_stub() { + cat > "$STUB_DIR/tmux" <<'EOF' +#!/bin/sh +echo "$@" >> "$LOG" +case "$1" in + list-panes) printf '%s\n' "$STUB_PANES" ;; +esac +EOF + chmod +x "$STUB_DIR/tmux" +} + +@test "self-inject: -t pane with no pairs echoes the pane and exits 0" { + make_stub + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="" sh "$SCRIPT" -t %42 + [ "$status" -eq 0 ] + [ "$output" = "%42" ] + # Pane was supplied, nothing sent: tmux must not have been called. + [ ! -e "$LOG" ] +} + +@test "self-inject: no pane derivable and no -t exits 1 with an error" { + make_stub + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="" sh "$SCRIPT" 0 "hello" + [ "$status" -eq 1 ] + case "$output" in *"no owning pane"*) : ;; *) false ;; esac +} + +@test "self-inject: derives the pane from process ancestry via list-panes" { + make_stub + # The stub reports the bats test process itself as a pane's pane_pid; + # the script runs as our child, so that pid is in its ancestry. + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="$$ %7" sh "$SCRIPT" + [ "$status" -eq 0 ] + [ "$output" = "%7" ] +} + +@test "self-inject: one delay/text pair sends literal text then Enter" { + make_stub + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="" sh "$SCRIPT" -t %3 0 "/clear" + [ "$status" -eq 0 ] + run cat "$LOG" + [ "${lines[0]}" = "send-keys -t %3 -l /clear" ] + [ "${lines[1]}" = "send-keys -t %3 Enter" ] +} + +@test "self-inject: multiple pairs send in order" { + make_stub + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="" \ + sh "$SCRIPT" -t %3 0 "/clear" 0 "go — resume" + [ "$status" -eq 0 ] + run cat "$LOG" + [ "${lines[0]}" = "send-keys -t %3 -l /clear" ] + [ "${lines[1]}" = "send-keys -t %3 Enter" ] + [ "${lines[2]}" = "send-keys -t %3 -l go — resume" ] + [ "${lines[3]}" = "send-keys -t %3 Enter" ] +} + +@test "self-inject: dangling odd argument after pairs is ignored" { + make_stub + run env PATH="$STUB_DIR:$PATH" LOG="$LOG" STUB_PANES="" sh "$SCRIPT" -t %3 0 "one" 99 + [ "$status" -eq 0 ] + run cat "$LOG" + [ "${#lines[@]}" -eq 2 ] +} diff --git a/claude-templates/.ai/scripts/tests/spec-sort.bats b/claude-templates/.ai/scripts/tests/spec-sort.bats new file mode 100644 index 0000000..583e458 --- /dev/null +++ b/claude-templates/.ai/scripts/tests/spec-sort.bats @@ -0,0 +1,453 @@ +#!/usr/bin/env bats +# +# Tests for claude-templates/.ai/scripts/spec-sort — the one-time docs-pile +# retrofit from the docs-lifecycle spec: classify docs/**/*.org outside +# docs/specs/ (spec candidate iff it carries BOTH a Decisions heading AND an +# Implementation phases heading), show an evidence panel, and on --apply +# move + rename confirmed candidates to docs/specs/*-spec.org, prepend the +# status heading (:ID:, dated history line), rewrite the keyword header to +# the two-sequence form, relink file: links across the rewritten roots, +# stamp :LAST_SPEC_SORT: in .ai/notes.org. +# +# Contract under test (docs/specs/2026-07-01-docs-lifecycle-spec.org, +# "The retrofit"): +# - dry-run report is the default; --apply writes +# - --apply refuses on a dirty worktree (exit 2) unless --allow-dirty +# - every candidate needs --confirm REL=KEYWORD or --skip REL (exit 1 +# otherwise); terminal keywords need --reason REL=TEXT +# - plan validated before the first write; destination collisions block +# - bare-path mentions in rewritten roots block --apply until +# --acknowledge-bare waives them (reported, never rewritten) +# - mid-apply failure names applied/not-applied + git restore recovery +# - idempotent: a sorted project yields no candidates, no changes +# +# Strategy: each test builds a throwaway git project fixture and runs the +# real script against it. Mid-apply failure is forced via the test-only +# SPEC_SORT_INJECT_FAIL_AFTER env hook. + +SCRIPT="$(cd "$(dirname "$BATS_TEST_FILENAME")/.." && pwd)/spec-sort" + +setup() { + TEST_DIR="$(mktemp -d -t spec-sort-bats.XXXXXX)" + PROJ="$TEST_DIR/proj" + mkdir -p "$PROJ" +} + +teardown() { + rm -rf "$TEST_DIR" +} + +# Standard fixture: one spec candidate, one note, a stray root spec with a +# spine, an anomaly (-spec.org name, no spine), inbound links from todo.org, +# a sibling note, a session archive (report-only surface), and .ai/notes.org +# with a Workflow State section. +make_project() { + cd "$PROJ" + git init -q + git config user.email test@test + git config user.name test + mkdir -p docs/design .ai/sessions + + cat > docs/design/widget.org <<'EOF' +#+TITLE: Widget Feature +#+DATE: 2026-05-01 +#+TODO: DRAFT REVIEW | SHIPPED + +* Metadata +| Status | draft | +| Owner | Craig | + +* Summary +The widget feature. See [[file:scratch-note.org][the note]]. + +* Decisions [1/2] +** DONE Pick the widget shape +** TODO Pick the color + +* Implementation phases +** Phase 1 — build =src/widget.py= +EOF + + cat > docs/design/scratch-note.org <<'EOF' +#+TITLE: Scratch Note + +* Metadata +| Status | n/a | + +* Thoughts +See [[file:widget.org][the widget spec]]. +EOF + + cat > docs/rooty-spec.org <<'EOF' +#+TITLE: Rooty + +* Decisions +** DONE Only decision + +* Implementation phases +** Phase 1 — nothing +EOF + + cat > docs/lonely-spec.org <<'EOF' +#+TITLE: Lonely +Just prose, no spine. +EOF + + cat > todo.org <<'EOF' +* Open Work +** DOING [#B] Widget feature +Spec: [[file:docs/design/widget.org][widget spec]]. +Summary anchor: [[file:docs/design/widget.org::*Summary][the summary]]. +EOF + + cat > .ai/notes.org <<'EOF' +* Active Reminders + +* Workflow State +:LAST_AUDIT: 2026-06-28 +EOF + + cat > .ai/sessions/2026-06-01-old.org <<'EOF' +Old log: [[file:../../docs/design/widget.org][widget]] +EOF + + git add -A + git commit -qm init +} + +# Confirm flags that satisfy the gate for the standard fixture's candidates. +CONFIRM_ALL=(--confirm docs/design/widget.org=DRAFT --confirm docs/rooty-spec.org=DRAFT) + +# ---- Classification (dry-run) ---------------------------------------- + +@test "spec-sort: dry-run classifies the spine-carrying doc as a candidate" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"CANDIDATE docs/design/widget.org -> docs/specs/widget-spec.org"* ]] +} + +@test "spec-sort: a Metadata table alone does not qualify — note stays a note" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"NOTE docs/design/scratch-note.org"* ]] + [[ "$output" != *"CANDIDATE docs/design/scratch-note.org"* ]] +} + +@test "spec-sort: stray root spec with a spine is a candidate, suffix not doubled" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"CANDIDATE docs/rooty-spec.org -> docs/specs/rooty-spec.org"* ]] + [[ "$output" != *"rooty-spec-spec.org"* ]] +} + +@test "spec-sort: -spec.org name without a spine is an anomaly, never auto-moved" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"ANOMALY docs/lonely-spec.org"* ]] + [[ "$output" != *"CANDIDATE docs/lonely-spec.org"* ]] +} + +@test "spec-sort: docs/specs/ contents are excluded from classification" { + make_project + mkdir -p docs/specs + cp docs/design/widget.org docs/specs/sorted-spec.org + git add -A && git commit -qm more + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" != *"CANDIDATE docs/specs/sorted-spec.org"* ]] +} + +@test "spec-sort: no docs/ directory is a silent no-op" { + cd "$PROJ" + git init -q + git config user.email test@test + git config user.name test + echo x > README.md + git add -A && git commit -qm init + run "$SCRIPT" + [ "$status" -eq 0 ] + [ -z "$output" ] +} + +# ---- Evidence panel --------------------------------------------------- + +@test "spec-sort: evidence panel shows status field, cookies, and todo.org task" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"status field: draft"* ]] + [[ "$output" == *"Decisions [1/2]"* ]] + [[ "$output" == *"todo.org:"*"DOING"*"Widget feature"* ]] +} + +@test "spec-sort: keyword proposal follows the evidence — DOING from the linked DOING task" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + # status field says draft, but the linking todo.org task is DOING — the + # panel proposes the state the strongest evidence supports + [[ "$output" == *"proposed keyword: DOING"* ]] +} + +@test "spec-sort: an 'incomplete' status field never proposes the terminal IMPLEMENTED" { + make_project + sed -i 's/| Status | draft |/| Status | incomplete |/' docs/design/widget.org + git add -A && git commit -qm status + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" != *"proposed keyword: IMPLEMENTED"* ]] +} + +# ---- Confirm gate ----------------------------------------------------- + +@test "spec-sort --apply: refuses when a candidate is neither confirmed nor skipped" { + make_project + run "$SCRIPT" --apply --confirm docs/design/widget.org=DRAFT + [ "$status" -eq 1 ] + [[ "$output" == *"unconfirmed"* ]] + [[ "$output" == *"docs/rooty-spec.org"* ]] + [ -f docs/design/widget.org ] # nothing moved +} + +@test "spec-sort --apply: a terminal keyword without --reason refuses" { + make_project + run "$SCRIPT" --apply --confirm docs/design/widget.org=IMPLEMENTED --skip docs/rooty-spec.org + [ "$status" -eq 1 ] + [[ "$output" == *"--reason"* ]] + [ -f docs/design/widget.org ] +} + +@test "spec-sort --apply: a terminal keyword with --reason records it in the history line" { + make_project + run "$SCRIPT" --apply --confirm docs/design/widget.org=IMPLEMENTED \ + --reason "docs/design/widget.org=shipped in v2, confirmed against src" \ + --skip docs/rooty-spec.org + [ "$status" -eq 0 ] + grep -q '^\* IMPLEMENTED Widget Feature' docs/specs/widget-spec.org + grep -q 'shipped in v2, confirmed against src' docs/specs/widget-spec.org +} + +@test "spec-sort --apply: --skip leaves the candidate in place and still stamps the marker" { + make_project + run "$SCRIPT" --apply --skip docs/design/widget.org --skip docs/rooty-spec.org + [ "$status" -eq 0 ] + [ -f docs/design/widget.org ] + grep -q ':LAST_SPEC_SORT:' .ai/notes.org +} + +# ---- Preflight -------------------------------------------------------- + +@test "spec-sort --apply: refuses on a dirty worktree (exit 2)" { + make_project + echo "drift" >> todo.org + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 2 ] + [[ "$output" == *"dirty"* ]] + [ -f docs/design/widget.org ] +} + +@test "spec-sort --apply --allow-dirty: proceeds and names what recovery loses" { + make_project + echo "drift" >> todo.org + git add todo.org && git commit -qm drift # keep the link intact; dirty a different file + echo "scratch" > untracked-note.txt + echo "local edit" >> .ai/notes.org + run "$SCRIPT" --apply --allow-dirty "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + [[ "$output" == *"pre-existing"* ]] + [[ "$output" == *".ai/notes.org"* ]] + [ -f docs/specs/widget-spec.org ] +} + +# ---- Move + rename + rewrite ------------------------------------------ + +@test "spec-sort --apply: moves, renames to -spec.org, prepends status heading with :ID: and history" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + [ -f docs/specs/widget-spec.org ] + [ ! -f docs/design/widget.org ] + grep -q '^\* DRAFT Widget Feature' docs/specs/widget-spec.org + grep -q ':ID:' docs/specs/widget-spec.org + grep -q 'retrofitted by spec-sort' docs/specs/widget-spec.org +} + +@test "spec-sort --apply: keyword header rewritten to the two-sequence form" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '^#+TODO: TODO | DONE$' docs/specs/widget-spec.org + grep -q '^#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED$' docs/specs/widget-spec.org + ! grep -q 'DRAFT REVIEW | SHIPPED' docs/specs/widget-spec.org +} + +@test "spec-sort --apply: Metadata Status field mirrors the confirmed keyword in lowercase" { + make_project + run "$SCRIPT" --apply --confirm docs/design/widget.org=READY --skip docs/rooty-spec.org + [ "$status" -eq 0 ] + grep -q '^\* READY Widget Feature' docs/specs/widget-spec.org + grep -Eq '^\| Status[[:space:]]*\|[[:space:]]*ready' docs/specs/widget-spec.org +} + +# ---- Relink ----------------------------------------------------------- + +@test "spec-sort --apply: rewrites the todo.org link, preserving the description" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '\[\[file:docs/specs/widget-spec.org\]\[widget spec\]\]' todo.org + ! grep -q 'docs/design/widget.org' todo.org +} + +@test "spec-sort --apply: preserves a ::anchor suffix through the rewrite" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '\[\[file:docs/specs/widget-spec.org::\*Summary\]\[the summary\]\]' todo.org +} + +@test "spec-sort --apply: recomputes a sibling note's relative link to the moved spec" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '\[\[file:../specs/widget-spec.org\]\[the widget spec\]\]' docs/design/scratch-note.org +} + +@test "spec-sort --apply: recomputes the moved spec's own outbound link to an unmoved note" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '\[\[file:../design/scratch-note.org\]\[the note\]\]' docs/specs/widget-spec.org +} + +@test "spec-sort: session archives are reported, never rewritten" { + make_project + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"REPORT .ai/sessions/2026-06-01-old.org"* ]] + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q 'docs/design/widget.org' .ai/sessions/2026-06-01-old.org +} + +@test "spec-sort: a synced template path report names the canonical rulesets file" { + make_project + mkdir -p .ai/workflows + echo 'See [[file:../../docs/design/widget.org][widget]]' > .ai/workflows/startup.org + git add -A && git commit -qm wf + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" == *"REPORT .ai/workflows/startup.org"* ]] + [[ "$output" == *"claude-templates/.ai/workflows/startup.org"* ]] +} + +# ---- Bare-path mentions ----------------------------------------------- + +@test "spec-sort --apply: a bare-path mention in a rewritten root blocks until acknowledged" { + make_project + echo "raw mention: docs/design/widget.org needs review" >> todo.org + git add -A && git commit -qm bare + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 1 ] + [[ "$output" == *"BARE"* ]] + [ -f docs/design/widget.org ] # nothing moved + run "$SCRIPT" --apply --acknowledge-bare "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q 'raw mention: docs/design/widget.org' todo.org # reported, never rewritten +} + +@test "spec-sort --apply: a moving doc's bare mention of its own old path is acknowledgeable, not post-apply residue" { + make_project + echo "History: docs/design/widget.org was drafted in May." >> docs/design/widget.org + git add -A && git commit -qm selfmention + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 1 ] + [[ "$output" == *"BARE"* ]] + run "$SCRIPT" --apply --acknowledge-bare "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] # the acknowledged mention rides along to docs/specs/; not residue + grep -q ':LAST_SPEC_SORT:' .ai/notes.org +} + +# ---- Plan validation --------------------------------------------------- + +@test "spec-sort --apply: a destination collision blocks validation, nothing moved" { + make_project + mkdir -p docs/specs + echo "occupied" > docs/specs/widget-spec.org + git add -A && git commit -qm occupy + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 1 ] + [[ "$output" == *"destination exists"* ]] + [ -f docs/design/widget.org ] + [ "$(cat docs/specs/widget-spec.org)" = "occupied" ] +} + +@test "spec-sort --apply: writes the plan file before executing" { + make_project + run "$SCRIPT" --apply --plan-file "$TEST_DIR/plan.json" "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + [ -f "$TEST_DIR/plan.json" ] + grep -q 'widget-spec.org' "$TEST_DIR/plan.json" +} + +# ---- Mid-apply failure recovery ---------------------------------------- + +@test "spec-sort --apply: forced mid-apply failure yields named recovery, not a half-migrated shrug" { + make_project + run env SPEC_SORT_INJECT_FAIL_AFTER=1 "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 1 ] + [[ "$output" == *"RECOVERY"* ]] + [[ "$output" == *"git restore"* ]] + [[ "$output" == *"applied"* ]] + [[ "$output" == *"not applied"* ]] + ! grep -q ':LAST_SPEC_SORT:' .ai/notes.org # no stamp on a failed apply +} + +# ---- Idempotence + marker ---------------------------------------------- + +@test "spec-sort --apply: stamps :LAST_SPEC_SORT: in the Workflow State section" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q ':LAST_SPEC_SORT: ' .ai/notes.org + # lands inside the Workflow State section, alongside the existing marker + awk '/^\* Workflow State/{ws=1} ws && /:LAST_SPEC_SORT:/{found=1} END{exit !found}' .ai/notes.org +} + +@test "spec-sort --apply: creates the Workflow State section when notes.org lacks it" { + make_project + printf '* Active Reminders\n' > .ai/notes.org + git add -A && git commit -qm notes + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + grep -q '^\* Workflow State' .ai/notes.org + grep -q ':LAST_SPEC_SORT: ' .ai/notes.org +} + +@test "spec-sort --apply: zero candidates still stamps the marker (clears the nudge)" { + make_project + rm docs/design/widget.org docs/rooty-spec.org docs/lonely-spec.org + git add -A && git commit -qm notes-only + run "$SCRIPT" --apply + [ "$status" -eq 0 ] + grep -q ':LAST_SPEC_SORT:' .ai/notes.org +} + +@test "spec-sort: a second run after a successful apply finds nothing to do" { + make_project + run "$SCRIPT" --apply "${CONFIRM_ALL[@]}" + [ "$status" -eq 0 ] + git add -A && git commit -qm sorted + run "$SCRIPT" + [ "$status" -eq 0 ] + [[ "$output" != *"CANDIDATE"* ]] + run "$SCRIPT" --apply + [ "$status" -eq 0 ] + run git status --porcelain + # only the re-stamped marker (same date) may differ — tree stays clean + [ -z "$(git status --porcelain -- docs todo.org)" ] +} diff --git a/claude-templates/.ai/scripts/tests/test-lint-org.el b/claude-templates/.ai/scripts/tests/test-lint-org.el index 3b8a9bb..d14879f 100644 --- a/claude-templates/.ai/scripts/tests/test-lint-org.el +++ b/claude-templates/.ai/scripts/tests/test-lint-org.el @@ -685,6 +685,37 @@ missing-rules violation." (judgments (lo-test--judgments (plist-get out :issues)))) (should-not (member 'level-2-dated-header (lo-test--checkers judgments))))) +;;; subtask-done-not-dated check (the inverse: level-3+ done keyword) + +(ert-deftest lo-subtask-done-not-dated-flags-level3 () + "A level-3 DONE sub-task still carrying the keyword is flagged for conversion." + (let* ((out (lo-test--run + "* Open Work\n\n** TODO [#B] Parent\n*** DONE [#C] Sub-task done\nCLOSED: [2026-06-20 Sat 10:00]\nBody.\n")) + (judgments (lo-test--judgments (plist-get out :issues)))) + (should (= 0 (plist-get out :fixes))) ; judgment-only, never auto-fixed + (should (member 'subtask-done-not-dated (lo-test--checkers judgments))))) + +(ert-deftest lo-subtask-done-not-dated-flags-level4-cancelled () + "A level-4 CANCELLED sub-task is flagged too." + (let* ((out (lo-test--run + "* Open Work\n\n** PROJECT [#B] Parent\n*** TODO Mid\n**** CANCELLED Deep abandoned\nCLOSED: [2026-06-20 Sat]\n")) + (judgments (lo-test--judgments (plist-get out :issues)))) + (should (member 'subtask-done-not-dated (lo-test--checkers judgments))))) + +(ert-deftest lo-subtask-done-not-dated-ignores-level2 () + "A level-2 DONE task is a top-level task, not a sub-task — this checker skips it." + (let* ((out (lo-test--run + "* Open Work\n\n** DONE [#B] Top-level\nCLOSED: [2026-06-20 Sat]\nBody.\n")) + (judgments (lo-test--judgments (plist-get out :issues)))) + (should-not (member 'subtask-done-not-dated (lo-test--checkers judgments))))) + +(ert-deftest lo-subtask-done-not-dated-ignores-dated-and-lowercase () + "An already-dated level-3 entry, and the word done in a title, are not flagged." + (let* ((out (lo-test--run + "* Open Work\n\n** TODO [#B] Parent\n*** 2026-06-20 Sat @ 10:00:00 -0400 landed\n*** TODO wrap the done cleanup\n")) + (judgments (lo-test--judgments (plist-get out :issues)))) + (should-not (member 'subtask-done-not-dated (lo-test--checkers judgments))))) + ;;; --------------------------------------------------------------------------- ;;; structural heading checks (org-lint gaps) diff --git a/claude-templates/.ai/scripts/tests/test-todo-cleanup.el b/claude-templates/.ai/scripts/tests/test-todo-cleanup.el index e569d9a..ffbf2fb 100644 --- a/claude-templates/.ai/scripts/tests/test-todo-cleanup.el +++ b/claude-templates/.ai/scripts/tests/test-todo-cleanup.el @@ -768,5 +768,176 @@ in ISSUES, in document order." (should (= 2 (plist-get once :bumped))) (should (= 2 (plist-get twice :bumped))))) +;;; --------------------------------------------------------------------------- +;;; --convert-subtasks harness + tests + +(defun tc-test--reset-convert (&optional check) + (setq tc-fixes 0 tc-archived 0 tc-bumped 0 tc-converted 0 tc-archived-to-file 0 + tc-issues nil + tc-check-only (and check t) + tc-archive-done nil tc-sync-child-priority nil tc-convert-subtasks t + tc-current-file nil + tc-archive-retain-days nil tc-archive-reference-date nil tc-archive-file nil)) + +(defun tc-test--convert (content &optional runs check) + "Write CONTENT to a temp .org file, run `--convert-subtasks' RUNS times (default 1). +Return a plist: :result final file contents, :converted count from the last run, +:issues from the last run. CHECK non-nil ⇒ --check (preview, no writes)." + (let ((file (make-temp-file "tc-test-" nil ".org")) + last-converted last-issues) + (unwind-protect + (progn + (with-temp-file file (insert content)) + (dotimes (_ (or runs 1)) + (tc-test--reset-convert check) + (tc-process-file file) + (setq last-converted tc-converted last-issues tc-issues) + (tc-test--drop-buffer file)) + (list :result (with-temp-buffer (insert-file-contents file) + (buffer-string)) + :converted last-converted + :issues last-issues)) + (tc-test--drop-buffer file) + (delete-file file)))) + +;; The UTC offset in a converted header is the test machine's local offset for +;; that date, so assertions match it as `[-+]NNNN' rather than a fixed value — +;; the mode's job is to emit a well-formed offset, not to run in one timezone. + +(defconst tc-test--convert-timed + "* Project Open Work +** TODO [#B] Parent task +*** DONE [#C] F12 opens the terminal :feature:quick: +CLOSED: [2026-06-27 Sat 12:50] +Verified live: docks, toggles, colors clean. +") + +(ert-deftest tc-convert-timed-subtask-normal () + "Normal: a timed CLOSED close becomes a dated header, keyword/priority/tags/CLOSED gone." + (let* ((out (tc-test--convert tc-test--convert-timed)) + (res (plist-get out :result))) + (should (= 1 (plist-get out :converted))) + (should (string-match-p + "^\\*\\*\\* 2026-06-27 Sat @ 12:50:00 [-+][0-9]\\{4\\} F12 opens the terminal$" + res)) + (should-not (string-match-p "CLOSED:" res)) + (should-not (string-match-p "DONE" res)) + (should (string-match-p "Verified live: docks, toggles, colors clean\\." res)) + (should (string-match-p "^\\*\\* TODO \\[#B\\] Parent task$" res)))) + +(defconst tc-test--convert-dateonly + "* Project Open Work +** PROJECT [#B] Parent +**** DONE [#B] Write full spec :refactor: +CLOSED: [2026-05-04 Mon] +Body. +") + +(ert-deftest tc-convert-dateonly-boundary-midnight () + "Boundary: a date-only CLOSED (no time) yields 00:00:00, at level 4." + (let ((res (plist-get (tc-test--convert tc-test--convert-dateonly) :result))) + (should (string-match-p + "^\\*\\*\\*\\* 2026-05-04 Mon @ 00:00:00 [-+][0-9]\\{4\\} Write full spec$" + res)) + (should-not (string-match-p "CLOSED:" res)))) + +(defconst tc-test--convert-level2 + "* Project Open Work +** DONE [#B] Top-level task +CLOSED: [2026-06-01 Mon 09:00] +Body. +") + +(ert-deftest tc-convert-leaves-level-2-alone-boundary () + "Boundary: a level-2 DONE task is a top-level task, not a sub-task — untouched." + (let ((out (tc-test--convert tc-test--convert-level2))) + (should (= 0 (plist-get out :converted))) + (should (equal tc-test--convert-level2 (plist-get out :result))))) + +(ert-deftest tc-convert-idempotent-boundary () + "Boundary: a second run over an already-dated entry converts nothing new." + (let ((once (tc-test--convert tc-test--convert-timed 1)) + (twice (tc-test--convert tc-test--convert-timed 2))) + (should (equal (plist-get once :result) (plist-get twice :result))) + (should (= 0 (plist-get twice :converted))))) + +(defconst tc-test--convert-nested + "* Project Open Work +** TODO [#B] Parent +*** DONE Outer sub :feature: +CLOSED: [2026-06-10 Wed 08:15] +**** DONE Inner sub +CLOSED: [2026-06-09 Tue 07:00] +Inner body. +") + +(ert-deftest tc-convert-nested-done-subtasks-boundary () + "Boundary: a done sub-task nested under a done sub-task — both convert." + (let* ((out (tc-test--convert tc-test--convert-nested)) + (res (plist-get out :result))) + (should (= 2 (plist-get out :converted))) + (should (string-match-p + "^\\*\\*\\* 2026-06-10 Wed @ 08:15:00 [-+][0-9]\\{4\\} Outer sub$" res)) + (should (string-match-p + "^\\*\\*\\*\\* 2026-06-09 Tue @ 07:00:00 [-+][0-9]\\{4\\} Inner sub$" res)) + (should-not (string-match-p "CLOSED:" res)))) + +(defconst tc-test--convert-cancelled + "* Project Open Work +** TODO [#B] Parent +*** CANCELLED [#C] Abandoned idea :feature: +CLOSED: [2026-06-15 Mon 10:00] +") + +(ert-deftest tc-convert-cancelled-subtask-boundary () + "Boundary: a CANCELLED sub-task converts too (terminal state)." + (let ((res (plist-get (tc-test--convert tc-test--convert-cancelled) :result))) + (should (string-match-p + "^\\*\\*\\* 2026-06-15 Mon @ 10:00:00 [-+][0-9]\\{4\\} Abandoned idea$" res)) + (should-not (string-match-p "CANCELLED" res)))) + +(defconst tc-test--convert-noclosed + "* Project Open Work +** TODO [#B] Parent +*** DONE Orphan with no closed date +Body only. +") + +(ert-deftest tc-convert-skips-subtask-without-closed-error () + "Error: a done sub-task with no parseable CLOSED is flagged and left unchanged." + (let ((out (tc-test--convert tc-test--convert-noclosed))) + (should (= 0 (plist-get out :converted))) + (should (equal tc-test--convert-noclosed (plist-get out :result))) + (should (cl-some (lambda (i) (eq (plist-get i :kind) 'convert-skip)) + (plist-get out :issues))))) + +(ert-deftest tc-convert-check-mode-previews-without-writing () + "Check mode reports the conversion but writes nothing." + (let ((out (tc-test--convert tc-test--convert-timed 1 t))) + (should (= 1 (plist-get out :converted))) + (should (equal tc-test--convert-timed (plist-get out :result))) + (should (cl-some (lambda (i) (eq (plist-get i :kind) 'convert-would)) + (plist-get out :issues))))) + +(defconst tc-test--convert-closed-with-deadline + "* Project Open Work +** TODO [#B] Parent task +*** DONE [#C] Ship the panel :feature: +CLOSED: [2026-06-27 Sat 12:50] DEADLINE: <2026-06-30 Tue> +Body line. +") + +(ert-deftest tc-convert-preserves-deadline-on-shared-planning-line-boundary () + "Boundary: removing the CLOSED cookie keeps a DEADLINE sharing its planning line." + (let* ((out (tc-test--convert tc-test--convert-closed-with-deadline)) + (res (plist-get out :result))) + (should (= 1 (plist-get out :converted))) + (should (string-match-p + "^\\*\\*\\* 2026-06-27 Sat @ 12:50:00 [-+][0-9]\\{4\\} Ship the panel$" + res)) + (should-not (string-match-p "CLOSED:" res)) + (should (string-match-p "^DEADLINE: <2026-06-30 Tue>$" res)) + (should (string-match-p "^Body line\\.$" res)))) + (provide 'test-todo-cleanup) ;;; test-todo-cleanup.el ends here diff --git a/claude-templates/.ai/scripts/todo-cleanup.el b/claude-templates/.ai/scripts/todo-cleanup.el index 541d106..bd8166d 100644 --- a/claude-templates/.ai/scripts/todo-cleanup.el +++ b/claude-templates/.ai/scripts/todo-cleanup.el @@ -5,10 +5,12 @@ ;; emacs --batch -q -l todo-cleanup.el --check todo.org # hygiene report only ;; emacs --batch -q -l todo-cleanup.el --archive-done todo.org # archive completed subtrees ;; emacs --batch -q -l todo-cleanup.el --archive-done --check todo.org # preview the archive +;; emacs --batch -q -l todo-cleanup.el --convert-subtasks todo.org # dated-rewrite done level-3+ sub-tasks +;; emacs --batch -q -l todo-cleanup.el --convert-subtasks --check todo.org # preview the conversion ;; emacs --batch -q -l todo-cleanup.el --sync-child-priority todo.org # bump children whose priority drifted below the parent's ;; emacs --batch -q -l todo-cleanup.el --check-child-priority todo.org # preview the sync (same as --sync-child-priority --check) ;; -;; Three independent modes: +;; Four independent modes: ;; ;; * Default (hygiene). Designed for the wrap-it-up workflow: cheap, idempotent, ;; safe to run every session. @@ -52,6 +54,20 @@ ;; Archiving is consequential, so it's never run by default; it does *not* ;; also run the hygiene passes. ;; +;; * --convert-subtasks (opt-in). Rewrites every level-3-and-deeper heading whose +;; TODO state is DONE/CANCELLED/FAILED into a dated event-log entry +;; (`<stars> YYYY-MM-DD Day @ HH:MM:SS -ZZZZ <text>'), dropping the keyword, +;; priority cookie, and tags, and removing the now-redundant CLOSED line. The +;; date and time come from that entry's own CLOSED cookie; a date-only close +;; yields 00:00:00, and the UTC offset is computed DST-aware for that date. +;; This enforces the todo-format depth rule that interactive closes +;; (`org-log-done' → DONE + CLOSED) and `--archive-done' (level-2 only) leave +;; unapplied. The heading text is preserved verbatim — a batch tool can't +;; past-tense an imperative title reliably. Idempotent (an already-dated +;; heading has no done keyword); a done sub-task with no parseable CLOSED date +;; is flagged and left alone, never stamped with a fabricated date. Like +;; --archive-done it does not also run the hygiene passes. +;; ;; * --sync-child-priority (opt-in). Walks every heading with a priority cookie ;; ([#A]-[#D]) and, for each of its direct child headings whose own priority ;; is lower (later in the alphabet — D is lower than A), bumps the child's @@ -73,11 +89,16 @@ (require 'calendar) (setq org-todo-keywords - '((sequence "TODO" "DOING" "WAITING" "NEXT" "|" "DONE" "CANCELLED"))) + '((sequence "TODO" "DOING" "WAITING" "NEXT" "|" "DONE" "CANCELLED" "FAILED"))) (defconst tc-done-states '("DONE" "CANCELLED") "TODO keywords that mark an entry as completed for `--archive-done'.") +(defconst tc--convert-done-states '("DONE" "CANCELLED" "FAILED") + "TODO keywords whose level-3-and-deeper entries `--convert-subtasks' rewrites +to dated event-log entries. Broader than `tc-done-states' because a FAILED +sub-task is terminal too and belongs in the parent's dated history.") + (defconst tc--priority-cookie-regexp "\\[#\\([A-Z]\\)\\]" "Regexp matching an org priority cookie. Match group 1 is the letter.") @@ -89,10 +110,12 @@ every heading below it.") (defvar tc-fixes 0) (defvar tc-archived 0) (defvar tc-bumped 0) +(defvar tc-converted 0) (defvar tc-issues nil) (defvar tc-check-only nil) (defvar tc-archive-done nil) (defvar tc-sync-child-priority nil) +(defvar tc-convert-subtasks nil) (defvar tc-current-file nil) (defvar tc-current-dir nil) (defvar tc-archived-to-file 0) @@ -578,6 +601,138 @@ before their descendants — a [#A] → [#B] → [#D] chain collapses in one pas (org-map-entries #'tc-sync-child-priority-at-heading nil 'file)) ;;; --------------------------------------------------------------------------- +;;; --convert-subtasks mode +;; +;; A sub-task (a heading at level 3 or deeper, i.e. under a parent task) that is +;; marked DONE/CANCELLED/FAILED should become a dated event-log entry per the +;; todo-format depth rule: drop the keyword, priority cookie, and tags, and +;; rewrite the heading to `<stars> YYYY-MM-DD Day @ HH:MM:SS -ZZZZ <text>' so the +;; parent's subtree grows a chronological history instead of a long tail of +;; nested DONE lines. Nothing enforced this before: `org-log-done' just flips an +;; interactive close to DONE + CLOSED, and `--archive-done' only touches level 2. +;; So level-3+ closes piled up as DONE keywords. This mode converts them +;; mechanically, pulling the timestamp from each entry's own CLOSED cookie. The +;; heading text is kept verbatim (a batch tool can't reliably past-tense an +;; imperative title, and guessing prose in the task file is worse than leaving it +;; as written). Idempotent: an already-dated heading has no done keyword, so it +;; is skipped. A done sub-task with no parseable CLOSED cookie can't be dated, so +;; it is flagged and left alone rather than stamped with a fabricated date. + +(defun tc--closed-parts-in-entry () + "Return a plist (:year :month :day :dow :hour :minute) from the CLOSED cookie +of the entry at point, or nil when the entry has no parseable CLOSED line. +:hour and :minute are nil when the cookie carries only a date. The CLOSED line +sits in canonical position directly under the heading, so the first match within +the entry is the task's own close." + (save-excursion + (org-back-to-heading t) + (let ((end (save-excursion + (or (outline-next-heading) (goto-char (point-max))) + (point)))) + (when (re-search-forward + (concat "CLOSED:[ \t]*\\[\\([0-9]\\{4\\}\\)-\\([0-9]\\{2\\}\\)-\\([0-9]\\{2\\}\\)" + "[ \t]+\\([A-Za-z]+\\)" + "\\(?:[ \t]+\\([0-9]\\{2\\}\\):\\([0-9]\\{2\\}\\)\\)?\\]") + end t) + (list :year (match-string 1) :month (match-string 2) :day (match-string 3) + :dow (match-string 4) + :hour (match-string 5) :minute (match-string 6)))))) + +(defun tc--tz-offset-string (year month day hour minute) + "Return the local UTC offset (e.g. \"-0500\") for the given wall-clock instant. +DST-aware: `encode-time' with an unknown-DST field lets the system pick the +correct offset for that date, so a summer close reads -0400 and a winter one +-0500 without hardcoding either." + (format-time-string + "%z" (encode-time (list 0 minute hour day month year nil -1 nil)))) + +(defun tc--dated-header-line (level parts title) + "Build the dated event-log heading string from LEVEL, CLOSED PARTS, and TITLE. +Missing time in PARTS defaults to 00:00:00 (the close logged only a date)." + (let* ((year (plist-get parts :year)) + (month (plist-get parts :month)) + (day (plist-get parts :day)) + (dow (plist-get parts :dow)) + (hh (or (plist-get parts :hour) "00")) + (mm (or (plist-get parts :minute) "00")) + (tz (tc--tz-offset-string (string-to-number year) + (string-to-number month) + (string-to-number day) + (string-to-number hh) + (string-to-number mm)))) + (format "%s %s-%s-%s %s @ %s:%s:00 %s %s" + (make-string level ?*) year month day dow hh mm tz title))) + +(defun tc--convert-collect-targets () + "Markers at every heading at level >= 3 whose TODO state is a done state. +Collected up front so the rewrite loop can edit the buffer without disturbing an +in-progress `org-map-entries' walk; markers track their headings across edits." + (let (targets) + (org-map-entries + (lambda () + (when (and (>= (org-current-level) 3) + (member (org-get-todo-state) tc--convert-done-states)) + (push (copy-marker (point)) targets))) + nil 'file) + (nreverse targets))) + +(defun tc--convert-one-subtask (marker) + "Convert the done sub-task heading at MARKER to a dated event-log entry. +Under `tc-check-only' the conversion is reported but not performed." + (goto-char marker) + (org-back-to-heading t) + (let* ((level (org-current-level)) + (title (org-get-heading t t t t)) + (line (line-number-at-pos)) + (parts (tc--closed-parts-in-entry))) + (cond + ((null parts) + (push (list :kind 'convert-skip :file tc-current-file + :line line :heading title + :detail "no CLOSED date to derive the timestamp") + tc-issues)) + (t + (let ((new (tc--dated-header-line level parts title))) + (cl-incf tc-converted) + (if tc-check-only + (push (list :kind 'convert-would :file tc-current-file + :line line :heading title :new new) + tc-issues) + ;; Replace the heading line, then drop the now-redundant CLOSED + ;; cookie from the entry (its date now lives in the header). Only + ;; the cookie goes: a planning line can also carry DEADLINE: or + ;; SCHEDULED: beside it, and those survive on their line. A line + ;; left blank by the removal is deleted whole. + (delete-region (line-beginning-position) (line-end-position)) + (insert new) + (let ((end (save-excursion + (or (outline-next-heading) (goto-char (point-max))) + (point)))) + (save-excursion + (when (re-search-forward "CLOSED:[ \t]*\\[[^]]*\\][ \t]*" end t) + (replace-match "") + (let ((bol (line-beginning-position)) + (eol (line-end-position))) + (if (string-match-p "\\`[ \t]*\\'" + (buffer-substring bol eol)) + (delete-region bol (min (1+ eol) (point-max))) + (goto-char bol) + (when (looking-at "[ \t]+") + (replace-match ""))))))) + (push (list :kind 'convert-done :file tc-current-file + :line line :heading title :new new) + tc-issues))))))) + +(defun tc-convert-subtasks-in-file () + "Rewrite every level-3-and-deeper DONE/CANCELLED/FAILED heading to a dated +event-log entry, pulling the timestamp from its CLOSED cookie. Honors +`tc-check-only'." + (let ((targets (tc--convert-collect-targets))) + (dolist (m targets) + (tc--convert-one-subtask m) + (set-marker m nil)))) + +;;; --------------------------------------------------------------------------- ;;; Driver + reporting (defun tc-process-file (file) @@ -590,6 +745,8 @@ before their descendants — a [#A] → [#B] → [#D] chain collapses in one pas (tc-archive-done-in-file)) (tc-sync-child-priority (tc-sync-child-priority-in-file)) + (tc-convert-subtasks + (tc-convert-subtasks-in-file)) (t ;; Pass 1: auto-fix bogus state logs (or report under --check). (org-map-entries #'tc-fix-bogus-state-log-in-entry nil 'file) @@ -684,9 +841,34 @@ before their descendants — a [#A] → [#B] → [#D] chain collapses in one pas (plist-get i :child-heading) (plist-get i :parent-heading))))))) +(defun tc--emit-convert-report () + ;; Silent on a real-mode no-op (nothing to convert and nothing skipped), for + ;; the same reason as the archive report: the wrap runs cleanup passes more + ;; than once, and a vocal \"0 converted\" reads as noise. Check mode always + ;; reports (the preview is what the caller asked for), and a skip always + ;; reports (a done sub-task with no CLOSED date is a real condition to see). + (let ((has-skip (cl-some (lambda (i) (eq (plist-get i :kind) 'convert-skip)) + tc-issues))) + (when (or tc-check-only (> tc-converted 0) has-skip) + (princ (format "todo-cleanup --convert-subtasks: %d sub-task(s) %s%s\n" + tc-converted + (if tc-check-only "would convert" "converted") + (if tc-check-only " — CHECK MODE (no writes)" ""))) + (dolist (i (reverse tc-issues)) + (pcase (plist-get i :kind) + ((or 'convert-done 'convert-would) + (princ (format " %s:%d: %s\n → %s\n" + (plist-get i :file) (plist-get i :line) + (plist-get i :heading) (plist-get i :new)))) + ('convert-skip + (princ (format " skipped %s:%d: %s — %s\n" + (plist-get i :file) (plist-get i :line) + (plist-get i :heading) (plist-get i :detail))))))))) + (defun tc-emit-report () (cond (tc-archive-done (tc--emit-archive-report)) (tc-sync-child-priority (tc--emit-sync-report)) + (tc-convert-subtasks (tc--emit-convert-report)) (t (tc--emit-hygiene-report)))) (defun tc-main () @@ -701,6 +883,9 @@ before their descendants — a [#A] → [#B] → [#D] chain collapses in one pas (when (member "--sync-child-priority" command-line-args-left) (setq tc-sync-child-priority t) (setq command-line-args-left (delete "--sync-child-priority" command-line-args-left))) + (when (member "--convert-subtasks" command-line-args-left) + (setq tc-convert-subtasks t) + (setq command-line-args-left (delete "--convert-subtasks" command-line-args-left))) ;; --check-child-priority is the report-only alias for ;; `--sync-child-priority --check'. (when (member "--check-child-priority" command-line-args-left) @@ -708,7 +893,7 @@ before their descendants — a [#A] → [#B] → [#D] chain collapses in one pas (setq command-line-args-left (delete "--check-child-priority" command-line-args-left))) (if (null command-line-args-left) (progn - (princ "Usage: emacs --batch -q -l todo-cleanup.el [--check] [--archive-done | --sync-child-priority | --check-child-priority] FILE...\n") + (princ "Usage: emacs --batch -q -l todo-cleanup.el [--check] [--archive-done | --convert-subtasks | --sync-child-priority | --check-child-priority] FILE...\n") (kill-emacs 1)) (let ((files command-line-args-left)) (setq command-line-args-left nil) @@ -727,6 +912,7 @@ ert-run-tests-batch-and-exit'." (cl-every (lambda (a) (cond ((member a '("--check" "--archive-done" + "--convert-subtasks" "--sync-child-priority" "--check-child-priority")) t) diff --git a/claude-templates/.ai/workflows/INDEX.org b/claude-templates/.ai/workflows/INDEX.org index a474b29..88721ed 100644 --- a/claude-templates/.ai/workflows/INDEX.org +++ b/claude-templates/.ai/workflows/INDEX.org @@ -54,6 +54,11 @@ This index must list every =.org= file in =.ai/workflows/= except this one and e - Roam-mode triggers: "inbox zero", "empty the inbox", "process the roam inbox", "triage my roam inbox" - Auto-mode trigger: "auto inbox zero" (match before "inbox zero") +- =work-the-backlog.org= — the autonomous task-execution loop, the single home for working a batch of marked tasks unattended: takes an ordered task set (explicit list or tag query) + session mode (=file-only= default / =autonomous-commit= + paging) + a hard run cap; each candidate passes the mechanical eligibility gate (status =TODO= + =:solo:= per the project's scheme header) and the four-item defer checklist, then is implemented to the full quality bar (TDD, =/review-code=, =/voice=) as its own logical commits. Fed by the inbox auto-loop's chain step (yes-gated, file-only, cap 1) and the no-approvals speedrun preset (pre-flight Q&A → autonomous-commit + always-push + end-of-set page over an explicit ordered list). + - Speedrun triggers: "speedrun", "no approvals speedrun", "speedrun these: <task set>" — any phrase containing "speedrun" routes here (the preset), never to =no-approvals.org= + - Manual triggers: "work the backlog", "work the backlog with <task set>" (file-only defaults) + - Synthesis trigger: "synthesize backlog metrics" — read the per-project metrics logs, compute trends + the corrections signal, write one =:agent:metrics:= KB node (personal projects only) + ** Calendar - =add-calendar-event.org= — create a calendar event. @@ -117,6 +122,7 @@ This index must list every =.org= file in =.ai/workflows/= except this one and e - Triggers: "session harvest", "harvest the sessions", "let's run the session-harvest workflow", "monthly harvest", "mine the sessions" - =no-approvals.org= — drop the interaction-level approval gates for a pre-agreed batch while keeping engineering-discipline gates (=/review-code=, =/voice personal=, tests, session-log updates, subagent reviews, destructive-action consent). Mode stays on until Craig turns it off, a real question arises, the queue empties, or the conversation switches topics. - Triggers: "no-approvals mode", "no approvals", "no-approval", "no need for approval gates", "stop asking, just keep going", "I'll check back in when you're done or stuck", "do all =<selector>= with no-approval" + - Exception: any phrase containing "speedrun" routes to =work-the-backlog.org='s no-approvals speedrun preset instead * Living Document diff --git a/claude-templates/.ai/workflows/clean-todo.org b/claude-templates/.ai/workflows/clean-todo.org index dd33056..a1b2af5 100644 --- a/claude-templates/.ai/workflows/clean-todo.org +++ b/claude-templates/.ai/workflows/clean-todo.org @@ -27,7 +27,17 @@ Deletes bogus =- State "X" from "X" [date]= log lines (state didn't actually cha To preview without writing, run =--check= first: =emacs --batch -q -l .ai/scripts/todo-cleanup.el --check todo.org=. -** Step 2: Archive completed work +** Step 2: Convert done sub-tasks to dated entries + +#+begin_src bash +emacs --batch -q -l .ai/scripts/todo-cleanup.el --convert-subtasks todo.org +#+end_src + +Rewrites every heading at level 3 or deeper whose TODO state is DONE/CANCELLED/FAILED into a dated event-log entry (=<stars> YYYY-MM-DD Day @ HH:MM:SS -ZZZZ <text>=), dropping the keyword, priority cookie, and tags, and removing the =CLOSED:= line. Enforces the depth rule that a completed sub-task becomes dated history — a shape interactive org closes and =--archive-done= (level-2 only) leave unapplied. Timestamp comes from each entry's =CLOSED= cookie; heading text kept verbatim; idempotent; a done sub-task with no parseable =CLOSED= is flagged and left alone. Run before archiving so a parent's sub-tasks are already dated when it moves. Capture the output. + +To preview without writing: =emacs --batch -q -l .ai/scripts/todo-cleanup.el --convert-subtasks --check todo.org=. + +** Step 3: Archive completed work #+begin_src bash emacs --batch -q -l .ai/scripts/todo-cleanup.el --archive-done todo.org @@ -37,10 +47,11 @@ Moves every level-2 subtree whose TODO state is DONE or CANCELLED out of the "Op To preview the moves without writing: =emacs --batch -q -l .ai/scripts/todo-cleanup.el --archive-done --check todo.org=. -** Step 3: Summarize +** Step 4: Summarize -Report to Craig from the two captured outputs: +Report to Craig from the three captured outputs: - Hygiene: how many bogus state-log lines were deleted; any orphan-planning warnings (file:line + heading), or "none". +- Convert: how many done sub-tasks were rewritten to dated entries (heading + line), any flagged for no =CLOSED= date, or "nothing to convert". - Archive: how many subtrees moved and which (heading + line), or "nothing to move" / the skip reason if a section was missing or ambiguous. - If the file changed, note that =todo.org= now has an uncommitted edit — review =git diff -- todo.org= and commit it (in this repo's commit style) if it looks right. If nothing changed, say so and stop. @@ -49,7 +60,7 @@ Don't auto-commit. The summary is the review point; Craig decides whether the di * Principles - *Both passes apply, not just preview.* The workflow is invoked because cleanup is wanted. Use the =--check= variants only when Craig asks for a dry run. -- *Two passes, two invocations.* =--archive-done= is its own mode and does not run the hygiene pass; run both. +- *Separate modes, separate invocations.* =--convert-subtasks=, =--archive-done=, and the hygiene pass are each their own mode and don't run the others; run all three. - *Never auto-commit todo.org.* Surface the diff and let Craig commit it. The cleanup is a working-tree change, fully reversible until committed. - *Trust the script.* It's fast and idempotent; if there's nothing to do, it reports zero and exits clean. No pre-checks. diff --git a/claude-templates/.ai/workflows/inbox.org b/claude-templates/.ai/workflows/inbox.org index 5fc855f..b28fdaa 100644 --- a/claude-templates/.ai/workflows/inbox.org +++ b/claude-templates/.ai/workflows/inbox.org @@ -114,6 +114,14 @@ The item extends a task already filed. Update the parent TODO's body with a date ** File as TODO Substantive but waits, or needs design/triage before implementation. Add the TODO under =* <Project> Open Work= with priority + tags per the priority-scheme check (core §6). Body summarizes the proposal and links the inbox content if it's been moved to =docs/design/=. Delete the inbox file (or move it to =docs/design/= first if the content survives). +*Route-candidate marking (feeds the wrap-up router).* After filing, check whether the keeper's inferred home is a different project: + +#+begin_src bash +python3 .ai/scripts/route_recommend.py --item "<the keeper's heading + body text>" --exclude "$(basename "$PWD")" +#+end_src + +On a =<destination>\tstrong= or =<destination>\tweak= result, stamp the new TODO's property drawer with =:ROUTE_CANDIDATE: <destination>= (create the drawer if the task has none). A =none= result stamps nothing, and a local keeper stays unstamped. The marker is the wrap-up router's entire candidate set — =wrap-it-up.org= Step 3 surfaces exactly the =:ROUTE_CANDIDATE:=-tagged tasks and offers to deliver each to its destination's inbox, never scanning the standing backlog. Stamping is cheap and reversible (the router's skip leaves the task in place; a wrong marker is one property line to delete), so prefer stamping on any plausible match — the human reviews the batch at wrap time. + *Blocking-dependency handoff.* A special shape: another project sends a note that *this* project's work is blocking one of theirs ("your task X is blocked on us — we need Y"). File or link the owning task, tag it =:blocker:=, and name the requesting project in the body (see the cross-project dependency convention in =todo-format.md=). The =:blocker:= tag makes =open-tasks.org= surface that task *first*, since clearing it unblocks the other project. Dedup against an existing task rather than filing a duplicate. When the work later lands, drop =:blocker:= and notify the waiting project (=inbox-send <their-project> --text "Delivered: <what> — you're unblocked."=) so it can lift its own =:blocked:=. ** Defer @@ -453,18 +461,19 @@ Take these up when the single-destination version is in use and the multi-projec * Mode: auto inbox zero -A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop <interval>= running roam mode. It is in-session and interactive by design — each cycle reports, and a find waits for Craig's go before any work happens. +A recurring, *interactive* roam check. Trigger phrase: "auto inbox zero" (match before "inbox zero" — the longer phrase wins). On invocation, *ask Craig for the interval* (e.g. 30 min, 2 hours), then drive the loop with =/loop <interval>= running roam mode. It is in-session and interactive by design — each cycle reports what it found and filed. ** Per cycle 1. Run roam mode's scan (Phase A local check + Phase B roam scan), read-only — no =git pull=. The capture-guard still gates any write: use =capture-guard --wait= (core §5) so a transient capture clears itself; if it's still open after the wait, *defer this cycle's roam reconcile to the next cycle* rather than surfacing — the loop cadence is the retry, and the filed items get swept next time. The rare write hands its git to =roam-sync= (roam Phase D). 2. *Nothing found* → no inbox summary. One acknowledgement line: =ran at HH:MM, nothing found=. Nothing else. The acknowledge-only-on-empty rule keeps a quiet inbox quiet. 3. *Items found* → summarize the found items, file them as tasks (roam Phase C), and *append them to a displayed queue* — the harness task list, via =TaskCreate= — so the queue accumulates across cycles. Then ask: "run this batch next?" - - *Yes* → launch into implementing the found items, each through the normal disposition ladder (core §3) + verify flow. + - *Yes* → chain into =work-the-backlog.org= as an explicit second step after routing completes: pass it the eligibility query over the queued items (status =TODO= + =:solo:= per the scheme header, priority-ordered), =file-only= mode, paging off, cap 1. The highest-priority eligible candidate runs; the rest wait for the next tick or a later yes. - *No* → they stay queued for a later go. + This mode never implements anything itself — routing ends here, and the execution loop lives in =work-the-backlog.org=, its one home. 4. *Cross-cycle dedup.* Subsequent cycles add only *newly-found* items to the same displayed queue, never re-surfacing what's already there. Dedup against the queue (the =TaskCreate= list), not against what's already been implemented — a find that was queued-but-not-yet-run must not reappear, and one already filed into =todo.org= is dropped by roam Phase C's status check. -A find is always surfaced and gated on Craig's yes; a quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because its execute step waits for a yes. +A find is always surfaced and filed; execution happens only through the =work-the-backlog.org= chain and waits for Craig's yes. A quiet inbox produces only the timestamped acknowledgement. =auto inbox zero= is inherently in-session because its chain step waits for that yes. ** Fully-unattended pass (=/schedule=) — vNext, not v1 diff --git a/claude-templates/.ai/workflows/no-approvals.org b/claude-templates/.ai/workflows/no-approvals.org index 1efce82..9e1c894 100644 --- a/claude-templates/.ai/workflows/no-approvals.org +++ b/claude-templates/.ai/workflows/no-approvals.org @@ -22,6 +22,8 @@ Craig activates the mode with any of: - Queuing several tasks in =todo.org= followed by any phrase above - Any equivalent phrasing that signals he doesn't want to be re-asked between items +*Not this mode:* any phrase containing "speedrun" ("speedrun", "no approvals speedrun") routes to =work-the-backlog.org='s no-approvals speedrun preset — an autonomous batch over an explicit ordered task set, with a pre-flight Q&A, autonomous commits, always-push, and an end-of-set page. This mode is the general interaction-gate suspension for whatever work is already underway; the speedrun is the dedicated backlog-batch workflow. + Mode resets when: - Craig says approvals are back on diff --git a/claude-templates/.ai/workflows/open-tasks.org b/claude-templates/.ai/workflows/open-tasks.org index 4ba29dd..02a0847 100644 --- a/claude-templates/.ai/workflows/open-tasks.org +++ b/claude-templates/.ai/workflows/open-tasks.org @@ -23,15 +23,16 @@ Don't route "task review" / "review tasks" here — those trigger the hygiene ha * Phase A: Data Gathering (both modes) -** Phase A pre-step — archive any freshly-DONE tasks +** Phase A pre-step — normalize freshly-closed tasks -Before reading =todo.org=, run the cleanup script's archive-done sweep so completed level-2 subtrees move from =* $Project Open Work= to =* $Project Resolved=: +Before reading =todo.org=, run two cleanup sweeps so the read reflects current state. First convert any done sub-tasks to dated entries, then archive completed level-2 subtrees from =* $Project Open Work= to =* $Project Resolved=: #+begin_src bash +emacs --batch -q -l .ai/scripts/todo-cleanup.el --convert-subtasks todo.org emacs --batch -q -l .ai/scripts/todo-cleanup.el --archive-done todo.org #+end_src -Costs a few hundred milliseconds. Without it, a task that completed earlier in the session sits as =** DONE= under Open Work until the next =clean-todo= or wrap-up pass, and Next Mode would surface it as a "what's next" candidate. The sweep makes Phase A's read of =todo.org= reflect current state. +Costs a few hundred milliseconds. Without the archive sweep, a task that completed earlier in the session sits as =** DONE= under Open Work until the next =clean-todo= or wrap-up pass, and Next Mode would surface it as a "what's next" candidate. The convert sweep runs first so a completed parent's sub-tasks are already dated when it archives; it also keeps interactive level-3 closes from lingering as DONE keywords. Together they make Phase A's read of =todo.org= reflect current state. Skip the sweep if the workflow is invoked in an explicit read-only or dry-run context. Default is to run it. diff --git a/claude-templates/.ai/workflows/spec-create.org b/claude-templates/.ai/workflows/spec-create.org index 508b969..1249181 100644 --- a/claude-templates/.ai/workflows/spec-create.org +++ b/claude-templates/.ai/workflows/spec-create.org @@ -82,8 +82,9 @@ This is where the spec earns a "Ready" from review: an engineer must be able to ** Phase 5 — Wire it up (conventions) -- *Filename + location:* =docs/<problem-slug>-spec.org=. Org-mode. The slug names the *problem/feature*, not a date. Must end in =-spec.org=. -- *Metadata header:* a small table at the top — Status, Owner, Reviewer(s), Date, Related (link to the task/ticket). +- *Filename + location:* =docs/specs/YYYY-MM-DD-<problem-slug>-spec.org= — formal specs live in =docs/specs/=, never =docs/design/= (that's for notes, brainstorms, inventories; see =claude-rules/docs-lifecycle.md=). Org-mode. The slug names the *problem/feature*; no status suffixes ever — status lives in the file. Must end in =-spec.org=. +- *Status heading (first element after the file header):* a top-level heading carrying the lifecycle keyword, stamped =DRAFT= at authoring — spec-create owns this flip. It holds an =:ID:= UUID (generate with =uuidgen=) and dated history lines, newest first. The keyword is authoritative; the Metadata =Status= field mirrors it in lowercase. Transitions are three lines in one file (keyword + history line + mirror): spec-review flips =READY=, spec-response flips =DOING= at decomposition, the final build task flips =IMPLEMENTED=. Terminal states always record a reason. +- *Metadata header:* a small table at the top — Status (the lowercase mirror), Owner, Reviewer(s), Date, Related (link to the task/ticket). - *Review-and-iteration-history stub:* add a =Review and iteration history= section at the bottom and seed it with the author's first entry. =spec-review= and =spec-response= append provenance entries here, so the heading shape is a contract: =YYYY-MM-DD Day @ HH:MM:SS -ZZZZ — Contributor — Role=, body fields What / Why / Artifacts. - *Cross-link both ways:* the spec links its task; the task links the spec (replace the task's inline plan with a terse description + a =file:= link to the spec). @@ -103,7 +104,14 @@ Then it's ready for =spec-review.org=. Snapshot-vs-living rule: keep the spec li ,#+TITLE: <Feature> — Spec ,#+AUTHOR: <author> ,#+DATE: <YYYY-MM-DD> -,#+TODO: TODO | DONE SUPERSEDED CANCELLED +,#+TODO: TODO | DONE +,#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +,* DRAFT <spec short name> +:PROPERTIES: +:ID: <uuid — generate with uuidgen> +:END: +- <YYYY-MM-DD Day @ HH:MM:SS -ZZZZ> — drafted. ,* Metadata | Status | draft | diff --git a/claude-templates/.ai/workflows/spec-response.org b/claude-templates/.ai/workflows/spec-response.org index de5b1c8..7628e49 100644 --- a/claude-templates/.ai/workflows/spec-response.org +++ b/claude-templates/.ai/workflows/spec-response.org @@ -130,9 +130,11 @@ When related specs were reviewed together, two reviews can recommend opposite th This is the *last* step of the workflow, and it runs *only after the author confirms the spec is Ready* — never during review iterations. A Ready spec nobody can act on is unfinished; this phase turns it into tracked work. It applies to every project type (library, application, service, docs set). -1. *Decide where the tasks live.* If the work is spinning off into its own project/repo, move the parent task into that project's =todo.org= (and relocate the spec with it); otherwise use the current project's =todo.org=. One parent task owns the effort; the phase tasks hang under it. +*This phase owns the =READY= → =DOING= lifecycle flip* (docs-lifecycle convention): when the decomposition below lands, update the spec's top-level status heading keyword to =DOING=, add a dated history line, and set the Metadata =Status= mirror to =doing= — three lines, one file. -2. *Create one task per implementation phase* from the spec's =Implementation phases=, in dependency order, so the task set as a whole describes the *full* milestone (e.g. v1) with no gaps. Each task body names the deliverable, its tests, and how it is verified. Carry over deferred/vNext work and any publish/release steps as their own tasks. +1. *Decide where the tasks live.* If the work is spinning off into its own project/repo, move the parent task into that project's =todo.org= (and relocate the spec with it); otherwise use the current project's =todo.org=. One parent task owns the effort; the phase tasks hang under it. *Stamp the binding:* the parent task's =:PROPERTIES:= drawer gets a =:SPEC_ID:= line holding the spec's status-heading UUID. That property is the durable join task-audit uses to police =DOING= specs (a =DOING= spec whose bound parent is closed, archived, or missing gets flagged). + +2. *Create one task per implementation phase* from the spec's =Implementation phases=, in dependency order, so the task set as a whole describes the *full* milestone (e.g. v1) with no gaps. Each task body names the deliverable, its tests, and how it is verified. Carry over deferred/vNext work and any publish/release steps as their own tasks. *Always end the set with the flip task:* a final "flip the spec to IMPLEMENTED (+ dated history line + mirror)" task under the same parent — the tracked obligation that closes the lifecycle loop when the build finishes. Never skip it; "a human remembers" is the failure mode this exists to prevent. 3. *Turn a critical eye on completeness.* Re-read the spec — every phase, every acceptance criterion, every named deliverable, every data-safety/principle rule — and confirm each has a home in a task. The work is not done when the tasks merely exist; it is done when nothing in the spec is left untracked. This completeness pass is mandatory regardless of project type. diff --git a/claude-templates/.ai/workflows/spec-review.org b/claude-templates/.ai/workflows/spec-review.org index 833dfc9..d4998eb 100644 --- a/claude-templates/.ai/workflows/spec-review.org +++ b/claude-templates/.ai/workflows/spec-review.org @@ -50,6 +50,11 @@ Run it *early* — design review exists to catch viability problems and costly m Before Phase 1, verify the file under review ends with =-spec.org=. Every design, decision, or planning document under a project's =docs/= directory carries that suffix as its identifier. The =.org= extension alone is not enough because =docs/= holds non-spec org files too (tutorials, frozen inventories, reference material). +*Location expectation (docs-lifecycle convention).* Formal specs live in =docs/specs/=. Whether that's enforced depends on whether the project has run its one-time =spec-sort= retrofit: + +- =:LAST_SPEC_SORT:= present in =.ai/notes.org= Workflow State → the project has sorted; a =-spec.org= file outside =docs/specs/= fails this precondition. Surface it: "this spec sits outside docs/specs/ — move it (and update inbound links) before review." +- Marker absent → legacy locations (=docs/= root, =docs/design/=) stay reviewable; add one nudge line to the review output ("this project's docs pile has never been spec-sorted — say 'run spec-sort' to sort it") and proceed. No legacy spec is ever unreviewable during the transition. + If the file does not end with =-spec.org=, stop immediately and surface the mismatch: #+begin_example @@ -128,6 +133,7 @@ Work the spec against these. Each is a source of concrete findings, not a box to - *Performance & scale.* Expected counts (issues/comments/labels/teams/projects/views)? Server-side filtering where possible? Bounded, visible pagination? Cached name→ID lookups? Sync calls in the command path acceptable? Could a save hook or whole-file scan make N network calls? Rendering linear? Full-file rewrites avoided? Long-running operations async/cancellable/observable? Is concurrency/queueing/backpressure defined? Are high-output process filters throttled and cheap? Is progress/ETA exposed only when defensible, and are hung/stalled operations detectable and killable? Identify UI freezes, repeated network calls, unbounded pagination — without premature optimization. - *Security & privacy.* API keys safe? Debug logs leaking secrets or private issue text? Confirmations before mutating shared workspace objects? Personal vs shared distinguished? Local files holding sensitive descriptions/comments? Anything to redact from messages/logs? Any work-tracker integration may handle private company data. - *UX & accessibility.* Discoverable commands? Recoverable mistakes? Prompts ordered to the task? Safe, useful defaults? Informative-not-noisy status messages? Does the UI avoid implying unsupported actions are supported? Match the upstream product's permissions/concepts? Are customizations named in user language, with clear defaults and docstrings? For Emacs packages, command names, completion candidates, buffer layout, defcustom names, and message wording *are* the UX. +- *Operational-panel UI traps.* Applies when the spec covers a user-facing panel, dialog, or control surface; skip otherwise. Lists that mix saved, current, and generated items must name each item's source. Refresh or scan actions must not gate data that could be shown immediately. Add-forms must not ask the user to retype values the system already discovered. Destructive confirmations read in future tense before the action and verified-result tense after it. Diagnostics, performance, logging, and repair affordances are reviewed as one coherent flow before extra pages or buttons are added. A popup launched from a bar, tray, or tool surface should visually belong to that launcher. (Promoted from archsetup's Waybar network-panel review, 2026-06-30.) - *Test strategy and coverage.* Characterization tests before behavior changes? Pure functions to unit-test? API responses needing fixtures? Command flows needing stubs? Regression tests for prior bugs? Boundary/error cases? What's covered elsewhere and shouldn't be re-tested? Which existing tests must change? How is coverage generated, summarized, and used to find untested/refactor-worthy code? Prefer tests that lock contracts: representation shape, query compilation, sync no-op, conflict refusal, pagination, dirty-buffer protection, log redaction, and long-running/slow-operation behavior via fakes rather than flaky live dependencies. - *Observability & operations.* How does a user see what the package is doing? Progress messages for long ops? Useful, safe debug logging? Are logs structured enough to isolate issues from a bug report? Are commands provided to inspect/clear caches, test connectivity, diagnose backends/tools, copy redacted debug info, or reproduce command invocations? How are terminal states discovered: completion, failure, partial success, stalled/hung, cancelled, cleanup-unverified, and "needs user action"? Does the product notify only when useful, avoid noisy success spam, and keep non-success states visible until acknowledged? For generated org files, headers should often carry source, filter/view name, refresh time, count, truncation state. - *Comparable-product sentiment.* When there are obvious adjacent products, research what users love and hate about them from official docs plus current community reports. Do not cargo-cult their feature set; translate findings into the spec's scope. For each loved behavior, say whether the spec provides it, intentionally omits it, or defers it. For each hated behavior, say whether the spec avoids, resolves, inherits, or accepts it. @@ -166,6 +172,8 @@ Assign one label consistently: The most useful reviews move a spec from =Not ready= to =Ready with caveats= or =Ready= once decisions are captured. +*The =Ready= verdict flips the spec's lifecycle status.* spec-review owns the =DRAFT= → =READY= transition (docs-lifecycle convention): on assigning =Ready= (or =Ready with caveats= the author accepts), update the spec's top-level status heading keyword to =READY=, add a dated history line under it naming the review that passed, and set the Metadata =Status= mirror to =ready= — three lines, one file. Any other rubric label leaves the keyword where it stands (a re-review that finds new blockers on a =READY= spec demotes it back to =DRAFT= the same three-line way, with the reason in the history line). + Finding severity maps to blocking power: *high-priority findings block =Ready=* — they hold the rubric at =Not ready= (or =Ready with caveats= if the author accepts and tracks them) until dispositioned; *medium-priority findings are the author's discretion* and don't block. State the blocking status on each finding so the author running spec-response knows which ones gate the rubric. Then update the spec's review history. Specs should carry a bottom section named =Review and iteration history= (or the nearest existing equivalent) that tracks each material author/reviewer pass. Add a concise entry for this review even when the spec is ready and no findings are recorded. diff --git a/claude-templates/.ai/workflows/startup.org b/claude-templates/.ai/workflows/startup.org index 5e8f61e..9488dd0 100644 --- a/claude-templates/.ai/workflows/startup.org +++ b/claude-templates/.ai/workflows/startup.org @@ -151,7 +151,7 @@ These calls have no dependencies on each other. Issue them all together in one m 8. =[ -f todo.org ] && .ai/scripts/task-review-staleness.sh todo.org 7 || true= — count top-level tasks overdue for review (the daily task-review habit's startup nudge). The =[ -f todo.org ]= guard skips projects without a root todo.org; =|| true= keeps Phase A from failing if the script isn't synced yet. Threshold 7 days is one review cycle of slack — softer than the wrap-up health check's 30-day alarm. 9. =bash ~/code/rulesets/scripts/sync-language-bundle.sh "$PWD" 2>/dev/null || true= — language-bundle freshness for the current project. Fingerprint-detects which bundle (if any) the project has, auto-fixes drifted rulesets-owned files (=.claude/rules/*.md=, =.claude/hooks/*=, =githooks/*=), and surfaces drift in =settings.json= without writing it (a project may have customized it). =CLAUDE.md= is deliberately left untracked — it's seed-only in =install-lang= and project-owned afterward, mirroring how =diff-lang= skips it. Quiet when there's no bundle or everything's clean. Hardcodes the rulesets path because =languages/= is the canonical source and lives only there — the same absolute-path dependency the rsyncs already carry. =|| true= keeps Phase A from failing on older checkouts where the script isn't present yet. The =.ai/= rsyncs and this call write to disjoint paths (=.ai/= vs =.claude/=/=githooks/=), so the batch stays parallel-safe. 10. =[ -f "$HOME/org/roam/inbox.org" ] && grep -cE '^\*\* ' "$HOME/org/roam/inbox.org" || true= — count items in the roam global inbox (=~/org/roam/inbox.org=), the roam-mode startup nudge. Silent if the roam clone isn't on this machine. Phase C reads the file when the count is non-zero, splits total vs items related to this project, and surfaces the offer (see =inbox.org= roam mode). Read-only; never files at startup. -11. KB surface prep (the read + contribute startup nudges; see =docs/design/2026-06-16-encourage-kb-contribution-spec.org=). Gated on the agent KB clone. Counts =:agent:= nodes, lists up to 5 whose content matches the current project basename (titles only; a few most-recent nodes as a fallback when nothing matches), and resolves the best-practices node path. Read-only; silent when the clone is absent. Phase C surfaces the relevant titles (consult) and the best-practices link (contribute). +11. KB surface prep (the read + contribute startup nudges; see =docs/specs/2026-06-16-encourage-kb-contribution-spec.org=). Gated on the agent KB clone. Counts =:agent:= nodes, lists up to 5 whose content matches the current project basename (titles only; a few most-recent nodes as a fallback when nothing matches), and resolves the best-practices node path. Read-only; silent when the clone is absent. Phase C surfaces the relevant titles (consult) and the best-practices link (contribute). #+begin_src bash ra="$HOME/org/roam/agents" @@ -166,6 +166,16 @@ These calls have no dependencies on each other. Issue them all together in one m fi #+end_src +12. Spec-sort probe (the docs-lifecycle retrofit nudge; see the docs-lifecycle spec in =docs/specs/=). Read-only; prints one line when the project has an unsorted docs pile — a =docs/design/= directory or stray =docs/*-spec.org= root files — and no =:LAST_SPEC_SORT:= marker in =.ai/notes.org=. Silent for projects with nothing to sort or an already-stamped marker (the marker permanently clears it). + + #+begin_src bash + { [ -d docs/design ] || [ -n "$(find docs -maxdepth 1 -name '*-spec.org' -print -quit 2>/dev/null)" ]; } \ + && ! grep -qs ':LAST_SPEC_SORT:' .ai/notes.org \ + && echo "spec-sort: unsorted docs present" || true + #+end_src + + The stray-root check uses =find= rather than a glob so the probe behaves identically under bash and zsh (=compgen= is bash-only, and zsh aborts on an unmatched glob). + Notes on the rsync commands: - Trailing slashes on both source and destination matter — they tell rsync to sync /contents/ rather than nest a directory inside. - =--delete= on the directory syncs lets retired template files actually disappear from each project on next startup. @@ -199,6 +209,7 @@ This phase touches the user and runs sequentially: - *Roam inbox nudge.* If the Phase A roam-inbox count is greater than zero, read =~/org/roam/inbox.org=, split total vs items related to this project (claimed by the =<project>:= prefix, plus any unprefixed item whose topic plainly concerns this project), and surface one line: "Roam inbox: =<N>= total, =<M>= appear related to this project — say 'inbox zero' to file them." Offer it as a priority option; never auto-file. If the count is zero or the file is absent, say nothing. See =inbox.org= roam mode. - *KB consult nudge (read side).* If the Phase A KB-surface prep returned any =kb-relevant-titles=, surface one line listing them (capped 5): "KB lessons that may be relevant: =<title>=; =<title>=… — open the node before related work." The titles are declarative, so the list alone tells you whether to open one. Gated on the roam clone; silent when the clone is absent or nothing relevant surfaced. See the best-practices node and =knowledge-base.md=. - *KB contribute nudge (write side).* Once per session, surface one line pointing at the best-practices node (the =kb-bestpractices= path from Phase A): "Learned something durable? See =<path>= for how to write a KB node — contributing cross-project facts is welcome (personal projects only; work/unknown projects never write per =knowledge-base.md=)." Light encouragement, never a gate. Gated on the roam clone; silent when absent. + - *Spec-sort nudge.* If the Phase A spec-sort probe printed =spec-sort: unsorted docs present=, surface one line: "this project's docs pile has never been spec-sorted — say 'run spec-sort' to sort it." If the probe was silent, say nothing. A project with nothing to sort never sees the line; a stamped =:LAST_SPEC_SORT:= marker permanently clears it. See the docs-lifecycle rule and the spec in =docs/specs/=. - *Language-bundle sync.* If the Phase A step-12 call (=sync-language-bundle.sh=) printed anything, surface it. =fixed= lines are informational — the drift was already repaired (note that =.claude/= is now dirty if the project commits it). A =drift= line on =settings.json= is surface-only and needs the printed =make install-<lang> PROJECT=.= to reconcile; flag it so the user can decide. If the call was silent, say nothing. - *Newly-installed symlinks.* If the Phase A.0 =make install= step printed any =link= / =relink= / =WARN= line, surface it. A =link= line means a skill, rule, hook, or script added to rulesets is now linked into =~/.claude= for the first time on this machine. For a newly-linked *skill*, check the agent's available-skills list: if the harness already registered it mid-session, note it's available and move on; if it's absent, stop and tell Craig to restart the agent so it loads (whether a mid-session reload works is harness-version-dependent). For a newly-linked *hook*, note that the harness reads hooks at session start — it fires from the next session (or after Craig opens =/hooks= once); its settings.json wiring travels with the tracked file, so the link is usually the only missing piece. A =WARN ... not a symlink= line is a real collision at the target path — surface it; it needs a human. If the step printed only "nothing new to link", say nothing. - *Template-sync churn (safety net).* Check whether Phase A's rsync left uncommitted churn in the synced =.ai/= paths — accumulated from a prior session that crashed before wrap-up, or freshly added this session when rulesets advanced. Without surfacing, it builds up silently until it blocks Phase A.0's auto-ff (git won't ff a dirty tree). Skip in the rulesets repo itself (there =.ai/= is a committed mirror, kept honest by the pre-commit hook). The check is sequential here, after the rsync has finished — not a Phase A step, to keep that batch race-free. diff --git a/claude-templates/.ai/workflows/task-audit.org b/claude-templates/.ai/workflows/task-audit.org index 94b99da..7d2b758 100644 --- a/claude-templates/.ai/workflows/task-audit.org +++ b/claude-templates/.ai/workflows/task-audit.org @@ -61,6 +61,8 @@ For each open task, read its body and cross-check its claims against the actual - *Calendar* — did a scheduled event happen; is a SCHEDULED/DEADLINE date now past. - *Meeting recordings* — if a task hinges on "did this conversation happen / what was said," check the recording queue (e.g. =~/sync/recordings/=) and transcribe via =process-meeting-transcript.org= if the answer lives in an un-transcribed recording. (This is exactly how a "did the interview happen?" task gets resolved instead of guessed.) +*Spec lifecycle reconcile (docs-lifecycle convention).* If the project has a =docs/specs/=, run the =:SPEC_ID:= query as part of this phase: for each spec whose top-level status heading reads =DOING=, find the =todo.org= task whose =:SPEC_ID:= property matches the spec's =:ID:=. Flag the spec NEEDS-USER when that bound parent is =DONE=/=CANCELLED=, archived, or missing — the build finished (or evaporated) without the =IMPLEMENTED= flip, exactly the drift this check exists to catch. Check the parent's own keyword, not its children (completed children become dated entries and the final flip task is a child, so child-counting misleads). + Assign each task a bucket (CURRENT / STALE / NEEDS-USER) and, for STALE, the specific factual update. *Scale tactic.* For a large open-task set, dispatch read-only investigation sub-agents over batches of tasks (parallel-safe per =subagents.md= — independent read-only domains). Each returns a per-task bucket + suggested update. *Never* let sub-agents write to =todo.org= concurrently — apply all edits serially in the main thread (concurrent writes to one file race and lose work). @@ -79,7 +81,7 @@ For every STALE task, edit it in the main thread: - *Ensure priority is set per the project scheme.* The top of the project's =todo.org= should carry the priority legend (=[#A]= through =[#D]=). Every task should carry an explicit priority cookie. If a cookie is missing, or no longer matches the reconciled facts, assign the right level per the legend. If the level is unambiguous from the body, do it autonomously; if it's a judgment call (especially the [#A] / [#B] line for important-but-not-urgent work), flag NEEDS-USER. Also enforce the [#A]-discipline rule from the legend — an [#A] task without a =SCHEDULED:= or =DEADLINE:= line is mis-graded and is either down-graded to [#B] (when reconciled facts say "important but not urgent") or surfaced as NEEDS-USER for the user to date. - *Ensure a type tag is set.* Every task carries one type tag from the project's tag legend (typically =:feature:= / =:chore:= / =:spec:= / =:bug:=). If missing or wrong, assign or correct it from the body when the type is unambiguous. If two tags fit (a refactor that also fixes a bug; a spec that's also a chore), flag NEEDS-USER rather than picking one silently. - *Enforce the project's declared tag vocabulary.* If the project's tag legend declares an *exhaustive* set of allowed tags, strip from each task any tag outside that set — the heading and parent section already carry topic/scope context, so ad-hoc tags only fragment the vocabulary and defeat tag-based filtering. Normalize near-duplicate spellings to the canonical tag (a plural to its singular, say). Where the legend does not declare the set closed, leave existing tags alone; this step applies only where the allowed set is exhaustive by design. -- *Re-assess the =:quick:= and =:solo:= tags* — reconciliation can change a task's effort or autonomy: a resolved dependency may make a stuck task =:solo:=, a scope cut may make it =:quick:=, and new complexity surfaced by the sources can invalidate either. Add or remove the tags per the definitions in the project's tag legend (and [[file:task-review.org][task-review.org]]) when the reconciled facts make the call clear. When they don't — an effort estimate you can't pin down, a =:solo:= gate you can't confirm — it's a NEEDS-USER flag, not a guess. +- *Re-assess the =:quick:= and =:solo:= tags (mandatory — an audit that skips this is incomplete).* Reconciliation can change a task's effort or autonomy: a resolved dependency may make a stuck task =:solo:=, a scope cut may make it =:quick:=, and new complexity surfaced by the sources can invalidate either. Add or remove the tags per the hard definitions in [[file:../../claude-rules/todo-format.md][todo-format.md]] ("Hard definitions: :solo: and :quick:"; task-review carries the same three-gate walk). Autonomous execution reads =:solo:= as its eligibility gate and trusts the tag, so a stale one is a run-time hazard, not cosmetic drift. When the call isn't clear — an effort estimate you can't pin down, a =:solo:= gate you can't confirm — it's a NEEDS-USER flag, not a guess. - Bump =:LAST_REVIEWED:= on each edited task. Follow =todo-format.md= for completion mechanics (depth-based DONE vs dated-rewrite) and the working-files / link-hygiene rules when moving artifacts. @@ -99,6 +101,21 @@ Never merge or re-parent autonomously — which tasks belong together, and wheth When no clear cluster exists, say so in one line and move on — most audits won't find one, and forcing a merge fragments worse than it consolidates. +** Phase C.6 — Retire completed parents and promote stragglers (interactive) + +Phase C.5 consolidates related *open* tasks. This step retires parent tasks whose work is *finished*, so completed containers don't linger in Open Work as scaffolding. + +Run =todo-cleanup.el --convert-subtasks= first (it's part of the =clean-todo= / wrap-up cleanup, and =open-tasks.org= runs it too) so every completed sub-task is a dated event-log entry rather than a lingering =DONE= keyword. The closure logic below reads "open child" as a child heading still carrying a task keyword (=TODO=/=DOING=/=WAITING=/=VERIFY=/=NEXT=/=PROJECT=/=STALLED=/=DELEGATED=); a dated entry is correctly not open. + +Two shapes, both proposed to Craig (inline numbered options per =interaction.md=, no popup) before applying: + +- *Zero open children → close the parent.* A parent whose child *tasks* are all resolved (now dated) and that carries no open child task is finished: close it per =todo-format.md= (=**= parent → =DONE=/=CANCELLED= + =CLOSED:=), and it moves to Resolved on the next =--archive-done=. If the work resurfaces later, a fresh task is created then; a completed container shouldn't sit open as a placeholder. +- *One or two open children → promote, then close.* When a parent has only one or two open children, pull them out and rewrite them as standalone =**= level-2 tasks — give each a priority per the project scheme, and make the heading stand alone without the parent's context — then close the now-childless parent and let it move to Resolved. The former children become first-class Open Work tasks; the retired parent stops being scaffolding for one or two stragglers. + +*The leaf-with-notes carve-out (important).* "Zero open children" is not the same as "done." A =**= leaf task whose only descendants are dated *notes* — a captured "Ideas", "Goals", or "Current State" entry, not a real completed sub-task — is unstarted work with a note attached, not a finished container. Do not close it. Tell the two apart by intent: a container reads as a grouping (a =PROJECT= keyword, an explicit "parent grouping ..." line, or several dated entries that were genuinely separate sub-tasks that shipped); a leaf-with-notes is a single feature/bug task whose title names unstarted work and whose lone dated child is a design note. When the call is ambiguous, flag it NEEDS-USER rather than closing. + +Never close or promote autonomously past the ambiguous line — surface the candidates with a recommendation and let Craig ratify, the same interactive stance as Phase C.5. Clear container completions (a =PROJECT= whose every child is dated) can be proposed as a batch; leaf-with-notes ambiguities are flagged individually. Verify open-vs-done counts against the actual headings (a real scan of the subtree), not a fragile regex that a shell's =\b= support can silently break — a miscount here closes live work. + ** Phase D — Flag the judgment calls (interactive) Present the NEEDS-USER bucket as a short, scannable list — one line per task, naming the decision or the fact required. Adjudicate with the user one item at a time (inline numbered options per =interaction.md=, no popup). Apply the user's calls as they come (which may itself produce more autonomous updates, or new tasks). @@ -147,3 +164,7 @@ Two Phase C behaviors added, both surfaced by an Emacs-config =todo.org= audit: - *Tag-vocabulary enforcement.* That project declares a closed tag set (=bug=, =feature=, =refactor=, =test=, =quick=, =solo=); the audit had to strip ~44 ad-hoc tags that had accumulated across the file. The prior workflow only checked that a type tag was *present* — it had no concept of an exhaustive allowed set. The new bullet enforces a declared closed vocabulary and leaves open-vocabulary projects untouched. - *Code-complete-but-unverified closing.* Many tasks had shipped (tests green, live in the daemon) but stayed open awaiting a manual or visual verification, so they accumulated as half-open. Leaving them open is noise; auto-closing them would violate "never claim a fix verified before the user confirms." The fix routes the pending human check into the project's =Manual testing and validation= parent (dedup-checked) per =verification.md='s manual-verification hand-off, then closes the implementation task. The work is done and the check is tracked; a failed check promotes to a bug. + +** 2026-07-01 — Retire completed parents (Phase C.6) + +Added Phase C.6: retire a parent task once its child *tasks* are all done. Zero open children → close the parent; one or two open children → promote them to standalone level-2 tasks, then close. Surfaced by an Emacs-config =todo.org= audit where several PROJECT containers had all children complete. Depends on =todo-cleanup.el --convert-subtasks= running first so completed sub-tasks are dated (not lingering =DONE= keywords) and the open-child count is accurate. Carries a leaf-with-notes carve-out: a =**= leaf task whose only descendant is a dated design note ("Ideas"/"Goals") is unstarted work, not a finished container, and must not be closed — the ambiguous case is flagged NEEDS-USER. The step also warns against counting open-vs-done with a fragile regex (a =\b= that a given shell/awk silently drops miscounts and closes live work). diff --git a/claude-templates/.ai/workflows/task-review.org b/claude-templates/.ai/workflows/task-review.org index 69e172d..ba1571a 100644 --- a/claude-templates/.ai/workflows/task-review.org +++ b/claude-templates/.ai/workflows/task-review.org @@ -57,7 +57,9 @@ Keep is the common case — most tasks are still right and just need re-stamping *** Tagging =:quick:= — small tasks -While reviewing each task, estimate its effort. If you judge it *30 minutes or less* and it doesn't already carry =:quick:=, add the tag to the heading line. If the heading and body don't tell you how long it'll take, *ask Craig* — don't guess. A wrong =:quick:= is worse than none: the tag exists so Craig can grab a genuinely small task in a spare moment, and a mislabeled one wastes that moment. +The =:quick:= and =:solo:= assessments (this section and the next) are *mandatory* for every reviewed task except a Kill — a review that skips them is incomplete. The hard definitions live in [[file:../../claude-rules/todo-format.md][todo-format.md]] ("Hard definitions: :solo: and :quick:"); autonomous execution (work-the-backlog / the no-approvals speedrun) reads =:solo:= as its eligibility gate and trusts the author's tag, so the run-time gate is only as trustworthy as this pass. + +While reviewing each task, estimate its effort. If you judge it *30 minutes or less* and it doesn't already carry =:quick:=, add the tag to the heading line. If the heading and body don't tell you how long it'll take, *ask Craig* — don't guess. A wrong =:quick:= is worse than none: the tag exists so Craig can grab a genuinely small task in a spare moment, and a mislabeled one wastes that moment. =:quick:= is an effort hint only, never an eligibility gate — size does not decide what runs autonomously. This is orthogonal to the action chosen — a task can be kept (or re-graded, or marked DOING) *and* tagged =:quick:= in the same pass. Skip the assessment on a Kill, since it's leaving the pool. Tags go on the heading line per [[file:../../claude-rules/todo-format.md][todo-format.md]], sharing one =:tag1:tag2:= cluster. @@ -67,7 +69,7 @@ While reviewing each task, judge whether Claude could build *and* verify it with 1. *Buildable* — Claude has the capability and access to do the work. 2. *Verifiable by Claude* — an objective or local check exists that Claude can run itself. Craig's routine spot-checking does not count against this, and neither does handing off a residual human-in-the-loop confirmation as a structured manual-testing reminder (the =verification.md= "Handing Off Manual Verification" pattern). The disqualifier is having no verification path of Claude's own at all — when the success criterion is only judgeable by Craig's eyes or subjective taste. -3. *No upfront decision* — no design or preference call Craig must make before Claude can begin. +3. *No deliberation* — no open design question and no "weigh these approaches" with real tradeoffs. At most one or two *quick, upfront-answerable* factual decisions are allowed — the speedrun preset batches those into its pre-flight Q&A, so they don't break the hands-off run. A genuine design or preference call disqualifies. If any gate is shaky, leave the tag off. Like =:quick:=, a wrong =:solo:= is worse than none — it tells Craig he can hand the task off and walk away, so a mislabeled one wastes that trust. When the heading and body don't make all three gates clear, ask Craig instead of guessing. @@ -96,6 +98,8 @@ The exact date string matters: =task-review-staleness.sh= and the wrap-up health Follow the completion rules in [[file:../../claude-rules/todo-format.md][todo-format.md]]. A killed top-level =**= task stays task-shaped: change the keyword to =CANCELLED=, add a =CLOSED: [YYYY-MM-DD Day]= line under the heading (generate with =date "+%Y-%m-%d %a"=), and leave the priority and tags intact. It's then a candidate for =--archive-done= at the next cleanup. Don't stamp =:LAST_REVIEWED:= on a kill — it's leaving the review pool anyway. +A killed *sub-task* (=***= or deeper, under a parent task) instead becomes a dated event-log entry per the depth rule — but you don't have to hand-format it here. =todo-cleanup.el --convert-subtasks= (run in the =clean-todo= and wrap-up cleanup passes) rewrites any level-3+ DONE/CANCELLED/FAILED heading into its dated form mechanically from the =CLOSED= cookie, so a keyword-plus-=CLOSED= close at depth gets normalized on the next cleanup rather than lingering. =lint-org.el= flags any that slip through (checker =subtask-done-not-dated=). + * Phase D: Close out When the batch is done (or Craig calls it early): diff --git a/claude-templates/.ai/workflows/work-the-backlog.org b/claude-templates/.ai/workflows/work-the-backlog.org new file mode 100644 index 0000000..642162d --- /dev/null +++ b/claude-templates/.ai/workflows/work-the-backlog.org @@ -0,0 +1,263 @@ +#+TITLE: Work the Backlog +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-07-02 + +* Overview + +The single home for the autonomous task-execution loop: take a set of marked, solo-doable tasks from the project's =todo.org= and work them unattended, each held to the full quality bar, under a fixed safety contract. Spec: =rulesets/docs/specs/2026-06-16-autonomous-batch-execution-spec.org=. + +Two callers feed it, differing only in how they build the task set and which session mode they pass: + +- The *inbox auto-loop* (=inbox.org= auto mode) chains here after its routing completes, with a tag/priority query, file-only mode, cap 1. +- The *no-approvals speedrun* preset feeds an explicit ordered list with autonomous-commit + always-push + paging-on, after a pre-flight Q&A that front-loads every decision. + +This workflow owns the execution logic — eligibility gate, defer checklist, quality bar, run cap. Callers own input assembly and mode selection. Capture-routing (inbox surfaces) stays entirely in =inbox.org=; this file never reads an inbox. + +* When to Use This Workflow + +Invoked by its two callers, or directly by phrase: + +- *Speedrun triggers:* "speedrun", "no approvals speedrun", "speedrun these: <task set>" — run the no-approvals speedrun preset (below). The word "speedrun" always routes here, even when the phrase also says "no approvals": plain =no-approvals.org= is the general session mode; the speedrun is this workflow's preset over an explicit task set. +- *Loop caller:* =inbox.org= auto mode chains here after its routing (below). Not phrase-triggered. + +Manual fallback: "work the backlog" / "work the backlog with <task set>" — gather the three inputs below (ask for whichever are missing, defaulting to file-only mode; default cap is the list length for an explicit set, 1 for a query) and run the loop. + +* Inputs — the caller contract + +A caller hands this workflow three things: + +1. *A task set* — an ordered list of candidate task headings from the project's =todo.org=. Either an explicit ordered list (speedrun) or the result of a tag/priority query (the loop). The loop does not care how the set was assembled; it receives an ordered list of candidates. +2. *A session mode* — two orthogonal flags: + - *Commit autonomy:* =file-only= (default) or =autonomous-commit=. See "Commit autonomy" below. + - *Paging:* on or off. End-of-set only. +3. *A run cap* — the hard maximum number of tasks to complete this run. + +It returns a per-task outcome and a run summary. + +* Outcomes — the per-task vocabulary + +Every task in the set ends in exactly one of: + +- =implemented-committed= — implemented, committed (and pushed per the project's flow) under =autonomous-commit=. +- =implemented-diff-surfaced= — implemented, diff surfaced, *not* committed (=file-only=). +- =deferred-VERIFY= — a defer-checklist hit; a =VERIFY= filed naming what's missing or risky. +- =dropped-by-craig= — removed from the run at the speedrun pre-flight Q&A ("skip this"). +- =skipped-ineligible= — failed the mechanical eligibility gate. +- =failed= — implementation was attempted and abandoned: the tree is left working (never commit a broken state), the failure is surfaced in the run summary, and the run continues to the next task. + +The run summary lists each task with its outcome, plus the remaining set when the cap stopped the run. + +* The loop + +For the task set, in order, until the run cap is hit: + +1. *Eligibility gate* (below). Ineligible → record =skipped-ineligible=, next task. +2. *Scope read* of the relevant code. Cheap; just enough to run the defer checklist. +3. *Defer checklist* (below). Any hit → defer: file the =VERIFY= naming the gap and record =deferred-VERIFY= (or, under the speedrun preset, route a quick-question gap to the pre-flight Q&A), next task. +4. *Implement* under the project's commit discipline: TDD red→green→refactor, then =/review-code --staged=, fix all Critical/Important findings, then close the task per =todo-format.md='s completion rules. Decompose into as many logical commits as the change needs — size is not capped. If implementation fails partway, leave the tree working, record =failed=, surface it, and continue to the next task. +5. *Commit autonomy branch:* + - =file-only= → surface the diff, do *not* commit. Record =implemented-diff-surfaced=. + - =autonomous-commit= → =/voice personal= on the message, commit individually, push per the project's flow. Record =implemented-committed=. +6. *Record metrics* for the task (the JSONL append — see Metrics below). +7. Decrement the cap. At zero, stop. + +After the set: if the paging flag is set, fire the end-of-set page (below). Surface the run summary either way. + +* Eligibility gate — mechanical, no judgment + +A task is autonomous-safe when *both* hold. This layer is a lookup, not a judgment; all the judgment lives in the defer checklist. + +1. *Status is =TODO=* — never =VERIFY=, =DOING=, =DONE=, or =CANCELLED=. =VERIFY= marks "awaiting Craig's input"; auto-implementing one defeats the check it represents. The do-not-implement set is safe-by-omission: anything not plainly =TODO= (plus any project-declared "hold" marker) is out. +2. *Tagged =:solo:=* — the autonomy tag, resolved against the project's priority/tag scheme header in =todo.org= (never hardcoded). =:solo:= carries the hard definition in =todo-format.md=: completable and verifiable without Craig beyond at most one or two quick decisions answerable up front, no design deliberation. A project whose scheme declares a different autonomous-safe tag set overrides the default. + +Priority and =:next:= drive *ordering* within the eligible set, not eligibility ([#A] before [#B] before [#C], then the author's ordering). =:quick:= is an effort hint for batching and duration estimates — never a gate. + +Task *size* is deliberately absent from this gate. A large but well-specified, decision-free task is in scope and gets decomposed into per-logical-commit chunks during implementation. Size never sends a task away; only *deliberation* or *risk* does (the checklist below). + +*No scheme header → don't run.* The gate reads =:solo:= semantics from the project's scheme header; a =todo.org= without one leaves the tag undefined (=todo-format.md= makes the header mandatory). Surface that the header is missing and stop rather than guessing eligibility. + +* The defer checklist — act vs file + +After the scope read, run each eligible candidate through the checklist. Each item is a concrete, answerable question, not an adjective. *Any* hit — or any "unsure" — defers the task. Only a task that clears every item is implemented. + +1. *Test-writability (the keystone).* Can I write the failing test from the task text — plus any decisions gathered up front — without inventing a requirement? *No / unsure* → underspecified. Under the speedrun preset, if the gap is one or two quick answerable questions, route it to the pre-flight Q&A; otherwise file a =VERIFY= noting what's missing. Under the unattended loop, file the =VERIFY= (no one to ask). +2. *Data-loss / irreversible / external operation.* Does implementing it require any of: =rm= of non-scratch data, =git reset --hard= / force-push, =DROP= / =DELETE= / =TRUNCATE=, file truncate/overwrite of persisted content, a schema or data migration, any external or shared-state mutation, any credential touch? *Yes* → do NOT implement; file a =VERIFY= naming the risk. This is the hard safety gate; an upfront answer never overrides it without an explicit checkpoint. +3. *Already-satisfied.* Does the scope read show the desired end-state already holds? *Yes* → file a =VERIFY= noting it and move on. Don't make a no-op change. +4. *Design deliberation.* Does the task carry an unresolved design question, a "weigh these approaches" with real tradeoffs, or a TBD that isn't a quick factual answer? *Yes* → under the speedrun preset, if it collapses to one or two quick questions, route to the pre-flight Q&A; otherwise file and surface as a =/start-work= candidate. Under the loop, file. The discriminator is *quick-answerable question* vs *deliberation* — never task size. + +When genuinely unsure which side a task falls on, defer — a wrong auto-implement costs a revert *and* the next-session correction. + +** Filing the deferral =VERIFY= + +Every checklist hit files a =VERIFY= in the project's =todo.org=, per =todo-format.md='s VERIFY rules: + +- *Dedup first.* If a =VERIFY= sibling for this deferral already exists (a prior run filed it), don't file another — record the outcome as =deferred-VERIFY= with a "previously filed" note and move on. The deferred task keeps its =TODO= status and tags, so without this check every subsequent run would re-defer and re-file. +- *Placement:* sibling of the deferred task (the deferred task is the trigger) — a =**= task gets its =VERIFY= at =**=, a =***= sub-task gets it at =***= under the same parent, never deeper. +- *Heading:* carries the question or risk on its own ("VERIFY <topic> — migration touches persisted rows"). +- *Body:* which checklist item hit, what's missing or risky, and what answer or action would make the task runnable. For an already-satisfied hit, the evidence that the end-state already holds. + +** Routing a quick-question gap (speedrun only) + +Under the speedrun preset, a checklist-1 or checklist-4 hit that collapses to one or two quick answerable questions routes to the pre-flight Q&A instead of deferring (see the preset section below). The discriminator: a *quick question* is a factual or preference pick answerable in one line without weighing tradeoffs ("cap at 5 or 8?", "which config key name?"); *deliberation* is anything that needs tradeoffs weighed, options explored, or code read by Craig. A task needing three or more questions isn't quick-question-gapped — it's underspecified; file the =VERIFY=. Checklist item 2 (data-loss / irreversible) never routes to the Q&A: an upfront answer doesn't override the hard safety gate. + +The unattended loop has no one to ask — every hit defers there. + +* Per-task quality bar + +Autonomy changes who approves, not what quality means. Per task, non-negotiable: + +- *TDD* per =testing.md=: red first, green, refactor. The keystone checklist item already proved the failing test is writable. +- *Verification* per =verification.md=: fresh evidence, full suite green before any commit. +- *=/review-code --staged=* before every commit; Critical and Important findings block until fixed. +- *=/voice personal=* on every commit message on the =autonomous-commit= path (or the patterns walked inline if the skill is unavailable), message printed inline so the log shows what landed. +- *Task closure* per =todo-format.md=: depth-based completion (keyword + =CLOSED:= at level 2, dated rewrite at level 3+). +- *One logical change per commit.* A large task becomes several commits, not one omnibus. + +* Commit autonomy + +=file-only= is the default: surface the diff, never commit. =autonomous-commit= is honored only when the project carries the commit-autonomy waiver, read fresh each run — never from memory of past runs or "this project usually allows it." + +The waiver lives in the project's =.ai/notes.org= *Workflow State* section as marker lines, the same shape as the workflow markers already there: + +#+begin_example +:COMMIT_AUTONOMY: yes +:LOOP_MAY_COMMIT: yes +#+end_example + +- =:COMMIT_AUTONOMY: yes= — the project has the waiver. An =autonomous-commit= request (the speedrun preset, or a manual run asking for it) is honored. +- =:LOOP_MAY_COMMIT: yes= — the *unattended loop caller* may also commit. It requires =:COMMIT_AUTONOMY:= alongside it; the split exists because "Craig-initiated speedrun may commit" and "the recurring loop may commit unattended" are different levels of trust. Without this flag the loop stays =file-only= even when the project holds the waiver. + +An absent marker means no. Anything other than a plain =yes= value also means no. The read is one grep of the Workflow State section — a lookup, not a judgment. + +*The degrade contract.* When a caller requests =autonomous-commit= and the required marker is missing, degrade to =file-only= and surface it in both the run intro and the run summary: "autonomous-commit requested, no :COMMIT_AUTONOMY: waiver in notes.org — running file-only." Never honor the request without the marker, and never drop to file-only silently — the first commits into a project that didn't opt in, the second hides why nothing got committed. + +* Bounding the run + +The cap is a hard per-run task ceiling passed by the caller — the kill switch a runaway can't exceed: + +- *Loop caller default: 1.* Implement the highest-priority eligible candidate, record, stop; the next tick continues. +- *Speedrun: the length of the explicit list*, capped at a ceiling — the human bounded the set by naming it. + +Even the speedrun stops at the cap and surfaces (and, with paging on, pages) the remainder. The cap bounds task *count*, not cost; a token budget is logged as vNext. + +* Context hygiene — auto-flush between tasks + +Task boundaries are clean boundaries by construction: the previous task is closed and committed (or filed), nothing is half-edited. When the context window grows heavy mid-run, run the flush skill's *auto mode* between tasks: checkpoint the session anchor with the remaining task set, session mode, and cap in Next Steps (so the resumed context continues the run blind), arm the self-injection (=.ai/scripts/self-inject.sh= via =tmux run-shell -b=), and end the turn. The fresh context resumes from the anchor and works on. Unattended runs only — the keystroke-collision hazard and the full mechanism live in the flush skill. + +* End-of-set page + +With paging on, fire one page when the set is done or the cap is hit — end-of-set only, never per-task: + +#+begin_src sh +notify alarm "Page" "<project>: <N> done, <M> remaining — <one-line summary>" --persist +#+end_src + +=--persist= keeps it on screen until dismissed (the page-me convention). The page fires when the set completes *or* the cap stops the run — either way exactly once. The message carries the project name, the completed count, and the remaining count (with skipped tasks noted in the run summary) so Craig can confirm ready and name the next project in one reply. There is no separate page-signal call — =notify= is the paging surface. + +* Metrics + +Each task outcome appends one JSON line to the project's =.ai/metrics/work-the-backlog.jsonl= — git-tracked, append-only, =jq=-queryable. Create the directory and file on the first append. Logging is a side effect only: a failed append surfaces a warning in the run summary but never blocks, reorders, or aborts execution. + +One record per task, written at the moment its outcome is decided: + +| Field | Meaning | +|--------------------+-------------------------------------------------------------------------------------------------| +| =ts= | ISO-8601 timestamp of the task outcome | +|--------------------+-------------------------------------------------------------------------------------------------| +| =run_id= | UUID shared by every record in one run (=uuidgen= at run start) | +|--------------------+-------------------------------------------------------------------------------------------------| +| =project= | project basename | +|--------------------+-------------------------------------------------------------------------------------------------| +| =caller= | =loop= / =speedrun= / =manual= | +|--------------------+-------------------------------------------------------------------------------------------------| +| =task= | the task heading (slug) | +|--------------------+-------------------------------------------------------------------------------------------------| +| =outcome= | =implemented-committed= / =implemented-diff= / =deferred-verify= / =skipped-ineligible= / | +| | =dropped-by-craig= / =failed= | +|--------------------+-------------------------------------------------------------------------------------------------| +| =defer_reason= | =underspecified= / =data-loss= / =already-satisfied= / =needs-deliberation= — set on | +| | =deferred-verify= records only | +|--------------------+-------------------------------------------------------------------------------------------------| +| =upfront_decision= | =true= when a pre-flight answer was recorded and used for this task | +|--------------------+-------------------------------------------------------------------------------------------------| +| =wall_clock_s= | seconds from task start to outcome | +|--------------------+-------------------------------------------------------------------------------------------------| +| =commit_sha= | committed tasks: the commit SHA (comma-separated when the task decomposed into several); empty | +| | otherwise | +|--------------------+-------------------------------------------------------------------------------------------------| +| =review_findings= | count of =/review-code= Critical + Important findings on this task | +|--------------------+-------------------------------------------------------------------------------------------------| + +The =outcome= slugs map one-to-one onto the outcome vocabulary above (=implemented-diff= is =implemented-diff-surfaced=; =deferred-verify= is =deferred-VERIFY=). Per-run rollups (attempted / completed / deferred / dropped, wall-clock total, findings per commit) are computed at synthesis, not stored per record. The =commit_sha= field is what the synthesis step's corrections signal keys on — whether a later commit reverted or hand-fixed an autonomous one — so never omit it on a committed task. + +* Caller: the inbox auto-loop + +=inbox.org= auto mode chains here as an explicit second step *after* its routing completes — never as a phase inside inbox processing. When a cycle files new items and Craig answers "run this batch next?" with yes, auto mode invokes this workflow with: + +- *Task set:* the eligibility query over the queued/filed items — status =TODO= + =:solo:= per the scheme header, priority-ordered. +- *Session mode:* =file-only=, paging off. (A project carrying both =:COMMIT_AUTONOMY:= and =:LOOP_MAY_COMMIT:= markers opts the loop into commits — see Commit autonomy above.) +- *Cap: 1.* The highest-priority eligible candidate runs, gets recorded, and the loop's next tick (or the next yes) continues from there. + +The loop has no human at kickoff of each task, so a needs-quick-decisions task defers with a =VERIFY= — the pre-flight Q&A is a speedrun capability, not a loop one. Startup and wrap-up never invoke this workflow. + +* Preset: the no-approvals speedrun + +The named preset is a label for one flag combination, not a second code path: *explicit ordered list + =autonomous-commit= + always-push + paging-on*, with every approval front-loaded into a single pre-flight step. "No approvals" means all input first, then hands-off — not no input ever. =autonomous-commit= still requires the =:COMMIT_AUTONOMY:= waiver (Commit autonomy above); without it the preset degrades to =file-only= and says so in the pre-flight intro. + +When Craig names a task set and says "speedrun": + +1. *Gather* the named task set. +2. *Scope-read and classify* each task against the eligibility gate + defer checklist: *ready* (clears everything), *needs-quick-decisions* (one or two upfront-answerable questions — checklist item 1 or 4), or *drop* (data-loss/irreversible, or deliberation that isn't a quick question). +3. *Order* the list — priority, then the author's ordering / =:next:=. +4. *Intro the work* — present the ordered plan: what will run, what was dropped and why, and the batched questions for the needs-quick-decisions tasks. +5. *Craig answers each question or says "skip this"* — a skip removes the task (recorded =dropped-by-craig=; the task itself stays =TODO=); an answer is recorded so implementation works from the decision, not a guess. +6. *Run the finalized list autonomously* — no further approvals until done. Cap = the list length (the human bounded the set by naming it), still one commit per logical change, always-push per the project's flow, auto-flushing between tasks when the context grows heavy (see Context hygiene above). +7. *End-of-set page* with completed + remaining + skipped. + +The batch-ask (step 4-5) is one message: each question names its task, puts the recommended answer at item 1 when there is one (per =interaction.md= — inline numbered, no popup), and offers "skip this" as the last option. Before the run starts, write each answer into its task's body in =todo.org= as a dated line — the implementation works from the recorded decision, and the record survives the session. The Q&A fires only under this preset; the loop caller never asks (its decision-needing tasks defer). + +*** Per-item disposition rule + +For every item the run picks up (this holds for any executing caller, including an auto-inbox-zero run given a standing yes): + +- *Feature-level task* → write a spec first (=spec-create=), don't implement directly. The spec is the run's deliverable for that item. +- *Needs decisions you can't confidently guess* → file it as a =VERIFY= carrying the question (under this preset, one or two quick questions route to the pre-flight Q&A instead). +- *Well-defined* → implement it, taking the time it needs. + +This extends the defer checklist: the checklist decides *act vs file*; this rule decides the *shape* of the act. + +* Synthesis: metrics → org-roam KB + +Trigger: "synthesize backlog metrics" (optionally a weekly scheduled run). This is the read side of the metrics log — Craig's ask was "gather data and create org-roam articles we can look at later," and this step is the second half. It is read-only over the logs plus exactly one KB write. + +1. *Gather the JSONL union.* Discover =.ai/metrics/work-the-backlog.jsonl= across the project roots (dirs carrying =.ai/protocols.org= under =~/code=, =~/projects=, =~/.emacs.d=). Classify each project per =knowledge-base.md= (work-root denylist, never inference) before reading it into the union. +2. *Enforce personal-only.* A work-classified or unknown project's metrics never enter the KB write — they stay in that project's own log. Report the exclusion per the KB refusal contract: the classification, a one-line redacted summary, and where the data stayed. +3. *Compute the rollups and trends.* Per run: attempted / completed / deferred (by reason) / dropped / failed, wall-clock total, commits landed, review findings per commit. Trends across runs: completion rate over time, defer-reason distribution, findings-per-commit trend. +4. *Compute the corrections signal* — the key metric. For each =commit_sha= in the window, check that project's history for a later commit (within ~14 days) that reverts it or carries a fix touching the same files. A clean run is one whose autonomous commits survive untouched; a flagged run is what Craig reviews by hand. This is a cheap proxy, not proof — it flags candidates, it doesn't convict. +5. *Write one KB node* at =~/org/roam/agents/YYYYMMDDHHMMSS-backlog-metrics-<window>.org= per =knowledge-base.md=: =:agent:metrics:= filetags, a concise title, the rollup table, the trend narrative, and =[[id:...]]= links to prior synthesis nodes so the series is traceable. Pull before writing, commit and push after — the normal KB session discipline. + +The KB node is the artifact Craig reads later: "are the runs completing more and getting corrected less?" should read off the trend table without touching raw logs. Synthesis never mutates the JSONL, todo.org, or any project tree. + +* Common Mistakes + +1. *Implementing a =VERIFY= or =DOING= task.* The gate is status =TODO= only — a =VERIFY= exists precisely because Craig's input is pending. +2. *Treating =:quick:= as eligibility.* It's an effort hint. =:solo:= is the gate. +3. *Deferring on size.* A large, well-specified, decision-free task runs — decomposed into logical commits. Size is not a checklist item. +4. *Guessing past the keystone.* If the failing test isn't writable from the task text, the task isn't ready. Inventing the requirement is the failure the checklist exists to stop. +5. *Rationalizing through the data-loss list.* "The migration is small" doesn't clear checklist item 2. Enumerated operations defer, full stop. +6. *Committing in =file-only= mode.* The diff is the deliverable; the commit is Craig's. +7. *One omnibus commit for the whole run.* Every logical change is its own reviewed commit. +8. *Skipping =/review-code= or =/voice= because nobody's watching.* Autonomy removes interaction gates, never engineering-discipline gates (same contract as =no-approvals.org=). +9. *Running past the cap.* The cap is the kill switch; hitting it means stop and surface, even mid-set. +10. *Paging per-task.* One page, end of set. +11. *Honoring =autonomous-commit= from memory.* The waiver is the marker line in =notes.org=, read fresh each run. "This project usually allows it" isn't a read. +12. *Re-filing the same deferral =VERIFY= every run.* The deferred task stays =TODO=, so a run that skips the existing-sibling check spams =todo.org= with duplicates. +13. *Routing a data-loss hit to the pre-flight Q&A.* Checklist item 2 is the hard gate — an upfront answer never clears it without an explicit checkpoint. + +* Living Document + +Refine as the dogfooding signal arrives — the metrics log and the corrections-in-next-session signal are the feedback loop. Fold recurring adjustments in rather than accumulating caller-side workarounds. + +* History + +Created 2026-07-02 as Phase 1 of the autonomous-batch execution spec, reconciling the inbox-zero "Phase E" proposal and the =.emacs.d= speedrun proposal into one execution loop. The auto-inbox-zero execute step in =inbox.org= reverted to routing-only in the same change so this file is the loop's only home. Phases 2-6 (same day) wired both callers, pinned the commit-autonomy waiver markers, fleshed the defer/Q&A/page mechanics, and added the metrics record + KB synthesis step. diff --git a/claude-templates/.ai/workflows/wrap-it-up.org b/claude-templates/.ai/workflows/wrap-it-up.org index 5d2cdd2..d0c4e75 100644 --- a/claude-templates/.ai/workflows/wrap-it-up.org +++ b/claude-templates/.ai/workflows/wrap-it-up.org @@ -137,6 +137,22 @@ Run the report-only variant first if you want to see what would change without w emacs --batch -q -l .ai/scripts/todo-cleanup.el --check todo.org #+end_src +*** Convert done sub-tasks to dated entries + +#+begin_src bash +[ -f todo.org ] && emacs --batch -q -l .ai/scripts/todo-cleanup.el --convert-subtasks todo.org +#+end_src + +=--convert-subtasks= rewrites every heading at level 3 or deeper whose TODO state is DONE/CANCELLED/FAILED into a dated event-log entry (=<stars> YYYY-MM-DD Day @ HH:MM:SS -ZZZZ <text>=), dropping the keyword, priority cookie, and tags, and removing the now-redundant =CLOSED:= line. This enforces the =todo-format.md= depth rule that a completed *sub-task* (a heading under a parent task) becomes dated history, not a lingering DONE keyword — a shape an interactive org close (=org-log-done= → DONE + CLOSED) never applies and =--archive-done= (level-2 only) never reaches. The timestamp comes from each entry's own =CLOSED= cookie; a date-only close yields =00:00:00=. Heading text is kept verbatim. Idempotent (an already-dated heading has no keyword to match), and a done sub-task with no parseable =CLOSED= is flagged and left alone rather than stamped with a fabricated date. + +Run this *before* =--archive-done= so that when a completed level-2 parent is archived, its sub-tasks already carry their dated form. Any rewrites show up in the wrap-up commit's diff for review before push. + +Preview without writing: + +#+begin_src bash +emacs --batch -q -l .ai/scripts/todo-cleanup.el --convert-subtasks --check todo.org +#+end_src + *** Archive completed work #+begin_src bash @@ -244,6 +260,32 @@ The check exempts =lint-followups.org= explicitly because lint-org runs earlier This integrates with =inbox.org= process mode, which stamps =:LAST_INBOX_PROCESS:= in =notes.org='s *Workflow State* section on completion. Wrap-up doesn't double-stamp. It only ensures the inbox carries nothing but the expected pipeline artifacts at session end. +*** Cross-project router (optional — route filed keepers to their home projects) + +Runs directly after the inbox sanity check. The split between the two: the sanity check *gates* the wrap (a dirty inbox blocks until resolved); the router is *optional* (skipping it never blocks anything — the candidates just stay local until a future wrap). Spec: =docs/specs/wrapup-routing-spec.org= (D7/D8/D9). + +The candidate set is exactly the local tasks carrying a =:ROUTE_CANDIDATE:= property — keepers that inbox process mode filed this session whose inferred home is another project. Never scan the standing backlog. + +#+begin_src bash +.ai/scripts/route-batch --list +#+end_src + +*Empty set = zero interaction.* =--list= prints nothing when there are no candidates; continue the wrap silently — no prompt, no "0 items" line. + +When candidates exist, surface the batch as one line per task — the task heading, the destination project, the delivery mode (=inbox-send= file handoff), and the engine's confidence — then offer exactly two options: *go* (route the whole batch) or *skip* (leave everything local). Derive each confidence label by running the engine on the task's heading + body (=python3 .ai/scripts/route_recommend.py --item "..." --exclude "$(basename "$PWD")"=); label weak matches visibly ("weak — verify the destination") so a low-confidence route gets a human glance before the keystroke. + +On *go*: + +#+begin_src bash +.ai/scripts/route-batch --go +#+end_src + +Per candidate, the helper writes the task's subtree (children ride along; =:ROUTE_CANDIDATE:= stripped, headings promoted to top level) to a one-task handoff, delivers it via =inbox-send <destination> --file= (so the =from-<this-project>= provenance is stamped and the destination's inbox process mode dispositions it as a single item), and only after a successful send removes the subtree from the local =todo.org= — a single-file local edit the wrap is already committing. A failed send leaves that task in place and exits non-zero; report it and continue the wrap. Never write the destination's =todo.org= directly; its own inbox processing files the task per its conventions. + +On *skip*, leave every candidate in place, marker included — they resurface next wrap. + +Mis-routes are recoverable: the receiving project rejects via inbox process mode's reject-from-another-project flow, which returns the item to this project's inbox with the rationale. That reject path is why removing the local source on send is safe. + *** Review-habit health check (surface a slipped daily task-review) The daily task-review habit walks the open top-level tasks on a rotating cycle, stamping =:LAST_REVIEWED:= as it goes (see =task-review.org=). This check is the watchdog for that habit. When tasks have gone too long unreviewed, the habit has slipped, and the wrap-up says so in one line — it does not re-list the tasks. @@ -536,7 +578,7 @@ Before considering wrap-up complete: - [ ] The Summary ends with the =KB: promoted N / consulted yes-no= line (promotion check ran) - [ ] File renamed to =.ai/sessions/YYYY-MM-DD-HH-MM-description.org= - [ ] =.ai/session-context.org= no longer exists -- [ ] =todo-cleanup.el= ran — hygiene pass + =--archive-done= + =--sync-child-priority= (if =todo.org= exists at project root) +- [ ] =todo-cleanup.el= ran — hygiene pass + =--convert-subtasks= + =--archive-done= + =--sync-child-priority= (if =todo.org= exists at project root) - [ ] =lint-org.el= ran on =todo.org= — mechanical fixes applied, judgments appended to follow-ups file (if =todo.org= exists) - [ ] Any orphan-planning-line warnings reviewed (fix or accept) - [ ] Inbox carries nothing but expected pipeline artifacts (=.gitkeep=, =lint-followups.org=, =PROCESSED-*= prefixes), OR each remaining handoff has an explicit deferral logged in the valediction diff --git a/docs/design/2026-07-02-auto-flush-mechanism-note.org b/docs/design/2026-07-02-auto-flush-mechanism-note.org new file mode 100644 index 0000000..fbe06ae --- /dev/null +++ b/docs/design/2026-07-02-auto-flush-mechanism-note.org @@ -0,0 +1,20 @@ +#+TITLE: AUTO-FLUSH capability — proven live in the archsetup session +#+SOURCE: from archsetup +#+DATE: 2026-07-02 01:26:20 -0400 + +AUTO-FLUSH capability — proven live in the archsetup session 2026-07-02, Craig asks that it be promoted to all projects and recommended as part of the no-approvals speedrun to keep sessions sharp. + +Problem: /clear is a user-only keystroke, so long autonomous sessions either bloat or hit arbitrary auto-compaction. Craig can't always be around to type it. + +Mechanism (companion script: self-inject.sh, sent separately to this inbox): +1. At a clean task boundary, the agent refreshes .ai/session-context.org exactly as the flush skill does (checkpoint with Active Goal / Decisions / Next Steps). +2. It derives its own tmux pane: match pane_pid from 'tmux list-panes -a' against its process ancestry (the ai launcher runs every agent session inside tmux, so this holds everywhere). +3. It arms the injection VIA THE TMUX SERVER — tmux run-shell -b "sleep 25; tmux send-keys -t %N -l '/clear'; tmux send-keys -t %N Enter; sleep 15; tmux send-keys -t %N -l 'go — auto-flush resume: read .ai/session-context.org and continue per Next Steps'; tmux send-keys -t %N Enter" — and immediately ends its turn so the prompt is idle when the keys land. +4. /clear fires the SessionStart hook (which already points a fresh context at notes.org + session-context.org), and the injected resume line starts the next turn. Zero human keystrokes. + +Gotchas learned the hard way: +- A detached child (setsid/nohup/&) of a tool call DIES when the tool call ends; only tmux run-shell -b (server-owned) survives the turn boundary. +- Under run-shell the process is a child of the tmux server, so ancestry-based pane detection can't run there — derive the pane first from the agent's shell, pass it explicitly. +- Collision: if the user is typing when the keys fire, the injection merges into their input (a real /clear became '/clearto' mid-word). Fine for unattended sessions; warn the user to keep hands off the armed window if present. + +Suggested integration: an 'auto' mode on the flush skill (checkpoint, then self-inject instead of prompting the user), plus a line in the no-approvals speedrun workflow to auto-flush at clean boundaries when context grows heavy. The script could live in claude-templates' .ai/scripts/ so every project gets it on sync. diff --git a/docs/design/2026-06-16-autonomous-batch-execution-spec.org b/docs/specs/2026-06-16-autonomous-batch-execution-spec.org index 84cefe3..42348dc 100644 --- a/docs/design/2026-06-16-autonomous-batch-execution-spec.org +++ b/docs/specs/2026-06-16-autonomous-batch-execution-spec.org @@ -1,10 +1,18 @@ #+TITLE: Autonomous-Batch Task Execution — Spec #+AUTHOR: Craig Jennings & Claude #+DATE: 2026-06-16 -#+TODO: TODO | DONE SUPERSEDED CANCELLED +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* DOING Autonomous-Batch Task Execution — Spec +:PROPERTIES: +:ID: 90f623cd-fdbe-4f5c-b63d-b2f84d9151cf +:END: +- 2026-07-02 Thu @ 00:44:59 -0400 — READY → DOING: spec-response decomposition ran — the speedrun build parent in todo.org carries the :SPEC_ID: binding, one task per phase (1-6) plus the live-trial validation and the flip-to-IMPLEMENTED task. Phase 0 had already landed 2026-07-01. +- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to READY (evidence-based, human-confirmed) * Metadata -| Status | ready | +| Status | doing | |----------+--------------------------------------------------------------------| | Owner | Craig Jennings | |----------+--------------------------------------------------------------------| @@ -12,7 +20,7 @@ |----------+--------------------------------------------------------------------| | Date | 2026-06-16 | |----------+--------------------------------------------------------------------| -| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] | +| Related | [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal]]; [[file:../design/2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] | |----------+--------------------------------------------------------------------| * Summary @@ -367,7 +375,7 @@ Verification is by invocation against a project's real =todo.org=: run the loop * References / Appendix - [[file:../../working/inbox-zero-phase-e/proposed-inbox-zero.org][Phase E proposal (inbox-zero stopgap)]] and [[file:../../working/inbox-zero-phase-e/sender-note.org][its sender note with the 5 open questions]]. -- [[file:2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] (file retains its original on-disk name pending a rename pass). +- [[file:../design/2026-06-15-fix-speedrun-workflow-proposal.org][speedrun proposal]] (file retains its original on-disk name pending a rename pass). - [[file:../../.ai/workflows/inbox-zero.org][inbox-zero.org (canonical, A-D)]] — the routing workflow this feature decouples from. - =~/code/rulesets/claude-rules/knowledge-base.md= — the org-roam write contract the synthesis step follows. diff --git a/docs/design/2026-06-16-encourage-kb-contribution-spec.org b/docs/specs/2026-06-16-encourage-kb-contribution-spec.org index cf8111b..cfbfe79 100644 --- a/docs/design/2026-06-16-encourage-kb-contribution-spec.org +++ b/docs/specs/2026-06-16-encourage-kb-contribution-spec.org @@ -1,10 +1,17 @@ #+TITLE: Encourage Org-Roam KB Contribution Across Workflows — Spec #+AUTHOR: Craig Jennings & Claude #+DATE: 2026-06-16 -#+TODO: TODO | DONE SUPERSEDED CANCELLED +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* READY Encourage Org-Roam KB Contribution Across Workflows — Spec +:PROPERTIES: +:ID: f67f5f45-5aa1-4a5a-8704-d636e4e16f75 +:END: +- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to READY (evidence-based, human-confirmed) * Metadata -| Status | approved (decisions ratified 2026-06-20) | +| Status | ready | |----------+------------------------------------------------| | Owner | Craig Jennings | |----------+------------------------------------------------| diff --git a/docs/specs/2026-07-01-docs-lifecycle-spec.org b/docs/specs/2026-07-01-docs-lifecycle-spec.org new file mode 100644 index 0000000..16e1132 --- /dev/null +++ b/docs/specs/2026-07-01-docs-lifecycle-spec.org @@ -0,0 +1,357 @@ +#+TITLE: Docs Lifecycle — Spec +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-07-01 +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* DOING Docs lifecycle +:PROPERTIES: +:ID: 80b0787b-4a60-4c82-8a16-b383d3e3c8f2 +:END: +- 2026-07-01 Wed @ 23:34:15 -0400 — READY → DOING: spec-response decomposition ran — build parent in todo.org carries the :SPEC_ID: binding, one task per phase plus the flip-to-IMPLEMENTED task and the manual-testing child. First live exercise of the transition-ownership table. +- 2026-07-01 Wed @ 23:22:50 -0400 — DRAFT → READY: Codex re-review found all fourteen review findings closed and no remaining blocking implementation-readiness gaps. +- 2026-07-01 Wed @ 22:54:41 -0400 — verify pass on the second responder round: all five fixes held, findings 1-9 unregressed, verdict ready; three minor nits folded in (scoped id-link criterion, untracked-copy cleanup in the recovery recipe, two stale prose spots). Stays DRAFT pending the reviewers' flip. +- 2026-07-01 Wed @ 22:46:52 -0400 — second responder pass: all five re-review findings fixed (fourteen of fourteen closed); stays DRAFT — the READY flip belongs to the reviewers this round. +- 2026-07-01 Wed @ 22:41:33 -0400 — READY → DRAFT: Codex re-review found five new blocking implementation-readiness gaps after the response pass. +- 2026-07-01 Wed @ 22:41:21 -0400 — DRAFT → READY: dual independent review (Codex + fresh-context Claude agent, both initially Not ready), all nine findings fixed, verify pass by the original reviewer returned ready; flip authorized by Craig. +- 2026-07-01 Wed @ 22:13:00 -0400 — drafted from the five decisions settled 2026-06-28 (todo.org "Spec storage location + lifecycle-status convention"). + +* Metadata +| Status | doing | +|----------+------------------------------------------------------------------| +| Owner | Craig Jennings | +|----------+------------------------------------------------------------------| +| Reviewer | Craig Jennings | +|----------+------------------------------------------------------------------| +| Date | 2026-07-01 | +|----------+------------------------------------------------------------------| +| Related | [[file:../design/2026-06-15-spec-storage-lifecycle-proposal.org][source proposal]]; todo.org "Spec storage location + | +| | lifecycle-status convention" | +|----------+------------------------------------------------------------------| + +* Summary + +Formal specs and working notes currently share one directory per project, and a spec's lifecycle state (drafted, in progress, shipped, dead) is invisible without opening the file. This spec adopts two coupled conventions — a location split (=docs/specs/= for formal specs, =docs/design/= for notes) and an authoritative in-file status carried by an org TODO keyword on a top-level status heading — plus =org-id= links for rename-safety, a general =docs-lifecycle= rule capturing the shape, and a one-time confirmed retrofit that sorts every project's existing pile. + +* Problem / Context + +.emacs.d triaged ~28 design docs and had to run a four-agent sweep reading every spec against the code to reconstruct which had shipped (6 implemented, 8 in progress, 12 not started, 1 superseded). Nothing in the filename, location, or file records the state, so the answer to "what's open?" degrades into "open every file and infer." rulesets has the same shape: 41 files in =docs/design/= of which only 3 carry a formal spec spine, plus two =-spec.org= files misfiled at the =docs/= root. The cost compounds with every doc added, and every project inherits the problem through the shared spec-create workflow. + +Two forces beyond triage cost: + +- *Links are load-bearing.* =todo.org= tasks, session archives, and sibling docs link specs by =file:= path. Any convention that renames or moves files on every status change (the filename-suffix approach) breaks those links repeatedly across a cross-linked, template-synced doc set. +- *The convention is worthless if legacy docs stay misfiled* (Craig, 2026-06-28). Template sync distributes rules and workflows but cannot perform a one-time per-project migration, so the design must include a reach mechanism that gets each project's existing pile sorted once. + +* Goals and Non-Goals + +** Goals +- A directory listing answers "which docs are specs, and what state is each in" without opening files. +- Status transitions cost one small in-file edit (keyword + history line + Metadata mirror) — no rename, no link surgery. +- Cross-doc spec links survive moves and renames. +- The shape is captured once as a general rule (=docs-lifecycle=) so future artifact collections (brainstorm piles, recording queues) can reuse it. +- Every existing project's =docs/design/= pile gets sorted exactly once, with human confirmation on each classification. + +** Non-Goals +- No automation of status flips — the keyword is edited by whoever changes the state (spec-create, spec-review, spec-response, or a human), not by a watcher. +- No retroactive rewriting of session archives or git history that reference old paths; only live inbound links (=todo.org=, =notes.org=, docs) are updated by the retrofit. +- No new tracking database or index file — the files are the index. + +** Scope tiers +- v1: the location split, the status-heading convention, the org-id link standard, the =docs-lifecycle= rule, spec-create/spec-review/spec-response updates, the retrofit helper + startup nudge, and the rulesets pilot. +- Out of scope: applying the lifecycle shape to non-doc collections (the rule documents the pattern; adopting it elsewhere is per-collection work). +- vNext: an org-agenda custom view over =docs/specs/*.org= keyed on the status keywords (nice-to-have once the keywords exist; log to todo.org). + +* Design + +** The location split + +- =docs/specs/= — formal specs only. A *spec* is a doc proposing a buildable change that carries a =Decisions= section and =Implementation phases= (the spec-create spine). Filenames keep the existing =YYYY-MM-DD-<topic>-spec.org= shape — the =-spec.org= suffix stays because spec-review's Phase 0 precondition keys on it; only the *status* suffixes from the original proposal are dropped. +- =docs/design/= — everything else: brainstorms, inventories, proposals, research notes, frozen source material. Review findings live inside the spec they review (current spec-review behavior), so standalone review files are legacy notes and stay in =docs/design/=. + +** The status heading (the authoritative record) + +Each spec's first element after the file header is a single top-level *status heading* carrying the org TODO keyword: + +#+begin_example +,#+TODO: TODO | DONE +,#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +,* DOING <spec short name> +:PROPERTIES: +:ID: <uuid> +:END: +- <dated one-line history entries, newest first> +#+end_example + +- *The keyword is authoritative.* The Metadata table's =Status= field mirrors it in lowercase for readers already in the table, and a status transition updates keyword + history line + mirror in the same edit; on disagreement the heading wins. +- *Two keyword sequences, no collisions.* The lifecycle sequence *joins* — never replaces — the =TODO | DONE= sequence that the =* Decisions= and =* Review findings= task machinery depends on. The two sequences share no keyword (the old header's =SUPERSEDED CANCELLED= done-states migrate to the lifecycle sequence; a legacy =CANCELLED= decision heading still parses as a done-state there, so =[/]= cookies stay mechanically correct). The retrofit rewrites each legacy header to carry both lines. +- *Vocabulary:* =DRAFT= (being written) → =READY= (review passed, buildable) → =DOING= (implementation in progress) → =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= (terminal). +- *Transition ownership — every flip has a named owner:* + - =DRAFT= — spec-create stamps it at authoring time. + - =DRAFT= → =READY= — spec-review, on a passing gate (keyword + history line + mirror in the review pass). + - =READY= → =DOING= — spec-response, when it decomposes the phases into build tasks. *The decomposition writes the spec-to-task binding:* the =todo.org= parent task it creates (or updates) carries a =:SPEC_ID:= property holding the spec's status-heading UUID. That property is the durable join between the spec and its build work. + - =DOING= → =IMPLEMENTED= — the session that completes the final implementation phase. To make that a tracked obligation rather than a memory, spec-response's phase-to-task breakdown *always emits a final task*: "flip the spec to IMPLEMENTED + history line," as a child of the bound parent. Safety net: task-audit's reconcile pass runs one query — for each =docs/specs/*.org= whose keyword is =DOING=, find the =todo.org= task with the matching =:SPEC_ID:=; flag the spec when that parent is =DONE=/=CANCELLED=, archived, or missing. Checking the *parent's* keyword (not "are all child tasks closed") sidesteps both the flip-task chicken-and-egg (the parent only closes after the flip task ran) and =--convert-subtasks= rewriting completed children into dated entries (dated children never affect the parent's keyword). This is the mechanism whose absence produced the .emacs.d six-shipped-specs-with-no-record failure; "a human remembers" is explicitly not the design. + - =SUPERSEDED= / =CANCELLED= — whoever makes the call, with the reason in the history line. +- *Glanceability without opening files:* one grep gives the full board — + + #+begin_src sh + rg -H '^\* (DRAFT|READY|DOING|IMPLEMENTED|SUPERSEDED|CANCELLED) ' docs/specs/ + #+end_src + + and because the keyword sits on a real org heading, an org-agenda view over =docs/specs/= works for free (the vNext item). +- *The heading body is the dated status history* — one line per transition (=YYYY-MM-DD Day @ HH:MM:SS -ZZZZ — <what changed, by whom>=), the record a filename could never carry. +- Why a dedicated status heading rather than restructuring each spec under one top-level heading: demoting every section in every existing spec is a large, link-hostile rewrite; a prepended heading is additive, retrofittable by script, and leaves the familiar flat section layout untouched. + +** Rename-safe links + +The status heading carries an =:ID:= UUID, assigned at authoring time (and by the retrofit for legacy specs). The target state is that cross-doc references to a spec use =[[id:<uuid>]]= rather than =file:= paths, so any future move can't orphan them. =file:= links remain fine for intra-doc anchors and for notes that never move. The KB's existing id-resolution recipe applies: =rg ':ID:[[:space:]]+<uuid>' docs/=. + +*Staged conversion — ids assigned now, links converted only when clickable.* =org-id-locations= only indexes agenda files and files org has visited, so a fresh =:ID:= in =docs/specs/= won't resolve on click in a live Emacs until the id index learns about project docs. =org-id-extra-files= is not a glob mechanism — it's a literal file list, only consulted under =org-id-track-globally= — so "point it at the globs" is not executable as written. The sequencing is therefore: + +1. *Pilot and retrofit rewrite =file:= links only* (path recomputation per the relink contract). Every link stays clickable throughout; no conversion window exists. +2. *:ID: properties are still assigned* during the sort — harmless, and they make the later conversion mechanical. +3. *Link conversion to =id:= is a separate follow-up pass*, gated on .emacs.d landing an executable id-index mechanism: enumerate each project's =docs/specs/*.org= into =org-id-extra-files= as real file names (a small function globbing at startup, with =org-id-track-globally= t), or a periodic =org-id-update-id-locations= over that enumeration — verified by clicking a known id link. The Phase 4 note to .emacs.d carries this ask; the =rg= recipe is the fallback for non-Emacs consumers either way. + +** The =docs-lifecycle= rule (the generalization) + +A new =claude-rules/docs-lifecycle.md= captures the reusable shape, with spec-create as the first instance: + +1. Separate formal artifacts from working notes by location. +2. Lifecycle state lives *in* the artifact, on a scannable, greppable carrier (an org keyword heading), with a dated history. +3. Links use rename-safe identifiers. +4. A growing collection earns this treatment when "which of these are live?" starts requiring a file-by-file read. + +** The retrofit (reach mechanism for existing piles) + +A synced helper, =spec-sort=, run once per project. *Canonical placement:* like every synced asset, the helper and all workflow edits land in rulesets' canonical tree first — =claude-templates/.ai/scripts/spec-sort= with its bats tests in =claude-templates/.ai/scripts/tests/= (the glob-discovered suite), workflow changes in =claude-templates/.ai/workflows/= — then =scripts/sync-check.sh --fix= propagates the committed =.ai/= mirror and both sides commit together. A mirror-only edit is reverted by the next sync; nothing in this feature is exempt from that contract. Downstream projects receive everything through the normal startup rsync. The run itself, per project: + +1. *Classify* each =docs/**/*.org= outside =docs/specs/= by one predicate: a doc carrying *both* a =Decisions= heading *and* an =Implementation phases= heading is a spec candidate; everything else is a note. (A =Metadata= table alone does not qualify — real counter-case: =docs/design/task-review.org= has a Metadata table and no spine, and is a note.) The heuristic *proposes*; a human confirms every move (classification is a judgment call — Craig, 2026-06-28). +2. *Move* confirmed specs to =docs/specs/=, *renaming to carry the =-spec.org= suffix* when the file lacks it (spec-review's Phase 0 precondition requires it — a retrofitted spec must be reviewable in its new home). Prepend the status heading, assign an =:ID:=, and rewrite the keyword header to the two-sequence form above. *The proposed keyword is evidence-based, not laundered:* the doc's own Status field is one signal among several, because stale Status fields are exactly what caused the original .emacs.d sweep. For each candidate the helper shows an evidence panel — the current Status value, the decision/finding cookie states, the state and heading of any =todo.org= task that links or binds to the doc, the most recent history/review entries, and (where cheap) whether artifacts the phases name actually exist — and proposes the keyword the evidence supports. When the evidence is inconclusive, the default is the most conservative *non-terminal* state it supports (never a terminal one). =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= are never applied without an explicit human-stated reason, recorded in the status-history line. +3. *Relink* under an explicit contract: + - *Rewritten roots (project-owned):* =todo.org=, =.ai/notes.org=, =docs/**=, =.ai/project-workflows/=, =.ai/project-scripts/=. The rewrite recomputes each link's relative path from the linking file's directory to the new location. *All rewrites stay =file:= links* — conversion to =[[id:...]]= is the separate follow-up pass gated on the Emacs id-index mechanism (see Rename-safe links), never part of a sort run. + - *Reported, never rewritten:* =.ai/sessions/= archives (frozen history), git history, and synced template paths (=.ai/workflows/=, =.ai/scripts/=, =.ai/protocols.org=) — a downstream edit there is reverted by the next template sync, so the report names the canonical rulesets file that needs the edit instead. + - *Supported link shapes:* org =[[file:...]]= links, relative or project-root-anchored, with or without a description. Bare-path mentions in prose or scripts are *reported for manual handling*, never rewritten. + - *Safety:* dry-run report is the default; =--apply= writes, under a fail-safe contract sized to the fact that one run mutates filenames, links, headers, and =.ai/notes.org= together: + - *Clean-worktree preflight.* =--apply= refuses on a dirty git tree (=git status --porcelain= non-empty) unless =--allow-dirty= is passed, which prints exactly what recovery loses. A clean tree is what makes recovery trivially safe. + - *Validate, then write.* The full move + relink plan — every source, destination, and link edit — is computed and validated first (every link parses, every target is unambiguous, every destination path is free), written to a plan file for inspection, and only then executed from that recorded plan. Ambiguous cases (two candidates sharing a basename, an unparseable link) block validation: listed, untouched, non-zero exit until each is resolved or explicitly waived. + - *Failure mid-apply is not a shrug.* Any write failure or a failed post-apply residue grep stops the run, names what was and wasn't applied (from the plan), and prints the recovery recipe — =git restore= over the plan's touched paths *plus* deletion of the plan's newly-created destination paths (=git restore= reverts tracked edits but doesn't remove untracked copies the move created). Safe by construction because preflight required a clean tree; the project is never silently left half-migrated. + - After a successful apply, the residue grep for each old path across the rewritten roots must return zero or =spec-sort= exits non-zero naming the residue. +4. *Stamp* =:LAST_SPEC_SORT: YYYY-MM-DD= in =.ai/notes.org='s =* Workflow State= section — the same surface as =:LAST_AUDIT:= and =:LAST_INBOX_PROCESS:=, created idempotently (append the section if the file lacks it) exactly as task-audit already does. + +*The startup nudge — concrete contract.* Phase A's parallel batch gains one read-only probe: + +#+begin_src bash +{ [ -d docs/design ] || [ -n "$(find docs -maxdepth 1 -name '*-spec.org' -print -quit 2>/dev/null)" ]; } \ + && ! grep -qs ':LAST_SPEC_SORT:' .ai/notes.org \ + && echo "spec-sort: unsorted docs present" || true +#+end_src + +(Phase 4 refined the stray-root check from =compgen= to =find=: =compgen= is bash-only and zsh aborts on an unmatched glob, so the original snippet false-negatived on stray root specs under zsh.) + +(The probe also fires on stray =docs/*-spec.org= root files, so a project whose only misfiled specs sit at the =docs/= root still gets nudged.) + +Phase C surfaces one line when the probe printed ("this project's docs pile has never been spec-sorted — say 'run spec-sort' to sort it") and stays silent otherwise. Projects with nothing to sort — no =docs/design/= and no stray root specs — never see it; a stamped marker permanently clears it. + +* Alternatives Considered + +** Filename status suffix (=-spec-doing.org=, =-spec-implemented.org=) +- Good, because the state is visible in a bare =ls= with no tooling. +- Bad, because every transition renames a file in a cross-linked, template-synced doc set — each rename is link surgery or a broken link, and the churn lands in git history and inbound =todo.org= links. +- Neutral, because the ls-visibility it buys is matched by the one-line =rg= over status headings. +- Rejected 2026-06-28 (Craig chose org-keyword over his earlier filename-suffix lean). + +** Status field in the Metadata table only (no keyword) +- Good, because the field already exists and needs no new structure. +- Bad, because a table cell is neither org-agenda-scannable nor reliably greppable across format drift, and it carries no dated history. +- Neutral, because the field stays anyway — as the in-table mirror. + +** Relink-helper instead of org-id (keep =file:= links, fix them on every move) +- Good, because readers see plain paths. +- Bad, because it makes every future move a tooling event, and one missed run silently breaks links — the failure mode is invisible until someone clicks. +- Neutral, because the retrofit needs relink logic once regardless; org-id just makes it a one-time need. + +* Decisions [5/5] + +All five were settled with Craig on 2026-06-28 (recorded in todo.org; migrated here per that note). + +** DONE Location split — adopt +- Context: specs and notes share one directory; telling them apart requires opening files. +- Decision: =docs/specs/= for formal specs (Decisions + phases spine); =docs/design/= for notes. Documented in spec-create and the docs-lifecycle rule. +- Consequences: easier — a listing answers "what's formal"; harder — one-time migration and link updates (the retrofit). + +** DONE Status mechanism — org keyword authoritative, no filename suffix +- Context: filename suffix vs org keyword; suffix wins =ls= visibility, keyword wins link stability and zero-rename transitions. +- Decision: the org TODO keyword on the spec's top status heading is authoritative, mirrored by the Metadata =Status= field. No status suffixes in filenames. +- Consequences: easier — a transition is one keyword edit and links never break; harder — glanceability needs the one-line =rg= (or the vNext agenda view) instead of bare =ls=. +- (Refined in review, 2026-07-01: "one keyword edit" became "three lines in one file" — keyword + history line + Metadata mirror. The ratified decision stands; see Review findings.) + +** DONE Link safety — org-id for cross-doc spec links +- Context: both the migration move and any future rename break =file:= links. +- Decision: specs carry =:ID:= UUIDs on the status heading; cross-doc references use =[[id:...]]=. +- Consequences: easier — moves are free; harder — following a link outside org needs the =rg ':ID:'= lookup. +- (Refined in review, 2026-07-01: the decision stands; the *sequencing* is staged — IDs are assigned at sort time, but link conversion to =id:= waits for the executable Emacs id-index mechanism, so no window exists where converted links don't click. See Review findings.) + +** DONE Generalize as a =docs-lifecycle= rule +- Context: the shape (in-artifact lifecycle state, formal-vs-notes split, rename-safe links) recurs for any processed-document collection. +- Decision: capture it in =claude-rules/docs-lifecycle.md= with spec-create as the first instance. +- Consequences: easier — the next collection reuses a decided pattern; harder — the rule must stay honest as the spec instance evolves. + +** DONE Retrofit existing files across ALL projects +- Context: template sync distributes conventions but cannot perform a per-project one-time migration; legacy piles would stay misfiled forever. +- Decision: ship a confirmed classify-move-relink helper (=spec-sort=) plus a startup nudge gated on =:LAST_SPEC_SORT:=; the helper proposes, a human confirms. Pilot on rulesets first. +- Consequences: easier — every project converges without manual archaeology; harder — the helper needs real relink logic and tests, and classification stays a judgment call. + +* Review findings [14/14] + +Two independent reviews (Codex, 2026-07-01 22:22; a fresh-context Claude agent, 2026-07-01 22:25) converged on =Not ready= with the same worst finding. All nine findings were dispositioned accept and fixed in the responder pass below; each carries its response. + +** DONE Org TODO vocabulary drops decision and finding task states :blocking: +(Codex; the Claude reviewer found the same, adding that keywords must be unique across sequences so a naive two-line fix collides on =SUPERSEDED=/=CANCELLED=.) The spec's example header replaced the file-level keyword vocabulary, so =TODO=/=DONE= stopped being task states and the =[/]= cookies that gate readiness went vacuous — this file itself was the first casualty. +Response: the scheme is now two collision-free sequences — =TODO | DONE= for decisions/findings, =DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED= for lifecycle (the old header's =SUPERSEDED CANCELLED= done-states migrate to the lifecycle sequence, and a legacy =CANCELLED= decision still parses as a done-state, so cookies stay correct). This file's own header now carries both lines; the Design section documents the two-sequence rule and the retrofit rewrites legacy headers to it. New acceptance criterion: cookies must compute by org, not hand counting. + +** DONE Relink behavior is too vague for a safe migration :blocking: +(Codex; the Claude reviewer independently flagged the synced-.ai/ slice — a downstream rewrite there is reverted by the next template sync, e.g. =startup.org:154='s reference to a spec candidate.) The retrofit named no scan scope, link-shape list, rewrite rule, residue policy, or dry-run format — the implementer would have had to invent the migration's data-safety contract. +Response: the retrofit section now carries the explicit contract: rewritten roots (=todo.org=, =.ai/notes.org=, =docs/**=, project-owned =.ai/= dirs), reported-never-rewritten surfaces (=.ai/sessions/=, git history, synced template paths — with the canonical rulesets file named in the report), supported link shapes (org =file:= links; bare paths report-only), relative-path recomputation, dry-run default with =--apply=, post-apply residue grep gating exit status, and refuse-loudly on ambiguity. + +** DONE Sort marker and startup nudge do not name the actual state surface :blocking: +(Codex; the Claude reviewer rated the same gap minor — Codex's version was sharper: startup reads =.ai/notes.org=, not a root =notes.org=, and Workflow State may not exist.) +Response: the marker is pinned to =.ai/notes.org='s =* Workflow State= (the =:LAST_AUDIT:= / =:LAST_INBOX_PROCESS:= surface), created idempotently as task-audit already does; the Design section now spells the Phase A probe command, its exact fire condition, and the Phase C one-liner. + +** DONE Phase order can strand legacy specs behind the new review precondition :blocking: +(Codex; the Claude reviewer found the same at medium severity.) Hardening spec-review's path precondition in Phase 1 while piles stay unsorted until Phases 3-4 would make every legacy spec unreviewable in the gap. +Response: Phase 1 now carries the compatibility rule — legacy =-spec.org= locations stay reviewable (with a "run spec-sort" nudge) until the project stamps =:LAST_SPEC_SORT:=; the precondition hardens only after. Acceptance criterion 5 updated to match. + +** DONE No owner for the DOING → IMPLEMENTED flip :blocking: +(Claude reviewer.) spec-create owns =DRAFT= and spec-review owns =DRAFT= → =READY=, but implementation finishes outside the spec trio, and "a human edits it" is the exact mechanism whose failure produced this spec (.emacs.d's six shipped-but-unmarked specs). +Response: the Design section now has a transition-ownership table naming an owner for every flip. =READY= → =DOING= belongs to spec-response; =DOING= → =IMPLEMENTED= is a tracked obligation — spec-response's phase-to-task breakdown always emits a final "flip the spec" task — with task-audit's reconcile pass as the safety net (flag any =DOING= spec whose implementation tasks are all closed). Phase 1 includes both workflow edits. + +** DONE Classification heuristic is precedence-ambiguous +(Claude reviewer.) "Decisions plus phases or Metadata table" reads two ways, and =docs/design/task-review.org= (Metadata table, no spine) classifies differently under each. +Response: one predicate now — spec candidate iff the doc carries *both* a =Decisions= heading *and* an =Implementation phases= heading; a Metadata table alone does not qualify. The task-review.org counter-case is cited in the retrofit step. + +** DONE spec-sort never renames moved files to the -spec.org suffix +(Claude reviewer.) spec-review's Phase 0 hard-requires the suffix, so a retrofitted legacy spec without it would be unreviewable in its new home. +Response: retrofit step 2 now renames moved files to carry =-spec.org= when they lack it; the relink pass covers the rename like any move. Acceptance criterion 3 checks the suffix on the re-homed root specs. + +** DONE Clicked id: links won't resolve in Craig's Emacs +(Claude reviewer.) =org-id-locations= indexes only agenda and visited files, so fresh =:ID:=s in =docs/specs/= are invisible-until-clicked broken — the convention would trade visible link breakage for invisible breakage. +Response: named as an explicit .emacs.d-side prerequisite in the Rename-safe-links section (=org-id-extra-files= over =docs/specs/= globs, or periodic =org-id-update-id-locations=), carried in the Phase 4 note to .emacs.d, with the =rg= recipe as the interim fallback. + +** DONE Acceptance criterion 2 contradicts the Metadata Status mirror +(Claude reviewer.) "Exactly one keyword edit" was irreconcilable with the mandated mirror update. +Response: a transition is now defined everywhere as three lines in one file — keyword, history line, mirror — still no rename and no link edits. Goals, Design, and criterion 2 all say the same thing. + +** DONE Synced helper placement ignores the canonical/mirror split :blocking: +The spec says to build =.ai/scripts/spec-sort= and update =.ai/workflows/= behavior, but rulesets' current contract is that =claude-templates/.ai/= is canonical and the repo-root =.ai/= tree is only the committed mirror kept honest by =scripts/sync-check.sh=. =CLAUDE.md= explicitly warns that mirror-only edits get silently reverted by the next sync, and =make test= runs the mirror-side tests only after the canonical copy has been synced. V1 should say every shared workflow/script edit lands in =claude-templates/.ai/{workflows,scripts}/= first, then =scripts/sync-check.sh --fix= updates the mirror; =spec-sort= tests should be placed in the synced script-test tree and the acceptance criteria should include =sync-check= / workflow-integrity where relevant. (blocking) +Response: the retrofit section now opens with the canonical-placement contract (helper + tests in =claude-templates/.ai/scripts{,/tests}/=, workflow edits canonical-side, =sync-check --fix= propagates, both sides commit together); Phases 1 and 2 name it per artifact; new acceptance criterion requires =sync-check= to exit clean after the build commits. + +** DONE Task-audit safety net has no spec-to-task binding :blocking: +The spec says task-audit flags a =DOING= spec whose implementation tasks are all closed, but current =task-audit.org= audits open =todo.org= tasks and has no model for scanning =docs/specs/=, finding a spec's implementation tasks, or deciding "all closed" after =todo-cleanup.el --convert-subtasks= rewrites completed child tasks into dated entries. The added final "flip to IMPLEMENTED" task also means there may always be one open task, so a naive "all tasks closed" check never fires. V1 should define the binding spec-response writes into =todo.org= (for example a parent task property or stable link to the spec ID), the exact audit query, how converted dated entries count, and whether the final flip task is excluded from or satisfies the reconciliation rule. (blocking) +Response: spec-response's decomposition now stamps a =:SPEC_ID:= property (the spec's status-heading UUID) on the build parent task — the durable binding. The audit query is defined: for each =DOING= spec, find the task with matching =:SPEC_ID:=; flag when that parent is closed, archived, or missing. Checking the parent's keyword (not "all children closed") dissolves both the flip-task chicken-and-egg and the dated-entry conversion concern. New acceptance criterion exercises the flag. + +** DONE spec-sort apply path can leave a half-migrated tree :blocking: +The retrofit contract has dry-run by default and a post-apply residue grep, but it does not say what happens when =--apply= has moved files and then a relink, parse, or residue check fails. Because the operation mutates filenames, links, headers, IDs, and =.ai/notes.org= together, a partial failure can strand the project in the exact mixed state the tool is meant to prevent. V1 should require a clean-worktree preflight (or an explicit dirty-tree refusal/override), validate the full move/relink plan before the first write, write from a single recorded plan, and define recovery behavior for every failed apply: no files moved, automatic rollback, or a printed =git restore= / =git revert= recovery recipe that is safe for uncommitted local edits. (blocking) +Response: the relink contract's safety block now specifies the fail-safe apply: clean-worktree preflight (refuse on dirty, explicit =--allow-dirty= override that prints what recovery loses), full plan computed + validated + written to a plan file before the first write, execution from the recorded plan, and mid-apply failure stopping with a named applied/not-applied breakdown plus the =git restore= recovery recipe — safe by construction because preflight required a clean tree. Bats covers the preflight and the forced-failure recovery output (Phase 2, plus a new acceptance criterion). + +** DONE org-id Emacs prerequisite is not executable as written :blocking: +The spec says the .emacs.d-side fix can be =org-id-extra-files= over =docs/specs/= globs, but Emacs' own docstring says =org-id-extra-files= is a list of additional files and is only relevant when =org-id-track-globally= is set; it does not establish that project glob strings will be expanded or that every project root will be discovered. The rollout also converts links during the rulesets pilot before the Phase 4 note asks .emacs.d to make clicked =id:= links resolvable. V1 should either keep =file:= links until the Emacs support has landed, or specify the executable Emacs-side implementation precisely: how project =docs/specs/*.org= files are enumerated into =org-id-extra-files= or fed to =org-id-update-id-locations=, when it runs, how it is tested, and how rollout avoids a window where converted links do not click through. (blocking) +Response: took the fork that removes the window entirely — the pilot and every sort run rewrite =file:= links only; =:ID:= properties are still assigned (harmless, enables later mechanics); conversion to =id:= is a separate follow-up pass gated on .emacs.d landing an executable id-index mechanism, now specified concretely (enumerate =docs/specs/*.org= into =org-id-extra-files= as real file names under =org-id-track-globally=, or feed the enumeration to =org-id-update-id-locations=; verified by clicking a known link). Decision 3 carries a sequencing-refinement note; a new acceptance criterion asserts zero =id:= links exist after the pilot. + +** DONE Status confirmation can still encode stale reality :blocking: +The retrofit proposes lifecycle status from a doc's current =Status= field or review history, then asks a human to confirm. Those are the same stale/incomplete signals that caused the original .emacs.d sweep: shipped specs and dead specs were only knowable by reading code/tasks against the spec. If =spec-sort= only confirms a guessed keyword, the pilot can produce a clean-looking board whose state is still wrong. V1 should define status-confirmation evidence: for each spec candidate, what sources the helper shows (current Status, decision/finding cookies, linked =todo.org= parent state, recent history, matching implementation files/tests), what default is allowed when evidence is inconclusive, and that =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= require an explicit reason in the status history line. (blocking) +Response: retrofit step 2 now defines the evidence panel the helper shows per candidate (Status value, cookie states, bound/linking =todo.org= task state, recent history entries, cheap existence checks on phase-named artifacts) with the keyword proposed from the evidence, not the Status field alone. Inconclusive evidence defaults to the most conservative non-terminal state; =IMPLEMENTED= / =SUPERSEDED= / =CANCELLED= always require an explicit human-stated reason recorded in the history line. + +* Implementation phases + +** Phase 1 — Rule + template updates +Write =claude-rules/docs-lifecycle.md=. Update spec-create (emit into =docs/specs/=, the two-sequence keyword header, status heading with =:ID:= in the template, transition mechanics), spec-review (path expectation with the compatibility rule below; flipping =DRAFT= → =READY= on a passing review updates keyword + history + mirror), spec-response (owns =READY= → =DOING=; its decomposition stamps the =:SPEC_ID:= binding on the build parent and always emits the final "flip to IMPLEMENTED" task), and task-audit (one reconcile bullet running the =:SPEC_ID:= query: a =DOING= spec whose bound parent is closed, archived, or missing gets flagged). All four are synced assets: edits land in =claude-templates/.ai/= (and =claude-rules/=), the mirror follows via =sync-check --fix=, both commit together. *Compatibility rule:* spec-review keeps accepting legacy =-spec.org= locations (=docs/= root, =docs/design/=) until the project's =:LAST_SPEC_SORT:= is stamped, nudging "run spec-sort" when it meets one; only after the stamp does the =docs/specs/= precondition harden. No legacy spec is ever unreviewable during the transition. Tree stays working: new specs land in the new shape; old specs remain reviewable until their project sorts. + +** Phase 2 — The =spec-sort= helper +Build =claude-templates/.ai/scripts/spec-sort= (classify → evidence-based confirm → plan + validate → move + rename + prepend status heading + assign =:ID:= → relink =file:= references → stamp =:LAST_SPEC_SORT:=), with bats coverage in =claude-templates/.ai/scripts/tests/= (glob-discovered by =make test=) for classification, the evidence/confirm gate, plan validation, moving + renaming, relinking, the clean-worktree preflight, mid-apply failure recovery output, idempotence, and the marker stamp. Mirror synced via =sync-check --fix= in the same commit. Tree stays working: the script is callable but nothing invokes it yet. + +** Phase 3 — Pilot on rulesets +Run =spec-sort= against rulesets' own =docs/= (41 design files, 3 spec-spine candidates, 2 stray root specs). Fix what the pilot surfaces before any other project runs it. Tree stays working: moves are confirmed one by one, links updated in the same pass. + +** Phase 4 — Startup nudge + broadcast +Add the Phase A probe + Phase C nudge line (the concrete contract in the Design retrofit section). Send .emacs.d a note that the convention is live, its ~28-doc pile is ready to sort, and the id-index mechanism is its side of the staged link conversion: enumerate each project's =docs/specs/*.org= into =org-id-extra-files= as real file names (with =org-id-track-globally= t) or feed that enumeration to a periodic =org-id-update-id-locations=, verified by clicking a known id link. The =id:= link-conversion pass across projects runs only after that lands — it is follow-up work, not part of v1's sort runs. Tree stays working: the nudge is one read-only line per session until acted on; every link is a working =file:= link until conversion day. + +* Acceptance criteria +- [ ] =rg '^\* (DRAFT|READY|DOING|IMPLEMENTED|SUPERSEDED|CANCELLED) ' docs/specs/= lists every rulesets spec with its state, and the answer matches reality. +- [ ] A status transition on a spec changes exactly three lines in one file — the keyword, a history line, and the Metadata mirror — with no rename and no link edits. +- [ ] Every doc remaining in rulesets =docs/design/= is a note (lacks the Decisions + Implementation-phases spine); both stray =docs/= root specs are re-homed and carry the =-spec.org= suffix. +- [ ] All inbound links in the rewritten roots resolve after the pilot, and the post-apply residue grep returns zero. +- [ ] The spec's own decision/finding =[/]= cookies compute correctly under the two-sequence keyword header (org, not hand counting). +- [ ] spec-create emits new specs into =docs/specs/= in the new shape; spec-review accepts legacy locations until =:LAST_SPEC_SORT:= is stamped and refuses them after. +- [ ] Every helper/workflow artifact of this feature lives canonical-side (=claude-templates/.ai/=, =claude-rules/=) with the mirror in sync — =scripts/sync-check.sh= exits clean after the build commits. +- [ ] A =DOING= spec whose =:SPEC_ID:=-bound parent task is closed or missing is flagged by task-audit's reconcile pass (exercised in the pilot or a fixture). +- [ ] =spec-sort --apply= on a dirty worktree refuses (absent the override); a forced mid-apply failure in the bats suite yields the named-recovery output, not a half-migrated tree. +- [ ] After the pilot, no link the sort *rewrote* uses =[[id:...]]= form and no rewritten root gained a new =id:= link targeting a spec (conversion is the gated follow-up); every rewritten link is a resolving =file:= link. The check scopes to actual rewritten and spec-target links — literal prose mentions of the id syntax (which already exist in =todo.org= and older specs) don't count, so a naive whole-file grep is the wrong implementation. +- [ ] A project with an unsorted =docs/design/= gets the startup nudge; one confirmed =spec-sort= run clears it via =:LAST_SPEC_SORT:=. + +* Readiness dimensions +- Data model & ownership: the spec file owns its state; the Metadata mirror is display-only. No external index to drift. +- Errors, empty states & failure: =spec-sort= on a project with no =docs/= is a silent no-op; an ambiguous classification is surfaced, never auto-moved; a relink pass that finds zero inbound links is normal. +- Security & privacy: N/A because the docs are already in-repo; no new exposure surface. +- Observability: the status grep is the dashboard; =spec-sort= prints every proposed move and every rewritten link. +- Performance & scale: N/A because collections are tens of files; everything is one-shot or grep-speed. +- Reuse & lost opportunities: reuses org TODO keywords, org-id, the existing scheme-header pattern of declared vocabularies, and spec-review's in-file findings convention. +- Architecture fit & weak points: weak point is the classification heuristic — mitigated by the confirm gate. The status heading is additive, so old readers of spec files see one extra heading and nothing breaks. +- Config surface: none new — one marker line (=:LAST_SPEC_SORT:=) in the existing Workflow State section. +- Documentation plan: the docs-lifecycle rule is the documentation; spec-create's template is the worked example. +- Dev tooling: =spec-sort= ships with bats tests under the existing glob-discovered suite. +- Rollout, compatibility & rollback: additive per project, one project at a time, rulesets first. Rollback of a sort is =git revert= of the pilot commit (moves + relinks are one commit). +- External APIs & deps: N/A — plain files, =rg=, =uuidgen=. + +* Risks, Rabbit Holes, and Drawbacks +- *Relink misses an inbound link shape* (org radio links, bare paths in scripts). Dodge: the pilot greps for the old path after moving and fails loudly on any residue. +- *Heuristic over-classifies notes as specs.* Dodge: the confirm gate is mandatory; the helper never moves unconfirmed. +- *Keyword vocabulary drift* between this spec, the rule, and spec-create's template. Dodge: the rule names the vocabulary once and the others link it. + +* Testing / Verification / Rollout +bats for =spec-sort= (classification, the evidence/confirm gate, plan validation, move + rename, relink, the clean-worktree preflight, forced mid-apply failure recovery output, idempotence, marker stamp). The pilot run on rulesets is the live verification; the post-move residue grep is the acceptance check. Rollout is per-project via the startup nudge, each run human-confirmed. + +* References / Appendix +- Source proposal: [[file:../design/2026-06-15-spec-storage-lifecycle-proposal.org]] (.emacs.d handoff, 2026-06-15). +- Decisions record: todo.org "Spec storage location + lifecycle-status convention" (settled 2026-06-28). +- This file is the convention's first resident: it lives in =docs/specs/=, carries the status heading + =:ID:=, and drops the status filename suffix. + +* Review and iteration history +** 2026-07-01 Wed @ 22:13:00 -0400 — Claude — author +- What: initial draft, written from the five pre-ratified decisions. +- Why: the queued-specs half of the 2026-06-30 session goal; decisions were settled 2026-06-28 and needed migration into a buildable spec. +- Artifacts: todo.org task "Spec storage location + lifecycle-status convention"; source proposal above. + +** 2026-07-01 Wed @ 22:22:34 -0400 — Codex — reviewer +- What changed or was recommended: rubric =Not ready=. Four blocking findings were added: preserve Org task keywords while adding lifecycle status, make =spec-sort= relinking executable and failure-safe, define the actual =.ai/notes.org= marker/startup-nudge contract, and avoid stranding legacy specs behind a stricter path precondition before retrofit. +- Why: current rulesets workflows still depend on =TODO= / =DONE= decision and finding tasks, startup state lives in =.ai/notes.org=, and the repo still contains formal specs outside =docs/specs/= until the migration runs. +- Artifacts: Review findings section; current-state checks against =.ai/workflows/spec-create.org=, =.ai/workflows/spec-review.org=, =.ai/workflows/startup.org=, =scripts/sync-check.sh=, and =todo.org=. + +** 2026-07-01 Wed @ 22:25:00 -0400 — Claude (fresh-context agent) — reviewer +- What: rubric =Not ready=. Independently found Codex's keyword-vocabulary blocker (adding the cross-sequence uniqueness wrinkle) and the stranded-legacy-specs and marker-surface gaps, plus five findings of its own: no owner for the =DOING= → =IMPLEMENTED= flip (blocking), the precedence-ambiguous classification heuristic, the missing =-spec.org= rename in spec-sort, org-id click-resolution in a live Emacs, and the criterion-2/mirror contradiction. +- Why: fresh-eyes adversarial pass requested by Craig after his own read found nothing; the two reviews converging on the same worst bug from independent context is the confidence signal. +- Artifacts: Review findings section (findings 5-9); spot-checks against real repo files (=docs/design/task-review.org=, the two stray root specs, =startup.org:154=). + +** 2026-07-01 Wed @ 22:46:52 -0400 — Claude — second responder pass +- What: fixed all five of Codex's re-review findings in place (fourteen of fourteen closed): canonical-placement contract for every synced artifact (+ sync-check acceptance criterion), the =:SPEC_ID:= spec-to-task binding with the parent-keyword audit query (dissolving the flip-task chicken-and-egg), the fail-safe =--apply= contract (clean-tree preflight, validate-then-write from a recorded plan, named recovery), staged id-link conversion (pilot rewrites =file:= links only; =id:= conversion gated on the concrete .emacs.d id-index mechanism — the fork Craig approved), and evidence-based status confirmation (evidence panel, conservative non-terminal default, reasons required for terminal states). Status stays DRAFT; the READY flip belongs to the reviewers this round. +- Why: Craig approved fixing all five ("1", 2026-07-01), including the keep-file:-links-through-pilot fork. +- Artifacts: per-finding responses inline; the fixed Design/phase/criteria sections. + +** 2026-07-01 Wed @ 23:22:50 -0400 — Codex — reviewer +- What changed or was recommended: rubric =Ready=. No new blocking findings. The second responder pass closed all five Codex re-review blockers without regressing the first nine findings, and the spec now gives implementers concrete contracts for canonical synced assets, =:SPEC_ID:= task binding, fail-safe =spec-sort --apply= behavior, staged id-link conversion, evidence-based status confirmation, phase sequencing, and test coverage. +- Why: the current spec can be implemented and tested without hidden product decisions; remaining vNext work is separately tracked. +- Artifacts: status heading flipped to =READY=; =* Decisions= [5/5]; =* Review findings= [14/14]; Emacs batch cookie check. + +** 2026-07-01 Wed @ 22:41:21 -0400 — Claude (fresh-context agent) — verify pass; Claude — READY flip +- What: the original reviewer re-read the fixed spec against its own nine findings: all held, none regressed, verdict ready. It re-ran the classification predicate live (exactly 5 candidates; task-review.org excluded) and confirmed org computes the cookies. Two non-blocking minors folded in before the flip: a refinement note under Decision 2 (whose frozen body still said "one keyword edit") and a wider nudge probe that also fires on stray =docs/*-spec.org= root files. Status flipped DRAFT → READY. +- Why: Craig authorized the flip contingent on the verify pass clearing; it did. +- Artifacts: the status heading's history line; verify-pass report in the session record. + +** 2026-07-01 Wed @ 22:41:33 -0400 — Codex — reviewer +- What changed or was recommended: rubric =Not ready=. Five new blocking findings were added after the response pass: make shared workflow/script edits obey the =claude-templates/.ai/= canonical plus =.ai/= mirror contract; define how task-audit binds a =DOING= spec to its implementation tasks; make =spec-sort --apply= failure-safe; turn the org-id Emacs prerequisite into an executable rollout step; and require status confirmation to be evidence-based rather than a rubber-stamp of stale fields. +- Why: the response fixed the original keyword/relink/precondition issues but introduced new integration points in synced template assets, task-audit, Emacs id resolution, and migration safety that are not yet buildable from the spec. +- Artifacts: Review findings section; checks against =CLAUDE.md=, =scripts/sync-check.sh=, =.ai/workflows/task-audit.org=, =.ai/workflows/startup.org=, =.ai/notes.org=, current =docs/= inventory, and Emacs batch/docstring checks for Org TODO cookies and =org-id-extra-files=. + +** 2026-07-01 Wed @ 22:30:06 -0400 — Claude — responder +- What: merged both reviews into one findings ledger (nine findings, all dispositioned accept) and fixed all nine in place: two-sequence keyword header (applied to this file itself), transition-ownership table with the tracked flip-to-IMPLEMENTED task, single classification predicate, the -spec.org rename step, the full relink data-safety contract, the =.ai/notes.org= marker + Phase A/C startup contract, the legacy-location compatibility rule, the org-id Emacs prerequisite, and the three-line transition definition. Acceptance criteria updated to match. +- Why: Craig approved fixing all nine ("1", 2026-07-01); none touched the five ratified decisions. +- Artifacts: Review findings section (responses inline per finding); the fixed sections themselves. diff --git a/docs/agent-knowledge-base-spec.org b/docs/specs/agent-knowledge-base-spec.org index 78ff9bd..e36d897 100644 --- a/docs/agent-knowledge-base-spec.org +++ b/docs/specs/agent-knowledge-base-spec.org @@ -1,12 +1,20 @@ #+TITLE: Agent Knowledge Base on Org-roam — Spec #+AUTHOR: Craig Jennings & Claude #+DATE: 2026-06-10 +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* IMPLEMENTED Agent Knowledge Base on Org-roam — Spec +:PROPERTIES: +:ID: 08a5ec99-9e1e-40e4-8241-e8a41e9de49f +:END: +- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to IMPLEMENTED (reason: v1 (Phases 0-4) shipped 2026-06-10 on Craig's go; KB live at ~/org/roam with the knowledge-base rule installed machine-wide) * Metadata -| Status | implemented — v1 (Phases 0-4) shipped 2026-06-10 on Craig's go; manual validation + other-machine clones outstanding (todo.org) | +| Status | implemented | | Owner | Craig Jennings | | Reviewer | Craig Jennings; Codex (2026-06-10) | -| Related | [[file:../todo.org][todo.org — "Check that memories are sync'd across machines via git"]] | +| Related | [[file:../../todo.org][todo.org — "Check that memories are sync'd across machines via git"]] | This spec supersedes the 2026-06-05 draft (formerly docs/design/2026-06-05-org-roam-knowledge-base-spec.org, removed; content in git history), folding in Craig's 2026-06-10 ratification answers and restructuring to the spec-create format. @@ -308,4 +316,4 @@ Modified recommendations from the 2026-06-10 Codex review, with reasons. Everyth ** 2026-06-10 Wed @ 17:31:10 -0500 — Codex — reviewer - What changed or was recommended: re-ran the spec-review workflow after the caveat resolution. Rubric: ready. No new blocking or medium-priority findings; no review file written. Confirmed the implementation phases and test-surface tasks are already represented under the existing parent task in todo.org. - Why: the prior blockers are dispositioned, the work-root denylist is confirmed, the pointer-rule install path matches the current Makefile RULES glob, and v1's manual/agent-runnable verification surface is explicit. -- Artifacts: this file; [[file:../todo.org][todo.org]] parent task "Check that memories are sync'd across machines via git". +- Artifacts: this file; [[file:../../todo.org][todo.org]] parent task "Check that memories are sync'd across machines via git". diff --git a/docs/inbox-workflow-consolidation-spec.org b/docs/specs/inbox-workflow-consolidation-spec.org index 2e158b6..4543e77 100644 --- a/docs/inbox-workflow-consolidation-spec.org +++ b/docs/specs/inbox-workflow-consolidation-spec.org @@ -1,16 +1,23 @@ #+TITLE: Inbox Workflow Consolidation — Spec #+AUTHOR: Craig Jennings & Claude #+DATE: 2026-06-23 -#+TODO: TODO | DONE SUPERSEDED CANCELLED +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* READY Inbox Workflow Consolidation — Spec +:PROPERTIES: +:ID: a7fe2a10-dfa8-4ba3-a11a-e7b1288b7573 +:END: +- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to READY (evidence-based, human-confirmed) * Metadata -| Status | Ready — review incorporated (Codex, 2026-06-23) | +| Status | ready | |----------+-------------------------------------------------------------| | Owner | Craig | |----------+-------------------------------------------------------------| | Reviewer | Craig | |----------+-------------------------------------------------------------| -| Related | [[file:../todo.org][Consolidate inbox/triage workflows + scheduled inbox check]] | +| Related | [[file:../../todo.org][Consolidate inbox/triage workflows + scheduled inbox check]] | |----------+-------------------------------------------------------------| * Summary diff --git a/docs/design/wrapup-routing-spec.org b/docs/specs/wrapup-routing-spec.org index 434f8d9..1a150fc 100644 --- a/docs/design/wrapup-routing-spec.org +++ b/docs/specs/wrapup-routing-spec.org @@ -1,16 +1,23 @@ #+TITLE: Wrap-Up Inbox/Transcript Routing — Spec #+AUTHOR: Craig Jennings #+DATE: 2026-06-13 -#+TODO: TODO | DONE SUPERSEDED CANCELLED +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* DOING Wrap-Up Inbox/Transcript Routing — Spec +:PROPERTIES: +:ID: 00b47414-2213-4a99-be35-48ceb266fc08 +:END: +- 2026-07-02 Thu @ 00:17:01 -0400 — retrofitted by spec-sort; status set to DOING (evidence-based, human-confirmed) * Metadata -| Status | Ready — review incorporated (spec-review, 2026-06-21) | +| Status | doing | |----------+-----------------------------------------------------| | Owner | Craig Jennings | |----------+-----------------------------------------------------| | Reviewer | Codex (spec-review) | |----------+-----------------------------------------------------| -| Related | [[file:../../todo.org][todo.org: wrap-up routing task]] · [[file:2026-06-13-wrapup-inbox-transcript-routing-proposal.org][archsetup proposal]] | +| Related | [[file:../../todo.org][todo.org: wrap-up routing task]] · [[file:../design/2026-06-13-wrapup-inbox-transcript-routing-proposal.org][archsetup proposal]] | |----------+-----------------------------------------------------| * Summary diff --git a/flush/SKILL.md b/flush/SKILL.md index 4c2709a..ca139c1 100644 --- a/flush/SKILL.md +++ b/flush/SKILL.md @@ -1,6 +1,6 @@ --- name: flush -description: Mid-session context flush — the checkpoint half of the wrap/restart rhythm. Refresh the session-context anchor in place, prompt the user to /clear, then resume the same logical session from the anchor without re-running startup. Cheaper tokens and a sharper context window without fragmenting the session into archive files. Agent-callable and agent-initiated: the agent may run the pre-clear checkpoint on its own judgment at a clean task boundary, but /clear is user-only — the agent does all the work, then prompts for the single /clear keystroke. Use when the current task has a clean boundary and the context window is large enough that a reset would sharpen the work. Do NOT use for end-of-day or done-for-now (use wrap-it-up, which archives to .ai/sessions/ and commits), or for a genuine fresh start after being away or on another machine (use startup, which pulls + syncs + surfaces inbox). +description: Mid-session context flush — the checkpoint half of the wrap/restart rhythm. Refresh the session-context anchor in place, prompt the user to /clear (or in auto mode self-inject it via tmux), then resume the same logical session from the anchor without re-running startup. Cheaper tokens and a sharper context window without fragmenting the session into archive files. Agent-callable and agent-initiated: the agent may run the pre-clear checkpoint on its own judgment at a clean task boundary; interactively /clear stays the user's keystroke, while auto mode ("/flush auto", for unattended runs like the no-approvals speedrun or a recurring loop) arms .ai/scripts/self-inject.sh so tmux types /clear and a resume line at the agent's own idle prompt — zero human keystrokes. Use when the current task has a clean boundary and the context window is large enough that a reset would sharpen the work. Do NOT use for end-of-day or done-for-now (use wrap-it-up, which archives to .ai/sessions/ and commits), or for a genuine fresh start after being away or on another machine (use startup, which pulls + syncs + surfaces inbox). --- # /flush — Mid-Session Context Checkpoint @@ -20,9 +20,11 @@ This is the checkpoint half of the wrap/restart rhythm. It is distinct from two The skill is agent-callable. The agent may also **initiate** a flush on its own judgment when the rhythm calls for it (see below) — it runs the pre-clear checkpoint, then prompts the user to type `/clear`. -## The hard constraint +## The constraint, and the auto-mode exception -`/clear` is a user-only command. The agent **cannot** execute it. "Agent-initiated" means the agent runs the pre-clear checkpoint (refresh the anchor + verify the write landed) on its own, then **prompts** the user: "checkpoint saved, type /clear to reset." The agent proposes and does all the work; the user supplies the single `/clear` keystroke. Never design or imply a flow where the agent self-triggers `/clear`. +`/clear` is a prompt command — the agent cannot execute it as a tool call. Interactively, "agent-initiated" means the agent runs the pre-clear checkpoint (refresh the anchor + verify the write landed) on its own, then **prompts** the user: "checkpoint saved, type /clear to reset." The user supplies the single `/clear` keystroke. + +**Auto mode** (`/flush auto`) is the sanctioned exception for unattended sessions: after the checkpoint, the agent arms the tmux server to type `/clear` and a resume line at its own idle prompt (see Auto mode below). The constraint that never bends is the **gate order**: the anchor write is verified on disk *before* anything arms or prompts a clear. There is no recovering the conversation afterward. ## When the agent should initiate @@ -68,6 +70,30 @@ That is the wrap/restart rhythm. When both conditions hold, run Phase 1 and end 6. **Hand off the clear.** Tell the user the checkpoint is saved, name the anchor path, and prompt: type `/clear` now, then send any message to resume. +## Auto mode — self-injected clear for unattended sessions + +`/flush auto` runs Phase 1 in full (steps 1-5, including the write-verified gate), then replaces step 6's user prompt with a self-injection. Proven live in the archsetup session 2026-07-02; the mechanism and its gotchas live in `.ai/scripts/self-inject.sh` (synced into every project). + +1. **Derive the pane first, synchronously, from this shell:** + + ```bash + pane=$(.ai/scripts/self-inject.sh) + ``` + + This must happen before arming: the armed step runs under the tmux *server*, where ancestry-based pane detection cannot work. + +2. **Arm the injection via the tmux server, then end the turn immediately:** + + ```bash + tmux run-shell -b ".ai/scripts/self-inject.sh -t $pane 25 '/clear' 15 'go (auto-flush resume: continue per Next Steps)'" + ``` + + `run-shell -b` is load-bearing — a detached child of a tool call (`setsid`/`nohup`/`&`) dies when the tool call ends; only a server-owned process survives the turn boundary. The delays let the turn fully end before `/clear` lands (25s) and let the `SessionStart` hook finish before the resume line lands (15s). + +3. **End the turn.** The prompt must be idle when the keys arrive. The injected `/clear` fires the same `SessionStart(clear)` hook as a hand-typed one; the injected resume line starts the next turn. Zero human keystrokes. + +**When to use it:** unattended runs only — the no-approvals speedrun, a recurring loop, any session where nobody is at the keyboard. The collision hazard is real: keys injected while a human is mid-keystroke merge into their input (`/clear` has become `/clearto`). If a user may be present, say the window is armed and to keep hands off, or use the interactive prompt instead. + ## Phase 2 — Post-clear resume (hook-driven) This half is driven by the `SessionStart(clear)` hook, not by this skill — but it is documented here so the loop is legible. diff --git a/scripts/sweep-gitignore-tooling.sh b/scripts/sweep-gitignore-tooling.sh index 63fc066..68bfe2d 100755 --- a/scripts/sweep-gitignore-tooling.sh +++ b/scripts/sweep-gitignore-tooling.sh @@ -10,13 +10,19 @@ # # For each AI project (a directory with .ai/protocols.org) under the search # roots, if it's a git checkout in gitignore mode (.ai/ already appears in its -# .gitignore), ensure .ai/, .claude/, CLAUDE.md, and AGENTS.md are all ignored. -# Append only the missing lines, so a re-run is a no-op. +# .gitignore, in either the unanchored `.ai/` or anchored `/.ai/` form), ensure +# .ai/, .claude/, CLAUDE.md, and AGENTS.md are all ignored. Append only the +# missing lines, in whichever style the file already uses, so a re-run is a +# no-op. # # Track-mode projects (.ai/ NOT in .gitignore) are skipped by design: they # track their tooling on purpose — team repos sharing config with teammates who # don't run rulesets, or private-remote personal repos where the history IS the -# project. +# project. But a track-mode project whose tracked tooling is reachable from a +# non-cjennings.net remote gets a loud WARN: per convention the tooling set is +# gitignored anywhere the repo can reach a public host, and a server-side +# mirror hook can publish even a "private" remote (the 2026-06-30 .emacs.d +# exposure rode exactly that). # # A line added here only stops *future* commits. If a target path is already # tracked, the ignore has no effect until it's untracked; the sweep warns so @@ -30,6 +36,14 @@ set -euo pipefail IGNORE_SET=('.ai/' '.claude/' 'CLAUDE.md' 'AGENTS.md') +# A pattern counts as present in either the unanchored (`.ai/`) or anchored +# (`/.ai/`) form — both ignore the root-level path; treating them as different +# is what silently skipped anchored-style projects. +has_ignore() { + local pat="$1" gi="$2" + grep -qFx "$pat" "$gi" || grep -qFx "/$pat" "$gi" +} + dry_run=0 roots=() for arg in "$@"; do @@ -72,16 +86,41 @@ for project in "${projects[@]}"; do continue fi - # Gitignore mode iff .ai/ is already ignored. Otherwise track-mode: leave it. - if [ ! -f "$gi" ] || ! grep -qFx '.ai/' "$gi"; then - echo "skip $name — track-mode (.ai/ not gitignored)" + # Gitignore mode iff .ai/ is already ignored (either style). Otherwise + # track-mode: leave the .gitignore alone, but warn when tracked tooling can + # reach a non-cjennings.net remote — a track-mode repo on a public host (or + # behind an invisible server-side mirror) is the exposure the convention + # exists to prevent. + if [ ! -f "$gi" ] || ! has_ignore '.ai/' "$gi"; then + tracked_tooling=() + for pat in "${IGNORE_SET[@]}"; do + path="${pat%/}" + if git -C "$project" ls-files --error-unmatch "$path" >/dev/null 2>&1; then + tracked_tooling+=("$path") + fi + done + # Private = the cjennings.net server, whether addressed by FQDN or by the + # bare `cjennings` ssh-config alias (git@cjennings:repo.git). + public_remote="$(git -C "$project" remote -v 2>/dev/null \ + | awk '{print $2}' | grep -vE '(@|://)cjennings(\.net)?[:/]' | sort -u | head -1 || true)" + if [ "${#tracked_tooling[@]}" -gt 0 ] && [ -n "$public_remote" ]; then + echo "skip $name — track-mode (.ai/ not gitignored)" + echo " WARN $name: tracked tooling (${tracked_tooling[*]}) is publicly reachable via $public_remote — gitignore the set and 'git -C $project rm --cached -r <path>' unless this is a deliberate team-shared config" + else + echo "skip $name — track-mode (.ai/ not gitignored)" + fi skipped=$((skipped + 1)) continue fi + # Append in the style the file already uses: anchored if its .ai/ marker + # line is the anchored form. + prefix="" + grep -qFx '/.ai/' "$gi" && prefix="/" + needed=() for pat in "${IGNORE_SET[@]}"; do - grep -qFx "$pat" "$gi" || needed+=("$pat") + has_ignore "$pat" "$gi" || needed+=("${prefix}${pat}") done if [ "${#needed[@]}" -eq 0 ]; then @@ -103,9 +142,11 @@ for project in "${projects[@]}"; do swept=$((swept + 1)) # Warn on any newly-ignored path that's already tracked — the ignore won't - # untrack it. + # untrack it. Strip the anchored prefix before asking git: the pattern + # `/CLAUDE.md` is the repo-relative path `CLAUDE.md`. for pat in "${needed[@]}"; do - path="${pat%/}" + path="${pat#/}" + path="${path%/}" if git -C "$project" ls-files --error-unmatch "$path" >/dev/null 2>&1; then echo " WARN $name: $path is currently tracked — 'git -C $project rm --cached -r $path' to untrack" fi diff --git a/scripts/tests/sweep-gitignore-tooling.bats b/scripts/tests/sweep-gitignore-tooling.bats index a28087e..f18eac5 100644 --- a/scripts/tests/sweep-gitignore-tooling.bats +++ b/scripts/tests/sweep-gitignore-tooling.bats @@ -109,3 +109,84 @@ make_project() { [ "$status" -eq 0 ] [[ "$output" == *"not a git checkout"* ]] } + +@test "sweep: anchored /.ai/ is recognized as gitignore-mode, appends anchored" { + make_project anchored $'/.ai/\n' + + run bash "$SWEEP" "$ROOT" + + [ "$status" -eq 0 ] + [[ "$output" != *"anchored — track-mode"* ]] + grep -qFx "/.claude/" "$ROOT/anchored/.gitignore" + grep -qFx "/CLAUDE.md" "$ROOT/anchored/.gitignore" + grep -qFx "/AGENTS.md" "$ROOT/anchored/.gitignore" +} + +@test "sweep: anchored partial project gets only the missing lines" { + make_project anchoredpartial $'/.ai/\n/.claude/\n' + + run bash "$SWEEP" "$ROOT" + + [ "$status" -eq 0 ] + # /.claude/ already present in anchored form — not re-added in either form. + [ "$(grep -cFx '/.claude/' "$ROOT/anchoredpartial/.gitignore")" -eq 1 ] + ! grep -qFx ".claude/" "$ROOT/anchoredpartial/.gitignore" + grep -qFx "/CLAUDE.md" "$ROOT/anchoredpartial/.gitignore" + grep -qFx "/AGENTS.md" "$ROOT/anchoredpartial/.gitignore" +} + +@test "sweep: anchored gitignore-mode is idempotent" { + make_project anchored2 $'/.ai/\n' + bash "$SWEEP" "$ROOT" >/dev/null + + run bash "$SWEEP" "$ROOT" + + [ "$status" -eq 0 ] + [[ "$output" == *"already complete"* ]] + [ "$(grep -cFx '/.claude/' "$ROOT/anchored2/.gitignore")" -eq 1 ] +} + +@test "sweep: track-mode with tracked tooling and a non-cjennings.net remote warns" { + make_project publictrack $'out/\n' + echo "# project rules" > "$ROOT/publictrack/CLAUDE.md" + (cd "$ROOT/publictrack" \ + && git add CLAUDE.md \ + && git -c user.email=t@t -c user.name=t commit -qm seed \ + && git remote add origin git@github.com:someone/publictrack.git) + + run bash "$SWEEP" "$ROOT" + + [ "$status" -eq 0 ] + [[ "$output" == *"WARN"* ]] + [[ "$output" == *"publicly reachable"* ]] + # Still track-mode: nothing written to its .gitignore. + ! grep -qFx ".claude/" "$ROOT/publictrack/.gitignore" +} + +@test "sweep: track-mode with tracked tooling on a cjennings.net remote stays quiet" { + make_project privatetrack $'out/\n' + echo "# project rules" > "$ROOT/privatetrack/CLAUDE.md" + (cd "$ROOT/privatetrack" \ + && git add CLAUDE.md \ + && git -c user.email=t@t -c user.name=t commit -qm seed \ + && git remote add origin git@cjennings.net:privatetrack.git) + + run bash "$SWEEP" "$ROOT" + + [ "$status" -eq 0 ] + [[ "$output" != *"publicly reachable"* ]] +} + +@test "sweep: the bare cjennings ssh-alias remote counts as private too" { + make_project aliastrack $'out/\n' + echo "# project rules" > "$ROOT/aliastrack/CLAUDE.md" + (cd "$ROOT/aliastrack" \ + && git add CLAUDE.md \ + && git -c user.email=t@t -c user.name=t commit -qm seed \ + && git remote add origin git@cjennings:aliastrack.git) + + run bash "$SWEEP" "$ROOT" + + [ "$status" -eq 0 ] + [[ "$output" != *"publicly reachable"* ]] +} @@ -39,7 +39,14 @@ Tags are assigned and refreshed by =task-audit=; =task-review= keeps them honest * Rulesets Open Work -** DOING [#B] wrap-it-up teardown + "wrap it up and shutdown" :feature: +** TODO [#C] KB orphan-node review pass :chore: +:PROPERTIES: +:CREATED: [2026-07-01 Wed] +:END: +The 2026-07-01 kb-hygiene report listed 42 agent KB nodes with no inbound id: links (of 53 agent nodes; 0 conflicts, no duplicate titles). Orphan-ness alone isn't a defect — agent nodes are found by rg, not only by links — but a periodic pass is worth doing: prune nodes that aged out, merge near-duplicates, add id: links where clusters exist. Regenerate the list with the kb-hygiene script rather than trusting the snapshot. Propose deletions/merges to Craig before applying (auto-cleanup allowed only for :agent:-tagged nodes after approval, per knowledge-base.md). + +** DONE [#B] wrap-it-up teardown + "wrap it up and shutdown" :feature: +CLOSED: [2026-07-01 Wed] :PROPERTIES: :CREATED: [2026-06-23 Tue] :LAST_REVIEWED: 2026-06-24 @@ -55,36 +62,11 @@ Per a roam-inbox FYI from .emacs.d (2026-06-23 23:38): both ai-term handoffs (mu *** 2026-06-24 Wed @ 06:51:13 -0400 Unblocked — .emacs.d companion landed; feature now live The three companion functions are in =.emacs.d/modules/ai-term.el= (=cj/ai-term-quit= 1068, =cj/ai-term-live-count= 1087, =cj/ai-term-shutdown-countdown= 1109), matching the contract — double-checked the bodies: quit kills session+buffer+restores layout idempotently, live-count returns the gate integer, shutdown-countdown re-checks the gate (TOCTOU guard), runs an abort-able =run-at-time= countdown (C-g cancels), then a configurable =cj/ai-term-shutdown-command=. 13 ERT tests, headless-verified live (.emacs.d FYI 2026-06-24 06:44). Dropped =:blocked:= / =:BLOCKED_BY:= — the build dependency is resolved; only the manual end-to-end validation below remains. NOTE: with the Stop hook wired and the companion present, the feature is now functional — the next bare "wrap it up" will actually tear the session down. Run the validation below before relying on it. -*** TODO Manual testing and validation :test: -What we're verifying: the wrap-teardown + shutdown feature end to end, once =.emacs.d/modules/ai-term.el= has the three companion functions and the =Stop= hook is installed (=make install-hooks= + the =settings-snippet.json= Stop block in =~/.claude/settings.json=). These need a live Emacs daemon + tmux + an =aiv-<proj>= ai-term session; they can't be driven from a script. - -**** Bare "wrap it up" tears down after the valediction -What we're verifying: teardown is the default and fires only after the valediction renders. -- In an ai-term =aiv-<proj>= session, say "wrap it up". -- Watch the wrap run (summary, archive, commit, push, valediction). -Expected: the valediction renders in full first, THEN the vterm buffer + =aiv-<proj>= tmux session + =claude= all disappear and the saved window geometry is restored. =/tmp/ai-wrap-teardown-<proj>= does not linger afterward. - -**** "wrap it up with summary" / "and summarize" keeps the buffer -What we're verifying: the explicit qualifier opts out of teardown. -- Say "wrap it up with summary" (then, separately, "wrap it up and summarize"). -Expected: the wrap completes (commit + push + archive), but the buffer and session stay up and readable. No teardown. - -**** "wrap it up and shutdown" aborts when another session is live -What we're verifying: the safety gate refuses to power off out from under another session. -- Start a second =aiv-*= ai-term session in another project. -- In the first, say "wrap it up and shutdown". -Expected: the wrap completes but the shutdown is refused; the other live =aiv-*= session is listed, and the valediction says it fell back to a normal wrap. No poweroff, no teardown, no sentinel dropped. - -**** "wrap it up and shutdown" as the sole session powers off (cancellable) -What we're verifying: the happy path and the abort window. -- With this as the only =aiv-*= session, say "wrap it up and shutdown". -- During the countdown, first run it once and cancel with =C-g=; then run it again and let it complete. (Stub =sudo shutdown now= to an echo while validating so the box doesn't actually power off.) -Expected: after commit + push, a 10→1 countdown renders one-per-second in the Emacs echo area; =C-g= cancels it cleanly (no shutdown); letting it finish fires =shutdown=. - -**** Teardown never precedes a verified push -What we're verifying: no sentinel is dropped before commit + push succeeds. -- Trigger a teardown wrap in a state where the push would fail (e.g. temporarily point the remote somewhere unreachable). -Expected: the wrap surfaces the push failure and stops; no =/tmp/ai-wrap-*-<proj>= sentinel is created, so no teardown fires. +*** 2026-07-01 Wed @ 21:52:15 -0400 Plumbing pre-flight re-verified; only the eyes-on tests remain +Fresh pre-flight, all green: Stop hook block in =~/.claude/settings.json= points at =~/.claude/hooks/ai-wrap-teardown.sh=, the symlink resolves to the rulesets canonical, no stale =/tmp/ai-wrap-teardown-*= sentinel, and all three companion functions are live in the daemon (=(t t t)=). Three =aiv-*= sessions live right now (=_emacs_d=, =archsetup=, =rulesets=), so the shutdown-gate refusal test has its multi-session condition available. Everything left in the checklist below needs Craig's eyes on a scratch session: buffer teardown + geometry restore, the qualifier opt-outs, the countdown render + C-g, and the push-failure guard. + +*** 2026-07-01 Wed @ 21:59:43 -0400 Manual end-to-end validation passed — all five tests, Craig's live run +Craig ran the full checklist in a live Emacs/tmux ai-term setup: (1) bare "wrap it up" tore down after the valediction rendered, geometry restored, no lingering sentinel; (2) "with summary" / "and summarize" both wrapped without teardown, buffer stayed readable; (3) "wrap it up and shutdown" with another aiv-* session live refused the shutdown, named the other session, fell back to a normal wrap; (4) as the sole session, the 10→1 echo-area countdown rendered one-per-second, C-g cancelled cleanly, and a full run fired the (stubbed) shutdown command; (5) with the push made to fail, the wrap stopped at the failure and no sentinel was dropped. Works great — feature validated and live. Both sides complete: rulesets Stop hook + wrap-it-up Teardown mode, .emacs.d companion functions. ** TODO [#B] Helper-agent instance support — concurrent same-project Claude :feature:spec: :PROPERTIES: @@ -135,7 +117,7 @@ Craig's call (2026-06-24): helper-instance is independent of the generic-runtime :CREATED: [2026-06-13 Sat] :LAST_REVIEWED: 2026-06-24 :END: -Optional wrap-up step that surfaces filed keepers belonging to another project, recommends a destination, and routes each to that project's =inbox/= via =inbox-send= (the destination's own =process-inbox= files it; transcript filing deferred to vNext). Spec: [[file:docs/design/wrapup-routing-spec.org]] — Ready, [9/9] decisions. Source proposal: [[file:docs/design/2026-06-13-wrapup-inbox-transcript-routing-proposal.org]]. +Optional wrap-up step that surfaces filed keepers belonging to another project, recommends a destination, and routes each to that project's =inbox/= via =inbox-send= (the destination's own =process-inbox= files it; transcript filing deferred to vNext). Spec: [[file:docs/specs/wrapup-routing-spec.org]] — Ready, [9/9] decisions. Source proposal: [[file:docs/design/2026-06-13-wrapup-inbox-transcript-routing-proposal.org]]. *** 2026-06-21 Sun @ 02:06:37 -0400 Spec-review + spec-response complete — Ready Craig's review challenge reshaped the design from a direct cross-repo =todo.org= move to =inbox-send= delivery into the destination's inbox (safer: reuses the sanctioned cross-project path, gets provenance + per-project filing for free, degrades gracefully where a destination has an =inbox/= but no =todo.org=). D2/D3 superseded; D7 (inbox-send delivery), D8 (=:ROUTE_CANDIDATE:= marker at file time), D9 (local source removal + reject-flow recovery) added. Spec-review file consumed and deleted. Implementation-task breakdown filed below (spec-response Phase 6). @@ -146,14 +128,8 @@ The 2026-06-23 inbox consolidation (24ca58d) merged =process-inbox= + =monitor-i *** 2026-06-28 Sun @ 13:02:42 -0400 Built the recommendation engine + destination discovery Added =.ai/scripts/route_recommend.py= (canonical + mirror): pure =recommend(item, projects) → (destination, confidence)= with strong (name/path literal, word-boundary matched, dot-stripped alias aware), weak (distinctive name-token overlap), and none tiers; a multi-way top-tier tie downgrades to weak with a deterministic pick (most overlap, then alphabetical); empty list → none. The CLI (=--item=, =--exclude=) reuses =inbox-send.py='s =discover_projects= via importlib so the candidate set matches inbox-send's project universe. 13 tests (the five spec'd cases + boundary/path/strong-beats-weak + 3 sandboxed CLI integration tests), full =make test= green. Covers spec Phases 1 + 3. Next sub-tasks (=:ROUTE_CANDIDATE:= marker, wrap-up router) call this engine. -*** TODO [#B] =:ROUTE_CANDIDATE:= marker in inbox process mode :feature:solo: -Extend =inbox.org= process mode's "File as TODO" disposition (core §3 / Phase D) to stamp =:ROUTE_CANDIDATE: <inferred-project>= on any keeper whose inferred home differs from the current project (uses the engine above). Edit the canonical, sync the =.ai/= mirror, verify sync-check clean. Spec Phase 2 / D8. Spec: [[file:docs/design/wrapup-routing-spec.org]]. (Originally targeted =process-inbox.org=, merged into =inbox.org= by the 2026-06-23 consolidation.) - -*** TODO [#B] Wrap-up router sub-step in wrap-it-up.org :feature:solo: -Add the optional router to =wrap-it-up.org= Step 3 after the inbox sanity check: surface the =:ROUTE_CANDIDATE:= batch (task / destination / delivery mode / confidence), go/skip; on go, per candidate =inbox-send= a one-task handoff to the destination's =inbox/= and remove the keeper from the local =todo.org=; empty set = silent. Name the gate-vs-optional split in the prose. Edit canonical + sync mirror. Spec Phase 4 / D7 / D9. Spec: [[file:docs/design/wrapup-routing-spec.org]]. - -*** TODO [#B] Wrap-up routing — test surface :test:solo: -Unit: recommendation engine (strong/weak/none, two-project tie, empty list); marker stamping (cross-project keeper tagged, local keeper not, standing backlog never). Integration (bats, fixture projects + temp =todo.org=): go issues N =inbox-send= calls to the right inboxes with sources removed; skip leaves all in place; empty set = zero interaction; a candidate whose destination has =inbox/= but no =todo.org= still delivers. Spec: [[file:docs/design/wrapup-routing-spec.org]] (Acceptance criteria). +*** 2026-07-02 Thu @ 00:36:12 -0400 Phases 2 + 4 + test surface landed — marker, router, route-batch helper +inbox.org's "File as TODO" disposition now runs route_recommend on each keeper and stamps =:ROUTE_CANDIDATE: <destination>= on strong/weak matches (none stamps nothing; local keepers stay unstamped) — spec Phase 2 / D8. wrap-it-up.org Step 3 gained the optional router directly after the inbox sanity check, with the gate-vs-optional split named in the prose: surface the batch (task / destination / delivery mode / confidence, weak visibly labeled), go/skip, empty set = zero interaction — spec Phase 4 / D7 / D9. The go path is mechanical: new =.ai/scripts/route-batch= (--list read-only, --go extracts the subtree minus the marker with children riding along and headings promoted, delivers via inbox-send for provenance, removes the local subtree only after a successful send; a failed send leaves the task in place and exits non-zero). Test surface: engine unit tests existed (13); route-batch adds a 9-test bats suite (list/backlog-exclusion, empty-set silence, list-modifies-nothing = skip semantics, delivery + provenance + children, local-task survival, drawer-minus-marker, inbox-without-todo.org delivery, empty go, failed-send recovery). cross-project.md notes the router as a sanctioned cross-project write path. make test green, sync clean. *** TODO [#B] Wrap-up routing — manual end-to-end validation :test: What we're verifying: a real keeper routes through a live wrap and the destination actually files it. @@ -164,7 +140,7 @@ What we're verifying: a real keeper routes through a live wrap and the destinati Expected: the task ends up in the destination's =todo.org=, gone from the source, with no foreign =todo.org= written directly. Not =:solo:= — needs a real cross-project wrap and the destination's next session. *** TODO [#D] Wrap-up routing — transcript filing (vNext) :feature:no-sync: -File a meeting recording into the destination =assets/= per =working-files.md=, batch go/skip mirroring the task router. Gated on the source-location decision (spec D4). Spec: [[file:docs/design/wrapup-routing-spec.org]] (Phase 5). +File a meeting recording into the destination =assets/= per =working-files.md=, batch go/skip mirroring the task router. Gated on the source-location decision (spec D4). Spec: [[file:docs/specs/wrapup-routing-spec.org]] (Phase 5). ** TODO [#C] Multiple agent-source improvements :spec: :PROPERTIES: @@ -262,10 +238,10 @@ Cancelled the follow-up brainstorm and undid the dedicated-repo migration at Cra *** 2026-06-05 Fri @ 05:57:35 -0500 Pivot: adopt the existing org-roam KB as the shared agent substrate Pressure-tested the two-tier idea, then Craig redirected: a shared org-roam knowledge base any project can read and write makes this simpler. Ground truth verified: =~/sync/org/roam/= already exists (484 org files, curated since 2023, Syncthing-synced, not git). So cross-machine sync is already solved, and the task stops being "build a memory-sync system" and becomes "point agents at the KB that already syncs." The dedicated-repo and two-tier approaches are both superseded for the storage+sync half. -Wrote a one-page spec: [[file:docs/agent-knowledge-base-spec.org][agent-knowledge-base-spec.org]] (originally docs/design/2026-06-05-org-roam-knowledge-base-spec.org; superseded by the 2026-06-10 spec-create rewrite at the new path). Five decisions, mechanics recommended: (1) KB is a queried substrate accessed as files (ripgrep + follow =[[id:]]= by grep), not via the org-roam package; (2) capture in harness memory, promote durable facts into the KB (same cadence as the pattern catalog) — resolves the at-risk problem since the valuable knowledge moves to the synced KB; (3) a =claude-rules/knowledge-base.md= pointer rule carries path/query/write-schema/boundary; (4) write schema = roam-valid node + =:agent:= filetag so agent notes stay distinguishable and index on the next =org-roam-db-sync=. The rules layer (=claude-rules/=, =CLAUDE.md=) is untouched — the KB replaces the memory tier, not the rules tier. +Wrote a one-page spec: [[file:docs/specs/agent-knowledge-base-spec.org][agent-knowledge-base-spec.org]] (originally docs/design/2026-06-05-org-roam-knowledge-base-spec.org; superseded by the 2026-06-10 spec-create rewrite at the new path). Five decisions, mechanics recommended: (1) KB is a queried substrate accessed as files (ripgrep + follow =[[id:]]= by grep), not via the org-roam package; (2) capture in harness memory, promote durable facts into the KB (same cadence as the pattern catalog) — resolves the at-risk problem since the valuable knowledge moves to the synced KB; (3) a =claude-rules/knowledge-base.md= pointer rule carries path/query/write-schema/boundary; (4) write schema = roam-valid node + =:agent:= filetag so agent notes stay distinguishable and index on the next =org-roam-db-sync=. The rules layer (=claude-rules/=, =CLAUDE.md=) is untouched — the KB replaces the memory tier, not the rules tier. *** 2026-06-10 Wed @ 14:29:20 -0500 Spec ratified — write boundary is option C; rewritten to spec-create format -Craig answered via cj annotations in the spec (2026-06-10): DECISION 5 is option C (read-shared, write-scoped — work agents never write the KB). Syncthing does replicate ~/sync/ to a work machine and Craig is fine with how C handles it. Node granularity: per-fact nodes. Write review: agent writes land freely in the KB only — explicitly not permission to post to email, Linear, or any public channel without review and consent. The spec was rewritten into the spec-create format at [[file:docs/agent-knowledge-base-spec.org][agent-knowledge-base-spec.org]] (old draft removed). Implementation explicitly held pending Craig's go-ahead; one decision still open (D7, next VERIFY). +Craig answered via cj annotations in the spec (2026-06-10): DECISION 5 is option C (read-shared, write-scoped — work agents never write the KB). Syncthing does replicate ~/sync/ to a work machine and Craig is fine with how C handles it. Node granularity: per-fact nodes. Write review: agent writes land freely in the KB only — explicitly not permission to post to email, Linear, or any public channel without review and consent. The spec was rewritten into the spec-create format at [[file:docs/specs/agent-knowledge-base-spec.org][agent-knowledge-base-spec.org]] (old draft removed). Implementation explicitly held pending Craig's go-ahead; one decision still open (D7, next VERIFY). *** 2026-06-10 Wed @ 14:35:40 -0500 Spec review — not ready Review written at docs/agent-knowledge-base-spec-review.org (deleted on disposition completion; content summarized in the spec's Review dispositions). Rubric: =Not ready=. Blockers: resolve D7 (keep vs retire harness memory) and define the executable personal/work/unknown write-boundary classifier plus work-side write/refusal destination. Medium notes: use concrete ripgrep commands that exclude =*.sync-conflict-*= files, and define seed-node approval/rollback. @@ -274,7 +250,7 @@ Review written at docs/agent-knowledge-base-spec-review.org (deleted on disposit Craig ratified "keep" in chat (2026-06-10). Harness memory stays the ephemeral, auto-recalled capture layer; the KB holds promoted durable facts; Phase 3's wrap-up promotion cadence is mandatory. Spec D7 flipped to accepted; D2 stands as written. *** 2026-06-10 Wed @ 14:44:00 -0500 Project classification defined — work-root denylist, unknown refuses -Resolved in the spec-response pass: =knowledge-base.md= carries an explicit work-root denylist (initially =~/projects/work=) as the source of truth. Personal = under a known project parent (=~/code/=, =~/projects/=, =~/.emacs.d=) and not denylisted → KB writes allowed. Work or unknown → no KB write; the agent reports the refusal with a one-line redacted summary of the fact. v1 adds no new work-side store — work projects keep their existing project-tree conventions. See the "Project classification and write routing" section of [[file:docs/agent-knowledge-base-spec.org][the spec]]. Denylist completeness is the one open caveat (next VERIFY). +Resolved in the spec-response pass: =knowledge-base.md= carries an explicit work-root denylist (initially =~/projects/work=) as the source of truth. Personal = under a known project parent (=~/code/=, =~/projects/=, =~/.emacs.d=) and not denylisted → KB writes allowed. Work or unknown → no KB write; the agent reports the refusal with a one-line redacted summary of the fact. v1 adds no new work-side store — work projects keep their existing project-tree conventions. See the "Project classification and write routing" section of [[file:docs/specs/agent-knowledge-base-spec.org][the spec]]. Denylist completeness is the one open caveat (next VERIFY). *** 2026-06-10 Wed @ 14:44:00 -0500 Codex review incorporated — spec ready with caveats Spec-response pass processed the 2026-06-10 Codex review with D7 = keep as a pre-agreed input. Both blockers cleared (D7 accepted; classification/write-routing section added). Mediums accepted: canonical rg commands with conflict-file exclusion, Phase 2 seed-node approval/rollback mechanics, Makefile no-change note, Testing/Verification section. Three recommendations modified, none rejected — see the spec's Review dispositions. Review file deleted per the workflow. Rubric: ready with caveats (denylist confirmation). Implementation tasks broken out below; implementation itself awaits Craig's go. @@ -320,6 +296,9 @@ Expected: all four behave per the spec; any miss promotes to a bug task. (Agent- *** 2026-06-30 Tue @ 13:53:34 -0400 ratio roam clone + sync-timer confirmed (cross-machine half done) Verified ratio over tailscale ssh: =~/org/roam= is a clone of =git@cjennings.net:roam.git= (HEAD auto-synced 13:11 today), and =roam-sync.timer= is enabled and actively firing (last run 5 min prior, next in 10). Both unit files present. velox was already confirmed, so the one-time clone+timer setup is now done on both daily drivers — the (b) half of this VERIFY's remaining work. Only the manual-validation child (work/unknown-project refusal checks needing Craig's eyes) is left before DONE. Cleared the matching "Current open instance" line in =daily-drivers.md=. +*** 2026-07-01 Wed @ 21:52:15 -0400 Manual-validation checks 1 + 4 (velox half) verified; ratio unreachable +Check 1 (seed node in org-roam + rg inventory): 55 =:agent:= nodes found by the rg inventory AND 55 nodes under =agents/= indexed in the live org-roam DB (emacsclient query) — match. Check 4 (cross-machine edit): created a temporary probe node =agents/20260701214910-kb-sync-validation-probe.org=, triggered =roam-sync= — committed and pushed to origin within seconds (f0252bb), zero =sync-conflict= files. The ratio half could NOT be verified tonight: =tailscale ping ratio= pongs via DERP, but ssh to 100.71.182.1:22 times out (machine likely suspended). Probe left in place; when ratio is back, confirm with: ssh cjennings@100.71.182.1 'ls ~/org/roam/agents/ | grep kb-sync-validation-probe' — then delete the probe node. Checks 2 + 3 (work/unknown-project refusal) still need live sessions in those projects, and "work machine has no KB clone" needs the work machine named + checked. + ** TODO [#C] Token-rotation helper for =@a-bonus/google-docs-mcp= OAuth refresh :feature:quick: :PROPERTIES: :LAST_REVIEWED: 2026-06-28 @@ -373,10 +352,11 @@ Codex ran the spec-review workflow. Outcome: the combined spec is =Not ready= be *** 2026-06-12 Fri @ 02:39:38 -0500 Second review after response pass Codex re-ran spec-review after the dispositions were folded in. Outcome by arc: Phase 1.5 helper instances =Ready with caveats=; phases 2-5 remain =Not ready= behind the explicit decisions/reverification gate. No new blocking findings for the helper slice. Review file updated in place: [[file:docs/design/2026-05-28-generic-agent-runtime-spec-review.org]]. -** TODO [#C] Spec storage location + lifecycle-status convention :spec: +** DOING [#C] Spec storage location + lifecycle-status convention :spec: :PROPERTIES: :CREATED: [2026-06-15 Mon] :LAST_REVIEWED: 2026-06-24 +:SPEC_ID: 80b0787b-4a60-4c82-8a16-b383d3e3c8f2 :END: Two coupled documentation conventions for rulesets to adopt, surfaced by .emacs.d while triaging ~28 design docs. Both land in =spec-create= ([[file:.ai/workflows/spec-create.org]]) and likely a new =docs-lifecycle= rule under =claude-rules/=. Source proposal: [[file:docs/design/2026-06-15-spec-storage-lifecycle-proposal.org]] (.emacs.d handoff 2026-06-15). @@ -395,10 +375,68 @@ We handle the task in priority order. Mechanism decided 2026-06-28; migrates int Follow-up once built: update spec-create to emit into =docs/specs/= with the org-keyword status; write the =docs-lifecycle= rule; ship the retrofit helper + startup nudge; retrofit rulesets' own =docs/design/= first as the pilot; send a note if .emacs.d should pilot before generalizing. +*** 2026-07-01 Wed @ 22:13:00 -0400 Spec drafted — first resident of docs/specs/, awaiting review +Wrote [[file:docs/specs/2026-07-01-docs-lifecycle-spec.org][the spec]] from the five settled decisions, dogfooding its own conventions: it lives in the new =docs/specs/=, opens with the =* DRAFT Docs lifecycle= status heading (org keyword authoritative, =:ID:= for id-links, dated history in the body), and drops the status filename suffix. It pins the one mechanism the decisions left open — where the keyword lives: a prepended top-level status heading with vocabulary =DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED=, additive and retrofittable, giving both the one-line =rg= board and free org-agenda scanning. Four build phases: rule + template updates → =spec-sort= helper (classify/confirm/move/relink, bats) → rulesets pilot (41 design files, 3 spec-spine candidates, 2 stray root specs) → startup nudge gated on =:LAST_SPEC_SORT:= + .emacs.d note. Status DRAFT until Craig's review flips it READY. + +*** 2026-07-01 Wed @ 22:22:34 -0400 Codex spec-review complete — Not ready +Review findings live in [[file:docs/specs/2026-07-01-docs-lifecycle-spec.org::*Review findings][the spec]]. Four blockers before implementation: the proposed lifecycle =#+TODO:= line drops the =TODO=/=DONE= states needed by Decisions and Review findings cookies; =spec-sort= needs an exact relink/unsupported-residue contract; the sort marker/startup nudge must name the actual =.ai/notes.org= state surface and detection flow; and the stricter =docs/specs/= review precondition must not strand legacy specs before retrofit. + +*** 2026-07-01 Wed @ 22:30:06 -0400 Second review merged + responder pass — all nine findings fixed +A fresh-context Claude reviewer independently rated the draft Not ready, converging on Codex's keyword-vocabulary blocker (adding the cross-sequence uniqueness wrinkle) and contributing five unique findings — the biggest: nobody owned the DOING→IMPLEMENTED flip, the exact mechanism whose failure produced this spec. Craig approved fixing all nine. Fixed in place: two-sequence collision-free keyword header (dogfooded in the spec's own header; org now computes the [5/5] and [9/9] cookies — verified in batch), transition-ownership table incl. spec-response's mandatory flip-to-IMPLEMENTED task + task-audit safety net, single classification predicate (Decisions AND Implementation phases), the -spec.org rename step, the full relink data-safety contract (rewritten roots / report-only surfaces / dry-run default / residue-grep gate), the =.ai/notes.org= marker + Phase A probe + Phase C nudge contract, the legacy-location compatibility rule, the org-id Emacs-resolution prerequisite for .emacs.d, and the three-line transition definition. Ledger + per-finding responses in the spec's Review findings section. Status stays DRAFT pending Craig's READY flip. + +*** 2026-07-01 Wed @ 22:41:33 -0400 Codex spec-review rerun — Not ready +Fresh review after the response/READY flip added five new blocking findings in [[file:docs/specs/2026-07-01-docs-lifecycle-spec.org::*Review findings][the spec]] and demoted the spec back to DRAFT. Remaining blockers: shared helper/workflow edits must name the canonical =claude-templates/.ai/= + mirror sync contract; task-audit needs an explicit spec-to-task binding before it can police =DOING= specs; =spec-sort --apply= needs a failure-safe/rollback contract; the org-id Emacs prerequisite must be executable before link conversion; and lifecycle status confirmation must be evidence-based so the retrofit does not encode stale reality. + +*** 2026-07-01 Wed @ 22:46:52 -0400 Second responder pass — all fourteen findings closed +Fixed Codex's five re-review findings: the canonical-placement contract now opens the retrofit section (helper + tests + workflow edits land in claude-templates first, sync-check --fix propagates, sync-check-clean is an acceptance criterion); spec-response stamps a =:SPEC_ID:= property on the build parent, and task-audit's query checks that parent's keyword — which dissolves the flip-task chicken-and-egg; =--apply= got the fail-safe contract (clean-tree preflight, validate-then-write from a recorded plan, named recovery recipe); id-link conversion is staged (pilot rewrites =file:= links only; =id:= conversion is a follow-up gated on the concrete .emacs.d id-index mechanism — Craig picked this fork); and status confirmation is evidence-based (evidence panel, conservative non-terminal default, terminal states need a stated reason). Also de-cookified bracket tokens in prose that org's cookie updater would mangle. Status stays DRAFT; the READY flip belongs to the reviewers this round — verify pass dispatched. + +*** 2026-07-01 Wed @ 23:22:50 -0400 Codex spec-review rerun — Ready +Codex re-read the revised [[file:docs/specs/2026-07-01-docs-lifecycle-spec.org][docs lifecycle spec]] after the second responder pass. All fourteen findings are closed, decisions remain [5/5], and the remaining implementation contracts are concrete enough to build and test. Status flipped to READY in the spec; implementation can proceed. + +*** 2026-07-01 Wed @ 23:34:15 -0400 Decomposed into build tasks; spec flipped READY → DOING +spec-response Phase 6 run: this parent now carries the =:SPEC_ID:= binding (the spec's status-heading UUID), the phase tasks below track the build, and the spec's status heading is DOING. Completeness pass done: all ten acceptance criteria have homes across the phase tasks; vNext (org-agenda view) was already filed as the [#D] task below. + +*** 2026-07-01 Wed @ 23:39:10 -0400 Phase 1 landed — docs-lifecycle rule + four spec-workflow updates +claude-rules/docs-lifecycle.md written and linked machine-wide (make install). Canonical-side updates: spec-create Phase 5 + template (docs/specs/ location, two-sequence keyword header, DRAFT status heading with :ID:, transition mechanics), spec-review (location expectation with the legacy compatibility rule keyed on :LAST_SPEC_SORT:, plus the DRAFT→READY flip — and the demote-back-to-DRAFT path a failed re-review takes), spec-response Phase 6 (owns READY→DOING, stamps :SPEC_ID: on the build parent, always emits the flip-to-IMPLEMENTED task), task-audit Phase B (the :SPEC_ID: reconcile query, checking the parent's keyword rather than counting children). Mirror synced; make test green end to end. + +*** 2026-07-01 Wed @ 23:57:44 -0400 Phase 2 landed — spec-sort helper + 30-test bats suite +Built claude-templates/.ai/scripts/spec-sort (Python, TDD — the 30-test bats suite written red-first in claude-templates/.ai/scripts/tests/spec-sort.bats) covering the full retrofit contract: spine classification with the -spec.org-name-without-spine anomaly case, evidence panel (Status field, cookies, linking todo.org task, dated history, artifact existence) with conservative non-terminal proposals, per-candidate --confirm/--skip gate with --reason required on terminal keywords, clean-worktree preflight (--allow-dirty prints what recovery loses), validate-then-write from a recorded plan file, relink across the rewritten roots (inbound AND the moved doc's own outbound relative links) with report-only for sessions + synced templates (naming the canonical claude-templates file), bare-path mentions blocking until --acknowledge-bare, named recovery on injected mid-apply failure, post-apply residue gate, idempotent :LAST_SPEC_SORT: stamp. Real-data dry run against rulesets' pile matched predictions: 5 candidates, 4 anomalies, 30 notes, 0 bare, 10 report-only (incl. the startup.org synced-template case Codex flagged). make test green; sync-check clean. + +*** 2026-07-02 Thu @ 00:18:28 -0400 Phase 3 pilot ran — rulesets' pile sorted, board live +Craig confirmed all five proposed keywords as-is plus the IMPLEMENTED reason; spec-sort --apply moved the five specs to docs/specs/ (agent-knowledge-base IMPLEMENTED, inbox-workflow-consolidation READY, autonomous-batch-execution READY, encourage-kb-contribution READY, wrapup-routing DOING — joining the docs-lifecycle spec's DOING on the board), rewrote 12 todo.org links plus the moved specs' own outbound links, and stamped :LAST_SPEC_SORT: 2026-07-02. Acceptance verified: status board matches reality, all re-homed specs carry -spec.org, residue zero in the rewritten roots (one acknowledged self bare mention rode along inside inbox-workflow-consolidation-spec), no id: links emitted, make test green. Surfaced and left in place: the four -spec.org-named files in docs/design without a spec spine (generic-agent-runtime, pattern-catalog, daily-prep-template, auto-triage-intake) — notes by predicate, misleading names; rename or leave is a Craig call. Report-only references: 9 frozen session archives + the synced startup.org (canonical edit lands with Phase 4's nudge work). + +*** 2026-07-02 Thu @ 00:23:32 -0400 Phase 4 landed — startup nudge live, .emacs.d notified +Added the spec-sort probe to startup.org Phase A (item 12) and the one-line nudge to Phase C's findings list, canonical-side, mirror synced. One refinement over the spec's sketch: the stray-root check uses find instead of compgen, because compgen is bash-only and zsh aborts on an unmatched glob — the original snippet false-negatived on stray root specs under zsh (spec snippet updated with a note). Fixture-verified in both shells: fires on an unsorted docs/design and on a stray docs/*-spec.org, silent with the marker stamped, silent with no docs at all. Also fixed startup.org's own stale reference to the moved encourage-kb-contribution spec (the pilot's report-only finding). Sent .emacs.d the convention-live note with its ~28-doc pile nudge and the id-index ask (org-id-extra-files enumeration or periodic org-id-update-id-locations, verify by clicking the docs-lifecycle spec's :ID:), asking it to tag the owning task :blocker: since rulesets' id-conversion task waits on it. + +*** TODO id-link conversion pass :solo: +Run the conversion pass: rewrite spec-target file: links in the rewritten roots to id: form, per project. Not part of any sort run. + +Gate CLEARED 2026-07-02: .emacs.d delivered the id-index (handoff 0056) — modules/org-spec-links.el enumerates every project's docs/specs/*.org into org-id-extra-files at org-id load, with cj/org-id-refresh-spec-locations for immediate re-scan. Verified live on their side: (org-id-find "80b0787b-4a60-4c82-8a16-b383d3e3c8f2") resolves to the docs-lifecycle spec. If a fresh id doesn't resolve on click, the fix is M-x cj/org-id-refresh-spec-locations on the .emacs.d side (also run it after each spec-sort pass or new project). + +*** TODO Flip the spec to IMPLEMENTED +When the final implementation phase completes: flip the spec's status heading DOING → IMPLEMENTED with a dated history line and the Metadata mirror, per the transition-ownership table. This task is the tracked obligation that closes the loop; the parent stays open until it runs. + +*** TODO Manual testing and validation :test: +What we're verifying: the human-eyes half of the acceptance surface. + +**** Startup nudge appears and clears +What we're verifying: the Phase A probe + Phase C nudge fire exactly once per project. +- Open a session in a project with an unsorted docs/design (before its sort) and read the startup output. +Expected: one line offering "run spec-sort"; after the pilot stamps :LAST_SPEC_SORT:, the next session shows nothing. + +**** Moved-spec links click through in Emacs +What we're verifying: the pilot's relink pass left todo.org and docs links working for the human reader, not just the residue grep. +- After the Phase 3 pilot, open todo.org in Emacs and click three links that point at moved specs (including one from a dated log entry). +Expected: each opens the spec at its new docs/specs/ path. + +*** TODO [#D] Docs lifecycle vNext — org-agenda spec-status view :feature: +Once specs carry lifecycle TODO keywords under =docs/specs/=, add a custom org-agenda view that lists =DRAFT= / =READY= / =DOING= / terminal specs by status. Deferred from [[file:docs/specs/2026-07-01-docs-lifecycle-spec.org][the docs-lifecycle spec]]; not part of v1 because the grep board is sufficient until the status headings exist. + ** DOING [#C] No-approvals speedrun — cross-project autonomous-batch mode :feature:spec: :PROPERTIES: :CREATED: [2026-06-15 Mon] :LAST_REVIEWED: 2026-06-24 +:SPEC_ID: 90f623cd-fdbe-4f5c-b63d-b2f84d9151cf :END: A named mode for coding projects: Craig names an ordered task set and says "speedrun" / "no approvals speedrun"; the set is worked autonomously, each task held to the full quality bar (TDD red→green, =/review-code=, =/voice= on the commit) and committed + pushed as its own logical commit, with all needed quick decisions gathered in one pre-flight Q&A (answer or "skip this") and a VERIFY filed for anything underspecified or needing deliberation, plus an end-of-set page listing completed + remaining + skipped tasks. Task size is not a gate — large tasks decompose into per-commit chunks. Surfaced by .emacs.d from a 2026-06-15 theme-studio session where the shape worked. Source proposal: [[file:docs/design/2026-06-15-fix-speedrun-workflow-proposal.org]] (.emacs.d handoff 2026-06-15). Build via =spec-create= when worked; we handle the task in priority order. @@ -408,14 +446,50 @@ Skeptical-review read (open design questions to resolve in the spec, not settled - *Auto-pull vs explicit list* — whether the set comes from an explicit ordered list or a tag/priority query. - *Guardrails* — must refuse to speedrun tasks needing design decisions or carrying data-loss risk without a checkpoint (the sender's biased-safe unused-tile flag is the worked example). +*** 2026-07-01 Wed @ 22:10:35 -0400 Phase 0 landed — hard tag definitions + review/audit enforcement +todo-format.md gained the "Hard definitions: :solo: and :quick:" subsection under the scheme header (fixed across projects: :solo: = buildable + agent-verifiable + no deliberation, with one-or-two upfront-answerable quick decisions allowed per the ratified spec; :quick: = ≤30-min effort hint, never a gate). task-review.org: the two tagging sections are now explicitly mandatory ("a review that skips them is incomplete") and gate 3 was realigned from "no upfront decision" to the spec's no-deliberation form — the stricter old wording predated the pre-flight-Q&A decision and would have wrongly excluded quick-question tasks. task-audit.org: the re-assess bullet is marked mandatory and points at the todo-format hard definitions as canonical. Phases 1-6 (work-the-backlog extraction, callers, commit gate, checklist/Q&A/page, metrics, synthesis) remain. + *** 2026-06-16 Tue @ 00:53:36 -0500 Spec written; design questions answered -Craig's "your call" (2026-06-16) answered in [[file:docs/design/2026-06-16-autonomous-batch-execution-spec.org][the autonomous-batch execution spec]], which reconciles this with Phase E into one feature: +Craig's "your call" (2026-06-16) answered in [[file:docs/specs/2026-06-16-autonomous-batch-execution-spec.org][the autonomous-batch execution spec]], which reconciles this with Phase E into one feature: - *Most effective / workflow-vs-preset:* one dedicated =work-the-backlog.org= workflow holds the execution loop; "fix speedrun" is a thin named preset (no-approvals + always-push + end page) feeding it an explicit list, and the inbox-zero loop feeds it a tag query. Pros of the shared workflow: one execution loop to audit, inbox-zero's three callers stay clean, both input shapes reuse one guardrail set. Cons: one more workflow file and a caller-to-workflow indirection. The con list is shorter and lighter than the duplication cost of two separate features, which is why the shared workflow wins. The pros carry the more important entries (single audit surface, clean seam). - *Paging:* end-of-set only, via =notify ... --persist= (reconciled past the removed page-signal wrapper). - *Auto-pull vs explicit list:* both — explicit list for the preset, tag/priority query for the loop. - *Effectiveness measurement (the trial Craig asked for):* the spec designs a per-task JSONL metrics log (=.ai/metrics/work-the-backlog.jsonl=), a corrections-in-next-session signal, and a periodic synthesis step that writes =:agent:metrics:= org-roam articles for later review — the "gather data + create org-roam articles" loop. *** 2026-06-29 Mon @ 03:48:09 -0400 Ratified the autonomous-batch execution spec -Craig ratified all eight decisions in [[file:docs/design/2026-06-16-autonomous-batch-execution-spec.org]] (revised this session — size gate removed, crisp four-item defer checklist, =:solo:= / =:quick:= definitions + task-review/audit enforcement, speedrun pre-flight Q&A). Spec Status → ready; implementation-ready across Phase 0–6. Decisions grew from six to eight during the revision. +Craig ratified all eight decisions in [[file:docs/specs/2026-06-16-autonomous-batch-execution-spec.org]] (revised this session — size gate removed, crisp four-item defer checklist, =:solo:= / =:quick:= definitions + task-review/audit enforcement, speedrun pre-flight Q&A). Spec Status → ready; implementation-ready across Phase 0–6. Decisions grew from six to eight during the revision. + +*** 2026-07-02 Thu @ 00:44:59 -0400 spec-response decomposition — :SPEC_ID: bound, spec DOING +Stamped the spec's UUID on this parent, broke Phases 1-6 into the build tasks below (plus the flip task and a live-trial validation child), and flipped the spec's status heading READY → DOING per the transition-ownership table. + +*** 2026-07-02 Thu @ 01:07:29 -0400 Phase 1 landed — execution loop extracted into work-the-backlog.org +work-the-backlog.org written (canonical + mirror): caller contract (task set + session mode + cap), five-outcome vocabulary, the loop, mechanical eligibility gate (TODO + :solo: per scheme header, safe-by-omission, no-scheme-header → don't run), four-item defer checklist, per-task quality bar, cap/kill-switch semantics, page + metrics stubs pointing at Phases 4-5. inbox.org's auto-mode per-cycle item 3 reverted to routing-only (yes-path execution removed; mode intro + closing line updated to match). INDEX.org entry added. make test green, sync clean; nothing invokes the new workflow yet. + +*** 2026-07-02 Thu @ 01:13:33 -0400 Phase 2 landed — both callers wired +inbox.org auto-mode item 3 regained its "run this batch next?" ask, now chaining into work-the-backlog as an explicit second step after routing (eligibility query + file-only + paging off + cap 1). work-the-backlog.org gained the two caller sections: the auto-loop contract and the no-approvals speedrun preset (seven-step pre-flight → autonomous-commit + always-push + paging-on over an explicit list; finer Q&A mechanics deferred to Phase 4). Speedrun trigger phrases live in the workflow + INDEX; "speedrun" always routes to the preset, with a disambiguation note in no-approvals.org and its INDEX entry. Each caller independently exercisable. + +*** 2026-07-02 Thu @ 01:18:07 -0400 Phase 3 landed — waiver-gated commit autonomy +Pinned the waiver format per D5: two marker lines in .ai/notes.org Workflow State — :COMMIT_AUTONOMY: yes (has the waiver) and :LOOP_MAY_COMMIT: yes (the unattended loop may also commit; requires the first). Absent or non-yes reads as no; the read is a fresh grep each run, never memory. Degrade contract written into work-the-backlog.org (surface in run intro + summary, never honor without the marker, never degrade silently); caller sections + Common Mistakes updated. Stamped rulesets' own :COMMIT_AUTONOMY: yes; :LOOP_MAY_COMMIT: deliberately not granted — Craig's call. .emacs.d holds the waiver too but its notes.org is its own scope; told via inbox-send to stamp its marker. + +*** 2026-07-02 Thu @ 01:21:47 -0400 Phase 4 landed — checklist mechanics, pre-flight Q&A contract, page +The four-item checklist (in since Phase 1) gained its mechanics: a VERIFY-filing subsection (dedup against an existing sibling first — the deferred task stays TODO, so without the check every run re-files; placement/heading/body per todo-format.md) and a quick-question routing subsection (discriminator: one-line factual/preference pick vs tradeoff-weighing; three-plus questions = underspecified = file; item 2 data-loss never routes to Q&A). Preset section gained the batch-ask contract (one message, recommendation-first numbered options per interaction.md, answers recorded as dated lines in the task bodies before the run). Page section finalized (fires once on set-done or cap-hit; notify --persist is the paging surface). Common Mistakes 12-13 added. Checklist only ever reduces what runs; pre-flight fires only under the preset. + +*** 2026-07-02 Thu @ 01:24:50 -0400 Phase 5 landed — per-task JSONL metrics log +Metrics section written into work-the-backlog.org: one record per task at outcome time, appended to the project's .ai/metrics/work-the-backlog.jsonl (git-tracked, append-only, dir+file created on first append). Full field table per the spec (ts, run_id, project, caller, task, outcome, defer_reason, upfront_decision, wall_clock_s, commit_sha, review_findings), outcome slugs mapped to the prose vocabulary, commit_sha flagged as the corrections-signal key (comma-separated when a task decomposed into several commits). Added the sixth outcome the spec's readiness section demanded but the enum missed: failed (tree left working, surfaced, run continues) — wired into the Outcomes vocabulary and loop step 4. A failed append warns in the run summary but never blocks, reorders, or aborts execution. + +*** 2026-07-02 Thu @ 01:27:43 -0400 Phase 6 landed — synthesis step to org-roam +Synthesis section written into work-the-backlog.org (trigger "synthesize backlog metrics", INDEX row added): discover the JSONL union across project roots, classify each project per knowledge-base.md's denylist before reading, exclude work/unknown projects with the refusal contract, compute per-run rollups + trends, compute the corrections signal (later revert/fix commit touching the same files within ~14 days — a flag for human review, not a conviction), write one :agent:metrics: KB node under ~/org/roam/agents/ with [[id:...]] links to prior synthesis nodes, pull-before/commit-push-after. Read-only over the logs plus the single KB write; never mutates JSONL, todo.org, or any tree. + +*** TODO [#C] Speedrun — live trial validation :test: +What we're verifying: the whole loop under a real run. Craig names a small ordered set in a coding project and says "no approvals speedrun": pre-flight Q&A fires once up front, each task lands as its own reviewed commit, ineligible/underspecified tasks get VERIFYs instead of half-work, the end-of-set page arrives via notify --persist, and the metrics JSONL carries one record per task. Not :solo: — needs Craig's set and his read on the run. + +*** TODO [#C] Flip the autonomous-batch spec to IMPLEMENTED +When the final phase completes and the live trial validates: flip docs/specs/2026-06-16-autonomous-batch-execution-spec.org DOING → IMPLEMENTED with a dated history line and the Metadata mirror, per the transition-ownership table. + +** TODO [#C] Template sync with gitignored-only local changes :feature: +From Craig via the roam inbox (2026-07-02, routed by archsetup): downstream projects should still pull template updates when their local changes sit entirely in gitignored files or directories — an inbox drop or a file left to read doesn't affect the templates, yet it currently holds the sync back and projects fall behind. When worked: verify how the sync gate actually detects dirtiness today, then let gitignored-only changes pass it. + +** TODO [#C] Wrap-it-up summary mode — keep or cut :feature: +From Craig via the roam inbox (2026-07-02, routed by archsetup). Teardown-by-default already shipped (bare "wrap it up" closes the window; "with summary" keeps it). Craig's follow-on: "maybe we cut the summary altogether. help me think through when I'd want a summary and how I would recognize it before confirming and then having it close." Run that think-through with him (brainstorm-shaped, not solo), then adjust wrap-it-up.org's Step 6 + trigger phrases to the outcome. ** TODO [#C] ntfy phone channel as general two-way agent-comms :feature:spec: :PROPERTIES: @@ -435,7 +509,7 @@ The work project (2026-06-18) added a "Push each sweep to Craig's phone (ntfy) :CREATED: [2026-06-23 Tue] :LAST_REVIEWED: 2026-06-28 :END: -vNext from the inbox-consolidation spec. =auto inbox zero= (v1) is the interactive =/loop= recurring check that waits for Craig's yes before executing. A fully-unattended =/schedule= cron pass that fires while Craig is away needs its own contract before it can ship: read-only vs may-mutate =todo.org= / =~/org/roam/inbox.org=, how a find surfaces asynchronously when Craig isn't at the session, how dedup state persists across runs that don't share a session, and what session/auth context a cron run carries. From the inbox-consolidation spec-review (Codex finding 1). See [[file:docs/inbox-workflow-consolidation-spec.org][spec]]. +vNext from the inbox-consolidation spec. =auto inbox zero= (v1) is the interactive =/loop= recurring check that waits for Craig's yes before executing. A fully-unattended =/schedule= cron pass that fires while Craig is away needs its own contract before it can ship: read-only vs may-mutate =todo.org= / =~/org/roam/inbox.org=, how a find surfaces asynchronously when Craig isn't at the session, how dedup state persists across runs that don't share a session, and what session/auth context a cron run carries. From the inbox-consolidation spec-review (Codex finding 1). See [[file:docs/specs/inbox-workflow-consolidation-spec.org][spec]]. Update 2026-06-28: the "design after v1 consolidation lands" precondition is cleared — the inbox engine consolidation (24ca58d) and the monitor-inbox 15-min loop (edb545d) both shipped. Now actionable backlog rather than blocked; design the unattended contract when prioritized. @@ -1286,1676 +1360,6 @@ speculatively — defense-specific notations are narrow enough that each skill should be driven by a concrete contract need, not aspiration. * Rulesets Resolved -** DONE [#C] Fix =cj-scan= false positives on cj fences nested inside other =#+begin_*= blocks :bug: -CLOSED: [2026-05-15 Fri] - -=cj-scan.py= was matching =#+begin_src cj:= / =#+end_src= line-by-line -without awareness of enclosing block scopes. A cj fence embedded inside a -=#+begin_example= block (typically when documenting what the =<cj= yasnippet -emits) or inside =#+begin_src snippet= (the yasnippet definition itself) was -misclassified as a live cj annotation. Surfaced from a /respond-to-cj-comments -run against the dotemacs =todo.org= that reported two false positives in the -=<cj= yasnippet documentation. - -Fix: track an active =wrapper_type= state. When the scanner sees =#+begin_<type>= -(for any =<type>= other than =cj:= via the more-specific cj-open regex, which -is checked first), it enters a wrapper state where every line is treated as -content until the matching =#+end_<type>= closer fires. Inside a wrapper, cj -fence patterns and legacy inline =cj:= lines are both suppressed. - -Tests: added =TestCjScanNestedFencesIgnored= (6 tests) to -=claude-templates/.ai/scripts/tests/test_cj_scan.py= covering nesting inside -=#+begin_example=, =#+begin_src <other-lang>=, and =#+begin_quote=, plus -regression guards that a wrapper closes cleanly (a subsequent real cj fence -is still detected) and that an unclosed wrapper doesn't silently swallow -later content into false-positive cj blocks. - -Full =make test-scripts= equivalent (=python3 -m pytest=): 302 passed, 1 -skipped, 0 failures. - -** DONE [#A] Add =make doctor= — verify ~/.claude/ matches repo + settings.json :feature: - -A drift detector that scans =~/.claude/= and reports anything inconsistent with what the repo expects. Single-command answer to "is my machine consistent with rulesets?" - -*** Why this matters - -A 2026-05-06 sweep found =~/.claude/hooks/= didn't exist on this machine even though =settings.json= referenced =~/.claude/hooks/precompact-priorities.sh= as a PreCompact hook. Compaction would have silently failed to invoke the hook. The fix was =make install-hooks=, but the breakage was invisible until I happened to grep for it. =make doctor= run regularly (or even as part of session start) would catch this kind of drift in seconds instead of after the fact. - -*** Checks - -- Every entry in =settings.json= ="hooks"= block points at a file that exists. -- Every entry in =enabledPlugins= has a matching install under =~/.claude/plugins/data/=. -- Every skill in =$(SKILLS)= has a working symlink at =~/.claude/skills/<name>=. -- Every rule in =$(RULES)= has a working symlink at =~/.claude/rules/<name>=. -- Every default hook has a symlink at =~/.claude/hooks/<name>= (warn-only — opt-out is legitimate). -- =settings.json= and =.mcp.json= symlinks resolve to the rulesets versions. -- =mcp/install.py= state matches =claude mcp list= (every server in =servers.json= is registered). -- No dangling symlinks anywhere under =~/.claude/=. - -*** Output - -One line per check: =ok= / =WARN= / =FAIL=. Final summary: =N ok, M warnings, K failures=. Exit non-zero on any failure so it can ride a pre-flight check. - -** DONE [#A] Build =voice= skill — combine =humanizer= with universal + personal style passes :feature: - -Combine =humanizer= with universal good-writing passes (Strunk & White, Orwell, Plain English) and the personal-style passes from =commits.md=. Two modes — =general= for arbitrary writing, =personal= for commits/PRs/comments — share a foundation and diverge on register. - -Built and shipped 2026-05-07: =voice/SKILL.md= with 39 numbered patterns walked sequentially. Patterns 1-25 carried over from humanizer, 26-31 are universal good-writing additions, 32-39 are personal-only. Migrated three callers (=commits.md=, =respond-to-cj-comments.md=, =start-work.md=). Removed the standalone =humanizer= skill since voice supersedes it. - -*** Why this matters - -Three transformations want to run together for personal-mode artifacts (commits, PR titles + bodies, PR comments) but lived in three places: =humanizer= as a skill, S&W-style universal rules nowhere (applied ad-hoc), and the personal-style passes as prose steps in =commits.md= that got re-applied by hand each time. Costs: (1) the "I forgot pass (e)" failure mode — skipping a pass without flagging is a defect but happens in practice. (2) No single-call invocation of the full transform. (3) General-mode writing (research notes, philosophy, history) got only humanizer with no universal-prose pass at all. Combining brings them under one skill with one invocation. - -*** Design - -Two modes: - -- *general* (default) — for arbitrary writing not bound for commit/PR/comment publishing (research notes, philosophy/history essays, emails, README prose). Runs: - - humanizer (current behavior — strip AI-generated-writing fingerprints) - - tier-1 universal passes (canonical good-writing rules) - - the 2 personal-style passes that have no register conflict (jargon-fragment rewrite, noun-ified verbs) - -- *personal* — for commits, PR titles + bodies, PR comments. Runs general PLUS: - - 8 personal-only passes (first-person rewrite, semicolons, contractions, sentence-split, felt-experience, sentence fragments, terse cut, public-artifact scope check) - -The 8 personal-only passes are explicitly *not* in general mode. They conflict with academic / literary / philosophical register. Forcing first-person on a Foucault essay or stripping felt-experience from a journal entry would damage the writing. - -*** Tier 1 universals (v1) - -From Strunk & White, Orwell's "Politics and the English Language", Plain English Campaign, and Garner's Modern English Usage. Each is a detection-pattern + rewrite-rule pair, mechanical enough to apply consistently across runs. - -- *Omit needless words* — curated phrase list (=the fact that= → =that=/=because=, =in order to= → =to=, =at this point in time= → =now=, =due to the fact that= → =because=, =for the purpose of= → =to=, =in spite of= → =despite=, etc.) -- *Long word → short word* — Plain English wordlist (~150 entries: =utilize=→=use=, =commence=→=start=, =terminate=→=end=, =facilitate=→=help=, =demonstrate=→=show=, =sufficient=→=enough=, =prior to=→=before=, =subsequent to=→=after=, =in the event that=→=if=, =a great deal of=→=much=) -- *Active over passive voice* — detect "to be + past-participle" patterns. Suggestion-only in v1 (auto-rewrite is risky in technical contexts where passive is appropriate); graduate to auto-rewrite for unambiguous cases in v2. -- *Comma splices* — detect independent clauses joined only by comma; rewrite to period or semicolon-then-period. -- *Cliché flag* — small curated list (=at the end of the day=, =moving forward=, =going forward=, =at this juncture=, =circle back=, =low-hanging fruit=, =deep dive=, =leverage= as verb). - -*** Tier 2 universals (v2) - -- *Positive over negative form* (S&W) — =not unlike= → =like=, =do not fail to= → =remember to=, =did not pay any attention= → =ignored= -- *Garner-style word-pair corrections* — comprise/compose, less/fewer, that/which (restrictive vs nonrestrictive), affect/effect, principal/principle -- *Parallelism in lists* — detect mismatched grammar in bullet items -- *Tense consistency* — flag mid-paragraph tense shifts -- *Acronym definition on first use* — detect uppercase tokens used before being expanded - -*** Tier 3 (v3, may not land) - -- *Concrete-over-abstract* preference -- *Emphatic word at sentence end* (S&W rule 18) -- *Vary sentence length / rhythm* -- *Reading-grade-level scoring* (Hemingway-style) - -*** Personal-style pass placement - -| # | Pass | Mode | Why | -|----+-------------------------------------+-------------------------------------+-------------------------------------| -| 1 | First-person voice rewrite | personal only | Forces "I" voice; wrong for | -| | | | academic prose where third-person | -| | | | and "we" are conventional | -|----+-------------------------------------+-------------------------------------+-------------------------------------| -| 2 | Jargon-fragment → complete sentence | both | Universal clarity, no genre | -| | | | conflict | -|----+-------------------------------------+-------------------------------------+-------------------------------------| -| 3 | Semicolon → period/comma | personal only | Semicolons are conventional in | -| | | | long-form / academic prose | -|----+-------------------------------------+-------------------------------------+-------------------------------------| -| 4 | Contractions ("it's", "don't") | personal only | Academic and formal writing | -| | | | typically avoids contractions | -|----+-------------------------------------+-------------------------------------+-------------------------------------| -| 5 | Sentence split on conjunctions | personal only | Foucault, Hegel, Adorno | -| | | | deliberately use long compound | -| | | | sentences | -|----+-------------------------------------+-------------------------------------+-------------------------------------| -| 6 | Felt-experience narration ("I'll | personal only | Personal essays *use* | -| | feel this every time") | | felt-experience as content | -|----+-------------------------------------+-------------------------------------+-------------------------------------| -| 7 | Noun-ified verbs ("the ask", "a | both | Targets corporate-speak with | -| | learn", "the spend") | | curated wordlist; doesn't catch | -| | | | philosophical nominalizations like | -| | | | "the becoming" | -|----+-------------------------------------+-------------------------------------+-------------------------------------| -| 8 | Sentence fragments → complete (in | personal only | Fragments are valid stylistic | -| | prose) | | devices in literary prose | -|----+-------------------------------------+-------------------------------------+-------------------------------------| -| 9 | Terse cut (rhetorical padding: | personal only | Tier 1 omit-needless-words covers | -| | "worth noting", "it's important to | | the worst offenders universally; | -| | understand") | | aggressive cut conflicts with | -| | | | academic register | -|----+-------------------------------------+-------------------------------------+-------------------------------------| -| 10 | Public-artifact scope check (local | personal only — *flag-only*, no | Operational/safety check, not | -| | paths, private repos, personal | auto-rewrite | stylistic; auto-masking risks | -| | tooling) | | silently editing meaningful text | -|----+-------------------------------------+-------------------------------------+-------------------------------------| - -*** Inclusive-language pass — explicitly excluded - -Considered and rejected. Conflicts with planned writing on philosophy/history topics (Foucault on sexuality and gender, history of slavery in New Orleans). Wordlist substitutions would override deliberate vocabulary choices in those genres. - -*** V1 scope - -- [ ] Skill at =~/code/rulesets/voice/= with =SKILL.md= -- [ ] Frontmatter with positive triggers (commit, PR, comment, "humanize", "voice pass") and negative triggers (code, structured data, plain bullet lists) -- [X] Mode invocation: default = =general= when invoked bare; =personal= invoked explicitly by publish-context callers -- [X] humanizer content migrated from =humanizer/= → =voice/= -- [X] Tier 1 universal passes implemented (5 patterns: #26-30, plus #31 noun-ified verbs as a universal personal addition) -- [X] 2 personal passes that run in both modes (#30 jargon-fragment, #31 noun-ified verbs) -- [X] 8 personal passes that run in personal mode only (#32 first-person, #33 semicolons, #34 contractions, #35 sentence-split, #36 felt-experience, #37 fragments, #38 terse cut, #39 scope check) -- [X] Each pass = detection-pattern + rewrite-rule pair (#39 is detection + flag-only) -- [X] Total v1 pattern count: 31 in general mode (humanizer's 25 + 4 tier-1 + 2 universal personal); +8 personal-only = 39 in personal mode -- [X] Update =commits.md= to invoke =/voice personal= instead of "run =humanizer= and apply five passes manually" -- [X] Remove the existing =humanizer/= skill (no callers outside this repo, all migrated) -- [X] =make doctor= still passes -- [X] =make lint= clean - -*** v2 (deferred) - -- [ ] Tier 2 universals (positive form, word-pair corrections, parallelism, tense consistency, acronym definition) -- [ ] Per-pass severity flags for Tier 1 active-voice (suggestion-only when actor is implicit; auto-rewrite when actor is named) -- [ ] Reporting mode: list which passes fired and which were no-ops - -*** v3 (aspirational, may not land) - -- [ ] Tier 3 (concrete-over-abstract, emphatic-word position, sentence-length variation, reading-grade scoring) -- [ ] Progressive disclosure split: =voice/SKILL.md= orchestrator + =voice/passes/<pass-name>.md= per pass with worked examples - -*** Migration (resolved) - -Decision: deleted =humanizer/= entirely. Three callers (=commits.md=, =respond-to-cj-comments.md=, =start-work.md=) all updated to invoke =/voice= directly. No alias needed since nothing outside the repo invoked humanizer. - -*** Naming alternatives considered - -- =voice= — chosen. Captures both modes; broad enough. -- =polish= — descriptive of multi-pass nature; less prescriptive about whose voice. -- =house-style= — signals "this is the house style"; appropriate for personal repo. -- =commit-voice= — too narrow (passes apply to research notes, emails, etc. in general mode). -- =humanize= (extending current) — undersells the universal + personal additions. - -*** Open questions before implementation - -Resolved during implementation: -- Default mode when =/voice= is invoked bare: =general=. Personal-context callers (=commits.md= publish flow, =respond-to-cj-comments.md=) invoke =/voice personal= explicitly. Avoids accidentally first-person-ifying research notes. -- Reporting: skill prints "Summary of changes" listing which patterns fired (audit value). -- Public-artifact scope check (#39): flag-only, user resolves manually. Blocking would frustrate on legitimate path mentions. -- Tier 1 active-voice detection: suggestion-only in v1. Auto-rewrite for unambiguous cases deferred to v2. - -** DONE [#B] Add =--archive-done= mode to =.ai/scripts/todo-cleanup.el= :feature: - -Opt-in mode that moves every level-2 subtree whose TODO state is DONE or CANCELLED out of the "Open Work" section and into the "Resolved" section of the same org file, subtree intact. - -- *Section matching.* Key on a top-level heading containing "Open Work" and one containing "Resolved" — that pairing is the only naming consistent across projects (=Work Open Work= / =Work Resolved= here; bare =Open Work= / =Resolved= elsewhere). Require exactly one match for each; otherwise skip with a clear message, no crash. -- *Modes.* =--check= previews and writes nothing, same as the existing hygiene pass. Idempotent. Not run by default in the wrap-up flow — archiving is consequential, so it stays opt-in: =emacs --batch -q -l todo-cleanup.el --archive-done FILE=. -- *Edge cases.* Source or target section missing; subtree at EOF; nested DONE subtree under an open parent stays put (only level-2 entries move); nothing to move → clean no-op. -- *Tests.* TDD with ERT — the project's first elisp tests. Fixtures (synthetic) under =.ai/scripts/tests/=; run via =make test= (rulesets) or =make test-scripts= (claude-templates), which run pytest + every =tests/test-*.el= ERT suite. Cases: one DONE level-2 moves; multiple; CANCELLED also moves; structural (no-state) headings don't move; nested DONE under an open parent stays; level-2 DONE with open level-3 children moves intact; subtree at EOF; missing source/target section; ambiguous "Resolved"; lowercase headings; nothing-to-do; idempotency; =--check= preview + its idempotency; realistic-sample integration. - -Origin: came up while scrubbing a project's todo.org on 2026-05-11 — moving a big completed PROJECT subtree (plus a few smaller ones) into the Resolved section by hand was the cue to build a reusable tool. - -Built and shipped 2026-05-11: =--archive-done= added to =.ai/scripts/todo-cleanup.el= test-first; 13-test ERT suite (=tests/test-todo-cleanup.el=) + realistic synthetic fixture (=tests/fixtures/todo-sample.org=), wired into =make test= / =make test-scripts= alongside pytest. The CLI dispatch moved into =tc-main= behind a guard so the suite can =require= the file without firing it. Section matching is case-insensitive and tolerates the =<Project> Open Work= / =<Project> Resolved= naming variants. Opt-in only — not wired into the wrap-up flow. Source of truth is =~/projects/claude-templates/=; rsync'd into this repo. - -** DONE [#B] Encode follow-up filing rules into =/start-work= -CLOSED: [2026-05-15 Fri] - -Phase 4 step 5 of =/start-work= ("refactor audit") says any candidate that isn't fix-now must land in one of three buckets: fold-into-related-commit, separate =refactor:= commit, or "file a ticket or todo.org entry." The third disposition doesn't say *where* — which leaves the orchestrator picking a location ad-hoc. Result: follow-ups buried under children of an epic parent get orphaned when the parent closes, or follow-ups for standalone tasks scatter across the file with no convention. - -Proposed placement rule (already memorized for this project as =feedback_followups_as_siblings.md=, generalizing): - -- *Epic-style parent task* (level-2 with multiple level-3 children) → follow-ups file as level-2 *siblings* of the parent. Stays visible after parent closure. -- *Standalone task* (level-2 with no children, or a level-3 inside another structure) → follow-up files as a new level-2 top-level entry in the same =* Open Work= section. Don't nest under the originating task. - -Both cases: include a "Triggered by: <date> <task or commit>" line so a future reader sees what surfaced it. - -Update =.claude/commands/start-work.md= Phase 4 step 5's "Disposition for each candidate" section to spell this out. Update any cross-references in =commits.md= or other files that touch the discipline. - -Triggered by: 2026-05-15 fold-epic session — Craig flagged the gap mid-flight after I'd surfaced a follow-up but hadn't filed it. -** DONE [#A] Consolidate =.ai/= template infrastructure (fold + audit + install-ai + ratio) :feature: -CLOSED: [2026-05-15 Fri] - -End-state: one repo (=rulesets=) is the single source of truth for =.ai/= template content. =make audit= verifies and applies drift across every =.ai/=-using project on the machine. =make install-ai= bootstraps new projects. Same setup propagated to ratio so both machines run the same way. - -Today (2026-05-15) the canonical-source rule got violated again: rulesets commit =372fb76= added a wrap-up subsection to =rulesets= without going through =claude-templates= first, and the next session's startup rsync was about to silently undo it. Two-repo coordination is the root cause; fold solves it. - -Build order: fold first (others depend on the new canonical path), then audit + install-ai in parallel, then test, then propagate to ratio. - -*** DONE [#A] Fold =claude-templates= into rulesets -CLOSED: [2026-05-15 Fri] - -Two repos, one source of truth. =~/projects/claude-templates/= is the canonical =.ai/= template that gets rsync'd into every project at session start. Keeping it standalone means a second =git pull= in startup Phase A.0, a second remote to push to at wrap-up, and a split history any time a change touches both. Folding it into =rulesets/claude-templates/= gives one repo to clone on a fresh machine and one place to edit templates. - -**** Open design choices - -- *History.* =git subtree add --prefix=claude-templates ~/projects/claude-templates main= preserves the 84-commit history under the new prefix. Plain content copy (=cp -a= + =git add=) is simpler but loses history. Either is fine since the standalone repo stays archived on =cjennings.net=. -- *Layout.* =rulesets/claude-templates/= mirrors the old repo name and sits next to =claude-rules/= cleanly. Alternative: absorb =.ai/= directly under a different name (=rulesets/.ai-template/= or similar). First option is clearer. -- *bin/ai.* The standalone Makefile symlinks =$HOME/.local/bin/ai → bin/ai=. After the move, fold that into rulesets' Makefile as another install target. - -**** Mechanical steps - -1. Subtree-merge or copy =~/projects/claude-templates/= into =rulesets/claude-templates/=. -2. Update 3 references in rulesets: - - =.ai/protocols.org= line 163 — pointer in the "Let's run/do the X workflow" section. - - =.ai/workflows/cross-agent-comms.org= line 8 — promotion-target path. - - =.ai/workflows/startup.org= lines 22, 96-98 — Phase A.0 pull + Phase A rsync sources. -3. Update Phase A.0 of =startup.org= to pull rulesets instead of claude-templates. Inside rulesets sessions, the existing project-repo pull already covers it. Outside rulesets (every other project's session), Phase A.0 needs an explicit =git pull= on =~/code/rulesets/= before the rsync — otherwise the templates will be stale. -4. Replace =~/projects/claude-templates/= with a symlink to =~/code/rulesets/claude-templates/= for transition continuity. -5. After every active project has had one session start (and rsync'd the new =startup.org=), drop the symlink and archive =cjennings.net:git/claude-templates.git=. - -**** Bootstrap gap - -Every project on the machine has a =.ai/workflows/startup.org= that rsyncs from =~/projects/claude-templates/=. Until each project's startup.org gets refreshed (which happens via the rsync itself), the old path needs to keep resolving. The symlink at step 4 is the bridge: old paths resolve into the new location, the rsync delivers the updated startup.org, next session uses the new path directly. - -*** DONE [#A] Add =make audit= — drift detector across all =.ai/=-using projects -CLOSED: [2026-05-15 Fri] - -Companion to =make doctor= (single-machine scope, checks =~/.claude/=). =audit= is cross-project scope: walks every directory on the machine that has a =.ai/=, diffs the synced template files against the canonical source, and reports drift. =--apply= flag rsyncs the drift into the project's working tree (no auto-commit). Catches stale projects without forcing a session start in each one. - -**** Open design choices - -- *Scope.* Template-sync drift is the useful flavor: for each project, diff =.ai/protocols.org=, =.ai/workflows/=, =.ai/scripts/= against the canonical source. -- *Source path.* Post-fold: =~/code/rulesets/claude-templates/.ai/=. Build =audit= against the new path from day one. -- *Project discovery.* Walk =~/code/=, =~/projects/=, =~/.emacs.d/= up to depth 3 for any directory containing =.ai/=. Skip the canonical source itself. -- *Default mode is report-only.* =--apply= triggers rsync; =--force= overrides the dirty-skip safety. - -**** Per-project flow (designed 2026-05-15) - -For each discovered project, in order: - -1. Verify =.ai/= exists (path probe). If missing → =FAIL=, skip, continue loop. -2. Detect git tracking via =git check-ignore .ai/= → =tracked= or =gitignored=. -3. Verify no uncommitted =.ai/= changes (=git status --porcelain .ai/=). Dirty → =WARN=, skip rsync unless =--force=. -4. Verify content matches canonical via three =rsync -a --dry-run --itemize-changes= calls (=protocols.org=, =workflows/=, =scripts/=). Zero items = clean. -5. Action (=--apply= only, drift detected): three =rsync -a [--delete]= calls. -6. Verify rsync converged (re-run the dry-runs; zero now). -7. Verify working-tree state after rsync (tracked projects). Report deltas. Do not auto-commit. -8. Verify no unpushed =.ai/= commits (=git log @{u}..HEAD -- .ai/=). Informational only. - -**** Output format (mirrors =doctor=) - -#+begin_example -Claude-templates source: - ok rulesets/claude-templates is current (origin/main) - -Per-project .ai/ drift: - ok ~/projects/work - applied ~/projects/homelab 3 files changed - skipped ~/code/winvm uncommitted .ai/ (use --force) - ok ~/projects/clipper - -Summary: 18 ok, 3 applied, 1 skipped, 0 failed -#+end_example - -Exit code: =0= if all clean, no skips, no failures. =1= otherwise. - -**** Why not extend =make doctor= instead - -=doctor= has a clean meaning today: "is this machine's =~/.claude/= consistent with rulesets?" Mixing in cross-project =.ai/= drift muddies the exit code. Keep them separate. =audit= can optionally invoke =doctor= as its last check since both ask "did the symlinks keep up with the source?". A future =make all-checks= can wrap both. - -*** DONE [#A] Add =make install-ai PROJECT=<path>= — bootstrap =.ai/= in a fresh project -CLOSED: [2026-05-15 Fri] - -Separate target from =audit= because operating on projects that lack =.ai/= is a distinct action. The absence might be intentional, so =audit= skips them. Bootstrap is explicit opt-in. - -**** Flow - -1. Refuse if =.ai/= already exists in =PROJECT=. Message: "already installed; use =make audit --apply= to update." -2. Verify =PROJECT= is a git checkout (warn if not — works without git, loses some lifecycle benefits). -3. Create =PROJECT/.ai/= directory. -4. Rsync canonical content: =protocols.org=, =workflows/=, =scripts/= (same three rsyncs as =audit=). -5. Seed =PROJECT/.ai/notes.org= from a canonical template with project-name placeholder. -6. Create empty =PROJECT/.ai/sessions/= (with =.gitkeep= for tracked projects). -7. Track or gitignore =.ai/=? Default: ask. Flag: =--track= / =--gitignore=. -8. Print next-steps banner: =make install-lang LANG=<lang> PROJECT=<path>=; open Claude Code in the project. - -**** Symmetry with existing install targets - -#+begin_example -make install-lang LANG=python PROJECT=/path # language bundle (existing) -make install-ai PROJECT=/path # .ai/ template (new) -make install-lang # no args → fzf-pick -make install-ai # no args → fzf-pick from - # ~/projects/* + ~/code/* dirs - # without an existing .ai/ -#+end_example - -*** DONE [#A] Test plan for audit + install-ai before propagating to ratio -CLOSED: [2026-05-15 Fri] - -Test against the current state of this machine before pushing changes to ratio. - -**** =make audit= tests - -1. Dry-run report only (no =--apply=). Should show: claude-templates current; per-project drift; correct =ok=/=drift= classifications; summary line and exit code match. -2. After the fold lands, every project should be reported as drift (their =startup.org= still points at the old path). Run =--apply= → rsync converges. Re-run audit → all =ok=. -3. Manually edit one =.ai/workflows/foo.org= in a tracked project. Re-run audit → should report =skipped: uncommitted .ai/=. Run =--apply --force= → rsync clobbers the edit. Verify the edit is gone. -4. Manually delete one =.ai/= dir. Re-run audit → =FAIL: .ai/ missing=. Loop continues. -5. Idempotency: =--apply= twice in a row converges to all =ok= on the second pass. - -**** =make install-ai= tests - -1. Create =/tmp/test-fresh-project= as a git repo. Run =make install-ai PROJECT=/tmp/test-fresh-project=. Verify =.ai/= structure matches canonical, =notes.org= has placeholder, =sessions/= exists. -2. Run =make install-ai PROJECT=/tmp/test-fresh-project= again → should refuse (=.ai/= already exists). -3. Open Claude Code in the new project. Startup workflow runs cleanly (Phase A.0 + Phase A rsync should be a no-op since the install just ran). -4. fzf form: =make install-ai= with no args. Lists candidate dirs (=~/projects/*=, =~/code/*= without =.ai/=). - -**** Pass criteria - -- =audit= behavior matches the per-project flow spec for every classification path. -- =install-ai= produces a project indistinguishable from one that's been running sessions for a while. -- =make doctor= still passes 36/0/0 after all the work. -- =make test= (pytest + ERT) passes. - -*** DONE [#A] Migrate projects on ratio (second machine) -CLOSED: [2026-05-15 Fri] - -After local fold + audit + install-ai are working, propagate to ratio. - -**** Steps - -1. On ratio: =git -C ~/code/rulesets pull= — picks up the folded =claude-templates/= subdir and updated =Makefile= targets. -2. On ratio: archive or =mv= the standalone =~/projects/claude-templates/= aside, replace with symlink to =~/code/rulesets/claude-templates/= (same bridge mechanic as local). -3. On ratio: =make audit= → see drift across ratio's projects. -4. On ratio: =make audit --apply= → rsync into each tracked/gitignored project. Surface projects with uncommitted =.ai/= drift for manual handling. -5. On ratio: =make doctor= → catch any =~/.claude/= install drift (likely some, since ratio hasn't seen recent rulesets updates). -6. Verify by opening Claude Code in a few ratio projects. Startup should be a no-op or near-zero rsync. - -**** Known unknowns - -- Ratio may have its own project list overlapping with this machine's but not identical. =audit= discovers projects via the walk, so this is automatic. -- Ratio might have uncommitted =.ai/= work in some projects that this machine doesn't. =audit= surfaces them; handle case-by-case. -- If anything goes wrong, ratio's archived =~/projects/claude-templates/= is the safety net — restore the symlink target and re-run audit. - -**** Adjacent: cross-machine memory sync - -The =[#A] DOING= memory-sync investigation (todo.org:10) is adjacent. Both involve "make my Claude setup portable across machines." Coordinate so the memory-sync stow approach (if approved) doesn't conflict with this fold's symlink mechanics. -** DONE [#B] Document startup pull-ordering rule in protocols.org -CLOSED: [2026-05-15 Fri] - -Phase A.0 of =startup.org= now pulls rulesets ff-only before the project repo -(shipped 2026-05-15 as part of the claude-templates fold — after the subtree -merge, there's no separate claude-templates pull, just rulesets-then-project). -The protocols.org paragraph stating the ordering and "resolve any issues -before proceeding" rule shipped 2026-05-15 in the =** Startup Pull Ordering= -subsection under =IMPORTANT - MUST DO=. -** DONE [#A] Build =/lint-org= skill + wrap-up integration -CLOSED: [2026-05-14 Thu] - -Spec: [[file:.ai/specs/lint-org-skill-spec.md]] - -A two-mode skill (=interactive=, =mechanical-only=) that runs =org-lint=, -auto-fixes safe categories (item-number, missing-language-in-src-block, -misplaced-planning-info, markdown-bold → single-asterisk), and walks judgment -items (broken local-file links, invalid fuzzy links, verbatim-asterisk false -positives, suspicious-language blocks) inline. - -Wrap-up integration: =wrap-it-up.org= invokes -=/lint-org todo.org --mode=mechanical-only= after the existing -=todo-cleanup.el --archive-done= pass. Judgment items defer to a -carry-forward file that the next morning's daily-prep merges in, so -wrap-up never blocks on a judgment call. - -Baseline that motivated this: the 2026-05-14 manual pass took =todo.org= -from 55 → 1 lint warnings across two commits (=0d10458= signal, -=9ad5b30= cosmetic). A nightly mechanical sweep keeps the count near -zero forever — each day's drift is small. -** DONE [#C] Test harness for =make audit= + =make install-ai= edge cases :test: -CLOSED: [2026-05-15 Fri] - -Three edge cases from the fold-epic test plan were not exercised because they're destructive on real projects: - -- =audit --force= clobbers uncommitted =.ai/= work — needs a project with intentionally dirty =.ai/= to verify the override path. -- =audit= reports =FAIL= when =.ai/= is missing — needs a project where the directory was deleted to verify the loop continues past the failure. -- =install-ai= fzf-pick form (no =PROJECT= arg) — needs interactive testing. - -Build a self-contained test harness under =.ai/scripts/tests/= that spins up =/tmp/audit-test-projects/= with a known matrix of project states (clean, dirty, missing =.ai/=, pristine, etc.), runs the audit + install-ai targets against it, and asserts expected outputs. The harness should clean up after itself. - -Pattern reference: bats or shell-based assertions (similar to the elisp ERT suites for =todo-cleanup= and =lint-org=, but for shell scripts). - -Triggered by: 2026-05-15 fold-epic, child 4 test plan; commits =94782ee= (audit) + =d364cf2= (install-ai). -** DONE [#A] wrap it up mentions github, which isn't the remote for many projects. :chore: -CLOSED: [2026-05-16 Sat] -For many of them, git.cjennings.net mirrors to github.com, and github.com isn't the remote. -For many others, git.cjennings.net is the remote with no mirror. -Remove or replace the reference to github.com -** DONE [#B] Phase A startup blind to =claude-templates/inbox/= post-fold :bug:fold: -CLOSED: [2026-05-19 Tue] - -Resolved on inspection: the bug is moot in current state. =inbox-send.py='s discovery scans =~/code/*= and =~/projects/*= single-level only, so =claude-templates/= (two levels under =~/code/=) is never a routable target; the 2026-05-15 incident was a one-time manual workaround because =rulesets/inbox/= didn't exist yet, and that root inbox was added in =470085f=. =claude-templates/inbox/= was removed 2026-05-15 and is no longer on disk. - -Phase A's inbox check at =startup.org:107= runs =\ls -la inbox/= against the project root. Post-fold, the canonical's inbox sits inside the subtree at =claude-templates/inbox/= and never gets scanned. A 2026-05-15 cross-project handoff from a dotemacs session dropped a record there; the next rulesets session (this one) missed it at startup entirely. Picked up only when the working-tree drift surfaced during the publish flow. - -Fix: extend Phase A's discovery to also scan =claude-templates/inbox/= when the canonical lives in-repo (i.e., when =claude-templates/.ai/= exists alongside =./.ai/=). The Phase B/C inbox-processing flow already handles per-file routing once a file is surfaced; the gap is only in discovery. - -Adjacent question worth answering at the same time: should cross-project handoffs file into =./inbox/= at the project root (matching what Phase A already scans), or stay in =claude-templates/inbox/= and rely on the discovery fix? The =inbox-send= script's target-project logic is the place to settle that. - -Triggered by: 2026-05-15 evening session, surfaced when committing the test-harness work. -** DONE [#A] Implement task-review daily-habit per spec -CLOSED: [2026-05-20 Wed] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-20 -:END: -Spec: [[file:docs/design/task-review.org]] - -Retires =wrap-it-up.org='s date-coverage scan and replaces it with a daily list-hygiene review (N=7 oldest-unreviewed top-level =[#A]= / =[#B]= / =[#C]= tasks per session, ~12-day rotation). Built as a pure Claude workflow — Shape B, no elisp; see the spec's Revision section for why the elisp approach was dropped. - -Status: -1. [X] =task-review-staleness.sh= + bats (count + =--list= modes). -2. [X] =wrap-it-up.org= health check (threshold 30). -3. [-] =task-review.el= — dropped (Shape B is a pure workflow, not an Emacs mode). -4. [X] New =task-review.org= workflow + INDEX entry (the existing listing workflow was renamed to =open-tasks.org= to free the name). -5. [X] Startup nudge in template =startup.org= (threshold 7), not the project-only startup-extras layer. -6. [X] Smoke test against live =todo.org= — first cycle run 2026-05-20 (7 tasks reviewed: 3 re-grades, 1 cancellation, 1 bump-and-tag). - -Triggered by: 2026-05-16 brainstorm on retiring the date-coverage scan. -** CANCELLED [#B] Build =ov-1= skill for DoDAF OV-1 (High-Level Operational Concept Graphic) -CLOSED: [2026-05-20 Wed] - -Cancelled during the 2026-05-20 task review. - -Triggered by SOFWeek (May 2026, Tampa) — DeepSat attending; DoD attendees -may ask for architecture diagrams. OV-1 is the universal informal -currency in DoD briefings ("show me the architecture" → OV-1 by default). - -Priority upgrades to =[#A]= if Craig confirms scenario 2 below (personal -load-bearing need at the event); stays =[#B]= or drops to =[#C]= if -scenario 1 (team already covers it, future asset only). - -*** Prior art (searched 2026-04-19) - -No existing Claude Code skill exists for DoDAF / OV-1 / SV-1 / SysML. - -- =anthropics/skills= — 17 skills, zero DoDAF/SysML/defense coverage. -- =awesome-claude-code= list — zero hits for DoDAF/OV-1/SysML/UAF. -- =mfsgr/sysml2dodaf= — empty repo (0 stars, no code). Vapor. -- =HowardKao-1130/mini-NEXEN= — broad SE methodology skill that - name-drops DoDAF as a trigger keyword; no artifact generation. 0 stars. -- =gaphor/gaphor= (Apache-2.0, 2.2k stars) — mature UML/SysML GUI - modeler. Not a skill; not a pipeline. Useful reference only. - -Nearest prior art to lean on when building: -- DoDAF 2.02 Viewpoints & Models reference (dodcio.defense.gov) — - canonical OV-1 exemplars. Embed 3-5 layouts as skill =references/=. -- Pattern from existing =c4-diagram= skill — same shape (prose → diagram - spec), swap the viewpoint vocabulary to DoDAF. -- PlantUML for SV-1 (when that skill comes later); Mermaid or draw.io - XML for OV-1 lightweight visuals. - -*** Build scope (when triggered) - -*In scope:* -- Input: prose description of a system + its operational context. -- Output: structured OV-1 *spec* — performers, external actors (other - systems, forces, adversaries), relationships (data/control flows), - narrative captions, classification marking, legend requirements. -- DoDAF 2.02 completeness checklist as a quality gate — verify the - produced spec contains every element a correct OV-1 requires. -- Optional lightweight visual: draw.io XML or Mermaid approximation for - quick review; NOT a finished rendering. - -*Out of scope:* -- Icon libraries, pictorial assets, finished PowerPoint export. OV-1 - final art belongs to a designer or Craig in Visio/PowerPoint; the - skill's job is the spec and the check, not the slide. -- SV-1, SV-2, UAF, IDEF1X, other viewpoints. Build only when a - concrete need triggers each. - -Estimate: 4-6 hours. - -*** Craig's investigation before kickoff - -1. Does DeepSat's systems-engineering or marketing team already have an - OV-1 (or the equivalent briefing artifact) for SOFWeek? -2. If yes (scenario 1) — skill is a future asset, not event-load-bearing. - Ship after SOFWeek. Priority drops to =[#C]=. -3. If no, or if the scenario is "Craig may need to produce/iterate an - OV-1 on the fly during the event" (scenario 2) — skill is load-bearing - for the event. Priority upgrades to =[#A]=; build before SOFWeek. -4. Confirm the classification level the skill needs to handle - (unclassified-only? or FOUO markings? affects the classification - block in the spec). -5. Confirm the target rendering format DeepSat uses for OV-1 - deliverables (PowerPoint slide? Cameo? Visio? affects whether the - skill emits draw.io XML vs Mermaid vs pure structured spec). - -*** Related - -See also the DoD-specific notations section under the later TODO -(=c4-*= rename revisit) — OV-1 is flagged there as the highest-value -starting point across the DoD notation landscape (SysML, DoDAF/UAF, -IDEF1X). This entry is the execution plan for that starting point. -** DONE [#A] Split team-specific publishing rules out of commits.md :commits: -CLOSED: [2026-05-22 Fri] -Shipped 3cb467e. Moved the DeepSat publishing steps (Linear ticket-state, the Slack notification protocol + channel ID, the GHE host, the team merge norm, the Linear ticket-body structure) out of the global =claude-rules/commits.md= into =teams/deepsat/claude/rules/publishing.md=. The global file keeps the universal skeleton and uses seams ("run the project's publishing overlay here if present") like startup-extras. Added =install-team= (targeted per-project copy, keyed on PROJECT, never globally symlinked) and generalized =sync-language-bundle.sh= to keep team overlays fresh at startup (3 new bats; make test green). - -Remaining deploy step (cross-project, surfaced to Craig): install the overlay into the DeepSat work project — =make install-team TEAM=deepsat PROJECT=<deepsat-path>= — so it actually loads there. -** DONE [#A] Define a /voice-unavailable fallback in the commits.md publish flow :commits: -CLOSED: [2026-05-22 Fri] -Added an "If =/voice= is unavailable" paragraph to the Single-skill gate in =commits.md=: walk the same patterns inline (the flow already names which matter), state the skill was unavailable and the pass was applied by hand ("/voice unavailable — patterns walked inline"), and flag the missing skill for install. The gate is the pattern walk, not the tooling. The original "=humanizer= unavailable" framing was moot (humanizer → /voice). -** DONE [#A] wrap-it-up Step 3.5 assumes GitHub-family remote :chore:quick: -CLOSED: [2026-05-22 Fri] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-20 -:END: -Documented the assumption inline at =wrap-it-up.org= Step 3.5 (chose the lightweight path over a provider-agnostic rewrite): the =gh= lookup expects a GitHub-family host, holds today via DeepSat on GHE, flagged for update if a future Linear project lands on GitLab/Gitea/Bitbucket. -Triggered by: 2026-05-16 wrap-it-up github.com cleanup (audit of the same file). - -Step 3.5 (Linear ticket-state hygiene) at =wrap-it-up.org:207= says "the project's GitHub remote — use =gh pr list ...=". Currently fine in practice: the step is Linear-gated, and the only Linear-using project is DeepSat (on =deepsat.ghe.com=, a GitHub-family host where =gh= works). Would break if a future Linear-using project lived on a non-GitHub host (gitlab, gitea, bitbucket). Either drop the GitHub-family assumption (provider-agnostic lookup, harder) or document the assumption explicitly so future projects know the step needs an update if they don't fit. -** DONE [#C] Review pass: tighten skills and rulesets after 2026-05-04 audit -CLOSED: [2026-05-22 Fri] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-20 -:END: -All 55 grouped-index items dispositioned (2026-05-22): ~49 edited across skills, commands, rule files, hooks, and the two playwright skills; several came out moot post-audit (humanizer→voice, skills→commands, typescript ruleset added); the two commits.md items shipped as the team-overlay split + /voice fallback. Freshness-checked each item against current reality before editing. - -Source notes used in this pass: -- C4 official docs: C4 is notation-independent; System Context and Container - diagrams are enough for most teams; every diagram needs title, key/legend, - explicit element types, and audience-appropriate abstraction. - [[https://c4model.com/diagrams][C4 diagrams]], - [[https://c4model.com/diagrams/notation][C4 notation]], - [[https://c4model.com/abstractions/component][C4 component]] -- arc42 docs: quality requirements need measurable scenarios; section 10 - should reference top quality goals and capture lesser quality requirements - with specific measures. [[https://docs.arc42.org/section-10/][arc42 section 10]], - [[https://quality.arc42.org/articles/specify-quality-requirements][specifying quality requirements]] -- ADR references: ADRs capture one justified architecturally significant - decision and its rationale; Nygard's original guidance emphasizes short, - numbered, repository-stored records and superseding rather than rewriting old - decisions. [[https://adr.github.io/][adr.github.io]], - [[https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions][Nygard ADR article]] -- Playwright docs: prefer user-visible locators and web assertions; locators - auto-wait and retry; =networkidle= is discouraged for testing readiness. - [[https://playwright.dev/docs/best-practices][Playwright best practices]], - [[https://playwright.dev/docs/locators][Playwright locators]], - [[https://playwright.dev/docs/next/api/class-page][Playwright page API]] -- OWASP references: Top 10 2021 includes Broken Access Control, - Cryptographic Failures, Injection, Insecure Design, Security - Misconfiguration, Vulnerable and Outdated Components, Identification and - Authentication Failures, Software and Data Integrity Failures, Security - Logging and Monitoring Failures, and SSRF; WSTG adds a broader testing map - across configuration, identity, authn/z, sessions, input validation, error - handling, cryptography, business logic, client-side, and API testing. - [[https://owasp.org/Top10/2021/][OWASP Top 10 2021]], - [[https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/][OWASP WSTG]] -- V2MOM references: Salesforce calls the last M "Measures" and emphasizes a - simple alignment document with prioritized Methods, explicit Obstacles, and - measurable outcomes. [[https://trailhead.salesforce.com/content/learn/modules/selfmotivation/get-focused-with-your-personal-v2mom][Salesforce Trailhead personal V2MOM]], - [[https://www.salesforce.com/blog/?p=12][Salesforce V2MOM alignment]] -- Prompt research: the cited Meincke paper is titled "Call Me A Jerk: - Persuading AI to Comply with Objectionable Requests"; its scope is - persuasion increasing compliance with objectionable requests, not a general - proof that persuasion framing improves prompt quality. - [[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5357179][SSRN paper]] -- Combinatorial testing references: NIST supports t-way combinatorial testing - and notes pairwise is one covering strength, with higher-strength arrays - useful for failures requiring more interacting factors. - [[https://www.nist.gov/publications/practical-combinatorial-testing-beyond-pairwise][NIST beyond pairwise]], - [[https://www.nist.gov/publications/combinatorial-software-testing][NIST combinatorial testing]] - -*** Grouped index (for batching by area) - -Each item below is a one-line summary of a sub-TODO further down. Tick the box when the matching sub-TODO is moved to =DONE=. Items are grouped by area so they can be batched (e.g., "do all Playwright items in one session"). - -**** Browser testing -- [X] [#A] =playwright-js=: locator/assertion-first guidance (replace raw CSS, =networkidle=) -- [X] [#B] =playwright-js= + =playwright-py=: reconcile headless/visible defaults -- [X] [#B] =playwright-js= + =playwright-py=: remove emoji console markers from examples - -**** Frontend / UI -- [X] [#B] =frontend-design=: WCAG 2.2 alignment, accessibility non-optional -- [X] [#B] =frontend-design=: harmonize aesthetic guidance with anti-pattern rules - -**** Security -- [X] [#A] =security-check=: OWASP 2021 + WSTG coverage -- [X] [#B] =security-check=: tooling and offline/network caveats - -**** Combinatorial testing -- [X] [#B] =pairwise-tests=: t-way escalation guidance beyond pairwise -- [X] [#B] =pairwise-tests=: clarify negative value syntax + generator availability - -**** V2MOM -- [X] [#A] =create-v2mom=: rename Metrics → Measures (Salesforce alignment) -- [X] [#B] =create-v2mom=: prevent task migration from turning V2MOM into a backlog -- [X] [#B] =create-v2mom=: mitigation/owner fields for Obstacles - -**** Prompt engineering -- [X] [#A] =prompt-engineering=: correct/narrow Meincke citation -- [X] [#B] =prompt-engineering=: eval-harness requirement for production prompts - -**** Codify -- [X] [#B] =codify=: stale-entry review + privacy checks before writing project =CLAUDE.md= - -**** Code review -- [X] [#A] =review-code=: resolve local-verification vs CI boundary -- [X] [#B] =review-code=: =CLAUDE.md= citation scope for public artifacts -- [X] [#B] =review-code=: relax three-strengths rule for tiny/failing diffs - -**** PR / review responses -- [X] [#A] =respond-to-review=: remove review-process language from commit messages -- [X] [#B] =respond-to-review=: use unresolved threads + resolution state -- [X] [#B] =respond-to-cj-comments=: drop personal absolute paths from public-writing (moot — already clean) -- [X] [#B] =respond-to-cj-comments=: fallback when =humanizer= or =emacsclient= unavailable (moot — superseded by /voice + VERIFY pattern) - -**** Branch workflow -- [X] [#A] =finish-branch=: fix base-branch detection -- [X] [#B] =finish-branch=: worktree-aware pull/merge safety -- [X] [#B] =start-work=: tool-availability + ceremony-scaling rules -- [X] [#B] =start-work=: claim-before-justify rollback risk - -**** Tests / TDD -- [X] [#B] =add-tests=: fix missing =typescript-testing.md= reference or add ruleset (moot — ruleset now exists) -- [X] [#B] =add-tests=: explicit exceptions to "all three categories per function" - -**** Debugging / RCA -- [X] [#B] =debug=: capture environment + recent-change context before hypotheses -- [X] [#B] =root-cause-trace=: constrain defense-in-depth to trust boundaries -- [X] [#B] =five-whys=: require evidence + counterfactual validation per why - -**** Brainstorming -- [X] [#B] =brainstorm=: timebox + research/source rules for high-stakes designs - -**** Architecture -- [X] [#B] =arch-decide=: timeless examples, drop unverifiable claims -- [X] [#B] =arch-decide=: standardize statuses + immutability language -- [X] [#B] =arch-design=: threat modeling + privacy/compliance as first-class inputs -- [X] [#B] =arch-design=: separate paradigms from tactical patterns -- [X] [#B] =arch-document=: arc42/Q42 quality scenarios -- [X] [#B] =arch-document=: staleness + ownership metadata for generated docs -- [X] [#B] =arch-evaluate=: confidence levels for framework-agnostic findings -- [X] [#B] =arch-evaluate=: report skipped tool checks explicitly - -**** C4 modeling -- [X] [#A] =c4-analyze= + =c4-diagram=: notation/output fallback (not draw.io-only) -- [X] [#B] =c4-analyze= + =c4-diagram=: clarify abstraction boundaries - -**** Global rules -- [X] [#B] =commits.md=: split DeepSat/Linear/Slack-specific from global rules → promoted to a top-level task (deferred for Craig) -- [X] [#A] =commits.md= + publish flows: =humanizer=-unavailable fallback → promoted to a top-level task (deferred; humanizer premise moot) -- [X] [#B] =verification.md=: explicit "unable to verify" reporting standard -- [X] [#B] =testing.md=: property-based + mutation testing as escalation paths -- [X] [#B] =testing.md=: soften absolute TDD with explicit spike protocol -- [X] [#B] =subagents.md=: capability/availability + cost checks - -**** Languages -- [X] [#A] =python-testing.md=: revisit in-memory SQLite guidance -- [X] [#B] =python-testing.md=: separate "never mock ORM" from unit-test boundaries -- [X] [#B] =elisp.md=: drop tool-specific advice -- [X] [#B] =elisp-testing.md=: batch-mode + native-comp caveats - -**** Hooks -- [X] [#A] =hooks/README.md=: include =destructive-bash-confirm.py= in install/settings snippets -- [X] [#A] =hooks/git-commit-confirm.py= + =gh-pr-create-confirm.py=: inspect message/body files referenced by =-F= / =--body-file= -- [X] [#B] =hooks/destructive-bash-confirm.py=: shell-aware command parsing (not regex) - -*** 2026-05-22 Fri @ 15:47:10 -0500 Made playwright guidance locator/assertion-first, dropped networkidle-as-readiness - -Rewrote the readiness guidance in both =playwright-js/SKILL.md= and =playwright-py/SKILL.md=: reconnaissance now waits for a visible app landmark via a web assertion or locator (=expect(...).toBeVisible()= / =get_by_role(...).wait_for()=), not =networkidle= (which Playwright discourages). Updated the login/form examples to =getByLabel=/=getByRole= + web assertions, the API_REFERENCE.md waiting section, and =lib/helpers.js= defaults (=waitForPageReady= now defaults to =load= and prefers a caller-supplied landmark; =authenticate= races the success indicator over a =load= navigation). node --check passes. - -*** 2026-05-22 Fri @ 14:23:02 -0500 Added headed/headless decision tables to both playwright skills - -Added matching purpose-based decision tables to =playwright-js/SKILL.md= (was "always visible") and =playwright-py/SKILL.md= Best Practices (was "always headless"). Each names its own default and points at the other skill, so the difference is deliberate, not a habit-flip: headed for interactive debugging, headless for CI/pytest. Also softened the absolutist "Always launch... headless" comment in the py example. - -*** 2026-05-22 Fri @ 15:47:10 -0500 Removed emoji console markers from the playwright skills - -Replaced every emoji status marker with a plain ASCII prefix across =playwright-js/= (run.js, lib/helpers.js, SKILL.md) and =playwright-py/= (SKILL.md, examples/*.py): 📦/⚡/📄/📥/🎭/🚀/📋/✅/❌/🔍/📸/✓/✗ → =[setup]=/=[run]=/=[ok]=/=[error]=/=[fail]= etc. Post-change emoji grep is clean (excluding node_modules); node --check and py_compile pass. - -*** 2026-05-22 Fri @ 14:35:16 -0500 Made accessibility a non-optional WCAG 2.2 gate in frontend-design - -Added an "Accessibility Gate (required before handoff)" section to =frontend-design/SKILL.md= covering keyboard operation, focus visibility, focus-not-obscured (2.2), target size (2.2), contrast, reduced motion, labels, and semantic structure — a baseline for all frontend work, not just interactive components. Rewrote the Build/Review phases to build accessibly as you go and clear the gate before handoff, and bumped =references/accessibility.md= from WCAG 2.1 to 2.2 with backing detail for the new criteria. - -*** 2026-05-22 Fri @ 14:35:16 -0500 Added a "creative but bounded" section to frontend-design - -Added a subsection under Frontend Aesthetics framing the bold/maximalist directions as tools, not obligations: domain fit, readability first, responsive stability, and no decorative effect that degrades the workflow. Reconciles rather than contradicts the maximalist encouragement (maximalism stays on the table as deliberate usable density), and ties the readability bullet to the new accessibility gate. - -*** 2026-05-22 Fri @ 14:35:16 -0500 Updated security-check to OWASP Top 10 2021 + WSTG mapping - -Replaced the older six-category list in =.claude/commands/security-check.md= with the full Top 10 2021 set, each finding mapped to a 2021 category or WSTG area. Added the four missing categories (Insecure Design, Software and Data Integrity Failures, Security Logging and Monitoring Failures, SSRF) plus explicit checks for object/function-level authorization, SSRF on URL-fetch paths, update/plugin/dependency integrity, and logging/monitoring gaps. - -*** 2026-05-22 Fri @ 14:35:16 -0500 Added scanner tooling + network caveats to security-check - -Added an optional configured-scanners step (=gitleaks=/=trufflehog= secrets, =semgrep= source patterns, OSV scanner, lockfile-diff review) that supplements the manual scans, plus a network caveat: dependency audits that can't run (offline, tool absent, DB unreachable) must report "not run" naming the tool and reason, never read as a pass. Carried that into the no-issues summary. - -*** 2026-05-22 Fri @ 14:35:16 -0500 Added t-way escalation guidance to pairwise-tests - -Added an "Escalating Beyond Pairwise (t-way)" subsection: start with pairwise across the whole space, then escalate specific high-risk clusters to 3-way+ when history, safety, security, or domain coupling says a fault needs more than two interacting factors. Lists escalation triggers and shows the sub-model order syntax (={ A, B, C } @ 3=) vs a blanket =/o:3= bump, stressing targeted not uniform escalation. Cites NIST combinatorial-testing work. - -*** 2026-05-22 Fri @ 14:35:16 -0500 Clarified PICT ~ syntax + honest generator-availability path in pairwise-tests - -Added a "~ prefix" explanation (PICT marker tagging a value as negative/invalid, not an arithmetic operator; PICT pairs negatives with valid values once and strips the marker before the SUT) and a stop-at-the-model rule: if neither the =pict= binary nor =pypict= is present, produce the model and stop rather than hand-writing a table and passing it off as PICT output. - -*** 2026-05-22 Fri @ 14:43:17 -0500 Renamed Metrics → Measures throughout create-v2mom - -Full rename across =.claude/commands/create-v2mom.md= (acronym expansions, Phase 7 heading, the "Measures must be measurable" principle, exit criteria, review questions, red flags, examples) to match Salesforce's official term. Kept the "vanity metrics" idiom intact — it's the anti-pattern term, not a section reference. - -*** 2026-05-22 Fri @ 14:43:17 -0500 Split strategy from execution in create-v2mom task migration - -Rewrote Phase 8 (and tightened Phase 5.5): tasks stay in the backlog grouped by method, and each method gains a one-line link to where its tasks live, instead of transplanting the task tree into the V2MOM. Strategy (V2MOM) and execution (backlog) are now explicitly separate sources of truth, keeping the V2MOM concise. - -*** 2026-05-22 Fri @ 14:43:17 -0500 Made create-v2mom obstacles operational (mitigation/owner/cadence) - -Phase 6 now captures, per obstacle: name, manifestation, stakes, mitigation, owner, and review cadence — with a worked example per domain (health/finance/software), a "good obstacle" characteristic, a Phase 9 review question, and a red flag for candid-but-not-operational obstacles. An obstacle without a countermove is now flagged as an observation, not a plan. - -*** 2026-05-22 Fri @ 14:43:17 -0500 Corrected and narrowed the Meincke citation in prompt-engineering - -Fixed the title to "Call Me A Jerk: Persuading AI to Comply with Objectionable Requests" (SSRN abstract_id=5357179) in all three spots (frontmatter, Seven Principles intro, References). Reframed the ~33%→72% result as what it is — a prompt-safety caution that persuasion raises compliance with objectionable requests — explicitly not evidence that persuasion framing improves engineering prompt quality. Kept the seven principles as a tone vocabulary. - -*** 2026-05-22 Fri @ 14:43:17 -0500 Added an eval-harness requirement to prompt-engineering critique mode - -Added critique step 7 + a checklist line: for fragile or reusable/production prompts, write 3-5 adversarial/edge inputs, run both the old and new prompt against each, and record the behavioral delta. A throwaway prompt can ship on the rewrite alone; a discipline/reused/production one can't. Without examples, "the rewrite is better" is an assertion, not a result. - -*** 2026-05-22 Fri @ 14:43:17 -0500 Added mandatory stale-entry + privacy pre-write checks to codify - -Added a "Mandatory pre-write checks" block at the top of Phase 3 (Write) in =.claude/commands/codify.md=: a stale-entry scan (update/remove no-longer-true entries in place, don't append contradictions around them) and a privacy/leak check carrying both questions verbatim — "safe if the project were public?" and "belongs in private memory instead?" — routing private content to auto-memory. Gates, not background guidance. - -*** 2026-05-22 Fri @ 14:06:41 -0500 Scoped review-code's CI-trust rule to reviewing, not shipping - -Expanded the False-Positive Filter bullet in =review-code/SKILL.md=: "trust CI, don't run builds" applies to reading a diff, not producing one. A pre-commit/pre-push flow still owes the local verification =verification.md= requires (run the suite or state "not run because..."). Closes the apparent contradiction with =verification.md= / =finish-branch=. - -*** 2026-05-22 Fri @ 14:06:41 -0500 Added private-vs-public CLAUDE.md citation modes to review-code - -Expanded the Content scope section in =review-code/SKILL.md= with two modes: a private/internal review cites =CLAUDE.md= directly; a public/team review translates the rule into the engineering reason it encodes and doesn't name the rules file (a teammate can act on the reason, not on a file they can't reach). Same principle =commits.md= states for personal tooling in public artifacts. - -*** 2026-05-22 Fri @ 13:48:14 -0500 Relaxed review-code "three strengths" to up-to-three-or-none - -Changed all three "three minimum" spots in =review-code/SKILL.md= (Strengths section, Critical Rules DO list, Anti-Patterns) to "up to three specific; say none found on a tiny or weak diff." Reframed the old "No Strengths section" anti-pattern as "Skipping strengths out of laziness" so a substantive diff still demands them while a weak one can honestly report nothing notable. Landed alongside Craig's adjacent edit telling reviewers not to explain why a strength is good (sycophantic padding). - -*** 2026-05-22 Fri @ 14:12:24 -0500 Removed review-process language from respond-to-review commit guidance - -Replaced the =fix: Address review — [description]= example (and the matching description-line phrasing) in =.claude/commands/respond-to-review.md= with "name the actual fix (=fix: validate export filename=), not the review that prompted it." Killed the non-ASCII dash and the process-in-commit pattern that conflicted with =commits.md=. - -*** 2026-05-22 Fri @ 14:12:24 -0500 Made respond-to-review fetch unresolved threads + resolve after verification - -Rewrote section 1 (Gather) in =.claude/commands/respond-to-review.md= to pull =reviewThreads= via =gh api graphql= with =isResolved=, skipping already-resolved threads so settled feedback isn't re-processed; top-level conversation comments still come from REST. Added a section-4 step: reply and resolve a thread only after the fix is verified, never before. - -*** 2026-05-22 Fri @ 14:12:24 -0500 Verified respond-to-cj-comments no longer embeds an absolute path (moot) - -Already resolved by a prior migration: =grep= for =/home/= and =/Users/= in =.claude/commands/respond-to-cj-comments.md= returns nothing. The public-writing section refers to the rules by name, not by local path. No edit needed. - -*** 2026-05-22 Fri @ 14:12:24 -0500 Closed respond-to-cj-comments humanizer/emacsclient fallback (largely moot) - -Overtaken by two later changes: =/humanizer= was replaced by =/voice personal= (no =/humanizer= invocation remains), and the mandatory =emacsclient= summary-open was replaced by the in-place VERIFY-task pattern (workflow line ~262, Craig's 2026-05-12 standing instruction). Only a stale descriptive phrase remained — tidied "humanizer's signs of AI writing" to "the signs of AI writing." The original fresh-environment-fallback concern no longer applies as written. - -*** 2026-05-22 Fri @ 14:51:37 -0500 Fixed finish-branch base-branch detection - -Rewrote Phase 2: resolve the base *branch name* in priority order (open PR's =baseRefName=, then =git symbolic-ref --short refs/remotes/origin/HEAD= stripped, then ask), and compute the merge-base *SHA* separately only where a commit range is needed. Made the branch-name-vs-merge-base distinction explicit, since the old command returned a SHA where a branch name was needed. - -*** 2026-05-22 Fri @ 14:51:37 -0500 Made finish-branch merge safer + worktree-aware - -Added pre-flight checks to Option 1 (Merge Locally): dirty-tree refusal with no auto-stash, protected-branch awareness, upstream-gated =git pull --ff-only=, and merge-commit-vs-rebase as a team-policy choice instead of a hardcoded =--no-ff=. Replaced the fragile =git worktree list | grep <branch>= detection with a =git rev-parse --git-dir= vs =--git-common-dir= comparison plus =git worktree list --porcelain= for the path. - -*** 2026-05-22 Fri @ 14:51:37 -0500 Added tool-availability + ceremony-scale paths to start-work - -Added a "Tool availability" section (graceful degradation when Linear MCP / =gh= / =/voice= / Playwright are missing — do what's available, surface what isn't, don't block) and a "Ceremony scale" section (trivial / small / standard tiers so a two-line fix skips ticket+branch+gates unless asked). The =humanizer= reference in the original item is moot — the file already uses =/voice= throughout. - -*** 2026-05-22 Fri @ 14:51:37 -0500 Resolved start-work claim-before-justify rollback risk - -Split the claim by tracker type: personal todo.org claims defer to after the Justify gate (a killed task needs no rollback), while team trackers (Linear/GitHub) still claim first to signal intent but record prior state (status, assignee, label) so the Phase 2 rollback restores exactly it. Updated the per-tracker rollback steps and the matching anti-pattern. - -*** 2026-05-22 Fri @ 14:28:41 -0500 Verified add-tests typescript-testing.md reference resolves (moot) - -Resolved since the audit: =languages/typescript/claude/rules/typescript-testing.md= now exists, and =add-tests/SKILL.md:68= references it by bare filename, the same way it references =python-testing.md= (both get copied into a project's =.claude/rules/=). The "missing file" premise no longer holds. No edit needed. - -*** 2026-05-22 Fri @ 14:28:41 -0500 Added a category-exception protocol to add-tests - -Added an exception note to step 7 (proposal) in =add-tests/SKILL.md=: pure adapters, generated code, tiny pass-through wrappers, and framework glue may skip a category that would only re-test the framework, but the skip must be stated and justified in the plan and the behavior covered at integration/E2E level — never a silent omission. Step 12 (write) now points back to "honor documented category exceptions." - -*** 2026-05-22 Fri @ 14:25:37 -0500 Added environment + recent-change capture to debug Phase 1 - -Added a fourth Phase-1 step in =debug/SKILL.md=: record versions, feature-flag/config state, dataset/fixture, seed/clock, concurrency, and recent commits/config-infra changes. Noted that intermittent bugs usually live in environment/state transitions (and "what changed recently" is often the fastest route), while a deterministic local bug only needs a one-liner. Updated the phase's closing recap to include the context. - -*** 2026-05-22 Fri @ 14:25:37 -0500 Constrained root-cause-trace defense-in-depth to boundaries - -Rewrote step b in =root-cause-trace/SKILL.md=: instead of "add a check at each layer that could have caught it," add one only at a layer that owns a boundary or invariant — ingress/trust, persistence, invariant-owning service, final render. Added the explicit rule that a pass-through function owning neither shouldn't get a duplicate null check (validation spam). Recast the three example layers as the boundary types. - -*** 2026-05-22 Fri @ 14:25:37 -0500 Required evidence + counterfactual per why in five-whys - -Expanded step 2 in =five-whys/SKILL.md=: each link now owes an evidence field (a log/commit/metric/config you can point to) and a counterfactual check (remove this cause — does the symptom above plausibly not happen?). Framed the counterfactual as the main guard against monocausal storytelling, and updated the worked example to show both fields. - -*** 2026-05-22 Fri @ 15:51:59 -0500 Added timebox + fresh-sources rules to brainstorm - -Phase 1 gained a "Timebox the dialogue" rule (aim for the one-sentence restatement in ~5-8 questions, then move on and park the rest as open questions). Phase 2 gained "Ground high-stakes claims in fresh sources" (check load-bearing claims about markets/regulations/tools/vendors/APIs against a current source; mark unverified ones as assumptions). The design-doc skeleton gained an "## Assumptions" section that distinguishes researched facts (with source) from assumptions (to confirm before building). - -*** 2026-05-22 Fri @ 14:59:32 -0500 Made arch-decide examples timeless + required citations - -Dated the MongoDB multi-document-transaction example (scoped to 2024-01) with a backing reference, and added a "Cite, don't assert" Do: every concrete technical claim about a tool/version/platform carries a link, doc, version, or "checked YYYY-MM" date, or gets a domain-neutral placeholder — so unsourced "X can't do Y" doesn't rot into stale fact. - -*** 2026-05-22 Fri @ 14:59:32 -0500 Standardized arch-decide ADR statuses + immutability rule - -Declared a canonical five-status set (Proposed, Accepted, Rejected, Deprecated, Superseded) with an explicit "no synonyms" line, and spelled out the immutability rule in the Don'ts: an accepted ADR's body is frozen, only status/link metadata changes, a changed decision gets a new superseding ADR and the old one stays as the historical record. - -*** 2026-05-22 Fri @ 14:59:32 -0500 Added Trust/Data/Compliance phase to arch-design - -Added a new Phase 4 (Trust, Data, and Compliance) before the paradigm shortlist: trust boundaries, data classification, abuse/misuse cases, privacy constraints, compliance evidence, and operational ownership — surfaced early so the architecture is drawn around them, not retrofitted by a downstream =security-check=. Threaded into the workflow list, brief template (new §6), review checklist, and anti-patterns. - -*** 2026-05-22 Fri @ 14:59:32 -0500 Split paradigms from tactical patterns in arch-design - -Split Phase 5's single mixed table into Step 1 (pick one paradigm: monolith/microservices/layered/event-driven/serverless/pipeline/space-based) and Step 2 (compose tactical patterns: DDD, hexagonal, CQRS, event sourcing — several or none, often per-module), with composition examples and an anti-pattern against treating DDD/CQRS as alternatives to a paradigm. Recommendation + brief now name a paradigm plus composed patterns. - -*** 2026-05-22 Fri @ 14:59:32 -0500 Expanded arch-document quality scenarios to the Q42 six-part template - -Replaced §10's thin "Under [condition]..." template with the arc42/Q42 six-part structure (source, stimulus, environment, artifact, response, response measure), each glossed, with the cart-checkout example rewritten across all six parts. A one-line prose form stays acceptable once all six parts are recoverable. - -*** 2026-05-22 Fri @ 14:59:32 -0500 Added staleness/ownership metadata to arch-document output - -Added a per-section metadata block (owner, generated-against SHA + date, review cadence, "stale-when" conditions) as an HTML-comment header plus a visible Doc-status note, with field-fill guidance, and a whole-document Doc Status table replacing the README's "Last Updated" stub. Wired into the review checklist and an "Undated docs" anti-pattern. - -*** 2026-05-22 Fri @ 14:59:32 -0500 Added confidence levels to arch-evaluate findings - -Added a "Confidence and Provenance" subsection: every framework-agnostic finding carries High/Medium/Low + how it was determined, with a required "Not fully checked because..." note when scale, runtime imports, reflection, or dynamic dispatch cap certainty. Updated the example findings and review checklist; a finding with no note now asserts a full read. - -*** 2026-05-22 Fri @ 14:59:32 -0500 Made arch-evaluate report skipped tool checks explicitly - -Replaced "skip silently" with explicit reporting: for each detected language whose tool isn't configured or can't run, emit an Info "tool not configured / not run" finding (with an example) so the audit shows what was and wasn't verified. A check that didn't run no longer reads as a pass. Updated workflow step 4 and the review checklist. - -*** 2026-05-22 Fri @ 14:51:37 -0500 Added notation/output fallback to c4-analyze + c4-diagram - -Both commands now treat C4 as notation-independent: a "Choosing a notation" section (draw.io XML, Structurizr DSL, Mermaid with native C4 types, PlantUML/C4-PlantUML) and a headless fallback that emits a text notation (Mermaid or Structurizr DSL) and skips PNG-export/desktop-open when =drawio= or a GUI is absent, rather than failing. draw.io is now one option, not the only one. - -*** 2026-05-22 Fri @ 14:51:37 -0500 Clarified C4 abstraction boundaries in c4-analyze + c4-diagram - -Added an "Abstraction boundaries" section to both: a Container is a separately deployable/runnable unit (not synonymous with a Docker container — a SPA or managed DB counts), a Component lives inside one Container and isn't separately deployable. Added a 4e "Verify single abstraction level" check that walks every element and relationship to confirm it stays at the diagram's level, notation-independent. - -*** 2026-05-22 Fri @ 15:10:35 -0500 Added "When You Cannot Verify" standard to verification.md - -Added a section requiring, when a verification command can't run, a four-part report: command attempted, why it couldn't run, risk left unverified, and the smallest next command for the user. States the principle that a check that didn't run is never reported as a pass — "unable to verify" is a required honest outcome, not silence. Placed after Red Flags. - -*** 2026-05-22 Fri @ 15:10:35 -0500 Added property-based + mutation testing escalation to testing.md - -Added an "Escalation Beyond Category and Pairwise" section: property-based testing for invariants over a broad input domain (round-trips, idempotence, ordering — Hypothesis/fast-check/proptest) and mutation testing for when high line coverage hides thin assertions (mutmut/cosmic-ray/Stryker). Both framed as escalation paths to reach for on a gap, not gates on every unit. - -*** 2026-05-22 Fri @ 15:10:35 -0500 Added a disciplined spike protocol to testing.md - -Formalized the existing "I need to spike first" excuse-table row into a "Spike Exception (Disciplined)" subsection under TDD Discipline: TDD stays the default, but a spike is sanctioned when all three hold — timeboxed, spike code not committed, and the first failing test written before productionizing the discovered approach. Built on the existing row rather than contradicting it. - -*** 2026-05-22 Fri @ 15:10:35 -0500 Added pre-dispatch availability + cost checks to subagents.md - -Added a "Pre-Dispatch Checks" section with two gates: Availability (no Agent capability → do the work in the main thread under the same scope/constraints/output discipline the contract would enforce) and Cost (when writing the full contract costs more than the task, do it inline). Cross-references the existing "Don't Subagent At All" section and "Subagenting trivial work" anti-pattern rather than duplicating. - -*** 2026-05-22 Fri @ 15:06:04 -0500 Revised python-testing SQLite guidance toward production-like DBs - -Replaced "prefer in-memory SQLite for speed" with: run ORM/query tests against a production-like DB (same engine as prod, often containerized), since SQLite diverges from Postgres/MySQL on query semantics, constraints, transactions, JSON, time zones, and indexes (a test can pass on SQLite and fail in prod). SQLite stays only for pure unit tests with no DB-semantics dependency. - -*** 2026-05-22 Fri @ 15:06:04 -0500 Clarified python-testing ORM-mocking boundary - -Changed the "never mock" bullet from "ORM queries" to "ORM internals (querysets, sessions, model internals)" and added a paragraph: domain services use real model methods/validation, but a thin orchestration unit can inject a fake at a deliberate data-access port (a repository/interface the code owns). That's still mocking at a boundary, not at ORM internals. - -*** 2026-05-22 Fri @ 15:06:04 -0500 Made elisp.md editing advice tool-agnostic - -Rephrased the "prefer Write over repeated Edits" bullet around intent: land nontrivial Elisp as one cohesive change rather than dribbling it in over tiny partial edits (which accumulate paren mismatches), and run paren-balance + byte-compile checks immediately after, whatever editing mechanism the environment uses. - -*** 2026-05-22 Fri @ 15:06:04 -0500 Added batch-mode + native-comp caveats to elisp-testing.md - -Added three sections: Batch-Mode Reproducibility (=emacs --batch= as source of truth, no interactive-session state, no blocking prompts, deterministic), Isolating Emacs State (temp =user-emacs-directory=, explicit load-path, declared deps only, with an unwind-protect sandbox example), and Byte-Compile/Native-Comp Warnings (=byte-compile-error-on-warn=, native-comp gated on =native-comp-available-p= and kept opt-in/version-aware). - -*** 2026-05-22 Fri @ 15:16:22 -0500 Synced hooks/README install snippets with the destructive hook (opt-in) - -Brought the README's manual-install and settings-JSON snippets in line with the canonical =hooks/settings-snippet.json= (which already wires all three) and the Makefile's opt-in design: added the destructive-bash-confirm.py symlink as an opt-in step, added its settings entry, and reworded the note to say all three are no-op-safe but the destructive gate is opt-in (=make install-hooks= excludes it by default — link manually before relying on the snippet entry). - -*** 2026-05-22 Fri @ 15:35:06 -0500 Hooks now scan file-backed commit/PR messages - -Added =read_referenced_file()= to =_common.py= (safe local read: missing/oversize/non-UTF-8 → None) and wired it in: =git-commit-confirm.py= =extract_commit_message= now handles =-F=/=--file=/=--file===<path>= (reads + scans the file, falls through to UNPARSEABLE → asks if unreadable), and =gh-pr-create-confirm.py= reads =--body-file= content instead of a placeholder. Attribution scanning now sees the real committed/posted text. Built a pytest harness (=hooks/tests/=, importlib-by-path loader for the hyphen-named hooks) and wired =hooks/tests= into =make test=. 54 hook tests pass; full suite green. - -*** 2026-05-22 Fri @ 15:35:06 -0500 Rewrote destructive-bash rm parsing on shlex - -=detect_rm_rf= now tokenizes with =shlex.split= instead of a whitespace split, so quoted/spaced paths and combined/separate/reordered flags (=-rf=, =-r -f=, =-fr=, =--recursive=/=--force=) all parse. Fails toward asking — returns a sentinel that still fires the modal — on unbalanced quotes or when a forced recursive rm coexists with a compound/pipeline/substitution/redirect construct. Documented the supported/unsupported shell constructs in the docstrings, and extended the dangerous-path banner to =$HOME=-prefixed and wildcard targets. Covered by 25 new tests. (Pre-existing, out-of-scope: path-prefixed =rm= like =/bin/rm= still isn't matched.) -** DONE [#B] Add =make remove= for interactive ruleset removal via fzf -CLOSED: [2026-05-22 Fri] -Shipped: =scripts/remove.sh= (three modes — =--list=, =--remove-selected= reading stdin, and the default fzf-multi interactive flow) + =make remove= target + =scripts/tests/remove.bats= (5 cases). Lists only symlinks resolving into the repo (foreign links left alone); rm's picked links while leaving repo sources untouched; reports-and-continues on a missing target; quiet no-op on empty selection. shellcheck clean, make test green. Dropped the stale =bridge= entry per the note below. - -Add a Makefile target that lists every currently-installed ruleset entry -and lets me pick one or more to remove via fzf. Granular alternative to -=make uninstall= (removes everything) and =make uninstall-hooks= (removes -only hooks). - -*** Why this matters - -Tearing down a single skill, rule, hook, or config file currently means -either running =make uninstall= and re-installing what I want to keep, -or =rm=ing the symlink directly and remembering the exact path. Both are -friction. An interactive picker lets me filter, multi-select with Tab, -and confirm with Enter — the typical fzf flow. Costs about 3-5 seconds -per teardown instead of 15+ seconds of "what's the exact name?". - -*** Design - -The recipe builds a tab-separated list of every currently-installed item, -categorized by type, and pipes it to =fzf --multi=. The user filters, -marks with Tab, and confirms with Enter. The recipe parses the selections -and =rm=s the matching symlinks. - -#+begin_example - skill debug - rule commits.md - hook destructive-bash-confirm.py - config settings.json - commands commands - bridge claude-rules -#+end_example - -Each line is =<kind>\t<name>=. The recipe maps =<kind>= to the right path: - -- =skill= → =$(SKILLS_DIR)/<name>= -- =rule= → =$(RULES_DIR)/<name>= -- =hook= → =$(HOOKS_DIR)/<name>= -- =config= → =$(CLAUDE_DIR)/<name>= -- =commands= → =$(CLAUDE_DIR)/commands= -- =bridge= → =$(SKILLS_DIR)/claude-rules= - -Source files in =rulesets/= stay untouched. =make install= re-creates the -removed links if needed (the install loop is idempotent). - -*** Edge cases - -- Esc instead of Enter → empty selection → clean exit, no removal. -- Filter to nothing then Enter → same as Esc. -- Selected item already gone → =rm= fails visibly, processing continues - on the rest. -- =fzf= not installed → fail fast with a clear error (matches the pattern - used by =install-lang=). - -*** Possible extensions - -- Parallel =make pick-install= target that lists not-yet-installed items - and installs the chosen ones. Symmetric UX, same fzf flow. -- Confirmation prompt when more than N items selected (defense against - accidental select-all). -- =--source= flag that also runs =git rm= against the rulesets source for - the selected item. Probably bad idea — too easy to lose work. -- The =bridge → $(SKILLS_DIR)/claude-rules= entry above is stale — the - bridge symlink got removed in a later commit. Drop that bullet when the - recipe lands. -** DONE [#B] Document the =mcp/= install pipeline in =mcp/README.org= -CLOSED: [2026-05-22 Fri] -Wrote =mcp/README.org= covering everything in the "what to cover" list: the file layout (tracked vs gitignored), the secrets-bundle shape (plain =${VAR}= secrets + base64-bundled OAuth artifacts, AES256 symmetric =gpg -c=), the install flow (decrypt → materialize keys/token caches at mode 600 → expand → register unregistered, idempotent), the http/sse-vs-stdio transport split, token rotation when a Google refresh token is revoked, and adding a new server. Grounded in a read of the actual =install.py= + =servers.json=. - -=mcp/= has =install.py=, =servers.json=, =secrets.env.gpg=, =gcp-oauth.keys.json= (gitignored, regenerated at install). No README. Coming back to this in three months I'll re-discover how the bundle is structured, what =install.py= does, and how to rotate tokens. Saving that re-discovery is the whole point. - -*** What to cover - -- Layout: what each file is, which are tracked vs gitignored. -- Secrets bundle shape: how vars are listed in =secrets.env=, the symmetric-encryption pattern (=gpg -c --cipher-algo AES256=), the base64-bundled OAuth artifacts (=GCP_OAUTH_KEYS_JSON_B64=, =GOOGLE_DOCS_PERSONAL_TOKEN_B64=, =GOOGLE_DOCS_WORK_TOKEN_B64=). -- Install flow: =make install-mcp= → =install.py= decrypts, writes the keys file and Google Docs token caches at mode 600, expands =${VAR}= in =servers.json=, calls =claude mcp add --scope user= for unregistered servers. Idempotent. -- Token rotation: when a refresh token gets revoked, the recovery flow (re-auth on one machine, re-bundle, recommit). -- Adding a new server: edit =servers.json=, add any new =${VAR}= placeholders to the bundle, re-encrypt. -- The OAuth dance for HTTP-transport servers (linear, notion) versus stdio (google-docs-*) — different paths, different gotchas. -** DONE [#C] Add =make uninstall-mcp= + =mcp/install.py --check= for symmetry :feature:solo:quick: -CLOSED: [2026-05-28 Thu] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-28 -:END: - -Currently the MCP install pipeline only flows one direction. No way to remove rulesets-managed MCP servers in one command. No way to ask "what's the drift between =servers.json= and =claude mcp list=" without eyeballing. - -*** =make uninstall-mcp= - -Iterate over =servers.json=, run =claude mcp remove <name> -s user= for each. Ignore "not registered" errors. Idempotent. - -*** =mcp/install.py --check= - -Dry-run mode. Decrypt secrets, but instead of registering, print the drift report: - -- Servers in =servers.json= not in =claude mcp list= → =MISSING= -- Servers in =claude mcp list= not in =servers.json= → =EXTRA= -- Servers in both → =ok= - -Useful for diagnosing connection failures and for the eventual =make doctor= integration. -** DONE [#C] Update =README.org= with MCP install pipeline section :chore:solo:quick: -CLOSED: [2026-05-28 Thu] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-28 -:END: - -=README.org= covers global install, per-project language bundles, and design principles, but doesn't mention =make install-mcp= or the =mcp/= directory. Add a short section after "Per-project language bundles" describing the user-scope MCP install pattern (decrypt → expand → register) and pointing at the eventual =mcp/README.org=. -** DONE [#C] Consolidate =claude-templates/Makefile= after fold :chore:quick:solo: -CLOSED: [2026-05-28 Thu] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-28 -:END: - -Sibling follow-up from the fold child (2026-05-15). After the subtree merge, =rulesets/claude-templates/Makefile= still has its standalone =install= / =uninstall= / =list= / =test-scripts= targets. The =install= target's =bin/ai= logic is now duplicated in =rulesets/Makefile=. Both work; the redundancy is harmless but worth cleaning up. - -Options: -- *Delete* =claude-templates/Makefile= entirely — forces all install through rulesets root. Cleaner. -- *Strip down* to just =test-scripts= — the one piece not redundant with =rulesets/Makefile=. -- *Leave it* — slight redundancy, no functional harm. - -Triggered by: 2026-05-15 fold session's refactor audit (commit =2d645fc=). -** DONE [#C] Run =--archive-done= sweep at start of =open-tasks.org= Phase A :chore:quick:solo: -CLOSED: [2026-05-28 Thu] -:PROPERTIES: -:CREATED: [2026-05-28 Thu] -:LAST_REVIEWED: 2026-05-28 -:END: - -From pearl handoff 2026-05-28. =open-tasks.org= Next Mode reads =* Project Open Work= and skips =* Project Resolved= correctly, but a level-2 task that completed during a session sits as =** DONE= under Open Work until something archives it. Between cleanups, a freshly-DONE task can surface as a "what's next" candidate. - -Proposed fix: as the first step of =open-tasks.org= Phase A, run =emacs --batch -q -l .ai/scripts/todo-cleanup.el --archive-done todo.org=, then read =todo.org=. The cleanup tool already exists; this is wiring it into the workflow. - -Cost: a few hundred ms at the start of every "what's next" invocation. Win: recommendations never include DONE work. - -Optional refinement: gate behind a check for read-only / dry-run mode if that's ever introduced. The default invocation archives. -** DONE [#C] Triage Codex enhancement backlog :spec: -CLOSED: [2026-05-28 Thu] -:PROPERTIES: -:CREATED: [2026-05-28 Thu] -:LAST_REVIEWED: 2026-05-28 -:END: - -Triaged interactively 2026-05-28. Disposition table for all 14 items lives at [[file:docs/design/2026-05-28-rulesets-enhancement-backlog.org][2026-05-28-rulesets-enhancement-backlog.org]] under "Triage Dispositions": 3 accepted (filed below as TODOs), 3 pilot/scope-limited (filed below), 2 marked as conventions rather than tracked tasks, 6 rejected with rationale. Items #1 and #2 already had homes (#16 and the Phase-1 codex TODO). -** DONE [#C] Canonical/mirror drift detection via pre-commit hook or =make sync-check= :feature:quick:solo: -CLOSED: [2026-05-28 Thu] -:PROPERTIES: -:CREATED: [2026-05-28 Thu] -:LAST_REVIEWED: 2026-05-28 -:END: - -From the codex enhancement backlog (item #7), reframed: don't dedupe the dual source — the canonical-in-=claude-templates/= + mirror-in-=.ai/= pattern is a feature (other projects rsync from the canonical; the mirror lets rulesets-as-a-project have a working copy). The real pain is sync-discipline overhead — every workflow edit needs both copies updated, and forgetting one leaves the next startup's rsync to surface the drift. - -Scope: write a small =scripts/sync-check.sh= (or fold into the existing Makefile) that diffs =claude-templates/.ai/workflows/= against =.ai/workflows/=, exits non-zero on drift. Wire as a pre-commit hook (=githooks/pre-commit= or equivalent) so the discipline is enforced before publish, not at the next startup. =make sync-check= as a manual entry point. - -Verification: introduce a deliberate diff, commit, hook should block. Restore parity, hook should pass. -** DONE [#C] Add =make status= — compose audit + doctor + open-task count :feature:quick:solo: -CLOSED: [2026-05-28 Thu] -:PROPERTIES: -:CREATED: [2026-05-28 Thu] -:LAST_REVIEWED: 2026-05-28 -:END: - -From the codex enhancement backlog (item #12), scope-limited: =make status= only. Reject the rest of #12 (=make sync= duplicates the existing sync flow; =make health= wraps existing checks without adding signal; =make bootstrap-project= duplicates =install-ai= + =install-lang=). - -Scope: one Makefile target that prints a compact summary of: - -- Install audit state (clean / drift, calling =make audit=). -- Machine-global doctor state (calling =make doctor=). -- Open-task count (top-level entries in =todo.org= under =* Rulesets Open Work=). -- Inbox count (files in =inbox/= excluding =.gitkeep= and =PROCESSED-= prefixes). -- Git working-tree status (clean / dirty, ahead/behind upstream). - -Output should be roughly 10 lines, scannable in one glance. Composes the existing checks; no new logic except the summary formatting. -** DONE [#C] Iteration-history backfill for spec-review and spec-response :docs:followup: -CLOSED: [2026-05-28 Thu] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-28 -:END: -Source: org-drill inbox 2026-05-28. - -Once the in-flight WIP lands (the requirement that specs carry a bottom =Review and iteration history= section, with iteration / date / contributor / role / what / why / artifacts), backfill the two workflow files themselves using rulesets' session history as evidence. - -Files to update: -- =claude-templates/.ai/workflows/spec-review.org= -- =claude-templates/.ai/workflows/spec-response.org= - -Investigation: search =.ai/sessions/=, =.ai/notes.org=, inbox archive, and git log for mentions of these workflow docs. Identify review/response/design iterations, dates, and contributors (including agents where known: Claude Code, Codex, local models). Distinguish high-confidence history (commits, dated session entries) from inferred (chat-only context). Recommend whether enough evidence exists to populate the section, and draft the entries if so. - -Dependency: spec-review.org and spec-response.org have uncommitted edits in flight. Wait for those to land before writing to the files. The read-only research portion (search sessions, identify iterations, draft entries to a scratch file) can run in parallel without conflict. -** DONE [#B] Startup Phase A rsync propagates dirty rulesets WIP into downstream projects :feature: -CLOSED: [2026-05-30 Sat] -:PROPERTIES: -:CREATED: [2026-05-29 Fri] -:LAST_REVIEWED: 2026-05-29 -:END: -Fixed via option 1 (skip-when-dirty), scoped to the synced source paths: startup.org Phase A now guards the protocols/workflows/scripts rsyncs behind a =git status --porcelain= check on =claude-templates/.ai/{protocols.org,workflows/,scripts/}=, skipping the sync when any are dirty. The propagation anomaly (cross-project-broadcast.org / page-signal.org not reaching jr-estate) was a timeline artifact: both files were added in 664bf01 on 2026-05-29, after jr-estate's Phase A rsync had already run — correct behavior, not a bug. - -From jr-estate handoff 2026-05-29. When rulesets has uncommitted WIP at the moment a downstream project starts a session, Phase A.0 reports "dirty, skipping pull" and proceeds. Phase A's =rsync -a --delete= then runs against the dirty rulesets working tree and copies the WIP state into the downstream project's =.ai/workflows/= and =.ai/scripts/=. The downstream project's =git status= then shows drift the user did not author. Two bad recovery paths: commit the drift as "chore: sync .ai tooling from templates" (creates fake commit history about template state) or leave it dirty (noisy wrap-ups, pressure to commit anyway). - -Three options proposed in the handoff: -1. *Skip-when-dirty.* Make Phase A's workflows/ and scripts/ rsync no-op when Phase A.0 reports rulesets dirty. Simplest defense. -2. *Clean-files-only.* Restrict the rsync to files git considers unmodified in rulesets. Untracked files in rulesets do not propagate. Most precise. -3. *Clean-ref-based.* Cache the last-known-clean state as a git tag or ref and rsync from that ref rather than the working tree. Most decoupled, also the most infrastructure. - -Recommendation (mine): option 1. The downstream impact of skipping a sync once is small (the next session with rulesets clean catches up), and the implementation is one =if [ "$dirty" -eq 0 ]= guard around the existing rsync block. Option 2 adds shellout complexity per file; option 3 requires tagging discipline that has no other reason to exist. - -The original handoff also noted a related anomaly: even with =--delete=, two files that DO exist in rulesets canonical (=cross-project-broadcast.org=, =page-signal.org=) did NOT propagate to jr-estate. Worth confirming whether that was a transient rsync issue or evidence of a deeper Phase A bug. Could be ordering: those files were added to rulesets AFTER the jr-estate Phase A rsync ran, in which case the behavior is correct and the report is misreading the timeline. - -Source: =inbox/2026-05-29-0832-from-jr-estate-investigate-startup-rsync-carried-dirty.org= (processed and deleted). -** DONE [#B] Codex Phase 1 — AI_AGENT_ID + session-context.d/<id>.org :feature: -CLOSED: [2026-05-30 Sat] -:PROPERTIES: -:CREATED: [2026-05-28 Thu] -:LAST_REVIEWED: 2026-05-28 -:END: -Shipped backward-compatibly. New =.ai/scripts/session-context-path= helper resolves the active path from =AI_AGENT_ID=: unset → the legacy =.ai/session-context.org= singleton (one-agent default unchanged, per the spec's compatibility rule), set → =.ai/session-context.d/<sanitized-id>.org=. startup.org's existence check and wrap-it-up.org's rename now resolve through the helper (with a singleton fallback for older checkouts); wrap folds the agent id into the archive name. protocols.org documents the rule. Verified: 5 bats cases + a two-agent simulation showing distinct paths per id. Larger runtime-neutral arc (runtimes/ manifests, launcher refactor) stays parked under the parent spec. - -Lifted from the broader codex runtime spec ([[file:docs/design/2026-05-28-generic-agent-runtime-spec.org]]) as the immediate-correctness slice independent of the larger arc. The singleton =.ai/session-context.org= is unsafe under simultaneous agents — two LLMs running in the same project at the same time would overwrite each other's session state. - -Scope: introduce an =AI_AGENT_ID= environment variable and split the single =session-context.org= into a per-agent =session-context.d/<id>.org= directory. No other phases of the runtime refactor are in this task — keep the surface small, fix the race, ship. - -Touches: =.ai/protocols.org= (rename rule + recovery anchor), =.ai/workflows/startup.org= (Phase A check), wrap-up workflow (rename target), per-project session record discoverability. - -Verification: simulate two agents sharing a project (separate AI_AGENT_ID values) and confirm session-context writes land in distinct files without interleaving. - -Parent: see [[Generic agent runtime support — Codex spec v0]] above for the larger arc this is sliced from. -** DONE [#C] Decide on category-3 rule copies in the deepsat tree :chore:quick:solo: -CLOSED: [2026-05-31 Sun] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-28 -:END: -Diffed 2026-05-31. Both copies (coding-rulesets vendored + orchestration_dashboard_mvp) are byte-identical to each other and stale against canonical: =testing.md= 221 lines behind with 5 lines unique to the copies (older wording or a small team tweak), =verification.md= 40 behind with nothing unique. Same older vendored version in both spots. Left untouched per the A1 decision — team-owned, and canonicalizing would create a cross-repo dependency on the private rulesets (the orchestration_dashboard_mvp pair is team-visible from Vrezh's PR thread). No files modified. - -While symlinking personal-project =.claude/rules/= mirrors to the rulesets canonical on 2026-05-07, two locations didn't fit the "personal mirror → symlink" pattern and were left untouched pending judgment: - -- =~/projects/work/deepsat/code/coding-rulesets/claude-rules/{testing,verification}.md= — looks like a vendored team-shared copy. -- =~/projects/work/deepsat/code/orchestration_dashboard_mvp/.claude/rules/{testing,verification}.md= — could be project-specific overrides. - -For each: read the file, diff against the rulesets canonical, decide whether it's an intentional diverge (leave alone), stale (sync content), or should canonicalize (replace with symlink and accept the cross-repo dependency). The orchestration_dashboard_mvp pair is the project where Vrezh's PR review surfaced this whole thread, so any decision there has team-visibility implications. - -Decision (Craig, 2026-05-31): *leave team-tree copies alone.* Personal rulesets does not reach into team repos — canonicalizing would create a cross-repo dependency on the private rulesets, and the orchestration_dashboard_mvp copy is team-visible. This makes the task solo: diff each copy against canonical, record whether it's identical / drifted / overridden in the disposition, and close as "left alone (team-owned)" without modifying the team-tree files. -** DONE [#C] Audit language-specific rule files for cross-project duplication :chore:solo: -CLOSED: [2026-05-31 Sun] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-28 -:END: -Audited 2026-05-31. Findings: in sync with canonical (=languages/<lang>/claude/rules/=) — work =python-testing.md=, deepsat =typescript-testing.md=, =.emacs.d= =elisp-testing.md= + =elisp.md=. Drifted — =gloss= and =chime= (byte-identical to each other): =elisp-testing.md= 44 lines behind (canonical added Batch-Mode Reproducibility + Isolating Emacs State; zero lines unique to the copies), =elisp.md= one line behind (canonical expanded the edit-cohesively guidance). No project-specific additions anywhere — every copy is either current or purely stale. - -Disposition: *leave them project-local* (the task's own option). The language-rule copies in code projects are the bundle's deliberate copy-and-sync model, not the symlink pattern the generic rules (commits/testing/verification/subagents) use in personal doc-projects. =sync-language-bundle.sh= auto-fixes drifted bundle rules on each startup, so gloss/chime self-heal the moment those projects next boot — no canonicalize/symlink needed, and symlinking would fight the bundle model. Did not reach into work/deepsat/gloss/chime/.emacs.d from here (cross-project boundary; team copies left alone per the 2026-05-31 category-3 decision). - -The four canonical rules (=commits=, =testing=, =verification=, =subagents=) are now symlinked across the five personal-project mirrors as of 2026-05-07. But several language-specific rule files exist in multiple project mirrors and may be duplicated or drifted: - -- =python-testing.md= in =~/projects/work/.claude/rules/= -- =typescript-testing.md= in =~/projects/work/deepsat/code/.claude/rules/= -- =elisp-testing.md= and =elisp.md= in =~/.emacs.d/=, =~/code/gloss/=, =~/code/chime/= - -The Elisp pair is the most suspicious — three repos using essentially the same rules. Audit: diff these across the projects, check for drift, then decide whether to canonicalize them under =~/code/rulesets/claude-rules/languages/<lang>/= and symlink, or leave them as project-local. -** DONE [#C] Refactor =daily-prep.org= to delegate to =triage-intake.org= for the triage section :chore:solo: -CLOSED: [2026-05-31 Sun] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-28 -:END: -Collapsed Phase 3's inline source scans (sub-steps 3b email / 3c mark-read / 3d Slack / 3e Linear / 3f PRs / 3g dedup, ~280 lines) into four: 3b runs the triage-intake engine, 3c surfaces today's reactive items as Day's Priorities thin links, 3d re-sorts by urgency, 3e writes the audit footer from the engine's coverage. Source coverage carries via the engine's Phase 0 two-dir glob (general + .ai/project-workflows/ plugins), so the work account's Gmail/Slack/Linear/GHE plugins still get scanned. Adapted the downstream refs (Prep Doc Structure rule, Heads-up FYI source, Recommended Approach Pattern reframed as engine-applied), removed the orphaned Linear-digest note, added a Living Document entry. Verified: workflow-integrity clean (no dangling script refs), sync-check clean, full suite green. daily-prep.org went 825 → 576 lines. - -=daily-prep.org= still does its own inline triage (Gmail × 3 accounts, Slack, Linear, GHE PRs, calendars) as part of the full prep flow. =triage-intake.org= is now a source-agnostic engine that loads =triage-intake.<source>.org= plugins (refactored 2026-05-26), so daily-prep could call the engine and consume its synthesis instead of duplicating the source-scan logic. That DRYs up a large workflow and keeps both flows in sync when sources change — a source change now lives in one plugin that both flows pick up. - -Scope: -- Identify the sections in =daily-prep.org= that do the inline triage (the email / Slack / Linear / PR / calendar fan-out, plus the "Sources checked: ..." footer at the top of each generated prep doc). -- Replace those sections with "run the =triage-intake.org= engine" and adapt the downstream sections (Heads-up, Day's Priorities, Carry-forwards) to read the engine's synthesis output rather than the inline scan results. -- Verify the generated prep doc still has the same shape (Heads-up + Day's Priorities + Carry-forwards + Sources checked). -- Reconcile source coverage: daily-prep's inline triage scans work accounts (3 Gmail, Slack, Linear, GHE PRs) that are project-specific plugins under =.ai/project-workflows/=, not general plugins. The delegation must ensure the engine loads those project plugins (Phase 0 globs both dirs) so nothing daily-prep currently scans drops out. - -Origin: came up while authoring =triage-intake.org= on 2026-05-11; body refreshed after the engine/plugin refactor on 2026-05-26. -** DONE [#C] Templatize =make coverage-summary= into the language bundles (Elisp pilot) :feature:solo: -CLOSED: [2026-05-31 Sun] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-28 -:END: - -Done 2026-05-31 (Elisp pilot, the scoped milestone): ported the kernel into the elisp bundle as a self-contained =languages/elisp/claude/scripts/coverage-summary.el= (no coverage-core dependency), proven end-to-end against the real dotemacs SimpleCov report (93 tracked, 27 untested modules surfaced, project number 66.4%). The missing-file-as-0% + unit-weighted number is the kernel. Delivery: the script ships under =.claude/scripts/= (gitignored, auto-fixed on drift by =sync-language-bundle.sh=); =languages/elisp/coverage-makefile.txt= holds the project-owned Makefile fragment, seeded at project root by =install-lang.sh= and dropped into =.ai/inbox/= by sync when that convention exists. Tests: 12 ERT (=languages/elisp/tests/=, wired into =make test=), 5 new sync bats, 2 new install-lang bats. The fan-out to Python/Go/TS is the follow-up below. - -Borrow dotemacs's =make coverage-summary= into the language bundles. After =make coverage= writes a coverage file, =coverage-summary= prints per-unit covered/total with percentages, a unit-weighted project number, and a list of source files present on disk but missing from the coverage report. - -*The kernel — the only part worth building.* Weight the project number by file/module rather than by line, and count a source file absent from the report as 0% instead of omitting it. A module no test imports just doesn't appear in coverage.py or nyc output, so it silently fails to drag the number down. That missing-file detection is the value; everything else (per-file table, total) the built-in reporters already print, so don't reimplement those. - -*Scope Elisp-first.* Port the proven dotemacs version into the elisp bundle, prove the pattern end-to-end, then fan out. Don't open all four bundles at once. - -*Delivery (settled 2026-05-25).* Two rulesets-owned pieces per language: -- The summary *script* ships in the bundle under =.claude/= (inside the now-gitignored tooling footprint), copied in on install and auto-fixed on drift by =sync-language-bundle.sh=, never committed by the project. -- One *text file per language* holding the Makefile fragment (the =coverage-summary= target plus its =coverage= prerequisite) and a block recommending how to set up coverage for that language. The bundle never edits the project's own Makefile. - - *New project:* install copies that file in for the project to own. - - *Existing project:* sync drops the fragment into the project's =inbox/= rather than touching its Makefile — the project adopts it deliberately. - -*Prerequisite caveat.* The summary presumes a coverage harness exists (undercover, coverage.py, nyc, =go cover=). Several bundles may have no =make coverage= yet, so for those this task implies adding the harness first — or the per-language file documents it as a prereq. - -Per-language parser (the script is ~40 lines over each tool's output): -- Elisp: undercover SimpleCov JSON (=.coverage/simplecov.json=) — dotemacs/auto-dim scripts already parse this. -- Go: =go test -coverprofile=cover.out=; parse =cover.out= (simple text), or lean on =go tool cover -func=. -- Python: =coverage json= per-file JSON, or lean on =coverage report=. -- TypeScript/JS: nyc/Istanbul =coverage-final.json= / json-summary. - -Reference (dotemacs): =scripts/coverage-summary.el=, =modules/coverage-core.el=, and the =coverage= / =coverage-summary= Makefile targets. - -Origin: handoff from the .emacs.d session, 2026-05-25. -** DONE [#C] Fan out coverage-summary across all language bundles :feature: -CLOSED: [2026-05-31 Sun] -:PROPERTIES: -:CREATED: [2026-05-31 Sun] -:END: - -Done 2026-05-31: coverage-summary now ships in all four bundles. Elisp pilot, then Python, Go, and TypeScript. Each parses its tool's report (SimpleCov / coverage.py JSON / Go cover.out / Istanbul json-summary), counts on-disk source files absent from the report as 0%, and file-weights the project number. The plumbing proved generic: =install-lang.sh= seeds the project-owned =coverage-makefile.txt= and ships the script into the gitignored =.claude/scripts/=; =make test= discovers ERT (=test-*.el=), pytest (=test_*.py=), =go test= (=*_test.go=), and =node --test= (=*.test.js=) under =languages/*/tests/=, each guarded on its toolchain. TypeScript and Go scripts were dogfooded (Go against a live profile, TS against the CLI); Python and TS weren't run against a live coverage tool (coverage.py / nyc not installed) — proven against faithful fixtures matching each tool's stable schema. - -Remaining follow-ups (not blockers): -- Go is a coverage-only slice — =languages/go/= has no rule file, so =sync-language-bundle.sh= can't fingerprint it and won't sync-maintain the script. Build out the real Go bundle (=go.md= / =go-testing.md= + =CLAUDE.md=) to close that. -- First real adopters of the Python and TS scripts should sanity-check against a live =coverage json= / nyc =coverage-summary.json= run. - -Original notes retained below for the next person. - -The Elisp pilot proved the pattern; Python and Go followed. The plumbing is generic: =install-lang.sh= seeds the fragment, and =make test= now discovers ERT (=test-*.el=), pytest (=test_*.py=), and =go test= (=*_test.go=) under =languages/*/tests/=. TypeScript is the last one. - -- TypeScript/JS: nyc/Istanbul =coverage-final.json= / =coverage-summary.json=. Same kernel: file-weighted project number, on-disk =*.ts=/=*.js= absent from the report counted as 0%. nyc prints its own table, so the script focuses on the missing-file list and the number. Needs a vitest/jest (or =node --test=) discovery path in =make test=, mirroring the go-test block. - -Notes for the next person, from the Python + Go runs: -- Python: parses coverage.py's =files[path].summary.{covered_lines,num_statements}= (stable since coverage 5.x), resolves report paths against the report's parent dir. Proven against a synthetic report, not a live =coverage json= run (coverage.py wasn't installed). Sanity-check against a real one. -- Go: =languages/go/= is a coverage-only slice with no rule file, so =sync-language-bundle.sh= can't fingerprint it (detection keys on a bundle's own =.claude/rules/*.md=). The script is delivered by =make install-lang LANG=go= but is not sync-maintained until the Go bundle gets a real rule file + =CLAUDE.md=. Building out that bundle is the natural companion task. Also: modern =go test ./...= already lists every module package in the profile at 0%, so the missing-file list is usually empty for in-module code; it earns its keep on build-tagged files and dirs outside =./...=. -** DONE [#C] Enumerate implementation tasks in =spec-review.org= Phase 6 :feature:solo: -CLOSED: [2026-05-31 Sun] -:PROPERTIES: -:CREATED: [2026-05-28 Thu] -:LAST_REVIEWED: 2026-05-28 -:END: -Added a Phase 6 step that lifts the spec's =Implementation phases= into a drop-in =todo.org= block (one =[#B]= per phase + a test-surface entry mirroring =Acceptance criteria=); a spec lacking phase decomposition raises that as a finding instead. Added Exit Criterion 6 and a review-history entry. Pure workflow-doc change. - -From pearl handoff 2026-05-28. =spec-review.org= Phase 6 currently says "log deferred work to =todo.org=: v1 implementation = [#B] ... vNext/someday = [#D]." That covers deferred and v1 in passing but doesn't lift the spec's =Implementation phases= section into a drop-in =todo.org= block. - -Proposed addition to Phase 6: a structured step that reads the spec's =Implementation phases= section and produces a =[#B] TODO= entry per phase (subject line, tags, one-line body, pointer back to spec), plus a final entry for the test surface (unit / integration / e2e / manual-verify mirroring the spec's =Acceptance criteria= when present). Emit under a new section "Implementation tasks (drop-in for todo.org)" in the review file. Format follows =todo-format.md= (terse heading, body holds context, tags on heading). - -Three wins: handoff is one paste not a re-read; forces specs to be implementable in pieces (a spec without a phase decomposition fails this step, surfacing the shape problem); closes the loop on =Acceptance criteria= as manual-verify entries. - -If the spec lacks an =Implementation phases= section, the step is the prompt to ask the author to add one before =Ready=. -** DONE [#C] Add =.aiignore= for agent inventory exclusions :chore:solo: -CLOSED: [2026-05-31 Sun] -:PROPERTIES: -:CREATED: [2026-05-28 Thu] -:LAST_REVIEWED: 2026-05-28 -:END: -Shipped a gitignore-syntax =.aiignore= at the rulesets root (deps, build output, language caches, editor cruft, token artifacts, lockfiles-as-agent-read-skip) and documented the convention + defaults + lockfile policy in protocols.org ("Recursive Reads"). Per Craig's scope call (2026-05-31): did NOT wire audit.sh / diff-lang.sh / sync-language-bundle.sh — they do targeted finds over .ai/.claude/bundle dirs, never naive whole-tree walks, so honoring .aiignore there would be dead code. Script-side honoring belongs in a future catalog/inventory tool if one ships; the real consumer today is agent recursive reads (the protocols guidance). - -From the codex enhancement backlog (item #8). Filesystem scans by agents and helper scripts pick up =node_modules=, =__pycache__=, =.pytest_cache=, lockfiles, generated OAuth artifacts, and test caches, even when those are gitignored. Token waste during exploration and skewed project summaries. - -Scope: add a shared =.aiignore= file (or =rulesets-ignore.json= if a more structured format helps) listing default exclusions. Teach the scripts that walk the project (=audit.sh=, =diff-lang.sh=, =sync-language-bundle.sh=, future =catalog= work if any) to honor it. Document in =protocols.org= so agents know to consult it before naive recursive reads. - -Keep the lockfile policy explicit: ignored when a local skill dependency cache, tracked when reproducibility matters. -** DONE [#C] Workflow test harness — drift + integrity tests :feature:solo: -CLOSED: [2026-05-31 Sun] -:PROPERTIES: -:CREATED: [2026-05-28 Thu] -:LAST_REVIEWED: 2026-05-28 -:END: - -From the codex enhancement backlog (item #10). Startup's drift check catches index-vs-directory mismatches but not deeper integrity: a workflow that references a script that's been renamed, a plugin whose parent engine has been deleted, a required section missing from a newly-added workflow. - -Scope: add =scripts/tests/workflow-integrity.bats= (or pytest equivalent) verifying: - -- Every =.org= file in =.ai/workflows/= is either indexed in =INDEX.org= or classifiable as a source plugin under an indexed engine. -- Every indexed workflow file actually exists. -- Every =file:= or shell-command reference inside a workflow to a script under =.ai/scripts/= or =scripts/= resolves to an existing file. -- Every source plugin maps to a parent workflow that exists and is indexed. -- Required sections (Overview, When to Use, the workflow's main phases) are present in each workflow. -- Workflow trigger phrases are unique enough to route — no two workflows claim the same exact trigger. - -Wire into =make test=. Run on the canonical =claude-templates/.ai/workflows/= as the source of truth. -** DONE [#C] Token-tier pilot on largest workflows :feature:solo: -CLOSED: [2026-05-31 Sun] -:PROPERTIES: -:CREATED: [2026-05-28 Thu] -:LAST_REVIEWED: 2026-05-28 -:END: - -Done 2026-05-31: restructured both =startup.org= and =triage-intake.org= into the four-lane structure (Summary / Execution / Reference / History), preserving every existing instruction. triage-intake's reorder ran through a content-preservation guard (the multiset of content lines is unchanged; only heading depth and lane grouping moved). workflow-integrity, sync-check, and the full test suite pass. - -From the codex enhancement backlog (item #5), scope-limited to a pilot rather than a universal template change. - -Apply a standardized section structure to the largest workflow files first — =startup.org= and =triage-intake.org= are the prime candidates. Sections: - -- *Summary* / *Quick Contract* — one-screen purpose and outputs. -- *Execution* — the steps an agent must follow. -- *Reference* — examples, edge cases, rationale, old decisions. -- *History* / *Design Notes* — durable context not needed every run. - -Decision (Craig, 2026-05-31): *approved the four-lane structure (Summary/Execution/Reference/History) and the scope — restructure both =startup.org= and =triage-intake.org= now.* Makes the task solo: apply the lanes to both, preserving every existing instruction (reorganize, don't rewrite), verify the workflows still read coherently and the drift/integrity checks pass. - -Teach startup/routing to read =Summary= only at routing time, then =Execution= only for the selected workflow. Other sections become opt-in. - -After the pilot, evaluate: did the savings show up in real session token use? Did the structure constrain the workflow expressiveness too much? If yes to savings and no to constraint, expand to the next-largest workflows. If not, document why and stop. Don't templatize universally — shorter workflows don't need tiering. -** DONE [#B] Add Signal MCP server (rymurr/signal-mcp) :feature: -CLOSED: [2026-06-02 Tue] -:PROPERTIES: -:CREATED: [2026-05-29 Fri] -:LAST_REVIEWED: 2026-05-29 -:END: -Done 2026-06-02. Registered signal-cli to the Google Voice pager account, added the signal-mcp entry to servers.json, installed via make install-mcp (claude mcp list shows it connected), and documented the signal-cli + GV dependency in mcp/README.org. The GV-registration dependency this task flagged is resolved. Shipped in cfaff12 (page-signal routing) and this commit (README). - -Install [[https://github.com/rymurr/signal-mcp][rymurr/signal-mcp]] so Claude can call =send_message_to_user=, =send_message_to_group=, and =receive_message= natively rather than shelling out to the =page-signal= wrapper. Python, MCP framework, depends on =signal-cli= being configured locally. - -Two-way capability is the differentiator over the CLI: =receive_message= lets the agent listen for replies on the phone, enabling page-as-confirm flows, "should I proceed?" loops over Signal, and structured Q&A across devices. - -*** Dependency - -This depends on the Google Voice account being registered with =signal-cli= first. Sending from Craig's primary number to itself doesn't notify (Signal treats it as one account on linked devices). The MCP server takes =--user-id= at startup, one account per instance, so it has to point at the GV account, with the primary as the per-send recipient. - -If GV registration is still pending when this task runs, block here and surface that. - -*** Implementation - -- =mcp/servers.json= — add =signal-mcp= entry under stdio transport (=command=, =args=, optional =env= for the user-id pointer). -- =mcp/README.org= — document the signal-cli + GV-registration dependency and the user-id pattern. -- =mcp/secrets.env.gpg= — only if the MCP server's user-id needs to be encrypted (probably not; the GV number isn't a secret beyond being personal). -- Verify: =make install-mcp= followed by =make check-mcp= shows =signal-mcp ok=; smoke-test via a Claude tool call sending a message + waiting on =receive_message=. - -*** Why this matters - -=page-signal= is the fast path (a hook, a script, a make recipe can call it without an MCP round-trip). The MCP server is the smart path. When Claude wants to send and then *react to the reply*, the CLI can't do that — only the MCP server can. The two complement each other; this task adds the second half. -** DONE [#C] task-review pass at end of task-audit :chore:solo: -CLOSED: [2026-06-02 Tue] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-02 -:END: -Have the =task-audit= workflow chain a =task-review= pass as its final phase, so a freshly-audited list also gets the lighter staleness/honesty sweep without a second invocation. The legend already notes the division of labor — task-audit assigns and refreshes tags, task-review keeps them honest in passing — so running task-review at the tail of task-audit closes the loop in one pass. Edit =claude-templates/.ai/workflows/task-audit.org= (and the synced mirror) to add the final phase; check whether =open-tasks.org= already invokes task-review so the chaining stays consistent. -** DONE [#C] lint-followups drift — reconcile-on-write + audit dead-link reaping :feature:solo: -CLOSED: [2026-06-02 Tue] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-02 -:END: -From an .emacs.d handoff (2026-06-02): running task-audit against a large todo.org proved several =.ai/lint-followups.org= entries stale (four dead-link flags pointed at docs that now exist; three near-duplicate dated lint runs had piled up). Two fixes, scoped separately. - -1. =lint-org= workflow/script (the real fix): reconcile-on-write. Before appending a run, drop entries whose finding no longer reproduces (dead link now resolves, flagged block/timestamp now clean) and dedupe against the prior run instead of re-logging. Key entries by content/finding rather than line number, so they survive edits to the target file (line numbers go stale immediately). -2. =task-audit.org= (small, narrow): in the Phase C link-hygiene step, when fixing/verifying a =file:= link, also reap any matching dead-link entry in the project's lint-followups file so the two artifacts don't drift. Scope explicitly to dead-link entries — do NOT pull general lint cleanup into the audit; that mixes two concerns and slows the audit. -** DONE [#C] start-work Justify gate: explicit "reasons not to do this" item :feature:quick:solo: -CLOSED: [2026-06-02 Tue] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-02 -:END: -From a work handoff (2026-06-02, surfaced running /start-work on a clean low-risk refactor). The Phase 2 Justify gate has "Downsides" and "Alternatives considered" but no forced devil's-advocate verdict on "should we even do this?" Add a "top reasons not to do this" item: surface the top three objections if any exist; when none rise to a real objection, state one line instead of manufacturing three (e.g. "Nothing material argues against this; no reason to defer or drop it"). Building the case against the work before committing is cheapest exactly at this gate, which is its purpose. Edit the start-work skill's Justify-gate phase. -** DONE [#C] start-work Approach gate: spec-needed check :feature:quick:solo: -CLOSED: [2026-06-02 Tue] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-02 -:END: -From Craig (2026-06-02). The Approach phase should consider whether the work needs a spec when one doesn't already exist. For a big task, this isn't a silent skip — the pre-confirmation summary must explicitly report why a spec isn't needed, so the decision is visible and challengeable at the gate rather than assumed. Small tasks can pass without comment. Edit the start-work skill's Approach-gate phase to add the spec-needed consideration and the big-task report-why-not requirement. -** DONE [#B] Cross-project pattern catalog :spec:thinking: -CLOSED: [2026-06-05 Fri] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-02 -:END: - -From pearl handoffs [[file:docs/design/2026-05-27-pattern-catalog-pearl-notes.org][2026-05-27]] + [[file:docs/design/2026-05-28-pattern-catalog-no-empty-input.org][2026-05-28 follow-up]]. - -Meta-question: how do good patterns travel from project A to project B? Pearl shipped three worked examples worth capturing — one-prompt picker with typed prefix (pearl-pick-source), magit-transient state buttons, and "no empty input as meaningful" (none-sentinel as first candidate). Each is a small principle with wide surface area; without a catalog, every project re-derives them from scratch. - -Open design questions before any implementation: -- Catalog format — structured (one pattern per file with frontmatter) vs free-form doc -- Surfacing mechanism — agent-driven (model spots opportunity) vs human-driven (Craig grep-searches) -- Anti-patterns included or only what worked -- Intake cadence — every time one lands, or batch review -- Home — rulesets repo (agent visibility) vs Linear doc vs per-project cross-links - -Pearl recommends a one-page spec (problem + design + open questions + acceptance) before implementation. Pearl available to come back for spec-review iterations. - -*** 2026-05-28 Thu @ 08:12:55 -0500 Pearl shipped patterns 4-6, filed alongside the prior two -Three more pearl handoffs landed and were filed during this audit. Filed: [[file:docs/design/2026-05-28-pattern-catalog-prompt-labels-and-defaults.org][prompt-labels-and-defaults]] (patterns 4-5: label-matches-behavior, default-most-common with friction-proportional-to-consequence) and [[file:docs/design/2026-05-28-pattern-catalog-prompt-collapse.org][prompt-collapse]] (pattern 6: collapse N orthogonal prompts into one enriched prompt). The catalog's evidence base is now four pearl notes in =docs/design/= covering six patterns plus the synthesizing principle Pearl articulated — "choices on screen, accurately labeled, ordered by what the user most often wants, friction sized to the cost of being wrong." - -*** 2026-06-05 Fri @ 00:47:59 -0500 Spec approved as written — all 5 decisions + 3 open questions accepted -Craig approved the spec ([[file:docs/design/2026-06-02-pattern-catalog-spec.org][2026-06-02-pattern-catalog-spec.org]]) as written. Confirmed: one file per pattern with frontmatter; home =patterns/= in rulesets; thin =claude-rules/patterns.md= pointer, agent-driven; anti-patterns as a per-pattern field; capture-on-landing/promote-on-review intake. Open questions resolved to the spec's leans: directory name =patterns/=; concrete-now, generalize-on-second-use; manual promote flow first, no =/pattern= skill yet. Built as =.org= files with =#+KEYWORD= frontmatter (Craig's call over the initial =.md= draft); the =claude-rules/patterns.md= pointer stays =.md= since the rules layer and the Makefile glob require it. - -*** 2026-06-05 Fri @ 00:47:59 -0500 Built the catalog — 6 seed patterns + pointer + README -Created =patterns/= with the six seed patterns (one-prompt-picker-typed-prefix, transient-state-buttons, no-empty-input-as-meaningful, label-matches-behavior, default-most-common-friction-proportional, collapse-orthogonal-prompts), each carrying the frontmatter contract (name/principle/problem/tags/source/examples) plus Problem/Do/Anti-pattern/Applicability/Related sections. =patterns/README.org= states the root principle, the frontmatter contract, and the intake cadence. =claude-rules/patterns.md= is the agent-facing pointer, auto-installed via the Makefile RULES glob. Sourced from the four pearl notes in =docs/design/=. -** CANCELLED [#C] Try Skill Seekers on a real DeepSat docs-briefing need :chore: -CLOSED: [2026-06-10 Wed] -:PROPERTIES: -:LAST_REVIEWED: 2026-05-28 -:END: - -=Skill Seekers= ([[https://github.com/yusufkaraaslan/Skill_Seekers]]) is a Python -CLI + MCP server that ingests 18 source types (docs sites, PDFs, GitHub -repos, YouTube videos, Confluence, Notion, OpenAPI specs, etc.) and -exports to 20+ AI targets including Claude skills. MIT licensed, 12.9k -stars, active as of 2026-04-12. - -*Evaluated: 2026-04-19 — not adopted for rulesets.* Generates -*reference-style* skills (encyclopedic dumps of scraped source material), -not *operational* skills (opinionated how-we-do-things content). Doesn't -fit the rulesets curation pattern. - -*Next-trigger experiment (this TODO):* the next time a DeepSat task needs -Claude briefed deeply on a specific library, API, or docs site — try: -#+begin_src bash -pip install skill-seekers -skill-seekers create <url> --target claude -#+end_src -Measure output quality vs hand-curated briefing. If usable, consider -installing as a persistent tool. If output is bloated / under-structured, -discard and stick with hand briefing. - -*Candidate first experiments (pick one from an actual need, don't invent):* -- A Django ORM reference skill scoped to the version DeepSat pins -- An OpenAPI-to-skill conversion for a partner-vendor API -- A React hooks reference skill for the frontend team's current patterns -- A specific AWS service's docs (e.g. GovCloud-flavored) - -*Patterns worth borrowing into rulesets even without adopting the tool:* -- Enhancement-via-agent pipeline (scrape raw → LLM pass → structured - SKILL.md). Applicable if we ever build internal-docs-to-skill tooling. -- Multi-target export abstraction (one knowledge extraction → many output - formats). Clean design for any future multi-AI-tool workflow. - -*Concerns to verify on actual use:* -- =LICENSE= has an unfilled =[Your Name/Username]= placeholder (MIT is - unambiguous, but sloppy for a 12k-star project) -- Default branch is =development=, not =main= — pin with care -- Heavy commercialization signals (website at skillseekersweb.com, - Trendshift promo, branded badges) — license might shift later; watch -- Companion =skill-seekers-configs= community repo has only 8 stars - despite main's 12.9k — ecosystem thinner than headline adoption -** DONE [#C] Promote meeting-prep to a template workflow :feature:solo: -CLOSED: [2026-06-10 Wed] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-10 -:END: -meeting-prep lives in the work project's =project-workflows/= and is general-purpose — it builds a per-meeting prep doc — but its body carries project-specific references: =deepsat/assets/= transcript paths, Linear as the tracker, =knowledge.org=. Promoting to =claude-templates= means generalizing those to project-neutral terms (the project's transcript home, the project's tracker), adding it plus its =meeting-prep.pre-wire.org= supporting doc to the =.ai/= mirror and INDEX.org, and a workflow-integrity pass. Once promoted, the daily-prep 5-Day Look-Ahead's conditional "where the project has one" reference can become a direct link. - -Out of the 2026-06-10 daily-prep handoff from the work project. -** DONE [#C] Build Craig's writing voice profile from real corpora :spec: -CLOSED: [2026-06-10 Wed] -:PROPERTIES: -:CREATED: [2026-05-29 Fri] -:LAST_REVIEWED: 2026-05-29 -:END: -Shipped across 2026-05-29 → 2026-06-10. =voice/references/voice-profile.org= is the canonical paired file: Phases 1-2 corpora measured (commit bodies 128k words + email/PR/review registers), all 45 patterns carry entries with basis and history, and every reconciliation delta landed in =voice/SKILL.md= (#13/#33 self-discipline reframing, #7 soft flag, new corpus-derived #43-#45). Extension corpora (Slack, long-form, syntactic fragment detection) deliberately not pursued. - -Build a grounded profile of Craig's actual writing voice by mining the corpora he's produced over time. The =voice/SKILL.md= patterns today are observation-derived (em-dash zero-tolerance, semicolon → period, contractions kept, sentence-fragment rewrite, felt-experience cut, etc.). Some are spot-on; others are intuition. A real corpus pass would tell us which patterns are genuinely Craig's voice and which were guesses, plus surface idioms, sentence structures, and vocabulary the current ruleset misses. - -*** Sources to mine - -- *Email* — sent folders across all three accounts (=gmail=, =dmail/DeepSat=, =cmail/Proton=). Filter to Craig-authored (not forwards or replies-just-quoting). Separate work voice (=dmail=) from personal voice (=gmail=, =cmail=) since they're likely distinct registers. -- *Commit messages* — =git log --author= across his repos. Captures terse-imperative voice. -- *PR descriptions and review comments* — same corpora. More deliberate prose than commits. -- *Org files he authored* — =notes.org=, todo bodies he typed, design docs in =docs/design/=, journal entries. Heavier on first-person voice than emails. -- *Slack/messages* — DeepSat work slack, family group, friends. Casual register. -- *Long-form artifacts* — résumé, proposals, white papers, blog posts (if any). - -Skip session-context files, which are Claude-co-written and would muddy the signal. - -*** Output - -- =voice/references/voice-profile.org= (or =.md=) — the canonical reference doc: - - Vocabulary tendencies (preferred verbs, avoided cliché classes, technical-vs-plain word choice). - - Sentence structures (typical length, conjunction patterns, parenthetical use). - - Punctuation patterns (em-dash actual frequency, semicolon vs period split, contraction rate). - - Register markers (signs of formal vs casual mode, work vs personal). - - Idioms and recurring phrasings. - - "Anti-patterns" — phrasings Craig consistently avoids that show up in AI-generated prose. -- Updated =voice/SKILL.md= patterns grounded in evidence rather than intuition. Patterns that the corpus confirms get strengthened; patterns the corpus contradicts get rewritten or removed. - -Each finding should cite at least two evidence samples from the corpora so the basis for a rule is reviewable. - -*** Approach - -Phase 1 (corpus assembly) — pull the relevant slices: sent-mail dumps, =git log --author --no-merges --pretty=format:'%B'=, =gh pr list --author= bodies, org-file extracts. Strip headers, replies-quoted blocks, signatures. Land in =voice/corpus/= (gitignored if the project's =.ai/= is gitignored, tracked if private repo with private remote). - -Phase 2 (analysis) — pass over the corpus with focused queries: distribution of em-dashes per 1000 words, semicolon count, contraction frequency by register, sentence-length histogram, top-N adjectives/adverbs, etc. Subagent dispatch fits here. - -Phase 3 (draft profile) — write =voice-profile.org= with findings + evidence. Surface contradictions with the current ruleset. - -Phase 4 (reconcile with voice/SKILL.md) — present the deltas to Craig. Each delta is one of: confirm existing rule with evidence, strengthen rule, weaken rule, add new pattern, remove unsupported pattern. Apply approved deltas. - -*** Privacy - -Email and Slack content is private. The corpus must NOT enter any commit unless rulesets stays on the private cjennings.net remote (which it does today). If a future move to a public remote is on the table, the corpus and any direct quotes have to go before that happens. The profile doc itself can stay (it's analysis, not raw content), but cite by pattern not by verbatim quote. - -*** Why this matters - -The voice skill earns its place when Craig sees the rewrite and recognizes it as his own voice rather than a "clean" AI voice that approximates him. Today the skill catches common AI tells (em-dashes, semicolons, the felt-experience tic), which is useful. Corpus-grounding would make it catch the absence of *Craig-specific positive traits* — the phrasings he actually reaches for — not just the AI traits he doesn't. - -Likely improves =/voice personal= output quality on PR bodies, commit messages, and email drafts. Compound interest over the long run. -** DONE [#C] Wide org-table handling — helper/lint/standard :spec: -CLOSED: [2026-06-11 Thu] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-11 -:END: -The org-table standard keeps project-doc tables <=120 cols with multi-line wrapped cells and a rule between rows, but nothing enforces it and hand-wrapping a wide cell into multi-row form is tedious and error-prone. Decide among: (a) a helper that auto-wraps a wide table into multi-row cells at a target width, (b) a lint check that flags tables over the width budget, (c) tighten the written standard with a worked before/after example. Likely some combination. A worked before/after example exists in a work-project prep doc (a 6-col table reformatted by hand to a 4-col multi-row-cell version), to be reproduced generically when this lands. - -Out of a work-project handoff 2026-06-09. - -Resolution 2026-06-11: all three shipped. (c) The standard, generalized from the work project's notes.org local copy, is now claude-rules/org-tables.md (globally loaded; render-width semantics — links measure at their visible label, never split a link) with the worked wrapped-table example. (a) .ai/scripts/wrap-org-table.el reflows tables mechanically: render-width measurement, link-atomic tokenizing, column shrink-to-floor allocation, continuation rows, rules between logical rows; idempotent (rule-delimited continuation groups merge back before re-wrapping); 23 ERT tests. (b) lint-org.el gained an org-table-standard judgment check (width overruns, missing rules; conformant wrapped tables not false-flagged); 5 new ERT tests, 32 total. Verified end-to-end on a demo file: 150-col table reflowed to budget, idempotent second pass, lint clean on the result. -** DONE [#C] SessionStart-on-clear hook for auto-resume :feature: -CLOSED: [2026-06-11 Thu] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-11 -:END: -Add a SessionStart hook (matcher: clear) in settings.json that auto-injects "read .ai/session-context.org and resume if present, else run startup.org". Today /flush prompts the user to /clear and the next session relies on the model re-reading session-context; the hook makes resume automatic on /clear. Keep full startup.org for genuine fresh starts (new day, other machine, been away). Likely lands as claude-templates workflow notes plus the hook in settings.json. - -The checkpoint+resume halves already shipped as /flush. This is the remaining automation piece. Out of a work-project handoff 2026-06-09 (process tooling, belongs in rulesets not the work project). - -Resolution 2026-06-11: the hook itself had already shipped 2026-06-02 (hooks/session-clear-resume.sh + the SessionStart clear entry in the tracked settings.json — this task duplicated it). What was actually broken: make install didn't cover hooks, so the symlink never reached machines that hadn't run make install-hooks by hand, and the hook errored silently on every /clear. Fixed by folding default-hook linking into make install (startup's Phase A.0 now propagates hooks machine-wide), with bats coverage in scripts/tests/install-hooks-link.bats. Both hook branches verified on ratio; the live /clear fire is a one-keystroke manual test. -*** TODO Manual testing and validation :test: -**** /clear mid-session resumes from the anchor -What we're verifying: the SessionStart(clear) hook fires and the fresh context resumes instead of cold-starting. -- In any project session with a live .ai/session-context.org (this rulesets session qualifies), type /clear -- Send any short message (the injected context loads but the model waits for your next keystroke) -Expected: the reply starts with "flushed." on its own line, restates the Active Goal and immediate Next Step, and does NOT run the startup workflow. -** DONE New personal projects are home regroupings — no mechanism needed -CLOSED: [2026-06-12 Fri] -Craig's call (2026-06-12): new personal projects will live in home, and there's no project-creation mechanism to build — he'll be working in home and simply decide to group some things differently. Nothing to do. - -Concurrence, verified: no template doc directs new personal work into ~/projects (first-session.org, install-ai.sh, and the README carry no such guidance; the only ~/projects references are discovery-root scans, which home and work still need). The situation as it stands: a new personal "project" is an area dir plus tasks inside home's existing =.ai/= machinery, no bootstrap step; =first-session.org= remains the bootstrap for standalone code projects in ~/code, unchanged and correct; "launch finances"-style trigger phrases for folded names degrade politely to the no-match candidate list, worth work only if real friction shows up. - -** DONE [#C] Build =/update-skills= skill for keeping forks in sync with upstream :feature: -CLOSED: [2026-06-11 Thu] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-10 -:END: - -The rulesets repo has a growing set of forks (=arch-decide= from -wshobson/agents, =playwright-js= from lackeyjb/playwright-skill, =playwright-py= -from anthropics/skills/webapp-testing). Over time, upstream releases fixes, -new templates, or scope expansions that we'd want to pull in without losing -our local modifications. A skill should handle this deliberately rather than -by manual re-cloning. - -Shipped 2026-06-11: [[file:.claude/commands/update-skills.md][/update-skills command]] + [[file:scripts/update-skills.py][helper script]] (17 bats tests) + three bootstrapped manifests under [[file:upstreams/][upstreams/]]. The first real upstream drift will exercise the interactive per-file/per-hunk flow end to end; the merge mechanics are covered by the test suite. - -*** 2026-06-11 Thu @ 17:05:28 -0500 Specification written as the shipped artifacts -The command doc ([[file:.claude/commands/update-skills.md][update-skills.md]]) carries the user-facing spec: discovery, classification statuses, the per-file confirmation and per-hunk conflict flow, mark-synced semantics, and the missing-baseline fallback. The script's module docstring specifies the manifest schema. Two deviations from the 2026-05-16 design, with reasons: manifests live centrally at =upstreams/<name>/= instead of per-skill =.skill-upstream= dotfile dirs (arch-decide became two flat files in =commands/= and can't carry one — a =files= rename map covers it); baselines were seeded from the 2026-06-11 upstream HEADs since the true fork-point commits are unrecoverable, so pre-existing local modifications classify as =local-only= going forward. - -*** 2026-05-16 Sat @ 01:14:20 -0500 original goals and decisions -**** Design decisions (agreed) - -- *Upstream tracking:* per-fork manifest =.skill-upstream= (YAML or JSON): - - =url= (GitHub URL) - - =ref= (branch or tag) - - =subpath= (path inside the upstream repo when it's a monorepo) - - =last_synced_commit= (updated on successful sync) -- *Local modifications:* 3-way merge. Requires a pristine baseline snapshot of - the upstream-at-time-of-fork. Store under =.skill-upstream/baseline/= or - similar; committed to the rulesets repo so the merge base is reproducible. -- *Apply changes:* skill edits files directly with per-file confirmation. -- *Conflict policy:* per-hunk prompt inside the skill. When a 3-way merge - produces a conflict, the skill walks each conflicting hunk and asks Craig: - keep-local / take-upstream / both / skip. Editor-independent; works on - machines where Emacs isn't available. Fallback when baseline is missing - or corrupt (can't run 3-way merge): write =.local=, =.upstream=, - =.baseline= files side-by-side and surface as manual review. - -**** V1 Scope - -- [ ] Skill at =~/code/rulesets/update-skills/= -- [ ] Discovery: scan sibling skill dirs for =.skill-upstream= manifests -- [ ] Helper script (bash or python) to: - - Clone each upstream at =ref= shallowly into =/tmp/= - - Compare current skill state vs latest upstream vs stored baseline - - Classify each file: =unchanged= / =upstream-only= / =local-only= / =both-changed= - - For =both-changed=: run =git merge-file --stdout <local> <baseline> <upstream>=; - if clean, write result directly; if conflicts, parse the conflict-marker - output and feed each hunk into the per-hunk prompt loop -- [ ] Per-hunk prompt loop: - - Show base / local / upstream side-by-side for each conflicting hunk - - Ask: keep-local / take-upstream / both (concatenate) / skip (leave marker) - - Assemble resolved hunks into the final file content -- [ ] Per-fork summary output with file-level classification table -- [ ] Per-file confirmation flow (yes / no / show-diff) BEFORE per-hunk loop -- [ ] On successful sync: update =last_synced_commit= in the manifest -- [ ] =--dry-run= to preview without writing - -**** V2+ (deferred) - -- [ ] Track upstream *releases* (tags) not just branches, so skill can propose - "upgrade from v1.2 to v1.3" with release notes pulled in -- [ ] Generate patch files as an alternative apply method (for users who prefer - =git apply= / =patch= over in-place edits) -- [ ] Non-interactive mode (=--non-interactive= / CI): skip conflict resolution, - emit side-by-side files for later manual review -- [ ] Auto-run on a schedule via Claude Code background agent -- [ ] Summary of aggregate upstream activity across all forks (which forks have - upstream changes waiting, which don't) -- [ ] Optional editor integration: on machines with Emacs, offer - =M-x smerge-ediff= as an alternate path for users who prefer ediff over - per-hunk prompts - -**** Initial forks to enumerate (for manifest bootstrap) - -- [ ] =arch-decide= → =wshobson/agents= :: =plugins/documentation-generation/skills/architecture-decision-records= :: MIT -- [ ] =playwright-js= → =lackeyjb/playwright-skill= :: =skills/playwright-skill= :: MIT -- [ ] =playwright-py= → =anthropics/skills= :: =skills/webapp-testing= :: Apache-2.0 - -**** Open questions - -- [ ] What happens when upstream *renames* a file we fork? Skill would see - "file gone from upstream, still present locally" — drop, keep, or prompt? -- [ ] What happens when upstream splits into multiple forks (e.g., a plugin - reshuffles its structure)? Probably out of scope for v1; manual migration. -- [ ] Rate-limit / offline mode: if GitHub is unreachable, should skill fail - or degrade gracefully? Likely degrade; print warning per fork. - -** DONE [#C] Monthly session-harvest workflow :feature: -CLOSED: [2026-06-11 Thu] -:PROPERTIES: -:CREATED: [2026-06-11 Thu] -:LAST_REVIEWED: 2026-06-11 -:END: -A monthly pass over recent =.ai/sessions/= summaries across projects proposing promotion candidates: patterns for the catalog, durable facts for the KB, rule refinements, workflow learnings. Sibling cadence to the roam-hygiene timer; a workflow run on schedule, not a standing agent. From the 2026-06-11 insights report's "Canonical-Aware Knowledge & Workflow Curator" — the capture/promote machinery exists (pattern catalog, /codify, KB); this adds the mining cadence. - -Shipped 2026-06-11 as [[file:.ai/workflows/session-harvest.org][session-harvest.org]] (template + INDEX entry): five phases, four promotion lanes, /codify-grade gates + work-confidentiality scrub, =:LAST_HARVEST:= marker in notes.org, and the KB receipt-line metrics readout for the ~2026-07-10 checkpoint. Window filter reads session-filename date prefixes (mtime proved unreliable in a live test). First run due ~2026-07-11. - -** CANCELLED [#B] todo-cleanup.el per-area Open Work / Resolved pairs :feature: -CLOSED: [2026-06-11 Thu] -=--archive-done= assumes exactly one level-1 "Open Work" and one "Resolved" heading per todo.org. Home's consolidated file briefly carried per-area pairs and the pass skipped. Filed from home's 2026-06-11 addendum, then held the same evening when Craig flagged that he expected a single pair. - -Cancelled 2026-06-11: Craig confirmed the decision — one todo queue with a single Open Work / Resolved pair. Home reshapes its consolidated file to that form, and the existing single-pair tooling works unmodified. No code change needed. - -** CANCELLED [#D] todo-cleanup =--archive-done= reports 0 moves while moving subtrees :bug: -CLOSED: [2026-06-12 Fri] -:PROPERTIES: -:CREATED: [2026-06-12 Fri] -:END: -Observed at the 2026-06-12 wrap: the pass relocated closed subtrees from Open Work to Resolved while printing "todo-cleanup --archive-done: 0 subtree(s) moved". - -CANCELLED 2026-06-12 — cannot reproduce. =todo-cleanup.el= is unchanged since the wrap that logged this, and =tc-archived= is incremented inline with each move and read straight in the report, so no move can go uncounted. Running the exact pre-archive state (=b6d286f:todo.org=) through the tool reports the right count (3 moved, all listed). The "0 moved" was a correct second-run report: =open-tasks.org= Phase A runs =--archive-done= after wrap-it-up already archived, so the second pass finds nothing to move and prints 0 next to the first pass's git diff. Not a code defect. -** DONE [#C] Session title hostname-project, no space :feature:quick: -CLOSED: [2026-06-13 Sat] -:PROPERTIES: -:CREATED: [2026-06-13 Sat] -:LAST_REVIEWED: 2026-06-13 -:END: -Routed from the roam global inbox via inbox-zero 2026-06-13. The SessionStart hook (=hooks/session-title.sh=) emitted =<host> <project>= with a space; Craig wanted =<host>-<project>= with a hyphen and no space. Changed the =sessionTitle= join to ="$host-$project"= plus the header comments, and updated the three =session-title-hook.bats= expectations (test-first; 6/6 green). -** DONE [#B] ~/.dotfiles discovery added to ai launcher; bootstrapped on velox -CLOSED: [2026-06-20 Sat] -Craig reported =~/.dotfiles= missing from the launcher picker. Two root causes, both fixed: (1) applied the parked one-liner =maybe_add_candidate "$HOME/.dotfiles"= in =build_candidates()= (=claude-templates/bin/ai=, after the =~/.emacs.d= line); (2) =~/.dotfiles/.ai/= was absent on velox — the 2026-06-16 bootstrap was on another machine and =.ai/= is gitignored, so it never traveled — re-bootstrapped via =install-ai.sh --gitignore ~/.dotfiles=. - -Verified end-to-end: =build_candidates()= now lists =~/.dotfiles= (protocols.org guard passes). sync-check clean (bin/ai is single-canonical, no mirror). By-name launch =ai ~/.dotfiles= already worked via single_mode's marker-only check. working/ai-dotfiles-discovery/ staging dir removed. -** DONE Phase E spec'd — folded into the autonomous-batch spec -CLOSED: [2026-06-16 Tue] -:PROPERTIES: -:CREATED: [2026-06-16 Tue] -:END: -Craig's answer (2026-06-16): spec it. Phase E reconciles with the "fix speedrun" proposal into one feature — see [[file:docs/design/2026-06-16-autonomous-batch-execution-spec.org][the autonomous-batch execution spec]]: a dedicated =work-the-backlog.org= holds the execution loop, inbox-zero keeps its A-D routing, and "fix speedrun" is a thin preset over the same loop. The prepared Phase E change stays under [[file:working/inbox-zero-phase-e/]] as a source. Tracked from here under the "fix speedrun" / autonomous-batch task below, where the spec-review VERIFY lives. -** DONE [#C] Encourage org-roam KB contribution across workflows :feature: -CLOSED: [2026-06-20 Sat] -:PROPERTIES: -:CREATED: [2026-06-16 Tue] -:END: -From the roam global inbox (Craig, 2026-06-16). Encourage agents to keep durable, strategic knowledge in the org-roam KB so it compounds into a cross-project asset: -- Curate a best-practices node (good note-taking + org-roam practices, drawing on established advice) and link it from =startup.org= with encouragement to contribute through the session. -- Add a reminder at the end of =triage-intake.org= and =inbox-zero.org= to store strategic / durable / useful info in the KB. -- Add an early =wrap-it-up.org= prompt asking the agent what it learned worth remembering, then to write it to the KB before proceeding. -Touches four synced template workflows and needs a curation pass on the best-practices content, so it's a design task — not a loop auto-implement. Filed from a =:next:=-tagged roam item; the eligibility tag was dropped on filing because the work needs a design decision (see the loop guardrail). Pairs with [[file:claude-rules/knowledge-base.md]] and the agent-knowledge-base spec. - -*** 2026-06-16 Tue @ 00:53:36 -0500 Spec written for review -Drafted [[file:docs/design/2026-06-16-encourage-kb-contribution-spec.org][the KB-contribution spec]]: four light workflow prompts (startup nudge, triage-intake + inbox-zero end-of-flow reminders, an early wrap-up reflection feeding the existing KB receipt) plus one Craig-authored best-practices node curated from Ahrens / Matuschak / org-roam guidance. Five open sub-decisions filed as decisions-as-TODO in the spec. -*** 2026-06-20 Sat @ 23:29:10 -0400 Spec ratified + built -Craig ratified all five decisions (2026-06-20) and added D6 — a read-side startup consult-nudge surfacing project-relevant KB node titles, the counterpart the original write-only design lacked. Built all of it: the best-practices node (=~/org/roam/agents/20260620232112-agent-kb-best-practices.org=), startup's two Phase C nudges (consult + contribute, gated on the roam clone), the conditional capture reminders in triage-intake + inbox-zero, and the early wrap-up reflection feeding the existing receipt. Commits 76e5559 (workflows + spec) and the related lint checker f6dde4e. Trigger for the build: receipt data showed "promoted 0 / consulted no" across recent sessions. ** DONE [#C] Bash/shell language bundle :feature: CLOSED: [2026-06-23 Tue] :PROPERTIES: @@ -2969,7 +1373,7 @@ CLOSED: [2026-06-23 Tue] :PROPERTIES: :CREATED: [2026-06-23 Tue] :END: -Built per the Ready spec: =process-inbox= + =monitor-inbox= + =inbox-zero= merged into one =inbox.org= engine (shared core + process/monitor/roam modes + the interactive =auto inbox zero= =/loop= mode); =triage-intake= and =no-approvals= stay separate. Callers repointed (INDEX, protocols, startup Phase C, wrap-up Step 3), old files deleted, stale-ref grep clean, workflow-integrity + sync-check + full suite green. The fully-unattended =/schedule= cron pass is vNext (see the =[#D]= task above). [[file:docs/inbox-workflow-consolidation-spec.org][spec]]. +Built per the Ready spec: =process-inbox= + =monitor-inbox= + =inbox-zero= merged into one =inbox.org= engine (shared core + process/monitor/roam modes + the interactive =auto inbox zero= =/loop= mode); =triage-intake= and =no-approvals= stay separate. Callers repointed (INDEX, protocols, startup Phase C, wrap-up Step 3), old files deleted, stale-ref grep clean, workflow-integrity + sync-check + full suite green. The fully-unattended =/schedule= cron pass is vNext (see the =[#D]= task above). [[file:docs/specs/inbox-workflow-consolidation-spec.org][spec]]. ** DONE [#C] inbox-zero: delete empty roam entries on triage :feature: CLOSED: [2026-06-23 Tue] :PROPERTIES: |
