diff options
Diffstat (limited to 'docs/specs')
| -rw-r--r-- | docs/specs/google-keep-emacs-integration-spec.org | 220 | ||||
| -rw-r--r-- | docs/specs/gptel-gh-tool-spec.org | 1065 | ||||
| -rw-r--r-- | docs/specs/gptel-git-tools-magit-backend-spec.org | 196 | ||||
| -rw-r--r-- | docs/specs/gptel-network-tools-spec.org | 411 | ||||
| -rw-r--r-- | docs/specs/mcp-el-gptel-integration-spec-doing.org | 1438 | ||||
| -rw-r--r-- | docs/specs/messenger-unification-spec.org | 138 | ||||
| -rw-r--r-- | docs/specs/theme-studio-completion-preview-spec.org | 211 | ||||
| -rw-r--r-- | docs/specs/theme-studio-nerd-icons-colors-spec.org | 380 | ||||
| -rw-r--r-- | docs/specs/theme-studio-preview-locate-spec.org | 271 |
9 files changed, 1186 insertions, 3144 deletions
diff --git a/docs/specs/google-keep-emacs-integration-spec.org b/docs/specs/google-keep-emacs-integration-spec.org new file mode 100644 index 000000000..376522ab4 --- /dev/null +++ b/docs/specs/google-keep-emacs-integration-spec.org @@ -0,0 +1,220 @@ +#+TITLE: Google Keep <-> Emacs integration — Spec +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-06-24 +#+TODO: TODO | DONE SUPERSEDED CANCELLED + +* Metadata +| Status | v1 implemented (Phases 1-3); live setup pending; v2 next | +|----------+------------------------------------------------------------| +| Owner | Craig | +|----------+------------------------------------------------------------| +| Reviewer | Codex (spec-review) | +|----------+------------------------------------------------------------| +| Related | [[file:../../todo.org][todo.org: google-keep in-editor integration]] | + +* Problem / Context + +Craig keeps quick notes in Google Keep but works almost entirely in Emacs. Today, reading or acting on a Keep note means leaving Emacs for the phone or the web app — a context switch for content that wants to live next to his org files. He wants Keep notes native to Emacs: browsable, searchable, greppable, and editable, with a path to publish the result as a standalone package. + +Two hard constraints shape every choice: + +1. *No official API.* Google Keep has no public API. Every live client (gkeepapi and the tools built on it) reverse-engineers the private mobile endpoint. That layer is fragile: it breaks when Google changes auth, and it needs a Google master token, not a password. +2. *The existing MCP is agent-only.* Craig already has a google-keep MCP with full read/write (create/update/find/labels/archive/list-items). But MCP tools are invoked by the *agent* (Claude), not from elisp — there is no elisp MCP client. So the in-editor integration cannot reuse the MCP as its data path; it needs its own. + +Researched 2026-06-24: there is no in-editor Emacs Google Keep package on MELPA or GitHub — only KeepToOrg, a one-shot Takeout-HTML-to-org importer (unmaintained). So this is a build, not an adopt. + +The closest prior art in this config is =calendar-sync.el=: an existing engine that fetches external data and writes generated org files under =data/=. This spec follows its conventions (generated file under =data/=, atomic temp-then-rename writes, async =make-process= + sentinel, =auth-source= via the house helper) rather than reinventing them. + +* Goals and Non-Goals +** Goals +- Keep notes visible and usable inside Emacs without leaving the editor. +- An org-native representation, so notes are searchable/greppable and reuse org machinery. +- A structure that starts as glue in =.emacs.d= and can be extracted to a publishable package (the VAMP / pearl module-to-package pattern). +- Read-write (create/edit notes from Emacs) as the immediate v2 increment — v1 ships read-only first, but the read path is built to carry write, so v2 is additive. +** Non-Goals +- Full bidirectional offline sync, conflict resolution, or real-time updates. +- Faithful round-tripping of every Keep feature (list checkboxes, collaborators, drawings, images). +- Reusing the MCP from elisp (infeasible — agent-only). +** Scope tiers +- v1: read-only. Fetch Keep notes through the gkeepapi bridge and render them as an org page (each note an org header). A manual refresh command. Auth via auth-source. Graceful degradation when the bridge or credentials are missing. The read path establishes a round-trip-ready data model and a stable per-note identity, because v2 builds on them. +- v2 (immediate follow-on, not deferred): read-write — create a note from a region or capture, and edit a note back to Keep. Reuses v1's bridge, auth, data model, and note-identity; adds the inverse direction and a staleness check. +- Out of scope: list/checkbox fidelity, collaborators, drawings/images, and any background or real-time sync. +- Later: list/checkbox rendering, and extracting the core to a standalone package. + +* Design + +** For the user + +A command (=cj/keep-refresh=) pulls the current Keep notes and writes them into one generated org file (=data/keep.org= under =user-emacs-directory=, a constant in =user-constants.el= alongside =gcal-file= — a generated file, not a hand-authored one in =~/org/=). Each note becomes a top-level org heading: the title (or a derived title) as the heading text, the note body as the entry, and Keep metadata as properties — labels as org tags, plus =:KEEP_ID:=, =:PINNED:=, =:COLOR:=, =:ARCHIVED:=, =:UPDATED:= in a drawer. Pinned notes sort first; each header carries an org link back to the source note (=https://keep.google.com/#NOTE/<id>=). The file is plain org, so it is searchable with the agenda, greppable, and linkable. Opening it is just visiting the file; a keybinding and a dashboard entry make it one keystroke. v1 is read-only: editing the org file does not push back to Keep (a header line says so), so there is no accidental-mutation risk while the integration is young. The =:KEEP_ID:= and =:UPDATED:= on each header are what v2 later uses to target an update and detect a stale local copy. + +** For the implementer + +Three layers, cleanly separable so the core can later be a package: + +1. *Data bridge (Python).* A small script using gkeepapi: authenticate with a stored master token, fetch notes, emit JSON on stdout per the schema below. =id= is the gkeepapi server id (immutable across title/content edits — the safe v2 targeting key, never derived from content), and =updated= is the freshness anchor v2's staleness guard reads. This is the one place the unofficial API lives, isolated so a break is contained and swappable; the same bridge gains a write subcommand in v2. On failure it exits non-zero and prints a single reason token on stderr (=no-gkeepapi=, =no-token=, =auth-failed=, =network=) so the elisp sentinel can surface which piece broke. +2. *Org renderer (elisp core).* Runs the bridge with =make-process= + a sentinel (async, so a slow or hung auth never blocks the interactive thread — the calendar-sync pattern), parses its JSON, and writes the org page via a temp file + atomic rename (never a partial =keep.org= under the user's eyes). Records =:KEEP_ID:=/=:UPDATED:= per note, labels as tags, pinned-first, archived notes tagged =:archived:=. =cj/keep-refresh= is the entry point. The master token is read with =cj/auth-source-secret-value= (system-lib.el, the house helper calendar-sync/slack/transcription use) against a documented =:host= entry. The JSON-to-org transform is the round-trip contract v2 inverts. This core takes its file path, auth host, and keymap injected from the glue layer — it never reaches into =user-constants= or binds keys itself, so it lifts out cleanly as a package. +3. *Glue (=modules/google-keep-config.el=).* Supplies the core its =data/keep.org= path, the =:host=, a Keep keybinding prefix, the dashboard entry, and the =(require 'google-keep-config)= in =init.el=. A load-time =cj/executable-find-or-warn= for =python3= warns early when the interpreter is absent (the gkeepapi-missing and token-missing cases surface at refresh time via the bridge's stderr reason token). + +** Bridge JSON schema (the load-bearing seam) + +Both the v1 transform and the v2 inverse are written against this, so it is pinned here. On success the bridge prints a JSON array of note objects: + +#+begin_src js +[ { "id": "<gkeepapi server id, string>", + "title": "<string, may be empty>", + "text": "<string; newlines are real \n inside the JSON string>", + "labels": ["<label>", ...], // [] when none + "pinned": true|false, + "archived": true|false, + "color": "<keep color name, e.g. \"WHITE\">", + "updated": "<ISO8601 UTC, e.g. 2026-06-25T04:12:00Z>" } ] +#+end_src + +- =updated= is ISO8601 UTC — sortable, parseable by =parse-iso8601-time-string=, and readable in the drawer. v2's staleness compare depends on this exact, comparable form. +- An empty Keep returns =[]= (a valid empty page, not an error). +- =labels= is always an array (empty, never null). =text= newlines are real =\n= inside the JSON string (standard JSON string encoding). +- Failure is not expressed in JSON: the bridge exits non-zero with one stderr reason token (above). The sentinel maps the token to a =display-warning=. + +* Alternatives Considered + +** A — Takeout import (one-shot HTML -> org) +- Good, because no auth, fully offline, dead simple, and zero ongoing breakage risk. +- Bad, because it is not live — Craig must manually export a Takeout archive, so notes are stale the moment they are imported, and it cannot write, so it is useless for the v2 write path. +- Neutral, because it is the right tool for an archival snapshot, the wrong one for daily use. Deferred — not built in v1; a possible later no-auth archive importer if ever wanted. + +** B (chosen for the data path) — gkeepapi via a Python subprocess bridge +- Good, because it is the only path that gives live notes and (in v2) write-back from inside Emacs, with the full note model. +- Bad, because gkeepapi reverse-engineers a private API: it breaks on Google auth changes, needs Python plus a stored master token, and the bridge is glue Craig owns and maintains. +- Neutral, because the fragility is isolated to one script; when it breaks, the renderer degrades to a warning. + +** C — Reuse the google-keep MCP from elisp +- Good, because the MCP already has full read/write and is maintained outside the config. +- Bad, because MCP tools are invoked by the agent, not elisp — there is no elisp MCP client, so this is infeasible for an in-editor feature. +- Neutral, because the MCP stays the right tool for agent-driven Keep access; it just can't power an in-editor integration. + +** D — A local HTTP server wrapping gkeepapi +- Good, because elisp would talk clean HTTP instead of spawning a subprocess each refresh. +- Bad, because a long-running personal server is more infrastructure than a single-user note view warrants. +- Neutral, because it is a heavier variant of B; revisit only if subprocess latency ever bites. + +* Decisions [5/5] + +** DONE Presentation shape: org page of headers +- Decision: We will render notes as one *org page*, each note a top-level header (title + body + metadata properties, labels as tags). Confirmed by Craig 2026-06-25. +- Consequences: easier — reuses org search/agenda/grep/links, nothing bespoke to render, and read-only is safe. Harder — a large Keep collection makes a long file (mitigated by pinned-first sort and org folding). + +** DONE Direction: read-only v1, read-write v2 (immediate) +- Decision: We will ship v1 read-only (fetch + render + refresh), then move directly into v2 read-write (create/edit back to Keep). v2 is the immediate next increment, not a deferred someday. The read path must establish a round-trip-ready data model and a stable per-note identity up front, so v2 is additive rather than a rewrite. Confirmed by Craig 2026-06-25. +- Consequences: easier — v1 ships the visible value fast on a path write reuses wholesale, and a parse bug can't corrupt real Keep data while the model is being proven. Harder — the read path carries design weight it wouldn't if write were truly far off (note-identity and the freshness field have to be right in v1), and the write work — note targeting, staleness/overwrite handling — still has to be built right after. + +** DONE Data path: gkeepapi subprocess bridge (Takeout deferred) +- Decision: We will use a small Python gkeepapi bridge that emits JSON as the sole data path; it powers read in v1 and gains a write subcommand in v2. A failure degrades to a clear warning. The Takeout-import path is deferred (read-only and stale, so it can't serve v2). The MCP is not in the data path. Confirmed by Craig 2026-06-25. +- Consequences: easier — one live path that serves both read and write, fragility isolated to one swappable script. Harder — a Python + gkeepapi dependency and a maintained bridge; the unofficial API can break and needs a master token, with no second read path as a safety net (the warning is the degradation). + +** DONE Auth: master token in authinfo.gpg via auth-source +- Decision: We will store the master token in =authinfo.gpg= and read it via =auth-source= (with =cj/auth-source-secret-value=, the house helper), and document the one-time master-token retrieval. Confirmed by Craig 2026-06-25. +- Consequences: easier — consistent with existing credential handling, no plaintext secret, the daemon's auth-source cache applies. Harder — the one-time token retrieval is a manual setup step, and a revoked/expired token surfaces as an auth failure the renderer must report cleanly. + +** DONE Structure: google-keep-config.el glue + extractable core +- Decision: We will build the integration as =modules/google-keep-config.el= (the =.emacs.d= glue: paths, keys, dashboard, auth wiring) plus a self-contained core (the bridge runner + org renderer) written so the core can later move to a standalone =keep.el=-style package, mirroring the VAMP / pearl migration. The core takes paths/keys/host injected from the glue — it never reaches into =user-constants= or binds keys itself. Confirmed by Craig 2026-06-25. +- Consequences: easier — usable immediately in =.emacs.d=, with a clean seam for later extraction. Harder — the discipline of keeping the core free of =.emacs.d=-specific assumptions from the start. + +* Review findings [8/8] +** DONE Generated-file path + atomic write (round 1, non-blocking) +- Finding: the draft put the file at =~/org/keep.org= (hand-authored org area) with no atomic write, diverging from calendar-sync (generated org under =data/=, temp-then-rename). Risk: a partial write leaves a truncated file the user is viewing, and a generated file in =~/org/= invites edits that get overwritten. +- Resolution: path is =data/keep.org= as a =user-constants.el= constant; the renderer writes via temp file + atomic rename. + +** DONE Async subprocess via make-process + sentinel (round 1, non-blocking) +- Finding: the draft said "run as a subprocess" without committing to async; gkeepapi can hang on auth churn, and a synchronous =call-process= would freeze Emacs. +- Resolution: specified =make-process= + sentinel (the calendar-sync pattern); the warning-on-failure degradation lives in the sentinel. + +** DONE Name the auth-source house helper (round 1, non-blocking) +- Finding: the draft read the token via "auth-source" generically; the house helper is =cj/auth-source-secret-value= (system-lib.el). +- Resolution: spec now calls it out with a documented =:host= entry. + +** DONE Name the missing-tool helper (round 1, non-blocking) +- Finding: the draft promised a display-warning for a missing tool but didn't cite =cj/executable-find-or-warn=, and didn't separate the gkeepapi-missing vs token-missing failure modes. +- Resolution: glue uses =cj/executable-find-or-warn= for =python3=; the bridge emits distinct stderr reason tokens the sentinel surfaces. + +** DONE Pin the bridge JSON schema (round 1, BLOCKING — cleared) +- Finding: Phase 1 listed JSON fields but left the =updated= format, label/newline serialization, empty-result, and error envelope unstated — the round-trip contract both v1 and v2's staleness guard depend on. +- Resolution: added the "Bridge JSON schema" subsection — ISO8601-UTC =updated=, always-array =labels=, standard JSON newline encoding, =[]= for empty, and exit-nonzero + stderr-token for failure. + +** DONE v2 staleness guard re-fetches before write (round 1, non-blocking) +- Finding: v2 must not trust the possibly-stale =:UPDATED:= in the generated file; a phone edit between refresh and edit-back would be clobbered. +- Resolution: Phase 4 states the write path re-fetches the target note's current =updated= from Keep and compares before overwriting, prompting on mismatch. + +** DONE Note KEEP_ID is the immutable server id (round 1, non-blocking) +- Finding: "stable" id asserted but not anchored. +- Resolution: spec states =:KEEP_ID:= is the gkeepapi server id, immutable across edits, never derived from content. + +** DONE Small enhancements: source link + archived handling (round 1, non-blocking, optional) +- Finding: a back-link to the Keep web note and archived-note handling were free given the data already carried. +- Resolution: each header links to =keep.google.com/#NOTE/<id>=; archived notes are tagged =:archived:= (the JSON carries =archived=). + +* Implementation phases + +** Phase 1 — Data bridge +A Python script (gkeepapi) that authenticates with the stored master token and prints the JSON array per the Bridge JSON schema on stdout (=id= = immutable server id, =updated= = ISO8601 UTC). On failure: exit non-zero with one stderr reason token. Standalone and testable from the shell with a fixture; no Emacs yet. Tree stays working (new files only). + +** Phase 2 — Org renderer + refresh +The elisp core + =modules/google-keep-config.el=: run the bridge with =make-process= + a sentinel, parse the JSON, and write =data/keep.org= via temp file + atomic rename (heading + body + property drawer per note, recording =:KEEP_ID:= and =:UPDATED:=, labels as tags, archived tagged =:archived:=, pinned-first, a source back-link per header). =cj/keep-refresh= regenerates it; auth via =cj/auth-source-secret-value=. A header line marks the file a read-only view. The sentinel maps a bridge failure (stderr token) to a =display-warning= naming the missing piece — never errors at load. + +** Phase 3 — Access UX + un-orphan +Keybindings (a Keep prefix), a dashboard entry, a load-time =cj/executable-find-or-warn= for =python3=, and the =(require 'google-keep-config)= in =init.el=. Optional: a dedicated read-only major mode for the buffer. + +** Phase 4 (v2) — read-write +The immediate next increment after v1 lands. A write subcommand on the bridge (gkeepapi create/update), and an elisp path that targets a note by =:KEEP_ID:= and, before overwriting, re-fetches that note's current =updated= from Keep and compares it against the drawer's =:UPDATED:= (a staleness guard — abort/prompt on mismatch so a phone edit since the last refresh isn't clobbered). Entry points to create a note from a region/capture and edit one back. Specced in detail once v1's read model is proven on real notes; listed here so Phases 1-3 don't paint into a read-only corner. + +(Later, not specced here, logged to todo.org: list/checkbox rendering; extract the core to a package.) + +* Acceptance criteria +- [ ] =cj/keep-refresh= fetches the current Keep notes and writes =data/keep.org= via temp + atomic rename, one header per note with title/body/labels/metadata and a source back-link. +- [ ] Pinned notes sort to the top; labels render as org tags; archived notes are tagged =:archived:=; Keep id/color/pinned/archived/updated land in a property drawer. +- [ ] Each note header carries an immutable =:KEEP_ID:= (gkeepapi server id) and an ISO8601 =:UPDATED:= (the v2 targeting + staleness anchors). +- [ ] The bridge runs async (=make-process= + sentinel); a slow/hung auth does not block Emacs. +- [ ] The master token is read via =cj/auth-source-secret-value= from =authinfo.gpg=; no secret is hardcoded. +- [ ] A missing python3 / gkeepapi / token (distinct bridge stderr tokens) produces a clear =display-warning=, not a load error or a crash; an empty Keep yields an empty page, not an error. +- [ ] =make validate-modules= + launch smoke clean with =google-keep-config= required. + +* Readiness dimensions +- Data model & ownership: Keep is the source of truth; =data/keep.org= is a generated read-only view (v1). Each note maps to one org header keyed by the immutable =:KEEP_ID:=; the bridge JSON (pinned schema) is the contract between Python and elisp, and the JSON-to-org transform is the round-trip contract v2 inverts. +- Errors, empty states & failure: auth failure, a broken gkeepapi, missing Python/token, or zero notes each degrade to a warning (named by the bridge's stderr token) or an empty page, never a crash. The unofficial API breaking is expected, not exceptional. +- Security & privacy: the master token lives in =authinfo.gpg= (gpg-encrypted), read via =cj/auth-source-secret-value=; note content lands in a local generated org file under =data/=. No secret in the repo. The token grants broad Google access — documented as a risk. +- Observability: the warning path names which piece is missing (python3 / gkeepapi / token / auth) from the bridge's stderr token. The generated page's header shows the last refresh time. +- Performance & scale: one async subprocess per manual refresh over N notes (tens to low hundreds); trivial. No background polling in v1. +- Reuse & lost opportunities: follows calendar-sync conventions (generated org under =data/=, atomic temp+rename, =make-process= + sentinel) and reuses =cj/auth-source-secret-value=, =cj/executable-find-or-warn=, org rendering, and the dashboard. gkeepapi supplies the API client, so no endpoint code is written here. +- Architecture fit & weak points: three layers (Python bridge / elisp core / glue) with the fragile API isolated in layer 1 and the core kept free of =.emacs.d= specifics for extraction. Weak point: gkeepapi maintenance and Google auth churn — mitigated by isolation, graceful degradation, and a swappable bridge. +- Config surface: the =data/keep.org= path constant, the auth-source =:host= entry, and a keybinding prefix. No tuning knobs in v1. +- Documentation plan: a setup note (one-time master-token retrieval, =pip install gkeepapi=, the authinfo entry) and the module commentary. No user-migration doc (personal config). +- Dev tooling: the bridge is shell-testable with a JSON fixture; the core gets ERT over the JSON-to-org transform (against the pinned schema); =make validate-modules= + launch smoke for the module. +- Rollout, compatibility & rollback: additive — a new module + a require. Rollback = drop the require and delete =data/keep.org=. No existing behavior changes. +- External APIs & deps: gkeepapi (PyPI) and the unofficial Google Keep mobile endpoint it wraps — the single load-bearing external dependency and the central risk. Python 3 on PATH. A Google master token. + +* Risks, Rabbit Holes, and Drawbacks +- The central risk is gkeepapi breaking when Google changes auth or the private endpoint. It has a history of auth churn. With Takeout deferred there is no second read path, so the mitigations are: isolate gkeepapi in the bridge, degrade to a clear warning (named by stderr token), keep the bridge swappable, and never block Emacs load on it. +- Credential risk: the master token grants broad account access. Keep it in =authinfo.gpg=, never the repo; document revocation. +- v2 staleness: because write follows immediately, the edit-back path must re-fetch the note's current =updated= from Keep before writing, not trust the generated file's =:UPDATED:= (which is only as fresh as the last refresh). Carrying =:UPDATED:= correctly in v1 is what makes that guard possible. +- Scope creep: the pull toward full bidirectional sync, list fidelity, and real-time. v1 is a read-only org view and v2 is targeted read-write; richness and sync stay later, gated behind those landing. + +* Review and iteration history +** 2026-06-24 Wed @ 22:40:00 -0400 — Claude — author +- What: initial draft. +- Why: Craig asked to spec the google-keep in-editor integration before building. It spans a fragile external API, an auth-source credential, a Python/elisp bridge, and a module-to-package trajectory, with real trade-offs on shape, direction, and data path — worth pinning before code. +- Artifacts: docs/specs/google-keep-emacs-integration-spec.org; the todo.org google-keep task. +** 2026-06-25 Thu @ 00:40:00 -0400 — Claude — author +- What: resolved all five decisions with Craig (one by one) and folded them in. Shape = org page (popup dropped, a separate later discussion). Direction = read-only v1 then read-write v2 immediately, so the read path now carries a round-trip-ready model + stable note-identity. Data path = gkeepapi sole bridge, Takeout deferred. Auth = authinfo.gpg via auth-source. Structure = extractable core + glue. Added Phase 4 (v2) and the v2-staleness risk. +- Why: decisions resolved, so the spec could go to spec-review. +- Artifacts: this spec. +** 2026-06-25 Thu @ 00:55:00 -0400 — spec-review (round 1) + author response +- What: independent review against the readiness gate and Phase 4 dimensions, reading calendar-sync.el as the analog. Eight findings (one blocking: the bridge JSON schema; seven non-blocking: data/ path + atomic write, async make-process, the two house helpers, v2 re-fetch, KEEP_ID server id, source-link + archived). Craig accepted all; folded in here. Added the Bridge JSON schema subsection, the calendar-sync-convention reuse, and the Review findings section. +- Why: clears the blocker and the readiness gaps; rubric moves to Ready. +- Artifacts: this spec (Review findings [8/8], Bridge JSON schema). +** 2026-06-25 Thu @ 01:05:00 -0400 — spec-review (round 2) — reviewer +- What: fresh readiness pass on the revised spec. Verified the round-1 resolutions are coherent (data/keep.org consistent throughout, the JSON schema is the single referenced contract, the async/sentinel + stderr-token story lines up across Design/Phase 2/acceptance). Confirmed cj/auth-source-secret-value (host &optional user) exists and is buildable as written, and that the data/keep.org constant matches user-constants.el's gcal-file form. No new findings. +- Why: confirms the spec is implementation-ready after two rounds. +- Artifacts: this spec. Verdict: Ready. +** 2026-06-25 Thu @ 01:30:00 -0400 — Claude — implementer +- What: built v1 (Phases 1-3) in a no-approvals speedrun. Phase 1: the gkeepapi bridge (scripts/google-keep/keep-bridge.py) emitting the pinned JSON schema, degrading with a stderr reason token; 12 Python tests. Phase 2: the elisp pure core (parse/tag/heading/render, extractable) + cj/keep-refresh (async make-process + sentinel, atomic temp-then-rename to keep-file, stderr-token to display-warning); 15 ERT tests. Phase 3: keep-file constant in user-constants.el, C-c k prefix (refresh/open), executable warning, required in init.el. validate-modules clean; no-token path degrades without error. +- Why: spec was Ready and Craig green-lit a no-approvals build of v1. +- Artifacts: scripts/google-keep/keep-bridge.py + test; modules/google-keep-config.el + tests/test-google-keep-config.el; user-constants.el; init.el. Remaining: the one-time gkeepapi + master-token setup (a VERIFY under todo.org Manual testing), then v2. diff --git a/docs/specs/gptel-gh-tool-spec.org b/docs/specs/gptel-gh-tool-spec.org deleted file mode 100644 index 80ecc0ab6..000000000 --- a/docs/specs/gptel-gh-tool-spec.org +++ /dev/null @@ -1,1065 +0,0 @@ -:PROPERTIES: -:ID: a124dd0f-1f40-4533-aeb8-595d93e20865 -:STATUS: not-started -:END: -#+TITLE: Design: Wrap the gh CLI as a GPTel tool -#+AUTHOR: Craig Jennings -#+DATE: 2026-05-16 -#+OPTIONS: toc:nil num:nil - -* Status - -Draft (revision 2). Pre-implementation; no code shipped yet. =gh= -v2.92.0 installed and authenticated against both =github.com= (account -=cjennings=) and =deepsat.ghe.com= (account =craig-jennings=). - -Revision 2 incorporates a code-review pass that caught several -incorrect =gh= CLI assumptions and a critical safety gap: with -=gptel-confirm-tool-calls= set to =nil= in =modules/ai-config.el:386=, -the "irreversible-only blocklist" V1 from revision 1 would let the -agent run unconfirmed writes (=pr merge=, =run cancel=, =api -f -body=...=, =release create=, etc.). Revision 2 keeps Craig's intent -of a general-capability tool but applies =:confirm t= to every -classified write, drops the "scan command string for =--hostname= -and =-H=" approach in favor of argv-list builders, and expands the -wrapper set so most agent workflows don't need to fall back to the -general tool. - -* Problem - -GPTel agents can read git history locally (=git_log=, =git_diff=, -=git_status=) but have no access to the GitHub side of the workflow: -PRs, issues, reviews, CI runs, releases, gists, repo metadata. The -=gh= CLI does all of this and is already authenticated against both -hosts the user works with. Wrapping it as a GPTel tool gives the -agent the same GitHub surface the user has. - -The wrapper has to handle four complications the local git tools -don't: - -1. *Two authenticated hosts* -- personal (=github.com=) and work - GHE (=deepsat.ghe.com=) -- with different policies and different - blast radii for destructive operations. -2. *A vast subcommand surface* (~30 top-level subcommands, hundreds - of leaves) that doesn't fit cleanly into one typed schema per - leaf. -3. *Inconsistent flag semantics across subcommands.* =--hostname= - exists only on =gh api= and =gh auth status=; the rest use - =--repo [HOST/]OWNER/REPO=. =-H= means =--head= on =gh pr list= - but =--header= on =gh api=. Naive string-scanning to detect - user intent is wrong. -4. *Global GPTel setting* (=gptel-confirm-tool-calls nil=) means - the agent can call any registered tool without user confirmation - unless that specific tool is registered with =:confirm t=. V1 - safety must come from per-tool flags, not from a session-level - gate. - -* Goals - -1. The agent can invoke the non-interactive subset of =gh= against - either host, gated by a safety policy. -2. High-frequency reads have typed wrappers with sensible defaults - (host/repo resolution, JSON-by-default with minimal fields, - output cap, timeout). Wrappers auto-execute. -3. The active host is resolved from a single context object that - considers (in order): explicit =hostname=/=repo= args, branch - upstream remote, =origin= remote, =cj/gh-default-host=. Every - tool response prefixes the resolved host/repo so cross-host - mistakes are visible. -4. Writes via the general tool execute only after user - confirmation (=:confirm t= on the gptel tool registration). - Unknown classifications fail closed (=:confirm t=). - Irreversible commands hard-block. -5. Interactive commands (=auth login=, =codespace ssh=, =--web=, - =browse=, =pr checkout=, =repo clone=, =config set=, - =alias set=, =extension exec=) are hard-blocked in V1. -6. File exfiltration paths (=api --input=, =api -F key=@file=, - =release upload=, =gist create=) are hard-blocked in V1. -7. Output is capped *during capture*, not after, so a runaway - =--paginate= or =--log= can't fill memory or block Emacs. - Long-output commands have higher timeouts but the same byte - cap. -8. Every call produces a structured debug record (host, repo, cwd, - sanitized argv, classification, policy decision, exit code, - duration, bytes captured, truncation flag, error kind) inspectable - via =cj/gh-tool-last-error=. -9. =cj/gh-doctor= diagnoses missing prerequisites (=gh= - executable, version floor, auth per host, cwd repo detection, - environment overrides) before they fail at runtime. -10. V2 adds true per-host policy (currently uniform across both - hosts), profile-based tool subsets, and bridge helpers between - local git tools and the gh wrappers. - -* Non-Goals - -- *Interactive subcommands.* =gh auth login=, =gh codespace ssh=, - =gh pr checkout= (modifies working tree), =gh repo clone= - (writes outside controlled paths) are not in scope. The user - runs those in a real terminal. -- *Replicating gh extensions.* Core gh only. Extensions can be - invoked from a regular terminal if needed. -- *Streaming long-running output.* =gh run watch=, =gh - --paginate= for unbounded result sets, =gh repo clone= for large - repos. V1 blocks =--paginate= and =watch= verbs; future - versions may add streaming support. -- *Per-host write policies* (work-read-only / personal-full-write). - V1 applies the same write-confirmation policy to both hosts. - V2 makes the policy host-keyed. -- *Conversation-context injection* and *local/remote bridge - helpers* (current branch → PR, changed files → PR comments). - Future enhancements; tracked as a follow-up. -- *Artifact download* (=gh run download=). Writes files; needs - its own confirmation flow. Deferred to V2. - -* Verified gh CLI Contracts - -These were checked against =gh --version 2.92.0= help output before -the spec was revised. Implementation can rely on them. - -** Host / repo selection per subcommand - -- =gh api= and =gh auth status=: use =--hostname HOST= (long form - only; no short form). -- =gh pr {view,list,diff,checks}=, =gh issue {view,list}=, - =gh run {view,list}=, =gh release *=, =gh search *=: use - =--repo [HOST/]OWNER/REPO= (short =-R=). -- =gh repo view= and similar root-level commands: infer from cwd - remote; no =--hostname= or =--repo= flag. -- Environment variables: =GH_HOST= and =GH_REPO= work for commands - that lack explicit flags. - -** Flags with overloaded short forms - -- =gh pr list -H FOO= → =--head FOO= (branch). -- =gh search prs -H FOO= → =--head FOO= (branch). -- =gh api -H KEY:VAL= → =--header KEY:VAL=. - -Substring scanning =-H= as "hostname" is wrong on every command -that uses it. - -** =gh api= method classification - -The help text explicitly says: "adding request parameters will -automatically switch the method to POST". This means: - -- =gh api repos/x/y/issues= → GET (read). -- =gh api repos/x/y/issues -f title=Foo= → POST (write). -- =gh api repos/x/y/issues -F body=@file= → POST (write + - exfiltration). -- =gh api --method GET repos/x/y/issues -f q=x= → GET (read, - explicit override). - -Classifier must inspect for =-f=, =-F=, =--field=, =--raw-field=, -=--input=, and =--method= before defaulting to GET. - -** Wrappers without =--json= support - -- =gh pr diff= produces patch output; no =--json= flag. Wrapper - is text-only. Structured file metadata available via - =gh pr view --json files= (new wrapper =gh_pr_files=). - -** Environment variables for non-interactive use - -All confirmed via =gh help environment=: - -- =GH_PROMPT_DISABLED= -- disable interactive prompting. -- =GH_PAGER= / =PAGER= -- set to =cat= so no pager fires. -- =GH_NO_UPDATE_NOTIFIER= -- silence the "new version" banner. -- =GH_NO_EXTENSION_UPDATE_NOTIFIER= -- same for extensions. -- =GH_SPINNER_DISABLED= -- silence the progress spinner. -- =NO_COLOR= -- disable ANSI color (cleaner LLM context). - -V1 subprocess env always includes all of the above. - -** Authentication - -#+begin_example -✓ Logged in to github.com account cjennings (keyring) -✓ Logged in to deepsat.ghe.com account craig-jennings (keyring) -#+end_example - -Both stored in the system keyring. =gh= reads the keyring per -call; the wrapper does not handle tokens. Expired auth surfaces -as exit code 4 with a clear stderr message; recovery is interactive -(=gh auth login --hostname HOST= in a terminal). - -* Current State - -** =modules/ai-config.el= - -- =gptel-confirm-tool-calls nil= at line 386. Per-tool - =:confirm t= is the only confirmation mechanism in V1. -- =cj/gptel-load-local-tools= (lines 71-96) loads tools from - =gptel-tools/=. Add new gh feature names to - =cj/gptel-local-tool-features=. - -** =gptel-tools/= - -Ten tools today. =git_log.el= is the closest analogue: validates -input, runs subprocess with =process-file=, caps output, signals -clear errors. The gh wrappers follow the same shape with shared -helpers in =gh-common.el=. - -* Design - -** File layout - -Two files plus per-wrapper registrations. Wrappers are -deliberately small (argv builders + JSON-field defaults) so they -all live in one file: - -- =gptel-tools/gh-common.el= -- shared helpers: host/repo context, - argv runner, subprocess env, classifier, blocklist, redaction, - debug record, last-error buffer. No tool registrations. -- =gptel-tools/gh.el= -- the general tool + every wrapper - registration. Pulls in =gh-common= via =require=. - -This matches Craig's preference for low loader surface (per the -MCP revision discussion). If =gh.el= grows past ~600 lines we -split per-resource later (=gh-pr.el=, =gh-issue.el=, etc.). - -** Host / repo resolution - -A single helper returns a structured context object used by every -tool entry point: - -#+begin_src emacs-lisp -(defun cj/gh--resolve-context (cwd hostname repo) - "Return a context plist for a gh invocation. -Resolution order: -1. Explicit REPO argument (`[HOST/]OWNER/REPO'). If it contains - `HOST/', split off the host. -2. Explicit HOSTNAME argument. -3. CWD's branch upstream remote URL (`git rev-parse --abbrev-ref - @{u}' then `git remote get-url REMOTE'). -4. CWD's `origin' remote URL. -5. `cj/gh-default-host'. - -Returns: - (:host HOST :owner OWNER :repo REPO :source SYM) -where SOURCE is one of: explicit-repo, explicit-host, upstream, -origin, default." - ...) -#+end_src - -URL parsing handles all three forms produced by git: - -- =git@HOST:OWNER/REPO.git= -- =ssh://git@HOST/OWNER/REPO.git= -- =https://HOST/OWNER/REPO.git= - -Returns =nil host= if the URL doesn't match a known host; the -caller uses =cj/gh-default-host= in that case. - -#+begin_src emacs-lisp -(defcustom cj/gh-default-host "github.com" - "Host used when no other resolution path yields one." - :type 'string - :group 'cj) - -(defcustom cj/gh-known-hosts '("github.com" "deepsat.ghe.com") - "Hosts the gh CLI is authenticated against." - :type '(repeat string) - :group 'cj) -#+end_src - -Context appears in every tool response as a one-line prefix: - -#+begin_example -[gh github.com/cjennings/dotemacs read ok 12.4KB] ... -[gh deepsat.ghe.com/org/repo write confirmed 0.8KB] ... -[gh github.com api unknown blocked] ... -#+end_example - -This makes cross-host accidents visible. - -** Argv-list builders, never command strings - -Every tool builds an explicit argv list and hands it to a shared -runner. Wrappers expose typed parameters that map directly into -argv positions: - -#+begin_src emacs-lisp -(defun cj/gh-pr-view--build-argv (context number include-comments fields) - "Return an argv list for `gh pr view NUMBER ...'." - (let ((repo-arg (format "%s/%s/%s" - (plist-get context :host) - (plist-get context :owner) - (plist-get context :repo)))) - (append (list "pr" "view" (number-to-string number) - "--repo" repo-arg) - (when include-comments '("--comments")) - (when fields (list "--json" fields))))) -#+end_src - -The general tool takes an =args= parameter typed as array-of-string -(not a shell string) so the agent constructs argv directly: - -#+begin_src emacs-lisp -;; agent passes: ["pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo"] -;; runner runs: gh pr view 171 --repo deepsat.ghe.com/org/repo -#+end_src - -If the agent accidentally includes leading =gh= in args[0], the -runner strips it and logs the normalization. - -** Subprocess contract - -One runner for everything: - -#+begin_src emacs-lisp -(defun cj/gh--run (argv &key timeout context) - "Run gh with ARGV (a list of strings). -Returns a plist: - (:exit-code N :stdout STR :stderr STR :truncated BOOL :duration-ms N - :argv ARGV :context CONTEXT) -Enforces TIMEOUT (default `cj/gh--default-timeout'). -Captures output in-flight, killing the process at -`cj/gh--max-bytes' to prevent runaway memory use." - ...) -#+end_src - -Contract: - -- *Env vars set every call:* =GH_PROMPT_DISABLED=1=, =GH_PAGER=cat=, - =PAGER=cat=, =NO_COLOR=1=, =GH_NO_UPDATE_NOTIFIER=1=, - =GH_NO_EXTENSION_UPDATE_NOTIFIER=1=, =GH_SPINNER_DISABLED=1=. -- *Timeout:* default 20 s for reads, 60 s for diffs/logs/search. - Per-tool override allowed but capped at 120 s. Timeout kills - the process and returns =:error-kind 'timeout=. -- *Output cap during capture:* a process filter accumulates bytes - up to =cj/gh--max-bytes= (64 KB). At the cap, the filter sets - =truncated=, ignores further output, and sends SIGTERM after a - short grace. Truncation marker appended to returned stdout. -- *TRAMP rejection:* if =(file-remote-p default-directory)= is - non-nil, return =:error-kind 'remote-cwd= without invoking gh. -- *Executable check:* if =(executable-find cj/gh--executable)= is - nil, return =:error-kind 'no-executable= with install - instructions. -- *Version check:* the first call per session verifies - =gh --version= meets =cj/gh--min-version= (default "2.50.0"); - cached for the session. - -** Read / write / destructive classifier - -Used to apply =:confirm t= and the irreversible blocklist: - -#+begin_src emacs-lisp -(defconst cj/gh--read-verbs - '("view" "list" "status" "search" "diff" "checks" "describe" - "show" "logs" "auth-status")) - -(defconst cj/gh--write-verbs - '("create" "edit" "merge" "close" "reopen" "comment" "review" - "upload" "set" "add" "remove" "rerun" "cancel" "delete" - "fork" "archive" "unarchive" "lock" "unlock" "pin" "unpin" - "ready" "draft" "rename" "transfer" "approve" "label" "assign")) - -(defconst cj/gh--blocked-verbs - '(;; interactive / opens UI - "login" "logout" "checkout" "clone" "ssh" "code" "edit-prompt" - ;; modifies user config - "alias" "config" "extension" - ;; opens browser - "browse")) - -(defconst cj/gh--blocked-flags - '("--web" ; many commands - "--paginate" ; can produce unbounded output - "--input" ; gh api file upload - "--editor")) ; opens editor - -(defconst cj/gh--irreversible-patterns - '("\\`repo delete\\b" - "\\`release delete\\b" - "\\`secret delete\\b" - "\\`ssh-key delete\\b" - "\\`gpg-key delete\\b" - "\\`org delete\\b" - "\\`project delete\\b" - "\\`variable delete\\b" - "\\`ruleset delete\\b" - "\\`label delete\\b.*--yes")) -#+end_src - -Classifier: - -#+begin_src emacs-lisp -(defun cj/gh--classify (argv) - "Return one of: read, write, destructive, blocked, unknown." - (let* ((stripped (cj/gh--strip-flags argv)) - (resource (car stripped)) - (verb (cadr stripped))) - (cond - ;; Hard blocks first. - ((cj/gh--has-blocked-verb-p argv) 'blocked) - ((cj/gh--has-blocked-flag-p argv) 'blocked) - ((cj/gh--has-file-arg-p argv) 'blocked) ; -F key=@file - ((cj/gh--matches-irreversible-p argv) 'destructive) - ;; gh api is special. - ((string= resource "api") (cj/gh--classify-api argv)) - ;; Verb match. - ((member verb cj/gh--read-verbs) 'read) - ((member verb cj/gh--write-verbs) 'write) - (t 'unknown)))) - -(defun cj/gh--classify-api (argv) - "Classify a `gh api ...' invocation. -Reads: explicit method GET/HEAD with no -f/-F/--input. -Writes: any -f/-F/--field/--raw-field/--input, OR explicit -non-GET/HEAD method. -Default (no method, no field): read (matches gh's GET default)." - (let* ((explicit-method (or (cadr (member "-X" argv)) - (cadr (member "--method" argv)))) - (has-field (cl-some - (lambda (f) (cl-some (lambda (a) (string-prefix-p f a)) - argv)) - '("-f" "--raw-field" "-F" "--field"))) - (has-input (member "--input" argv))) - (cond - (has-input 'blocked) ; file exfiltration - ((and explicit-method - (not (member (upcase explicit-method) '("GET" "HEAD")))) - 'write) - (has-field 'write) ; -f/-F auto-promotes to POST - ((and explicit-method - (member (upcase explicit-method) '("GET" "HEAD"))) - 'read) - (t 'read)))) -#+end_src - -Classifier-driven policy table: - -| Classification | Policy | -|----------------+--------| -| =read= | auto-execute | -| =write= | =:confirm t= on registration; agent's call shows in confirm prompt | -| =destructive= | hard-block; return =:error-kind 'irreversible-blocked= | -| =blocked= | hard-block; return =:error-kind 'policy-blocked= with reason | -| =unknown= | =:confirm t= (fail closed) | - -** Safety policy - -V1 uniform policy applied to both hosts: - -#+begin_src emacs-lisp -(defcustom cj/gh-policy - '((read . auto) - (write . confirm) - (destructive . block) - (blocked . block) - (unknown . confirm)) - "Per-classification policy. V1 applies uniformly to both hosts." - :type '(alist :key-type symbol :value-type symbol) - :group 'cj) -#+end_src - -Since GPTel's =:confirm t= flag is per-tool-registration (not -per-call), the general tool is registered with =:confirm t= -always. The wrappers are registered per their classification: - -| Tool | Registered with | -|------+-----------------| -| Read wrappers (=gh_pr_view=, etc.) | =:confirm nil= | -| =gh= general tool | =:confirm t= | - -The general tool then applies the policy table at invocation -time: reads execute without further prompting (already past -GPTel's confirm because :confirm t fires once); writes show -detail before invocation; destructive/blocked never reach gh. - -** General tool: =gh= - -Single tool registered with =:confirm t= covering everything the -wrappers don't. Schema: - -| Arg | Type | Required | Purpose | -|-----+------+----------+---------| -| =args= | array of string | yes | argv list, e.g. =["pr", "view", "171", "--repo", "deepsat.ghe.com/org/repo"]= | -| =hostname= | string | no | Override host for commands that accept =--hostname= (=api=, =auth status=); ignored otherwise | -| =repo= | string | no | =[HOST/]OWNER/REPO= for repo-scoped commands; if HOST is present in repo, hostname arg is overridden | -| =cwd= | string | no | Working directory; defaults to current buffer; must be under =$HOME=, must not be TRAMP | -| =timeout= | integer | no | Seconds before kill; default 20, max 120 | - -Description (registered) explicitly says: - -#+begin_example -Use this only when no task-specific gh_* tool fits. Prefer -gh_pr_view, gh_pr_list, gh_pr_checks, gh_issue_view, gh_issue_list, -gh_run_view, gh_run_list, gh_run_logs_failed, gh_repo_view, -gh_search_prs, gh_search_issues, gh_api_get. - -Writes (create/edit/merge/etc.) require user confirmation. -Destructive (repo/release/secret delete) are hard-blocked. -Interactive commands (auth login, codespace ssh, --web, browse) -are hard-blocked. File uploads (api --input, -F @file, release -upload, gist create) are hard-blocked. -#+end_example - -** Wrapper inventory - -Twelve wrappers grouped by resource. Each has typed args, JSON -field defaults, output truncation, timeout override, and a -description that names its scope. - -| Tool | gh command | Defaults | Timeout | -|------+------------+----------+---------| -| =gh_repo_view= | =repo view --json= | JSON fields: =name,nameWithOwner,description,defaultBranchRef,url,visibility= | 20s | -| =gh_pr_view= | =pr view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | -| =gh_pr_list= | =pr list --json= | JSON fields: =number,title,state,author,createdAt,headRefName=; default =--limit 30= (capped at 100) | 20s | -| =gh_pr_diff= | =pr diff N --color never= | text only; capped at 64 KB | 60s | -| =gh_pr_checks= | =pr checks N --json= | JSON fields: =name,status,conclusion,startedAt,completedAt,link= | 20s | -| =gh_pr_files= | =pr view N --json files= | JSON fields: =files= (path, additions, deletions, mode) | 20s | -| =gh_pr_current= | =pr view --json= (no number — auto-detect) | Same as =gh_pr_view= | 20s | -| =gh_issue_view= | =issue view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | -| =gh_issue_list= | =issue list --json= | JSON fields: =number,title,state,author,createdAt,labels=; default =--limit 30= | 20s | -| =gh_run_view= | =run view RUN-ID --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch,event,url= | 20s | -| =gh_run_list= | =run list --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch=; default =--limit 20= | 20s | -| =gh_run_logs_failed= | =run view RUN-ID --log-failed= | text only; capped at 64 KB | 60s | -| =gh_search_prs= | =search prs --json= | JSON fields: =number,title,state,author,repository,url=; default =--limit 30= (capped at 100) | 30s | -| =gh_search_issues= | =search issues --json= | Same as PR search | 30s | -| =gh_api_get= | =api ENDPOINT --method GET= | text/JSON pass-through; rejects fields/input args | 30s | - -Common args for every wrapper (unless noted): - -| Arg | Type | Required | Purpose | -|-----+------+----------+---------| -| =repo= | string | no | =[HOST/]OWNER/REPO=; resolved from context otherwise | -| =hostname= | string | no | Override host for context resolution | -| =limit= | integer | no | =--limit= for list/search wrappers; clamped to per-wrapper max | - -The =gh_api_get= wrapper *explicitly* rejects =-f=, =-F=, -=--field=, =--raw-field=, =--input=, =--method=, and any method -override. It only accepts =ENDPOINT= and optional =-H= headers -that don't carry secrets (the runner redacts =Authorization:= and -similar regardless). Writes via API go through the general tool -with confirmation. - -** JSON field defaults - -Per-wrapper defaults are minimal -- enough for the agent to decide -whether to drill in, not so much that one call fills the context. - -For =gh_pr_view= specifically: - -- *Default* (no =fields= override): the small list above (number, - title, state, author, createdAt, url, body-truncated-to-2KB). -- *Override*: agent passes =fields= as a comma-separated string; - wrapper validates against a per-resource allowlist (so the agent - can't request =reviews,comments,files= in one call to bypass the - cap). -- *Include flags*: =include-body t/nil=, =include-comments t/nil=, - =include-reviews t/nil= as boolean args. Each adds the - corresponding JSON field; agent opts in only when needed. - -** Output truncation - -Process filter pattern, not post-hoc cap: - -#+begin_src emacs-lisp -(defun cj/gh--make-filter (state-var) - "Return a process filter that accumulates into STATE-VAR's :stdout, -stops collecting after `cj/gh--max-bytes', sets :truncated, and -sends SIGTERM to the process." - (lambda (proc output) - (let* ((state (symbol-value state-var)) - (current (plist-get state :stdout)) - (current-len (length current)) - (remaining (- cj/gh--max-bytes current-len))) - (cond - ((<= remaining 0) nil) ; already at cap - ((<= (length output) remaining) - (plist-put state :stdout (concat current output))) - (t - (plist-put state :stdout - (concat current (substring output 0 remaining))) - (plist-put state :truncated t) - (ignore-errors (delete-process proc))))))) -#+end_src - -Truncation marker appended before return: - -#+begin_example -[truncated at 64KB; use --limit, narrower fields, or a specific -wrapper to reduce output] -#+end_example - -** Error classification + debug record - -Every call returns (and =cj/gh-tool-last-error= caches) a debug -record: - -#+begin_src emacs-lisp -(defun cj/gh--debug-record (argv context exit-code stdout stderr - duration-ms truncated) - (list :host (plist-get context :host) - :repo (cj/gh--repo-arg context) - :cwd default-directory - :argv (cj/gh--redact-argv argv) - :classification (cj/gh--classify argv) - :policy (cj/gh--policy-decision argv) - :exit-code exit-code - :duration-ms duration-ms - :bytes-captured (length stdout) - :truncated truncated - :error-kind (cj/gh--error-kind exit-code stderr))) -#+end_src - -=:error-kind= mapping: - -| Condition | =:error-kind= | Returned message | -|-----------+---------------+------------------| -| Exit 4 + "authentication required" | =auth= | "Run =gh auth login --hostname HOST= in a terminal." | -| Process killed by timeout timer | =timeout= | "Command exceeded N seconds; narrow the query or use a more specific wrapper." | -| Policy block | =policy-blocked= | "Blocked by V1 policy: REASON." | -| Irreversible match | =irreversible-blocked= | "Hard-blocked irreversible command: COMMAND." | -| Truncated output | =truncated= | "Output truncated at 64KB; reduce scope." | -| TRAMP cwd | =remote-cwd= | "Cannot run gh from remote directory: CWD." | -| Missing executable | =no-executable= | "gh not found at CJ/GH--EXECUTABLE; install via 'pacman -S github-cli' (or equivalent)." | -| Other non-zero exit | =gh-exit= | (raw stderr, redacted) | - -Each error includes a sanitized reproduce line: - -#+begin_example -Reproduce: GH_PROMPT_DISABLED=1 GH_PAGER=cat gh pr view 171 --repo HOST/OWNER/REPO -#+end_example - -(Secrets, body text, file paths redacted via =cj/gh--redact-argv=.) - -** Audit log (V1, opt-out) - -Every call appends one line to -=~/.emacs.d/data/gh-tool-log/YYYY-MM-DD.log=: - -#+begin_example -2026-05-16T14:23:45-0500 host=github.com repo=cjennings/dotemacs class=read policy=auto exit=0 duration=128ms bytes=4321 -2026-05-16T14:24:02-0500 host=deepsat.ghe.com repo=org/repo class=write policy=confirm exit=0 duration=412ms bytes=88 -#+end_example - -Metadata only, not output bodies. Defcustom -=cj/gh-tool-audit-log-enabled= (default =t=). Daily rotation -implicit (one file per day). Cleanup manual. - -** Secrets redaction - -=cj/gh--redact-argv= masks: - -- Anything after =--token=, =--secret=, =--password= flags. -- Authorization headers (=-H "Authorization: ..."=). -- =--figma-api-key=KEY= (in case general tool spawns figma-mcp - somehow). -- Bearer tokens in URLs (=?token=...=). -- Values for =-f=/=-F= keys named like =body=, =text=, - =description= (private content; metadata still logged). - -Applied to: -- All stderr returned to the agent. -- All audit-log lines. -- All debug records. -- The reproduce line on error. - -* Commands & UX - -** =cj/gh-doctor= - -Diagnostic command. No side effects. Checks: - -- =gh= executable found at =cj/gh--executable=. -- =gh --version= meets =cj/gh--min-version=. -- =gh auth status= for each host in =cj/gh-known-hosts=. -- Current buffer's cwd repo detection: resolves to which host/repo? -- Environment overrides effective (=GH_PROMPT_DISABLED= etc. would - be set by the runner). -- Active account per host. -- Warnings if any block-list-relevant env is set externally - (e.g. user already has =GH_PAGER= set to something that pages). - -Output: a buffer with PASS / FAIL / WARN per check + recovery -actions for failures. - -** =cj/gh-tool-last-error= - -Opens a buffer showing the last call's debug record: - -#+begin_example -Host: deepsat.ghe.com -Repo: org/repo -Source: upstream -CWD: ~/projects/work/foo -Argv: ("pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo") -Classification: read -Policy: auto -Exit code: 0 -Duration: 412 ms -Bytes captured: 4321 -Truncated: no -Error kind: none - -Reproduce: - GH_PROMPT_DISABLED=1 GH_PAGER=cat NO_COLOR=1 \ - gh pr view 171 --repo deepsat.ghe.com/org/repo -#+end_example - -** Tool response header - -Every tool result begins with a one-line header so cross-host / -policy decisions are visible: - -#+begin_example -[gh github.com/cjennings/dotemacs read ok 4.3KB] -{ ... } -[gh deepsat.ghe.com/org/repo write confirmed 0.2KB] -{ ... } -[gh github.com/cjennings/dotemacs api blocked policy-blocked] -Error: Hard-blocked file-upload path: --input file. -Reproduce: gh api repos/cjennings/dotemacs/contents/foo --input file -#+end_example - -* Implementation Plan - -Eight phases. Each ends with green ERT tests + manual smoke -before the next. - -** Phase 1 -- Common helpers + context resolver - -=gh-common.el=: =cj/gh--executable=, =cj/gh--available-p=, -=cj/gh--version= (cached), =cj/gh--validate-cwd= (HOME + non-TRAMP), -=cj/gh--parse-remote-url=, =cj/gh--resolve-context=, -=cj/gh--redact-argv=. No subprocess execution yet. - -Tests cover all helpers against fixture remote URLs and synthetic -git directories. No real gh calls. - -** Phase 2 -- Runner with subprocess env + timeout + in-flight cap - -=cj/gh--run=, the process filter, the timer kill path, env-var -setup. Tests stub =make-process= to simulate output / exit / hang -/ truncation paths. - -Acceptance: with stub configured to produce 100 KB of output, -returned stdout is exactly 64 KB plus the truncation marker, and -the process gets SIGTERM. - -** Phase 3 -- Classifier + blocklist + policy - -=cj/gh--classify=, =cj/gh--classify-api=, blocklist constants, -=cj/gh--policy-decision=. Tests for every blocklist pattern -(verbs + flags + file-arg paths), API edge cases (=-f= promotes -to POST, =--method GET -f x=y= stays GET, etc.), and the -unknown-fails-closed contract. - -** Phase 4 -- Read wrappers (5 first) - -=gh_repo_view=, =gh_pr_view=, =gh_pr_list=, =gh_pr_diff=, -=gh_issue_view=. Each is a thin schema + argv builder + delegate -to =cj/gh--run=. Tests verify argv shape for typical args. - -Manual smoke against both hosts. First real gh calls. - -** Phase 5 -- Remaining wrappers + JSON defaults - -Eight more wrappers (=gh_pr_checks=, =gh_pr_files=, -=gh_pr_current=, =gh_issue_list=, =gh_run_view=, =gh_run_list=, -=gh_run_logs_failed=, =gh_search_prs=, =gh_search_issues=, -=gh_api_get=). JSON-field defaults per wrapper. Tests for the -=gh_api_get= flag-rejection contract. - -** Phase 6 -- General tool - -=gh.el= general tool registration with =:confirm t=, blocklist -enforcement, policy decision applied before invocation. Tests -verify confirmation gate (stub gptel's confirm flow), blocked -commands never reach =cj/gh--run=, destructive commands return -=:irreversible-blocked= without prompting. - -** Phase 7 -- UX: doctor, last-error, response header - -=cj/gh-doctor=, =cj/gh-tool-last-error=, response header -formatting. Audit log writer. Defcustom for log enable. - -** Phase 8 -- Loader wiring + integration - -Add the 16 feature names to =cj/gptel-local-tool-features= (one -per wrapper + the general tool + the helpers feature). Verify -they land in =gptel-tools=. - -* Test Plan - -Target: 55-70 ERT tests across four files. No real subprocesses, -no real network, no real =~/.claude.json= (gh tools don't use it, -but the no-real-process rule applies uniformly). - -** =tests/test-gh-common.el= -- pure helpers (~25 tests) - -- =cj/gh--parse-remote-url=: ssh-scp, ssh-url, https, with/without - =.git=, with/without trailing slash, unknown host returns =:host - nil=, =github.com.evil.example= does NOT match =github.com=. -- =cj/gh--resolve-context=: explicit repo wins; explicit hostname - wins over remote; upstream beats origin; origin beats default; - default fires when no git. -- =cj/gh--validate-cwd=: HOME-rooted ok; outside HOME errors; - TRAMP errors; non-directory errors. -- =cj/gh--redact-argv=: =--token=, =-H "Authorization: ..."=, - =--figma-api-key=, =-f body=...= → body redacted; sentinel - =REDACTED_TEST_SECRET= never appears in any output of any - helper. -- =cj/gh--available-p=: nil when =executable-find= fails; t - otherwise. -- =cj/gh--version=: caches per session; floor check rejects - too-old. - -** =tests/test-gh-runner.el= -- runner contract (~15 tests) - -Stub =make-process=: -- Normal exit 0 with short output: returned verbatim, no - truncation flag. -- Long output (100 KB stub): truncated at 64 KB exactly, - truncation marker present, =:truncated t=. -- Process hangs past timeout: timer fires, SIGTERM sent, returns - =:error-kind 'timeout=. -- Exit 4 + auth stderr: returns =:error-kind 'auth= with recovery - message. -- TRAMP cwd: never invokes =make-process=, returns =:error-kind - 'remote-cwd=. -- Missing executable: returns =:error-kind 'no-executable=. -- Environment: =process-environment= includes all six required - vars before =make-process= call. -- Argv with leading "gh" stripped + logged. - -** =tests/test-gh-classifier.el= -- policy logic (~20 tests) - -- Every read verb classifies as read. -- Every write verb classifies as write. -- Every destructive pattern matches. -- Every blocked verb (=login=, =browse=, =clone=, etc.) classifies - as blocked. -- Every blocked flag (=--web=, =--paginate=, =--input=) classifies - as blocked. -- File-upload (=-F key=@/path=) classifies as blocked. -- =gh api= GET (no fields): read. -- =gh api= GET (=-f x=y=): write (auto-POST per gh's rules). -- =gh api --method GET -f q=x=: read (explicit override). -- =gh api -X DELETE=: destructive. -- =gh api -X PATCH=: write. -- =gh api --input file=: blocked. -- Unknown verb (=gh frobnicate=): unknown → confirm. -- =-H= as branch (=gh pr list -H feature=) doesn't trigger host - treatment. -- =-H= as header (=gh api -H Accept:json=) doesn't trigger host - treatment. -- Policy decision: read → auto; write → confirm; destructive → - block; blocked → block; unknown → confirm. - -** =tests/test-gh-wrappers.el= -- per-wrapper builders + schemas (~15 tests) - -- Every wrapper's schema is valid (correct =:name=, =:type=, - =:description=, =:args= shape). -- =gh_pr_view= argv with number 171 and repo "host/o/r" produces - =("pr" "view" "171" "--repo" "host/o/r" "--json" "DEFAULTS")=. -- =gh_pr_diff= rejects =format='json=. -- =gh_api_get= rejects =-f=, =-F=, =--field=, =--raw-field=, - =--input=, =--method= other than GET. -- =gh_pr_current= invokes without a number arg (uses cwd). -- =limit= clamped to per-wrapper max. -- =fields= validated against per-resource allowlist. - -** Manual smoke (every phase) - -| Phase | Smoke | -|-------+-------| -| 4 | =gh_pr_view N= against both hosts | -| 4 | =gh_pr_list= in =~/.emacs.d= → uses github.com/cjennings/dotemacs | -| 5 | =gh_pr_checks= shows CI status without full logs | -| 5 | =gh_run_logs_failed= cap kicks in on a long-failed run | -| 5 | =gh_api_get= rejects a =-f= arg with clear error | -| 6 | =gh= general tool: agent asked to merge a PR triggers GPTel confirm prompt | -| 6 | Agent asked to =gh repo delete= gets irreversible-blocked | -| 6 | Agent asked to =gh --web=... gets policy-blocked | -| 7 | =cj/gh-doctor= correctly identifies an unauthenticated host | -| 7 | =cj/gh-tool-last-error= shows debug record after a failing call | - -** Opt-in integration suite - -A small set of real-gh tests (in =tests/test-gh-integration.el= -marked =:tag :integration=, default skipped): - -- =gh auth status --hostname github.com= ok. -- =gh auth status --hostname deepsat.ghe.com= ok. -- =gh repo view cjennings/dotemacs --json name= - returns parseable JSON. -- =gh pr list --repo cjennings/dotemacs --limit 1= returns ≤ 1 - PR. - -Run manually via =make test-name TEST=gh-integration=. - -* Acceptance Criteria - -1. *Argv contract.* No tool produces a command string for execution; - every call goes through the argv-list runner. -2. *No silent writes.* Every classified write either prompts - GPTel's confirm or hard-blocks. Verified by an end-to-end - test where the agent attempts =pr merge= and the test fails if - =make-process= is invoked before confirm. -3. *In-flight cap.* A stubbed process emitting 1 MB returns - exactly 64 KB; the runner never holds more than 65 KB in - memory. -4. *Host visibility.* Every successful tool response begins with - =[gh HOST/REPO ...]=. Verified by a test that greps the - response text. -5. *Doctor coverage.* =cj/gh-doctor= correctly identifies (a) no - gh executable, (b) too-old gh, (c) unauthenticated host, (d) - non-git cwd, (e) git cwd whose remote points to an unknown - host. -6. *No secret leakage.* Test fixtures containing - =REDACTED_TEST_SECRET= in every secret-bearing slot - (=--token=, =-H Authorization=, =-f body=, etc.) produce zero - matches when grepping audit log, debug record, and any - user-facing message. - -* Risks - -** R1 -- gh CLI evolves and verbs drift - -New =gh= versions may add subcommands the blocklist doesn't -cover. Or rename verbs. - -*Mitigation:* the blocklist works on verbs (not full subcommand -paths) so most additions are caught. Doctor includes a -=gh --version= floor. Periodic review when gh bumps a major -version. - -** R2 -- The "all reads auto-execute" default may still be too broad - -Some reads expose private content (issue bodies, PR descriptions -from private repos). An agent surfacing a confidential issue -body into a saved conversation has data-leak implications. - -*Mitigation:* response header makes the host/repo visible in -every result, so the saved conversation makes the privacy -boundary auditable. Wrappers truncate body/comments by default; -agent must explicitly opt-in to include them. Documented in -commentary. - -** R3 -- The general tool's =:confirm t= prompt may become click-fatigue - -If the agent uses the general tool heavily during a write-heavy -workflow (PR creation, label management), confirming every call -becomes tedious. - -*Mitigation:* the expanded wrapper set covers most reads, so the -general tool fires mainly for writes -- where confirmation is -exactly the right behavior. If usage shows confirm fatigue, -V2's per-host policy can add =:auto= for explicit -write-confirmed contexts. - -** R4 -- =--paginate= block conflicts with legitimate large queries - -Blocking =--paginate= globally means the agent can't get -historical CI runs (which may need pagination). - -*Mitigation:* =gh_run_list= and =gh_search_*= accept a clamped -=--limit= which usually substitutes. If a use case needs more, -the agent can request multiple non-paginated pages explicitly. - -** R5 -- Token expiry surfaces as cryptic exit 4 - -When a host's keyring entry expires, every call returns exit 4 -with "authentication required" on stderr. The agent sees the -error but may not realize the fix is interactive. - -*Mitigation:* the runner's =:error-kind 'auth= mapping prepends -the recovery message before returning to the agent. =cj/gh-doctor= -proactively checks auth status. - -** R6 -- TRAMP cwd silently runs gh remotely - -Without the explicit TRAMP rejection, =process-file= would try to -spawn =gh= on the remote host (where it may not exist, or may -authenticate against the wrong keyring). - -*Mitigation:* runner checks =(file-remote-p default-directory)= -first and returns =:error-kind 'remote-cwd= with a clear message. - -** R7 -- =gh search= behaves differently on GHE - -GHE may not support every advanced search operator -=github.com= does. Search wrappers may return inconsistent -results across hosts. - -*Mitigation:* documented in =gh_search_*= wrapper descriptions. -Result header makes the host visible so the agent can adjust. - -** R8 -- Audit log grows unbounded - -One file per day, but no automatic cleanup. - -*Mitigation:* metadata-only entries are tiny (~150 bytes); a -year of heavy use is a few MB. Manual cleanup acceptable. -Defcustom to disable for users who don't want it. - -* Open Questions - -** Q1 -- Should the general tool's confirmation prompt include the classification? - -When GPTel asks "Run gh tool? (y/n)" the prompt shows the argv but -not the classification. Showing "WRITE: gh pr merge 171" gives the -user more context. Need to investigate gptel's confirm-prompt -extensibility. - -** Q2 -- Should =gh_pr_diff= cap differently from text wrappers? - -A PR diff can legitimately be 100KB+ for a large refactor. The -64KB cap is the same as everywhere else. If diffs need a higher -cap (256KB?), that's per-wrapper config. - -** Q3 -- Should wrappers expose =include-body=, =include-comments=, etc., as separate args, or as a comma-separated list? - -The spec proposes separate boolean args (=:include-body t=, -=:include-comments t=). Alternative: one =:include= comma-list -arg. Separate args are more discoverable; comma-list is more -compact. Decide during Phase 4. - -** Q4 -- Should =cj/gh-tool-audit-log= grow into a query interface? - -V1 writes one line per call. Future: a command to query the -log (=cj/gh-audit-search REGEX=) for surfacing "what did the agent -do to this PR last week?". - -* V2 Roadmap - -Items intentionally deferred: - -- *Per-host policy.* =cj/gh-host-policy= alist keyed by hostname - (mirror of the MCP spec's structure) so work GHE can be - read-only while personal allows writes-with-confirm. -- *Conversation context injection.* After a PR view, the wrapper - inserts a "GitHub context: HOST/REPO PR #N at URL" line into the - GPTel buffer so saved conversations stay traceable without - bundling full output. -- *Local/remote bridge helpers.* current-branch → PR-number, - changed-files → matching PR file comments, etc. -- *Artifact download.* =gh_run_artifacts=, =gh_release_download= - with explicit confirm and write to a controlled directory. -- *Async general tool.* =make-process= + sentinel for the cases - where 60s timeout isn't enough (rare, but real for some - =--paginate= scenarios). -- *Audit log query interface.* =cj/gh-audit-search=, - =cj/gh-audit-by-host=. -- *Profile-based tool subsets.* e.g. read-only profile - vs. write-capable profile per buffer. - -* References - -- [[file:../../gptel-tools/git_log.el][gptel-tools/git_log.el]] -- pattern reference for new tool files. -- [[file:../../modules/ai-config.el][modules/ai-config.el]] -- =gptel-confirm-tool-calls nil= at - line 386; loader at lines 71-96. -- [[file:gptel-agentic-tool-ideas.org][gptel-agentic-tool-ideas.org]] -- broader agentic-tool design; - =gh= sits alongside the MCP integration as the - collaboration tier. -- [[id:b4c274c5-8572-4a7b-b657-d315712bd6af][mcp-el-gptel-integration-spec-doing.org]] -- sibling design; same - confirm-on-write pattern for safety. -- [[https://cli.github.com/manual/][gh CLI manual]] -- subcommand reference. -- =gh --version 2.92.0= help output -- verified flag semantics - per subcommand. -- =gh help environment= -- verified env-var names for non-interactive - mode. diff --git a/docs/specs/gptel-git-tools-magit-backend-spec.org b/docs/specs/gptel-git-tools-magit-backend-spec.org deleted file mode 100644 index bd84b0595..000000000 --- a/docs/specs/gptel-git-tools-magit-backend-spec.org +++ /dev/null @@ -1,196 +0,0 @@ -:PROPERTIES: -:ID: bd47c9a8-aae1-4a3d-ad5b-b8767f2fd580 -:STATUS: not-started -:END: -#+TITLE: Design: gptel git tools on a magit backend -#+AUTHOR: Craig Jennings -#+DATE: 2026-05-16 - -* Status - -Draft. Supersedes the three current git-tool implementations -(=gptel-tools/git_status.el=, =gptel-tools/git_log.el=, -=gptel-tools/git_diff.el=) shipped in commit =ceeae9b5=. Trigger: -Craig flagged that magit already does much of this and could carry -the backend for more git tools cheaply. - -* Problem - -The three current git_* tools shell out to git directly via -=process-file= and parse stdout. Each carries: - -- Its own =--is-inside-work-tree= path-validation step. -- Its own =-c color.ui=false= color suppression workaround (`git - status' doesn't accept =--no-color= the way `git log' / `git diff' - do). -- Boilerplate to set up a temp buffer, run =process-file=, capture - output, return the string. - -There's also an opportunity cost: adding more git context tools -(=git_blame=, =git_show=, =git_branches=, etc.) would mean -duplicating the same boilerplate per tool. - -* Wins from a magit backend - -Three concrete things magit provides: - -1. *Path validation via =magit-toplevel=.* One call replaces the - two-step =process-file= + =rev-parse --is-inside-work-tree= - check. Returns the working-tree root or nil. - -2. *Process plumbing via =magit-git-insert= / =magit-git-string= / - =magit-git-lines=.* These wrap git invocation with magit's - environment, encoding handling, and the right color posture. - Drops the per-subcommand color-flag bikeshedding. - -3. *Typed helpers for higher-level concepts* -- =magit-get-current-branch=, - =magit-list-branches=, =magit-rev-ancestor-p=, etc. Most - relevant for the *new* tools (branches, show, blame), not the - three we already wrote. - -What magit doesn't give us: high-level "give me status as a string" -helpers. =magit-status= / =magit-log-current= etc. populate -interactive magit buffers, not strings. For tool output we'd still -call =magit-git-insert "status" "--short" "--branch"= and grab the -buffer string. Same shape, less boilerplate. - -* Costs - -- *Magit loads on first invocation* of any git_* tool. Magit pulls - in transient, with-editor, magit-section, magit-core -- heavyweight. - Mitigation: lazy =(require 'magit)= inside each tool's function - body so cold-start Emacs sessions don't pay the cost unless the - user actually calls a git tool. -- *Tools no longer portable* to a no-magit Emacs. Acceptable here - because magit is a non-negotiable in this config; a future - drop-in distribution would need to publish a magit-free fallback. - -* Proposed shape - -** Single-file module: =gptel-tools/git_tools.el= - -The current "one file per tool" convention exists because the -existing tools share little. These six tools share a lot -(validate-path, run-git, truncate-output), so a single file with -shared helpers is more honest. - -** Shared helpers - -- =cj/gptel-git--toplevel-or-error PATH= - - Wraps =magit-toplevel=. Signals =user-error= when PATH escapes - HOME, doesn't exist, or isn't inside a working tree. - - Returns the resolved working-tree root on success. - -- =cj/gptel-git--insert ARGS...= - - Wraps =magit-git-insert= in a =with-temp-buffer=, returns - =buffer-string=. Single chokepoint for color / encoding / error - handling. - -- =cj/gptel-git--truncate TEXT MAX-BYTES= - - Caps output, appends a one-line truncation marker when - triggered. - - Open question: consolidate the matching helper from =web_fetch.el= - (=cj/gptel-web-fetch--truncate=) and the - =cj/update-text-file--*= analogue into a shared - =cj/gptel-tools--truncate-bytes= in =system-lib.el=, or keep - per-tool. - -** Six tools - -| Name | Magit-flavored shape | -|------------------+--------------------------------------------------------------------| -| =git_status= | =magit-git-insert "status" "--short" "--branch"= | -| =git_log= | =magit-git-insert "log" "--oneline" (format "-n%d" N) ?--since= | -| =git_diff= | =magit-git-insert "diff" REF1 REF2 "--" FILE= (each optional) | -| =git_blame= | =magit-git-insert "blame" "--line-porcelain" FILE [-L S,E]= | -| =git_show= | =magit-git-insert "show" REF= (message + full diff) | -| =git_branches= | =magit-list-branches= (optionally filtered by =--list PATTERN=) | - -Each tool: -- Validates =path= via =cj/gptel-git--toplevel-or-error=. -- Calls =cj/gptel-git--insert= with the appropriate args. -- Truncates via =cj/gptel-git--truncate=. -- Registered as a separate tool with =gptel-make-tool= for - description / argv clarity at the model side. - -** Caps - -| Tool | Default cap | Hard cap | -|---------------+----------------+--------------| -| =git_status= | uncapped | uncapped | -| =git_log= | 100 commits | 100 commits | -| =git_diff= | 500 KB | 500 KB | -| =git_blame= | 500 KB | 500 KB | -| =git_show= | 500 KB | 500 KB | -| =git_branches=| uncapped | uncapped | - -=git_log='s cap is on commit count; the rest cap output bytes. - -** :confirm posture - -All six tools are read-only. Same posture as the current -implementation: =:confirm nil= (the model can call them -autonomously, since they can't mutate state). The current -git_status / git_log / git_diff already ship with =:confirm nil= -- -keeping it. - -** Tests - -Single file =tests/test-gptel-tools-git-tools.el=, replacing the -three current per-tool test files. Real temp git repos via -=process-file= (same pattern as current tests). Coverage per tool: -Normal / Boundary / Error. - -Rough count: ~12 shared-helper tests (validator, insert wrapper, -truncate) + ~7 per tool × 6 tools = ~54 tests total. - -* Migration - -1. Delete =gptel-tools/git_status.el=, =git_log.el=, =git_diff.el=. -2. Delete =tests/test-gptel-tools-git-status.el=, - =test-gptel-tools-git-log.el=, =test-gptel-tools-git-diff.el=. -3. Create =gptel-tools/git_tools.el= containing all six tools + - shared helpers. -4. Create =tests/test-gptel-tools-git-tools.el=. -5. Update =cj/gptel-local-tool-features= in =modules/ai-config.el=: - replace the three =git_*= symbols with one =git_tools= symbol - (or six if each tool wants its own feature file -- decide during - implementation). -6. Make sure =modules/ai-config.el= can re-load without breaking the - live gptel session if the old tool symbols are still registered - from a prior Emacs. - -* Risks - -| Risk | Mitigation | -|-------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------| -| Magit load slows first git_* tool call | One-time hit per session, scoped to the tool's :function body. Acceptable for opt-in tools. | -| Tool registration name collision with the old git_* symbols | Use distinct names (git_status / git_log / git_diff stay; new tools join them). Or remove + restart. | -| =magit-toplevel= behavior on TRAMP / remote paths | Validator rejects paths outside HOME first, so TRAMP paths can't reach magit-toplevel. | -| =git_blame= exposes code surfaces the model shouldn't read | =:confirm nil= is the wrong posture if blame is sensitive. Open question for review. | -| =git_show= reveals past-self commit message wording | Same as blame -- low risk on personal repo, but worth flagging. | - -* Open questions - -1. Build all six tools in one push, or phase status/log/diff first - and add blame/show/branches in a follow-up? My read: one push. - The helpers are shared, marginal cost of three more tools is - small, and the model gets meaningfully more useful git context. -2. Consolidate the output-truncation helper into =system-lib.el=, - touching =web_fetch.el= and =update_text_file.el= for a cleaner - API? Or defer that to a separate refactor commit? -3. =git_blame= and =git_show= -- =:confirm nil= or =:confirm t=? - Personal repo lowers the stakes but the model could ask for - blame on /any/ file under HOME. -4. Tool feature symbols: one =git_tools= entry in - =cj/gptel-local-tool-features=, or six (one per tool)? - Currently each tool lives in its own provide-symbol file. With - the single-file design we'd register one feature symbol that - loads all six. - -* Effort estimate - -M (1-3 hours). Helpers + six tool wrappers + ~50 tests + migration. -Most of the time is test authoring; the production code is small -because magit absorbs the boilerplate. diff --git a/docs/specs/gptel-network-tools-spec.org b/docs/specs/gptel-network-tools-spec.org deleted file mode 100644 index c28d54694..000000000 --- a/docs/specs/gptel-network-tools-spec.org +++ /dev/null @@ -1,411 +0,0 @@ -:PROPERTIES: -:ID: 6388588c-dac2-4c52-97ad-2343ba1443fc -:STATUS: not-started -:END: -#+TITLE: Design: gptel network tools -#+AUTHOR: Craig Jennings -#+DATE: 2026-05-16 -#+OPTIONS: toc:nil num:nil - -* Status - -Draft. Brainstorm output captured from a =/brainstorm= session on -2026-05-16. Sibling to -=docs/specs/gptel-git-tools-magit-backend-spec.org= and the broader theme -hierarchy under =** TODO [#B] GPTel Tool Work= in =todo.org=. - -The conventional vs tail-sample exploration covered three categories -(network, text/data, build/code). Network was selected as the next -build target; this doc captures the network slice in full. The other -two categories are referenced briefly and live as theme stubs under -=*** TODO [#B] Filesystem Related Tools= and -=*** TODO [#B] Development Workflow Related Tools= in =todo.org=. - -* Problem - -The current =gptel-tools/= set covers filesystem CRUD, web fetch, and -git status/log/diff. When the user asks the agent "why can't I reach -X?" or "what's on my LAN right now?" the agent has no affordances -- -it can only suggest commands the user runs manually. - -Network diagnosis is a recurring task on this laptop (homelab, mixed -wifi/wired, occasional VPN, NetworkManager-managed connections). The -agent should be able to run read-only network probes directly, return -structured findings, and synthesize an explanation. Anything that -mutates network state (=nmcli connection up=, route changes) stays -behind =:confirm t=. - -* Non-goals - -- Active offensive scanning, vulnerability probes, or exploitation - tooling. Out of scope at the wrapper boundary -- nmap's - =-A=/=-O=/aggressive modes are rejected, NSE is deferred. -- Scanning networks the user doesn't own. Public targets are gated - behind an explicit =external=t= flag and =:confirm t=. -- Real-time/streaming inspection (=iftop=, =nethogs=, =tcpdump - follow=). Snapshot tools only; streaming tools don't fit the - request/response shape of gptel tools. -- Replacing Magit's git tooling, mu4e's mail handling, or any other - Emacs-native workflow. Network tooling is the gap. - -* Approaches considered - -The =/brainstorm= run generated six candidate themes across three -categories. Three conventional (high-prior), three tail samples -(genuinely different regions of the option space). Network was -chosen as the first build target; the others are recorded for -follow-up sessions. - -** Recommended: network triage bundle (conventional #1) - -Five tools covering discovery, diagnostics, and inspection: - -| Tool | Purpose | -|-------------------+--------------------------------------------------| -| =net_diagnose= | "Why can't I reach X?" -- composite probe | -| =net_discover= | "What's on this subnet?" -- LAN host discovery | -| =net_services= | "What's listening on host X?" -- service detect | -| =network_status= | "What's my current network state?" -- snapshot | -| =dns_lookup= | Typed DNS query (A/AAAA/MX/NS/TXT/SRV/CAA) | - -Detailed in =* Design= below. - -*** Pros - -- Hits the highest-leverage daily question (connectivity diagnosis) - with a single mental entry point (=net_diagnose=). -- Atomic tools (=dns_lookup=, =network_status=) for cases the - composite is too coarse for. -- All read-only at the network layer; =:confirm nil= for RFC1918, - =:confirm t= for public targets. -- nmap's two genuinely-unique capabilities (subnet discovery, service - enumeration) get first-class wrappers. - -*** Cons - -- Five tools is heavy for one category. Some are thin wrappers around - a single command. -- Composite =net_diagnose= hides which sub-check fired; debugging the - tool itself is harder than debugging atomic tools. -- nmap is the one tool that *can* get the user in trouble. Target - gating must be airtight or it's the wrong tool to ship. - -** Rejected: code-quality fan-out (conventional #2) - -=shellcheck_run=, =format_check= (black/prettier/gofmt/rustfmt/elisp, -returns unified diff), =lint_run= (eslint/ruff/golangci-lint), -=dot_render=, =mermaid_render=. - -Folded into =*** TODO [#B] Development Workflow Related Tools= as -per-language work rather than a standalone bundle. Most of the per- -language wins land in the existing prog-*.el modules' format-on-save -and LSP attachments; the agent benefits more from /reading/ those -buffers than from re-running the formatters via tool calls. - -** Rejected: GitHub workspace (conventional #3) - -=gh_pr_view=, =gh_issue_search=, =gh_run_logs=, =gh_pr_diff=. - -Overlaps with the magit-backend track (=gptel-git-tools-magit-backend=) -for several queries. Better treated as a follow-on once the magit -backend lands -- some queries are local (magit) and some are remote -(gh), and the seam is clearer after the local side is built. - -** Rejected: DNS-chain inspector (tail sample) - -=dns_chain= walks NS -> A/AAAA -> MX -> SPF -> DMARC -> DKIM for a -domain and returns a structured assessment with red flags ("MX -missing TLS-RPT", "SPF includes >10 lookups", "DMARC policy=none"). - -Real value when it's useful but probably 5 calls/year for this -laptop. =dns_lookup= covers 90% of the recurring need; the chain -walker is parked for a possible follow-on. - -** Rejected: awk_eval / sed_eval with explanation (tail sample) - -Accept snippet + sample input, return both the transformed output and -a plain-English explanation of what the snippet does. - -Doubles work the model already does internally -- the model is -already good at generating and explaining awk/sed. Real win would -only be the actual execution against actual data, which the eshell -escape hatch in the Filesystem section already covers. - -** Adopted as project convention: plan/apply split (tail sample) - -=rsync_plan= / =rsync_apply= split: plan always runs =--dry-run= and -returns the file list and byte counts that *would* transfer; apply is -a separate tool registration with =:confirm t=. Same shape for -=nmcli= (status read vs connection mutate) and any other mutating -tool. - -Promoted to a documented convention rather than a single tool: any -mutating wrapper in =gptel-tools/= should split into a preview and an -apply. The preview is =:confirm nil= so the agent can plan -autonomously; the apply is =:confirm t= and stops cleanly for human -review. Applies to =rsync=, =nmcli connection up=, =ssh= mutations, -and the pandoc/ffmpeg/imagemagick output-writing tools in the -Filesystem section. - -* Design - -** Tool 1: =net_diagnose= - -Composite "why can't I reach X?" probe. Given a target (hostname or -IP), runs a sequence of sub-checks and returns a structured result: - -1. =dig +short= on the name (skip if target is an IP literal). -2. =ping -c 3 -W 2= against the resolved IP. -3. =traceroute -n -w 2 -q 1 -m 20= to the IP. -4. If a port is given: =curl --max-time 5 -o /dev/null -sw '%{http_code}\n'= - for ports 80/443, or =nc -zv -w 3= for arbitrary TCP ports. - -Output shape (alist or plist returned to the model): - -#+begin_src text - ((target . "example.com") - (resolved-to . "93.184.216.34") - (dns-time-ms . 12) - (ping . ((sent . 3) (received . 3) (avg-ms . 14.2))) - (traceroute . ((hops . 8) (last-hop . "93.184.216.34"))) - (port-check . ((port . 443) (status . "200") (tls . "ok")))) -#+end_src - -Caps: total runtime <30s. Each sub-check has its own timeout. If a -sub-check fails (no ping reply, no route, no DNS), the field carries -the failure mode rather than aborting the whole call -- the agent -needs the partial picture to reason. - -=:confirm nil=. Read-only. - -** Tool 2: =net_discover= - -Wraps =nmap -sn <subnet>= for LAN host discovery. Two argv shapes: - -- =net_discover ()= -- defaults to the current LAN, derived from - =ip route get 1.1.1.1= and the matching interface's =/24=. -- =net_discover :subnet "192.168.1.0/24"= -- explicit subnet. - -Guardrails: - -- Subnet must be RFC1918, link-local (169.254/16), CGNAT (100.64/10), - or loopback. Public subnets rejected at the validator. -- Subnet mask must be /22 or smaller (no /16 or wider). At /22 that's - ~1024 hosts -- enough for any homelab. Default home network is /24. -- =--host-timeout 30s --max-retries 1= to bound runtime. - -Output: list of =(ip mac hostname state)= tuples. - -=:confirm nil= for RFC1918 / link-local / CGNAT / loopback. Public -subnets never reach this tool (validator rejects). - -** Tool 3: =net_services= - -Wraps =nmap -sV= for service/version detection on a single host. - -Argv: - -- =:host= -- required. RFC1918 / link-local / CGNAT / loopback by - default. Public hosts require =:external t= which flips - =:confirm t=. -- =:ports= -- optional port spec. Default: top-100 (=--top-ports - 100=). Custom lists allowed: ="22,80,443,5432,6379"= or - ="1-1024"=. Hard cap: 1024 ports total. -- =:fast= -- if t, uses =--top-ports 20= for a quick check. - -Mode allowlist enforced at the wrapper: only =-sV= with optional -=-p=. Reject =-A=, =-O=, =-T4=/=-T5=, =--script=, raw-packet flags. - -Output: list of =(port protocol state service version banner)= -tuples, parsed from =-oG -= (greppable output). - -=:confirm nil= for RFC1918 / link-local / CGNAT / loopback. -=:confirm t= for any target reachable only as a public IP/hostname. - -** Tool 4: =network_status= - -Snapshot of the local network state. Composite of: - -- =ip -br addr= -- interfaces and their addresses. -- =ip route= -- routing table. -- =nmcli -t -f NAME,TYPE,DEVICE,STATE connection show --active= -- - active NetworkManager connections. -- =ss -tulpn= (or =netstat -tulpn= fallback) -- listening sockets. -- =resolvectl status= (or =/etc/resolv.conf= fallback) -- DNS - resolver state. - -Output: structured alist with sections for each. - -=:confirm nil=. Read-only. - -Note: this is also the candidate target for the plan/apply split if -=nmcli connection up=/=down= ever lands as a tool -- =network_status= -becomes the "plan" side and any mutation is a separate tool. - -** Tool 5: =dns_lookup= - -Typed DNS query. Argv: - -- =:name= -- required. The DNS name to query. -- =:type= -- record type. Default =A=. Allowed: =A=, =AAAA=, =MX=, - =NS=, =TXT=, =SRV=, =CAA=, =CNAME=, =PTR=, =SOA=. -- =:server= -- optional resolver. Default uses system resolver. - When set, must be RFC1918 or one of a small allowlist (=1.1.1.1=, - =8.8.8.8=, =9.9.9.9=) so the tool can't be used to probe arbitrary - hosts via DNS. - -Output: list of records with TTL. For =MX= and =SRV=, includes -priority/weight/port. For =TXT=, the records are split into the -quoted segments dig returns. - -=:confirm nil=. Read-only. - -** Shared helpers - -In =gptel-tools/network_tools.el= (single file, mirrors the -magit-backend plan for git tools): - -- =cj/gptel-net--validate-target HOST &optional ALLOW-PUBLIC= - - Resolves HOST. Rejects unless resolved IP is RFC1918 / - link-local / CGNAT / loopback, unless ALLOW-PUBLIC is non-nil. - - Returns the resolved IP on success. - -- =cj/gptel-net--validate-subnet CIDR= - - Rejects non-private subnets and subnets wider than /22. - - Returns =(network mask)= on success. - -- =cj/gptel-net--current-lan= - - Derives the current /24 from =ip route get 1.1.1.1=. - -- =cj/gptel-net--run ARGS &key TIMEOUT= - - Wraps =process-file= with a uniform timeout, color/encoding - posture, and structured return =(exit-code stdout stderr)=. - -- =cj/gptel-net--parse-nmap-greppable STRING= - - Parses nmap =-oG -= output into structured tuples. - -- =cj/gptel-net--truncate TEXT MAX-BYTES= - - Same shape as the existing per-tool truncate helpers. Open - question whether this consolidates into =system-lib.el= alongside - the matching helpers in =web_fetch.el= and =update_text_file.el=. - -** Caps - -| Tool | Default cap | Hard cap | -|------------------+------------------------+------------------------| -| =net_diagnose= | <30s total runtime | <30s total runtime | -| =net_discover= | /24 default, /22 max | /22 | -| =net_services= | top-100 ports | 1024 ports | -| =network_status= | uncapped (snapshot) | uncapped | -| =dns_lookup= | uncapped | uncapped | - -** =:confirm= posture - -| Tool | RFC1918 target | Public target | -|------------------+-------------------+-------------------------| -| =net_diagnose= | =:confirm nil= | =:confirm t= | -| =net_discover= | =:confirm nil= | rejected at validator | -| =net_services= | =:confirm nil= | =:confirm t= | -| =network_status= | =:confirm nil= | n/a (local snapshot) | -| =dns_lookup= | =:confirm nil= | =:confirm nil= | - -=dns_lookup= stays =:confirm nil= for public names because DNS is -read-only and innocuous. =net_diagnose= and =net_services= against -public targets are gated because pinging/probing public hosts isn't -*illegal* but it can trip rate-limits or get the user flagged on a -managed network. - -** Tests - -Single file =tests/test-gptel-tools-network-tools.el=. Real subnets -are not available in CI, so: - -- =net_discover= and =net_services= are stubbed via =cl-letf= on - =cj/gptel-net--run=, returning canned nmap output. Real nmap - invocation tested via one =:tags '(:integration)= test that runs - =nmap -sn 127.0.0.1/32= and asserts the parser handles the real - format. -- =net_diagnose= sub-checks stubbed individually so each failure mode - can be exercised. -- =network_status= sections stubbed per-command; one integration test - runs against the live system and asserts the structure parses. -- =dns_lookup= stubbed against canned =dig= output; one integration - test against =localhost= via the system resolver. - -Rough count: ~12 shared-helper tests (validators, current-lan -detector, parsers) + ~7 per tool x 5 tools = ~47 tests. - -** Risk surface - -| Risk | Mitigation | -|-----------------------------------------------------------+---------------------------------------------------------------------| -| nmap scan against an unintended target | Validator gates on resolved IP, not on the input string. Public | -| | targets require explicit =:external t= flag + =:confirm t=. | -| Scan triggers IDS/IPS on a corporate/managed network | Default modes are non-aggressive (=-sn=, =-sV= only). No =-A=, no | -| | NSE, no high T-level. =:confirm t= for non-RFC1918 targets gives | -| | the user a manual checkpoint. | -| =net_diagnose= hangs on a slow target | Per-sub-check timeouts; total runtime cap; partial-failure return | -| | rather than abort. | -| nmap not installed on the system | =:command= check at module load via =cj/executable-find-or-warn= | -| | (matching the prettier/pyright pattern documented in CLAUDE.md). | -| Network tools shell out via =process-file= | argv-list invocation, no shell. =shell-quote-argument= unused | -| | because no shell is involved. | -| /tmp pollution or banner output writing to disk | All output captured to buffer via =process-file=, never written. | - -* Open questions - -1. *Default port set for =net_services=.* Top-100 (nmap default), - top-1000 (full default scan, slower), or a custom homelab-tuned - list (=22, 80, 443, 445, 3389, 5432, 6379, 8080, 8443, 9090, 9000, - 631=)? My read: top-100 default + =:fast t= for top-20 + custom - override for the homelab list when needed. -2. *NSE in v1 or deferred?* Skip entirely (clean v1) or ship a small - allowlist (=ssl-cert=, =http-title=, =ssh-hostkey=)? My read: - skip in v1. If a real use case shows up (TLS audit), add a single - =net_tls_audit= tool wrapping just =ssl-enum-ciphers=/=ssl-cert= - rather than a generic NSE escape hatch. -3. *Consolidate the truncate helper.* Same open question as the - magit-backend doc: move =cj/gptel-net--truncate= and its siblings - into =system-lib.el= as =cj/gptel-tools--truncate-bytes=, or keep - per-module? My read: consolidate when there are three callers - (web_fetch, update_text_file, network_tools all qualify). -4. *Composite vs atomic for =net_diagnose=.* Build it as one - composite, or break it into =ping_run=, =traceroute_run=, - =port_check= and let the agent compose? My read: composite is - better -- the agent reasons in "diagnose-this-target" terms more - often than in "just-ping-this". Atomic sub-tools can be added - later if the composite proves coarse-grained. -5. *Promote plan/apply split to documented convention now?* Or wait - until a second tool exercises it (post-rsync)? My read: document - the convention in the Filesystem section body now, since pandoc / - ffmpeg / imagemagick all benefit, even before any of them ship. -6. *nmcli mutation tools.* Out of scope for this doc but worth - flagging: =nmcli connection up <name>= / =nmcli connection down - <name>= / =nmcli device wifi connect <ssid>=. These would be the - first apply-side tools under the plan/apply convention, with - =network_status= as the plan side. - -* Effort estimate - -M (1-3 hours). Five tools + shared helpers + ~47 tests. Most of the -time is test authoring (canned nmap output, dig output, ss output); -production code is small because each tool is a thin =process-file= -wrapper plus a parser. - -* Next steps - -- Resolve open questions #1 and #2 before any code lands (the - =net_services= shape can't be finalized without them). -- Once approved, the work attaches to =*** TODO [#B] (Network bundle: - net_diagnose / net_discover / net_services / network_status / - dns_lookup)= -- a new theme under =*** TODO [#B] (Networking tools - category)= which itself becomes a new top-level under =** TODO [#B] - GPTel Tool Work= in =todo.org=, peer to the existing Filesystem - section. -- Implementation follows =/start-work= flow: TDD, characterization - tests for the parsers first (canned nmap/dig/ss fixtures), then - the wrappers, then the registrations in - =cj/gptel-local-tool-features=. -- After landing, revisit candidate #6 (plan/apply split) -- the - first apply-side tool (=nmcli connection up=, =rsync_apply=, - pandoc-output) exercises the convention end-to-end. diff --git a/docs/specs/mcp-el-gptel-integration-spec-doing.org b/docs/specs/mcp-el-gptel-integration-spec-doing.org deleted file mode 100644 index f22e91959..000000000 --- a/docs/specs/mcp-el-gptel-integration-spec-doing.org +++ /dev/null @@ -1,1438 +0,0 @@ -:PROPERTIES: -:ID: b4c274c5-8572-4a7b-b657-d315712bd6af -:STATUS: doing -:END: -#+TITLE: Design: Wire mcp.el into GPTel for MCP server access -#+AUTHOR: Craig Jennings -#+DATE: 2026-05-16 -#+OPTIONS: toc:nil num:nil - -* Status - -Draft (revision 3). Pre-implementation; no code shipped yet. The -mcp.el package is cloned at =~/code/mcp.el/= (fork of -[[https://github.com/lizqwerscott/mcp.el][lizqwerscott/mcp.el]]) but not wired into the config. - -Revision 3 tightens seven contracts the revision-2 review flagged: - -1. *GPTel confirmation* -- =gptel-confirm-tool-calls= is =nil= at - =ai-config.el:386=, which short-circuits every per-tool - =:confirm= slot. The integration flips it to =auto= as a hard - precondition. -2. *Async timeout mechanics* -- replaced =with-timeout= (which - only supervises dynamic extent) with an explicit timer/callback - race for async tool calls. -3. *Startup completion semantics* -- the hub's completion callback - is opportunistic, not authoritative; the stall timer + polling - =mcp-server-connections= is the source of truth. -4. *Server identity at registration* -- walk - =mcp-server-connections= directly instead of parsing - =:category mcp-SERVER= out of =mcp-hub-get-all-tool=. -5. *Server enablement* -- =cj/mcp-enabled-servers= defcustom lets - users disable a server without writing code. Profiles still - deferred. -6. *Keymap pinned* -- =C-; a C= (Connect) is the MCP subprefix. - =M= (=gptel-menu=) and =m= (=cj/gptel-change-model=) stay - where they are. -7. *mcp.el private-API isolation* -- a compat layer wraps every - =mcp--*= call so version drift surfaces in one place. - -Plus several smaller changes: every MCP tool registers async, -description normalization adds a server-name prefix and a write -risk note, =cj/mcp-start-on-entry-points= defcustom scopes -startup triggers (default: full chat only), TRAMP processes -local-only, doctor gains live-auth-check, =cj/mcp-wait-until-ready= -command added, audit buffer surfaces failed servers prominently. - -* Problem - -GPTel exposes ten local tools today (=read_buffer=, =read_text_file=, -=write_text_file=, =update_text_file=, =list_directory_files=, -=move_to_trash=, =git_status=, =git_log=, =git_diff=, =web_fetch= -- -see =gptel-tools/=). Claude Code, by contrast, has access to nine -external MCP servers (linear, notion, figma, slack-deepsat, -google-calendar, google-docs-personal, google-docs-work, drawio, -google-keep), each exposing 10-70 additional tools. - -The asymmetry means agentic work done in GPTel can't touch the same -external systems Claude Code can. Wiring [[https://github.com/lizqwerscott/mcp.el][mcp.el]] into the config -closes the gap: GPTel gains access to every MCP server Claude Code -uses, modulo three claude.ai-hosted servers whose OAuth is bound to -the Claude.ai session (see Non-Goals). - -* Goals - -1. GPTel sees every tool from the enabled subset of nine reusable - MCP servers in =gptel-menu=, grouped by server via the tool's - =:category= field. -2. Servers spawn *asynchronously*. Opening GPTel never blocks on - MCP startup; tools arrive incrementally and =gptel-tools= updates - as each server reports its inventory. =cj/toggle-gptel= must - return without waiting for any MCP subprocess. -3. Write/destructive MCP tools are gated by a confirmation prompt - the user actually sees. Two preconditions: - =gptel-confirm-tool-calls= is set to =auto= (so the per-tool - =:confirm= slot is honored), and write/destructive tools are - registered with =:confirm t=. Read-only tools execute without - confirmation. -4. Secrets stay in =~/.claude.json= (single source of truth, shared - with Claude Code). The Emacs config reads env vars from there - at server-spawn time, with an mtime-aware cache. Secrets are - never echoed to status, errors, hub buffers, or tests. -5. A per-server status alist tracks each server's lifecycle (idle / - starting / ready / failed / stopped) and is inspectable via - =cj/mcp-status= and a =cj/mcp-list-tools= audit buffer. -6. Server-management commands live under a =C-; a C= (Connect) - subprefix so existing GPTel keys (=C-; a M=, =m=) aren't - disturbed. -7. A failed server (network down, OAuth token expired, npx package - 404) is surfaced clearly via the OAuth-recovery pattern matcher - and does not block GPTel itself. Successful servers' tools are - available immediately; failed servers' tools are absent (not - stale). -8. The config can swap between MELPA mcp.el and the local - =~/code/mcp.el/= checkout with a one-line uncomment, gated by a - capability check that asserts required API functions exist. -9. A first-run =cj/mcp-doctor= command diagnoses missing - prerequisites (=npx=, =uvx=, =~/.claude.json=, per-server - commands, known local endpoints) and optionally runs a - live-auth probe before they fail at runtime. - -* Non-Goals - -- The three claude.ai-hosted MCP servers (Gmail / Drive / Calendar - served from =*.googleapis.com/mcp/v1=). Their OAuth is issued - by the Claude.ai session and is not transferable to GPTel. -- *MCP resources and prompts.* v1 registers tools only. - Resource browsing and prompt invocation are tracked as - follow-ups; the local checkout has the API surface ready. -- *Per-conversation tool profiles.* v1 ships - =cj/mcp-enabled-servers= for whole-server enable/disable; - profiles (different tool subsets per chat) wait for v1.5 once - usage shows whether they're needed. -- *Auth-source migration.* Deferred until the OAuth re-auth flow - for expiring tokens is understood. Tracked in Open Questions - §Q3. -- *Automated OAuth re-auth when tokens expire.* Out of scope; the - user re-authenticates via Claude Code, and the next GPTel - invocation picks up the refreshed values from =~/.claude.json=. -- *Modifying mcp.el itself in this repo.* Upstream patches and - tests live in =~/code/mcp.el/= and ship via PRs to lizqwerscott's - master. - -* Verified API Contracts - -These were checked against the actual source before each revision. -Behavior summarized here so implementation can rely on it. - -** GPTel confirmation gating (=gptel.el:2244=) - -The confirmation check is: - -#+begin_src emacs-lisp -(if (and gptel-confirm-tool-calls - (or (eq gptel-confirm-tool-calls t) - (gptel-tool-confirm tool-spec))) - ;; ask user - ...) -#+end_src - -When =gptel-confirm-tool-calls= is =nil=, the =(and ...)= -short-circuits and the tool's =:confirm= slot is ignored. - -The defcustom default is ='auto=, which "seeks confirmation only -when the corresponding tool spec has a non-nil :confirm slot" -(=gptel.el:1601-1603=). =ai-config.el:386= currently sets it to -=nil=. - -*Implementation consequence:* =ai-mcp.el= must =setq -gptel-confirm-tool-calls 'auto= as part of its setup (and -=ai-config.el= drops the explicit =nil= setting). Without this, -write-gated tools register =:confirm t= and gptel ignores it. - -** mcp-hub callback ownership (=~/code/mcp.el/mcp-hub.el:53-90=) - -=mcp-hub--start-server= unconditionally appends its own six -callbacks (=:initial-callback=, =:tools-callback=, -=:prompts-callback=, =:resources-callback=, -=:resources-templates-callback=, =:error-callback=) to whatever -the caller passes. Per-server custom callbacks in the alist -result in duplicate keyword arguments to =mcp-connect-server= -- -behavior implementation-defined. - -*Implementation consequence:* the integration does not slip custom -callbacks through =mcp-hub-servers=. It uses -=mcp-hub-start-all-server='s top-level completion callback as an -opportunistic signal, walks =mcp-server-connections= directly for -authoritative state, and uses a stall timer as the deadline. - -** mcp-hub-start-all-server completion semantics - -The hub's completion callback (=mcp-hub-start-all-server='s -=CALLBACK= argument) fires when its internal counter reaches the -total server count. The counter increments: - -- On immediate Elisp errors from =mcp-hub--start-server=. -- When the =:inited-callback= passed to =mcp-hub--start-server= - fires, which happens inside the hub's =:tools-callback=. - -Async error paths flow through =:error-callback= -- which the hub -also installs but does *not* obviously chain into the inited -callback. Servers without tools may not pass through the tools -callback in the same way. - -*Implementation consequence:* the callback is treated as an -opportunistic readiness signal, *not* as "all initialized or -failed". The authoritative state comes from polling -=mcp-server-connections= (each entry has =mcp--status= of -=connected= / =error= / =starting=) and from the stall timer -deadline. - -** gptel-make-tool registration semantics (=gptel.el:1729-1820=) - -=gptel-make-tool= registers the tool into =gptel--known-tools= -keyed by category + name. It does *not* add the tool to -=gptel-tools= (the per-buffer active list). The existing local -tools (=gptel-tools/git_log.el:97=) explicitly do: - -#+begin_src emacs-lisp -(gptel-make-tool ...) -(add-to-list 'gptel-tools (gptel-get-tool '("category" "name"))) -#+end_src - -*Implementation consequence:* the registration pipeline does -both calls per tool, tracks tool names per server in a hash, and -deregisters cleanly on restart/stop without disturbing local -tools. - -** MELPA vs local checkout - -The local checkout (=~/code/mcp.el/=, tip =f10768e=) has HTTP -transport (=mcp-http-process-connection=) and recent UX -improvements (resource reading, imenu support, detail mode). -MELPA parity not yet verified. - -*Implementation consequence:* =cj/mcp--assert-capabilities= -checks for required functions at load time and signals a clear -=user-error= if missing. Use-package block defaults to MELPA; -the local-checkout =:load-path= line stays commented until the -capability check tells us MELPA is missing something. - -* GPTel Confirmation Contract - -The single most consequential precondition for the safety story: - -** Current state - -=modules/ai-config.el:386= sets =gptel-confirm-tool-calls= to -=nil=. This was a deliberate "allow tool access by default" -choice when only the ten local tools existed -- all of which are -either read-only (git_log, git_status, list_directory_files, etc.) -or already wrap their own confirm prompts (web_fetch uses -=:confirm t= but is ignored under the current setting; this was -acceptable because the only "real" tool there is on a user-typed -URL). - -** Required state - -The MCP integration cannot ship without flipping this to =auto=. -Specifically, =modules/ai-mcp.el= must: - -1. =setq gptel-confirm-tool-calls 'auto= during its load. -2. Audit the existing local tools and add =:confirm t= to any - that should be gated. =web_fetch= is the obvious candidate; - =write_text_file=, =update_text_file=, =move_to_trash= may - also warrant it depending on Craig's preference. - -The existing =ai-config.el:386= line is removed. A comment -points readers at =ai-mcp.el= for the new value. - -** Verification test - -A test in =tests/test-ai-mcp-confirm-contract.el= asserts: - -- After =ai-mcp= loads, =gptel-confirm-tool-calls= is ='auto=. -- A write-classified MCP tool registered with =:confirm t= takes - the confirmation branch in =gptel-send='s tool-dispatch code - (verified by stubbing gptel's confirm-prompt and checking it - fires). -- A read-classified MCP tool registered with =:confirm nil= does - not take the confirmation branch. -- Local =git_log= (=:confirm nil=) still runs without prompting. - -* Current State - -** =modules/ai-config.el= - -- =use-package gptel= block at lines 363-414, defer-loaded on the - =gptel= / =gptel-send= / =gptel-menu= commands. -- =gptel-confirm-tool-calls nil= at line 386. Removed by this - integration; see § GPTel Confirmation Contract. -- =cj/gptel-load-local-tools= (lines 71-96) loads the ten local - tools from =gptel-tools/=. -- =cj/toggle-gptel= (lines 418-441) is the primary entry point - (=C-; a t=). Other entry points: =cj/gptel-quick-ask= - (=C-; a q=), =gptel-magit-commit-generate= (=g= in magit), - =cj/gptel-rewrite-with-directive= (=C-; a r=), =gptel-send= - (=C-RET= in gptel buffer). -- =cj/ai-keymap= (lines 510-528) currently uses keys A B M d . f - b l m p q r R c s t x. =C= (uppercase) is free and becomes the - MCP subprefix. - -** =gptel-tools/= - -Ten =.el= files. =git_log.el= is the closest analogue; -=web_fetch.el= demonstrates the =:confirm t= pattern. - -** =~/.claude.json= - -Mode 0600, ~75 KB. Top-level =mcpServers= key holds the nine -servers we want. Env-var names per server (values redacted): - -| Server | Env vars | -|----------------------+-------------------------------------------------------| -| google-calendar | =GOOGLE_OAUTH_CREDENTIALS= | -| google-docs-personal | =GOOGLE_CLIENT_ID=, =GOOGLE_CLIENT_SECRET=, =GOOGLE_MCP_PROFILE= | -| google-docs-work | Same three vars (different values) | -| google-keep | =GOOGLE_EMAIL=, =GOOGLE_MASTER_TOKEN= | - -* Design - -** Module split - -Implementation lives in =modules/ai-mcp.el=. =modules/ai-config.el= -gains only autoload declarations and the =C-; a C= subprefix -wiring. - -** Code organization outline (=ai-mcp.el=) - -The file is organized in seven sections so it stays readable as -features land: - -1. *Constants and defcustoms* -- =cj/mcp-server-specs=, - =cj/mcp-claude-config=, =cj/mcp-enabled-servers=, - =cj/mcp-start-on-entry-points=, =cj/mcp-startup-timeout=, - =cj/mcp-tool-timeout=, =cj/mcp-tool-confirm-overrides=, - audit-log defcustoms. -2. *Public commands* -- =cj/mcp-ensure-started=, =cj/mcp-hub=, - =cj/mcp-status=, =cj/mcp-list-tools=, - =cj/mcp-restart-failed=, =cj/mcp-restart-server=, - =cj/mcp-stop-all=, =cj/mcp-doctor=, - =cj/mcp-wait-until-ready=. -3. *Pure helpers* -- Claude config reader, =cj/mcp--build-server-alist=, - =cj/mcp--redact=, =cj/mcp--confirm-p=, - =cj/mcp--normalize-description=. -4. *mcp.el compatibility layer* -- 3-5 wrappers around private - API (=mcp--status=, =mcp--tools=, etc.). Single source of - version-drift risk. -5. *Registration pipeline* -- =cj/mcp--register-tool=, - =cj/mcp--register-server-tools=, - =cj/mcp--deregister-server-tools=, - =cj/mcp--registered-tools= hash. -6. *Async state machine* -- =cj/mcp--state=, - =cj/mcp--server-status=, =cj/mcp--on-all-started=, - =cj/mcp--stall-timer=, =cj/mcp--poll-status=. -7. *UI* -- audit-buffer mode, doctor buffer, recovery-pattern - matcher, response prefixes. - -This explicit outline doubles as the file's table of contents in -its commentary block. - -** Server inventory: data first - -The nine servers are described as a defconst of plists, with no -secrets baked in: - -#+begin_src emacs-lisp -(defconst cj/mcp-server-specs - '((:name "linear" - :transport http - :url "https://mcp.linear.app/mcp" - :auth in-protocol - :risk write-capable) - (:name "notion" - :transport http - :url "https://mcp.notion.com/mcp" - :auth in-protocol - :risk write-capable) - (:name "figma" - :transport stdio - :command "npx" - :args ("-y" "figma-developer-mcp" "--stdio") - :secret-args ("--figma-api-key" :figma-api-key) - :auth args-token - :risk arg-leak) - (:name "slack-deepsat" - :transport sse - :url "http://127.0.0.1:13080/sse" - :auth local - :risk write-capable) - (:name "drawio" - :transport stdio - :command "npx" - :args ("-y" "@drawio/mcp") - :auth none - :risk none) - (:name "google-calendar" - :transport stdio - :command "npx" - :args ("-y" "@cocal/google-calendar-mcp") - :env (:GOOGLE_OAUTH_CREDENTIALS t) - :auth oauth - :risk write-capable) - (:name "google-docs-personal" - :transport stdio - :command "npx" - :args ("-y" "@a-bonus/google-docs-mcp") - :env (:GOOGLE_CLIENT_ID t :GOOGLE_CLIENT_SECRET t :GOOGLE_MCP_PROFILE t) - :auth oauth - :risk write-capable) - (:name "google-docs-work" - :transport stdio - :command "npx" - :args ("-y" "@a-bonus/google-docs-mcp") - :env (:GOOGLE_CLIENT_ID t :GOOGLE_CLIENT_SECRET t :GOOGLE_MCP_PROFILE t) - :auth oauth - :risk write-capable) - (:name "google-keep" - :transport stdio - :command "uvx" - :args ("--from" "keep-mcp" "python" "-m" "server.cli") - :env (:GOOGLE_EMAIL t :GOOGLE_MASTER_TOKEN t) - :auth token - :risk write-capable))) -#+end_src - -The same data drives the doctor check list, status labels, and -recovery messages -- a single source of truth keeps them from -drifting. - -** Server enablement - -#+begin_src emacs-lisp -(defcustom cj/mcp-enabled-servers - (mapcar (lambda (s) (plist-get s :name)) cj/mcp-server-specs) - "List of MCP server names to start. -Defaults to every server in `cj/mcp-server-specs'. Set to a -shorter list to disable specific servers without editing the -spec. Changes take effect on next `cj/mcp-restart-failed' or -Emacs restart." - :type '(repeat string) - :group 'cj) -#+end_src - -=cj/mcp--build-server-alist= filters by this list before -returning. A user who wants only =linear= and =drawio= sets: - -#+begin_src emacs-lisp -(setq cj/mcp-enabled-servers '("linear" "drawio")) -#+end_src - -This is the answer to "100+ tools is overwhelming" without -needing per-conversation profiles. - -** Entry-point policy - -Not every GPTel entry point should trigger MCP startup. Quick -ask, rewrite, and magit commit-message generation are -lightweight; spinning up nine subprocesses for a 50-word commit -message is surprising overhead. - -#+begin_src emacs-lisp -(defcustom cj/mcp-start-on-entry-points - '(toggle-gptel) - "GPTel entry points that trigger MCP startup. -Symbols correspond to commands: `toggle-gptel', `gptel-send', -`gptel-quick-ask', `gptel-rewrite-with-directive', -`gptel-magit-generate-message'. Default: only full chat -(`toggle-gptel')." - :type '(repeat symbol) - :group 'cj) -#+end_src - -Each entry-point command checks membership before calling -=cj/mcp-ensure-started=: - -#+begin_src emacs-lisp -(defun cj/toggle-gptel () - ... - (when (memq 'toggle-gptel cj/mcp-start-on-entry-points) - (cj/mcp-ensure-started)) - ...) -#+end_src - -** Claude config reader (mtime-cached, structured returns) - -#+begin_src emacs-lisp -(defcustom cj/mcp-claude-config - (expand-file-name "~/.claude.json") - "Path to the Claude Code config that holds MCP server env vars." - :type 'file - :group 'cj) - -(defvar cj/mcp--config-cache nil - "Cons of (MTIME . PARSED) for `cj/mcp-claude-config'.") - -(defun cj/mcp--read-claude-config () - "Return a structured result describing the Claude config state. -Result shape: - (:ok t :data PLIST) - (:ok nil :reason missing-file) - (:ok nil :reason unreadable) - (:ok nil :reason malformed-json :message STR) -Cached by mtime; subsequent calls reparse only on change." - ...) -#+end_src - -** mcp.el compatibility layer - -All private-API access lives in 3-5 helpers documented with the -upstream commit they target. This is the only file that touches -=mcp--*= names; everything else calls these wrappers. - -#+begin_src emacs-lisp -;; ai-mcp-compat -- isolates private mcp.el API. -;; Verified against upstream commit f10768e (2026-05-16). - -(defun cj/mcp--server-status (connection) - "Return CONNECTION's lifecycle status: connected, error, starting." - (mcp--status connection)) - -(defun cj/mcp--server-tools (connection) - "Return CONNECTION's discovered tool list (plists)." - (mcp--tools connection)) - -(defun cj/mcp--server-name (connection) - "Return CONNECTION's logical server name." - (jsonrpc-name connection)) - -(defun cj/mcp--assert-capabilities () - "Signal `user-error' if any required mcp.el function is missing." - (dolist (fn '(mcp-connect-server mcp-make-text-tool - mcp-hub mcp-hub-start-all-server - mcp-hub-get-all-tool mcp-server-connections)) - (unless (fboundp fn) - (user-error "mcp.el too old; missing %s. Upgrade or switch \ -to local checkout in `ai-mcp.el' use-package block" fn)))) -#+end_src - -If mcp.el renames a slot or changes a return shape, only these -helpers break. Tests cover each helper against stub objects. - -** Startup model: async + state machine + polling - -Three state structures capture lifecycle: - -#+begin_src emacs-lisp -(defvar cj/mcp--state 'idle - "Overall MCP integration state: idle, starting, partial, ready, failed.") - -(defvar cj/mcp--server-status nil - "Alist mapping server name to status plist: - (:state STATE :tool-count N :tools (NAME ...) :last-error STR - :started-at TIME :ready-at TIME)") - -(defvar cj/mcp--stall-timer nil - "Timer guarding against servers that never call back.") -#+end_src - -=cj/mcp-ensure-started= is the only entry point consumers call: - -#+begin_src emacs-lisp -(defun cj/mcp-ensure-started () - "Schedule MCP startup if it hasn't run yet this session. -Returns immediately. Servers spawn asynchronously." - (when (eq cj/mcp--state 'idle) - (setq cj/mcp--state 'starting) - (cj/mcp--assert-capabilities) - (cj/mcp--build-status-from-specs) - (setq mcp-hub-servers (cj/mcp--build-server-alist)) - (message "MCP: starting %d server(s) in background..." - (length mcp-hub-servers)) - ;; The hub callback is opportunistic. We poll status on each - ;; tick and the stall timer is the authoritative deadline. - (mcp-hub-start-all-server #'cj/mcp--on-hub-callback nil nil) - (cj/mcp--start-stall-timer))) -#+end_src - -State transitions and authority: - -- *Hub completion callback (opportunistic).* Triggers an - immediate poll + registration pass for whatever's ready. Does - not signal completion by itself. -- *Stall timer (authoritative deadline).* After - =cj/mcp-startup-timeout= (default 30 s), marks every server - still in =starting= state as =failed= with reason =timeout=, - registers tools from servers that did become ready, transitions - =cj/mcp--state= to its final value (=ready=, =partial=, or - =failed=). -- *Polling (authoritative state).* =cj/mcp--poll-status= walks - =mcp-server-connections= and maps each entry's =mcp--status= to - =cj/mcp--server-status=. Called from the hub callback and from - the stall timer. Servers that transitioned through - =:error-callback= (which the hub doesn't chain into the inited - callback) show up here. - -Properties: - -- =cj/mcp-ensure-started= returns in <100 ms regardless of - subprocess state. -- =mcp-hub-start-all-server= itself is async (third =SYNCP= arg - is =nil=). -- Servers transition independently; tools land in =gptel-tools= - as each server reports inventory. -- A server that never connects, never errors, and never reports - tools is caught by the stall timer. - -** Tool registration pipeline - -Walks =mcp-server-connections= directly, per server, after each -status poll. This gives clean per-server bookkeeping without -parsing the =mcp-SERVER= category prefix: - -#+begin_src emacs-lisp -(defun cj/mcp--register-server-tools (server-name) - "Register every tool from the connected server SERVER-NAME. -Idempotent: re-registration replaces the function pointer -without duplicating menu entries." - (let ((connection (gethash server-name mcp-server-connections))) - (when (and connection - (eq (cj/mcp--server-status connection) 'connected)) - ;; First, deregister any existing tools for this server. - (cj/mcp--deregister-server-tools server-name) - (dolist (raw-tool (cj/mcp--server-tools connection)) - (cj/mcp--register-tool server-name raw-tool)) - (cj/mcp--update-server-status server-name :state 'ready)))) - -(defun cj/mcp--register-tool (server-name raw-tool) - "Register one tool from SERVER-NAME. -RAW-TOOL is the plist from `mcp--tools' (untransformed)." - (let* ((remote-name (plist-get raw-tool :name)) - (gptel-name (format "mcp__%s__%s" server-name remote-name)) - (description (cj/mcp--normalize-description - server-name raw-tool)) - (confirm-p (cj/mcp--confirm-p gptel-name remote-name)) - ;; mcp-make-text-tool builds the closure; we async by default. - (mcp-plist (mcp-make-text-tool server-name remote-name t)) - ;; Rewrite name + description + confirm after mcp.el builds the closure. - (gptel-plist (cj/mcp--rewrite-plist - mcp-plist - :name gptel-name - :description description - :confirm confirm-p - :async t - :category (format "mcp-%s" server-name)))) - (apply #'gptel-make-tool gptel-plist) - (add-to-list 'gptel-tools - (gptel-get-tool - (list (format "mcp-%s" server-name) gptel-name))) - (push gptel-name - (gethash server-name cj/mcp--registered-tools)))) -#+end_src - -Key properties: - -- *Async by default.* All MCP tools register with =:async t=. - This avoids any sync MCP tool call blocking Emacs during - =gptel-send='s tool dispatch. Per-call timeout uses the - timer-race pattern (next subsection). -- *Closure preserves remote name.* =mcp-make-text-tool= built - the function before we rewrote =:name=, so the closure calls - =mcp-call-tool SERVER REMOTE-NAME=, not the prefixed - =mcp__SERVER__TOOL=. -- *Idempotent.* Each registration deregisters first, so - callbacks firing multiple times or restarts don't accumulate - duplicate entries. -- *Description normalization.* See next subsection. - -** Per-call timeout: explicit timer/callback race - -=with-timeout= only supervises the dynamic extent of the form it -wraps. For async tools where the function returns immediately -and the callback fires later, it does nothing. The correct -pattern is an explicit timer: - -#+begin_src emacs-lisp -(defun cj/mcp--wrap-async-with-timeout (server-name remote-name - mcp-callback) - "Return a gptel-compatible async wrapper for an MCP tool. -GPTel calls the returned function with (CALLBACK . ARGS). -The wrapper races MCP's response against `cj/mcp-tool-timeout'; -whichever fires first wins, and late callbacks are ignored." - (lambda (gptel-callback &rest args) - (let* ((done nil) - (timer (run-at-time cj/mcp-tool-timeout nil - (lambda () - (unless done - (setq done t) - (funcall gptel-callback - (format "MCP tool %s/%s \ -timed out after %ds" - server-name - remote-name - cj/mcp-tool-timeout))))))) - (mcp-async-call-tool - (gethash server-name mcp-server-connections) - remote-name - (cj/mcp--args-to-plist args) - (lambda (result) - (cancel-timer timer) - (unless done - (setq done t) - (funcall gptel-callback - (mcp--parse-tool-call-result result)))) - (lambda (code message) - (cancel-timer timer) - (unless done - (setq done t) - (funcall gptel-callback - (format "MCP error %s: %s" code - (cj/mcp--redact message))))))))) -#+end_src - -Both branches set =done= before invoking gptel's callback so a -late response (e.g., MCP responds after the timer fired but -before its sentinel cancels) doesn't deliver twice. Timer is -canceled on the success and error paths. - -The closure that =mcp-make-text-tool= built gets replaced with -this wrapper during registration (the =:function= slot of the -rewritten plist). - -** Confirmation / safety policy - -All enabled tools are registered (per the goal of full -visibility), but write/destructive tools get =:confirm t=. -Classification is name-based: - -#+begin_src emacs-lisp -(defconst cj/mcp--write-name-patterns - '("\\`create\\b" "\\`update\\b" "\\`delete\\b" "\\`remove\\b" - "\\`send\\b" "\\`post\\b" "\\`add\\b" "\\`move\\b" - "\\`invite\\b" "\\`share\\b" "\\`upload\\b" "\\`set\\b" - "\\`patch\\b" "\\`import\\b" "\\`sync\\b" "\\`merge\\b" - "\\`close\\b" "\\`reopen\\b" "\\`archive\\b" "\\`unarchive\\b" - "\\`approve\\b" "\\`reject\\b" "\\`label\\b" "\\`assign\\b" - "\\`reply\\b" "\\`comment\\b" "\\`trash\\b" "\\`restore\\b" - "\\`pin\\b" "\\`unpin\\b" "\\`copy\\b" "\\`rename\\b")) - -(defconst cj/mcp--read-name-patterns - '("\\`get\\b" "\\`list\\b" "\\`read\\b" "\\`search\\b" - "\\`find\\b" "\\`fetch\\b" "\\`view\\b" "\\`query\\b" - "\\`describe\\b" "\\`show\\b" "\\`check\\b")) - -(defcustom cj/mcp-tool-confirm-overrides nil - "Per-tool confirmation overrides. -Alist mapping fully qualified MCP tool name (e.g., -\"mcp__linear__create_issue\") to t or nil. Wins over the -pattern-based classifier." - :type '(alist :key-type string :value-type boolean) - :group 'cj) - -(defun cj/mcp--confirm-p (gptel-name remote-name) - "Return non-nil if the tool should register with `:confirm t'." - (let ((override (assoc gptel-name cj/mcp-tool-confirm-overrides))) - (cond - (override (cdr override)) - ((seq-some (lambda (p) (string-match-p p remote-name)) - cj/mcp--write-name-patterns) t) - ((seq-some (lambda (p) (string-match-p p remote-name)) - cj/mcp--read-name-patterns) nil) - (t t)))) ; unknown → confirm -#+end_src - -This is the second half of the safety story; the first half -(=gptel-confirm-tool-calls 'auto=) is enforced by ai-mcp.el at -load time. - -** Description normalization - -mcp-side descriptions are written for an arbitrary client and -vary in quality. The registration pipeline prefixes a stable -"server / write-risk" header so the agent and the user have -consistent context: - -#+begin_src emacs-lisp -(defun cj/mcp--normalize-description (server-name raw-tool) - "Return a normalized description string for RAW-TOOL. -Prefix: [SERVER] for reads, [SERVER WRITE] for writes, -[SERVER ?] for unknown classification. Then the upstream -description, unchanged." - (let* ((remote-name (plist-get raw-tool :name)) - (upstream (or (plist-get raw-tool :description) - "(no description provided by server)")) - (cls (cond - ((seq-some (lambda (p) (string-match-p p remote-name)) - cj/mcp--write-name-patterns) "WRITE") - ((seq-some (lambda (p) (string-match-p p remote-name)) - cj/mcp--read-name-patterns) "") - (t "?")))) - (format "[%s%s] %s" server-name - (if (string-empty-p cls) "" (concat " " cls)) - upstream))) -#+end_src - -Tools in =gptel-menu= show as: - -#+begin_example -[linear WRITE] Create a new Linear issue in a team. -[linear] List issues in a Linear team. -[google-keep ?] Frobnicate a note (unknown classification). -#+end_example - -** Tool deregistration - -Three triggers remove tools from =gptel-tools=: - -- =cj/mcp-stop-all= -- removes every MCP-registered tool, clears - =cj/mcp--registered-tools=, calls =mcp-stop-server= per server. - Local tools untouched. -- =cj/mcp-restart-server= -- removes tools for the named server - before re-registering. -- =cj/mcp-restart-failed= -- deregister + re-register each - =failed= server. - -#+begin_src emacs-lisp -(defun cj/mcp--deregister-server-tools (server-name) - "Remove every GPTel tool this integration registered for SERVER-NAME. -Local tools (those not in `cj/mcp--registered-tools') are -preserved." - (let ((mcp-tools (gethash server-name cj/mcp--registered-tools))) - (dolist (tool-name mcp-tools) - (setq gptel-tools - (cl-remove-if (lambda (tool) - (string= (gptel-tool-name tool) tool-name)) - gptel-tools))) - (remhash server-name cj/mcp--registered-tools))) -#+end_src - -** Secrets redaction - -=cj/mcp--redact= masks every secret-bearing field before any -string surfaces in messages, errors, status buffers, hub -displays, or test fixtures. Used by status formatting, audit -buffer, failure surfacing, and error wrappers. Tests assert -sentinel =REDACTED_TEST_SECRET= never appears in any user-facing -output. - -** Process cleanup - -=kill-emacs-hook= gets one entry: =cj/mcp-stop-all=. Stdio -process sentinels record abnormal exits into the per-server -status. - -*Local-only constraint.* MCP server processes are always spawned -under the local Emacs's =default-directory='s root. TRAMP / -remote buffers do not relocate the spawn; MCP processes are -local-only. This is enforced implicitly (mcp-hub-start-all-server -uses =make-process= which is local) and documented in the -commentary. - -** Privacy: external tool output in saved conversations - -MCP tool results land in the GPTel buffer. GPTel's autosave (when -enabled) persists those results to -=~/.emacs.d/ai-conversations/=. Concrete implications: - -- A Slack channel excerpt the agent fetched is now on disk. -- A Google Docs snippet the agent quoted is now on disk. -- A Linear issue body the agent read is now on disk. - -This is normal external-tool behavior, but users may not realize -it. Two mitigations: - -- =ai-mcp.el='s commentary explicitly documents the autosave - privacy implication. -- The audit buffer (=cj/mcp-list-tools=) includes a header note: - "Tool results land in =gptel-tools= responses; saved - conversations persist them. Use =cj/gptel-autosave-toggle= per - buffer to opt out." - -A future enhancement (V1.5+) could mark MCP-sourced tool output -with a visible delimiter in the GPTel buffer so the user sees -"this came from an external server" during the chat. Not v1. - -** Per-server auth matrix - -| =:auth= value | Servers | How it works | Recovery if failed | -|---------------+---------+--------------+--------------------| -| =in-protocol= | linear, notion | mcp.el HTTP transport handles OAuth handshake | Open URL surfaced in =cj/mcp-status= | -| =local= | slack-deepsat | Local SSE; no auth but proxy must run | Start the local proxy | -| =none= | drawio | No auth | n/a | -| =args-token= | figma | API key in process args | Update Claude config, restart | -| =oauth= | google-calendar, google-docs-* | OAuth token in env; refresh out-of-band | Re-auth via Claude Code, restart | -| =token= | google-keep | Long-lived token in env | Regenerate, update Claude config, restart | - -Recovery surfaces via pattern matching on errors: - -#+begin_src emacs-lisp -(defconst cj/mcp--recovery-patterns - '(("\\(401\\|unauthorized\\|token expired\\)" - . "Token expired -- re-auth via Claude Code, then C-; a C r SERVER") - ("\\(connection refused\\|ECONNREFUSED\\)" - . "Local endpoint unreachable -- check the upstream service is running") - ("\\(ENOENT\\|command not found\\)" - . "Missing dependency -- run `cj/mcp-doctor' to diagnose"))) -#+end_src - -** Timeouts - -| Timeout | Variable | Default | Behavior | -|---------+----------+---------+----------| -| Startup / tool discovery | =cj/mcp-startup-timeout= | 30 s | Server marked =failed/timeout=; integration continues with ready servers | -| Per-call tool execution | =cj/mcp-tool-timeout= | 60 s | Tool call returns timeout string to agent via the timer-race wrapper; server stays connected | - -Both via defcustoms. Per-tool override possible via -=cj/mcp-tool-timeout-overrides= alist. - -* Status UX - -** Echo-area summary: =cj/mcp-status= - -Single-line summary keyed off =cj/mcp--state=: - -| State | Echo | -|-------+------| -| idle | =MCP: not started. (C-; a t triggers it.)= | -| starting | =MCP: starting (3/9 ready)...= | -| partial | =MCP: 7/9 ready (failed: google-calendar, slack-deepsat).= | -| ready | =MCP: 9/9 ready, 87 tools registered.= | -| failed | =MCP: all 9 servers failed. C-; a C d to diagnose.= | - -** Wait until ready: =cj/mcp-wait-until-ready= - -For the case where the agent asks for a Calendar tool immediately -after =cj/toggle-gptel=: - -#+begin_src emacs-lisp -(defun cj/mcp-wait-until-ready (&optional timeout) - "Block until MCP startup completes or TIMEOUT seconds pass. -TIMEOUT defaults to `cj/mcp-startup-timeout'. Returns the final -state symbol (`ready', `partial', `failed', `starting')." - (interactive) - ...) -#+end_src - -Bound to =C-; a C w=. Reports progress every second via -=message= so the user sees countdown. - -** Audit buffer: =cj/mcp-list-tools= - -Tabulated-list buffer. Failed servers appear at the top with a -red face so they're visually obvious. Each row: - -#+begin_example -Server State Tools Confirm Description --------------------- -------- ----- ------- ------------------------------ -google-calendar FAILED 0 - Token expired; see status -slack-deepsat FAILED 0 - Local proxy unreachable --------------------- -------- ----- ------- ------------------------------ -linear/list_issues ready - no List issues in a Linear team -linear/create_issue ready - YES Create a new Linear issue -linear/... ready 44 total -... -#+end_example - -Sort: failed servers first, then by server name, then by tool -name. Keys: =g= refresh, =RET= jump to tool's category in -gptel-menu, =r= restart server under point, =c= toggle confirm -override for tool under point. - -* Commands & Keymap - -=C-; a C= becomes the MCP (Connect) subprefix. Existing keys are -preserved: =M= keeps =gptel-menu=, =m= keeps -=cj/gptel-change-model=. - -| Key | Command | Purpose | -|-----+---------+---------| -| =C-; a C h= | =cj/mcp-hub= | Open server-management buffer | -| =C-; a C s= | =cj/mcp-status= | Echo state summary | -| =C-; a C l= | =cj/mcp-list-tools= | Open audit buffer | -| =C-; a C r= | =cj/mcp-restart-failed= | Restart failed servers | -| =C-; a C R= | =cj/mcp-restart-server= | Restart a named server | -| =C-; a C S= | =cj/mcp-stop-all= | Stop everything | -| =C-; a C d= | =cj/mcp-doctor= | Diagnose prerequisites | -| =C-; a C w= | =cj/mcp-wait-until-ready= | Block until ready | - -which-key labels mirror the table. - -** cj/mcp-doctor - -Diagnostic command. Two modes: - -- *Static* (default) -- no side effects, no network: capability - check, =npx=/=uvx= on PATH, Claude config parseability, - per-server env-var presence, local endpoint reachability. -- *Live* (=C-u C-; a C d=) -- opt-in: invokes a single safe read - per auth class to verify OAuth tokens haven't silently expired. - For example, =gh_search= against linear is one tool call; same - for notion, google-calendar, google-docs, google-keep. Static - checks first; live probe only fires if static passes. - -Output buffer keys: - -- =c= copy diagnostic summary to kill ring (for pasting into - bug reports / notes). -- =r= rerun all failed checks. -- =q= quit. - -Each check row formats as: =PASS / FAIL / WARN CHECK RECOVERY=. - -* Implementation Plan - -Eight phases (was seven in rev 2; added Phase 1.5 for the -confirmation-contract setup). Each ends with green ERT tests + -a manual smoke test before the next. - -** Phase 1 -- Module + pure helpers - -Create =modules/ai-mcp.el=. Implement: =cj/mcp-server-specs=, -=cj/mcp-enabled-servers=, =cj/mcp-start-on-entry-points=, -=cj/mcp--read-claude-config=, =cj/mcp--get-env=, -=cj/mcp--build-server-alist= (pure transformer; filters by -=cj/mcp-enabled-servers=), =cj/mcp--redact=, -=cj/mcp--confirm-p=, =cj/mcp--normalize-description=. - -Tests cover all of the above against fixtures. - -** Phase 1.5 -- Confirmation contract - -Flip =gptel-confirm-tool-calls= to ='auto= in =ai-mcp.el='s -setup. Remove the =(setq gptel-confirm-tool-calls nil)= line -from =ai-config.el=. Audit existing local tools and add -=:confirm t= to any that should be gated (=web_fetch= -guaranteed; =write_text_file=, =update_text_file=, -=move_to_trash= per Craig's decision). - -Verification test in =tests/test-ai-mcp-confirm-contract.el= -asserts the setting, the local-tool gating behavior, and that -=git_log= (=:confirm nil=) still runs without prompting. - -** Phase 2 -- Compat layer + tool registration with fake inventory - -Add =ai-mcp-compat= helpers. Build the registration pipeline -against a stubbed =mcp-server-connections=. Verify: -- Tool name rewriting (=remote-name= stays in closure; - =gptel-name= is =mcp__SERVER__TOOL=). -- =gptel-make-tool= + explicit =add-to-list 'gptel-tools=. -- =:confirm= application per policy + overrides. -- Description normalization adds expected prefix. -- =cj/mcp--registered-tools= bookkeeping. -- Deregister removes from =gptel-tools= without disturbing local - tools (test pre-populates =gptel-tools= with a local tool and - asserts it survives). -- Re-register after deregister doesn't duplicate. -- All tools register with =:async t=. - -** Phase 3 -- Async state machine + timeout wrapper - -Implement =cj/mcp-ensure-started=, =cj/mcp--on-hub-callback=, -=cj/mcp--state= transitions, stall timer, polling. Implement -=cj/mcp--wrap-async-with-timeout=. Stub -=mcp-hub-start-all-server= with synthetic delayed callbacks and -synthetic async errors. - -Verify: -- =cj/mcp-ensure-started= returns in <100 ms regardless of stubs. -- Hub callback triggers status poll + registration. -- Stall timer fires for stuck servers. -- Async error path (=:error-callback= without inited callback) - reaches =cj/mcp--server-status= via polling. -- =cj/mcp--wrap-async-with-timeout=: timer-first ignores late MCP - response; MCP-first cancels timer; both branches deliver - exactly once. - -** Phase 4 -- First real connection (no auth) - -Wire one =drawio= or =slack-deepsat= server. Verify the stubbed -Phase 3 behavior matches real subprocesses. - -** Phase 5 -- Status UX + commands + doctor (static) - -Implement =cj/mcp-status=, =cj/mcp-list-tools= (with failed -servers at top, red face), =cj/mcp-doctor= (static mode only), -=cj/mcp-wait-until-ready=, restart commands, keymap entries, -which-key labels. - -Investigate =gptel-menu= refresh behavior: if the transient is -already open when a new tool registers, does it pick it up on -next invocation? Document and add an acceptance test: -"register new tool while gptel-menu is open; close and reopen; -new tool appears." - -** Phase 6 -- HTTP servers - -Add =linear= and =notion=. Test in-protocol OAuth handshake. -Add live-auth-check mode to doctor. - -** Phase 7 -- Env-dependent stdio servers - -Add =figma=, =google-calendar=, =google-docs-personal=, -=google-docs-work=, =google-keep=. - -** Phase 8 -- Privacy + audit polish - -Add audit buffer's privacy header, autosave commentary, audit-log -hygiene (=cj/mcp-tool-audit-log-enabled= defcustom). Update -=ai-mcp.el= commentary with the code-organization outline as a -TOC. - -* Test Plan - -** Confirmation contract (Phase 1.5) - -- =gptel-confirm-tool-calls= is ='auto= after =ai-mcp= loads. -- Write-classified MCP tool with =:confirm t= triggers confirm - prompt (stub gptel's prompt and assert it fires). -- Read-classified MCP tool with =:confirm nil= does not trigger - prompt. -- Pre-existing =git_log= (=:confirm nil=) does not trigger - prompt. -- =web_fetch= (newly gated with =:confirm t=) triggers prompt. - -** Pure helpers (Phase 1-2) - -- =cj/mcp--read-claude-config=: good fixture, missing, unreadable, - malformed JSON, missing =:mcpServers=, missing server, empty - env, non-string env values. Cache invalidation on mtime - change. -- =cj/mcp--build-server-alist=: each transport, each auth class, - env merge, args splicing for figma, no mutation of - =cj/mcp-server-specs=, filter by =cj/mcp-enabled-servers=. -- =cj/mcp--redact=: bearer tokens, OAuth credentials, - TOKEN/KEY/SECRET/CREDENTIALS-suffixed env vars, URL query - tokens, figma args slot. Sentinel never leaks. -- =cj/mcp--confirm-p=: read patterns, write patterns, unknown → - t, override map wins. -- =cj/mcp--normalize-description=: prefix shape per - classification. - -** Registration pipeline (Phase 2) - -- Single tool registers into all three structures. -- Two tools with same =:name= from different servers don't - collide. -- Re-registration replaces function pointer without duplicating. -- Deregister removes from =gptel-tools= without touching a - pre-populated local tool. -- All tools register with =:async t=. -- Confirm overrides win over patterns. - -** Compat layer (Phase 2) - -- Each =cj/mcp--*-server-*= helper returns expected value against - stub object. -- =cj/mcp--assert-capabilities= signals when a required function - is missing. - -** State machine + timeout (Phase 3) - -- =cj/mcp-ensure-started= from idle returns in <100 ms with - delayed-callback stub. -- Second call from =starting= is no-op. -- Hub callback triggers status poll. -- Stall timer marks slow servers =failed/timeout=. -- Async error path: server emits error callback only (no inited - callback); polling catches it and marks =failed/error= within - stall window. -- =cj/mcp--wrap-async-with-timeout=: - - MCP responds first, before timer: gptel callback fires once - with result; timer canceled. - - Timer fires first: gptel callback fires once with timeout - message; late MCP response is ignored (done flag). - - Error callback: cancels timer, fires once with redacted - error. - -** Async / freeze (Phase 3) - -- Stub =mcp-hub-start-all-server= to delay callbacks 5 s. - =cj/toggle-gptel= returns within 250 ms; buffer is - interactive. -- Stub a server that never responds. After - =cj/mcp-startup-timeout=, server is =failed=, - =cj/mcp--state= is =partial= or =failed=. - -** Entry-point policy (Phase 5) - -- With =cj/mcp-start-on-entry-points '(toggle-gptel)=, calling - =cj/toggle-gptel= triggers startup; calling - =cj/gptel-quick-ask= does not. -- Adding =gptel-quick-ask= to the list makes quick-ask trigger. - -** Local-tool preservation (Phase 2) - -- =cj/mcp-stop-all= removes only MCP-registered tools from - =gptel-tools=; local tools like =git_log= remain. -- =cj/mcp-restart-server= removes only that server's tools; - other MCP servers' tools and local tools both remain. - -** Process / cleanup (Phase 4+) - -- =kill-emacs-hook= calls =cj/mcp-stop-all=; subprocesses exit. -- =cj/mcp-stop-all= clears =gptel-tools= MCP entries and - =cj/mcp--registered-tools=. -- Restart doesn't leak process buffers or duplicate process - objects. -- Process sentinel records abnormal exits into status. - -** Partial availability (Phase 4+) - -- 8 of 9 servers ready, 1 failed: ready tools available; - failed-server tools absent. -- Restart-failed only retries the failed one. - -** Saved-conversation behavior (Phase 7+) - -- After a successful MCP tool call, GPTel autosave (when on) - persists the tool result. Test asserts the saved file - contains the result text and the audit buffer's privacy - header is updated. -- =cj/gptel-autosave-toggle= off → result not saved. - -** Keymap (Phase 5) - -- =C-; a C h/s/l/r/R/S/d/w= all bound after =ai-mcp= loads. -- Existing =C-; a M=, =C-; a m= still bound to - =gptel-menu=, =cj/gptel-change-model=. -- which-key labels present for every new binding. -- No duplicate labels. - -** =gptel-menu= refresh (Phase 5) - -- Register new tool while a previously-opened gptel-menu is - closed; reopen; new tool appears. Document whether mid-open - refresh works. - -** No-real-process rule - -All tests in =tests/test-ai-mcp*= use stubs for =process-file=, -=make-process=, =mcp-hub-start-all-server=, -=mcp-server-connections=, =mcp--tools=, =mcp--status=. No real -=npx=, no network, no real =~/.claude.json=. - -** Manual test matrix - -| Scenario | Expected | -|----------+----------| -| No =~/.claude.json= | Doctor warns; env-free servers still start | -| Malformed Claude config | Status shows =malformed-json=; integration =failed= cleanly | -| Network offline | HTTP servers fail; stdio servers start; status =partial= | -| =npx= not on PATH | Doctor flags it; stdio servers fail with clear message | -| One stdio server exits immediately | Sentinel records failure; others continue | -| slack-deepsat endpoint down | Server =failed=; recovery message points at local proxy | -| Google token expired | Server starts; tool calls fail; live-auth check (=C-u doctor=) surfaces it | -| All servers available | =MCP: 9/9 ready, ~N tools registered= | -| =cj/mcp-restart-failed= after fix | Only retried servers transition | -| =cj/mcp-stop-all= then call a tool | Tool absent from =gptel-tools= | -| Disable a server via defcustom | Doctor and status reflect the absence | -| TRAMP buffer open + =cj/toggle-gptel= | MCP starts locally; no remote spawn | - -* Acceptance Criteria - -1. *No freeze.* =cj/toggle-gptel= returns in <250 ms with mcp.el - wired and nine real servers starting in background. -2. *Incremental registration.* As each server reports tools, - =gptel-tools= updates; in-flight =gptel-send= calls see - newly-added tools on the next request. -3. *No MCP failure breaks ordinary GPTel chat.* With every MCP - server failing, =cj/toggle-gptel= still opens a usable chat - buffer; non-tool prompts work normally; local tools (git_log - etc.) still callable. -4. *Confirm gate works.* After =ai-mcp= loads, a write-classified - MCP tool actually prompts before invocation. Verified by a - test that fails if =mcp-async-call-tool= is invoked before - gptel's confirm-prompt stub fires. -5. *Local-tool preservation.* =cj/mcp-stop-all= and per-server - restart remove only MCP-owned tools. -6. *Partial availability.* With one failed server, status is - =partial=, ready servers' tools work, failed server's tools - absent. -7. *Idempotent restart.* Calling =cj/mcp-restart-failed= twice - with no intervening change produces identical state. -8. *No secret leakage.* Grep every user-facing output for - sentinel fixture secrets; zero matches. -9. *Doctor coverage (static).* Identifies each diagnosable - failure in the manual test matrix. -10. *Server enablement.* Setting =cj/mcp-enabled-servers= to a - subset starts only those servers; doctor reports the disabled - ones as expected-absent. - -* Risks - -** R1 -- mcp.el API drift behind compat layer - -Even with the compat layer, an upstream rename could break us if -the capability check misses it. - -*Mitigation:* compat helpers document the upstream commit and -file location. Tests cover each helper against stub objects; -when mcp.el bumps, run those tests first. - -** R2 -- Cold-start latency for nine subprocesses - -Nine =npx -y= invocations cold-start over several seconds. Time -to =ready= state may be 10-30 seconds on a cold machine. - -*Mitigation:* async model means user doesn't wait. Tools arrive -incrementally; status indicator shows progress. -=cj/mcp-wait-until-ready= for the rare case where the agent needs -a specific tool immediately. - -** R3 -- OAuth token expiry surfaces silently - -A Google server starts cleanly but every tool call fails with -auth errors. - -*Mitigation:* the OAuth recovery pattern matcher inspects every -tool-call error. Live-auth-check mode in doctor proactively -calls one safe read per auth class. - -** R4 -- Tool count balloons gptel-menu - -Up to 100+ tools. Even with category grouping, the transient -menu is large. - -*Mitigation:* =cj/mcp-enabled-servers= lets users disable -servers they don't need. Audit buffer is the alternate browser. -Per-conversation profiles deferred to v1.5. - -** R5 -- =~/.claude.json= schema change - -Parser breaks if Anthropic restructures the file. - -*Mitigation:* =cj/mcp--read-claude-config= returns structured -errors that surface in status and doctor. Integration degrades -to "env-free servers work, env-dependent servers fail" rather -than crashing. - -** R6 -- Process argument leakage (figma) - -figma's API key is in process args -- visible via =ps=, -=/proc/PID/cmdline=, =list-system-processes=. - -*Mitigation:* accepted risk (the figma package only supports -args-token). =cj/mcp--redact= ensures the key never appears in -Emacs-side output. Commentary flags this. - -** R7 -- Confirmation fatigue from unknown-classification tools - -Default for unknown is =:confirm t=. A server with many tools -matching neither read nor write pattern produces many prompts. - -*Mitigation:* audit buffer surfaces unknown classifications with -their confirm state. =cj/mcp-tool-confirm-overrides= alist lets -the user pin specific tools to =nil= once vetted. A "review -unknowns" doctor pass could enumerate them on demand (v1.5). - -** R8 -- Subprocess accumulation across sessions - -If =kill-emacs-hook= is bypassed (kill -9, crash), subprocesses -persist. - -*Mitigation:* =cj/mcp-doctor= can detect orphaned mcp processes -via =list-system-processes= (v1.5 enhancement). - -* Open Questions - -** Q1 -- Should =gptel-menu= refresh after mid-call tool registration? - -Investigation during Phase 5. If gptel-menu's transient caches -=gptel-tools= at open time, mid-call additions won't appear -until close+reopen. Document the behavior; if it's a real -pain, file a gptel upstream issue. - -** Q2 -- Should write-confirmation overrides be per-host? - -A v1.5 question: when per-conversation profiles land, the -override alist could also be scoped (e.g., "in /work/ folder, -auto-confirm google-docs-work writes"). Out of v1 scope. - -** Q3 -- Auth-source migration path for OAuth tokens - -Three candidate paths (Elisp OAuth client / Claude Code refresh -script / timer-based refresh) all have meaningful complexity. -Until one is viable, =~/.claude.json= stays the source. - -** Q4 -- Live-auth check cadence - -Doctor's live-auth mode is opt-in. Should there be a periodic -auto-check (every N hours via timer) that catches expiry between -explicit doctor runs? Adds complexity; defer until usage shows -need. - -* Considered Alternatives - -** =gptel-mcp.el= (declined; cited as prior art) - -[[https://github.com/lizqwerscott/gptel-mcp.el][lizqwerscott/gptel-mcp.el]] is a 96-line wrapper from the same -author as mcp.el that bridges mcp.el's tool inventory into gptel. -It exposes five functions: - -- =gptel-mcp-register-tool= -- walks =mcp-hub-get-all-tool= and - calls =gptel-make-tool= on each plist. -- =gptel-mcp-activate-all-tool= -- pushes tools into - =gptel-tools=. -- =gptel-mcp-deactivate-all-tool= -- removes them by category + - name lookup. -- =gptel-mcp-start-all-server-and-register= -- chains - =mcp-hub-start-all-server= with the register callback. -- =gptel-mcp-dispatch= -- a transient menu with three keys (=s= - start, =A= activate, =C= deactivate). - -*Why this matters for the spec.* The package independently -validates the integration shape this spec converged on: -=mcp-hub-start-all-server= → walk connections → =gptel-make-tool= -→ =add-to-list gptel-tools= is the canonical path. - -*Why we are not adopting it.* The package solves the trivial -"wire tools through" problem but skips every concern this spec -exists to address: - -| Concern | Spec | gptel-mcp.el | -|---------+------+--------------| -| GPTel confirmation contract (=gptel-confirm-tool-calls 'auto=) | Required precondition | Not addressed | -| Tool-name collisions | Rewrites to =mcp__SERVER__TOOL= | Silent overwrite | -| Confirm-on-write policy | Per-tool =:confirm t= for writes | All tools register with default | -| Async startup contract | =cj/mcp-ensure-started= returns in <100 ms | Synchronous-feel start | -| Async per-call timeout | Explicit timer/callback race | None | -| State machine | =cj/mcp--state= + per-server status | None | -| Server identity strategy | Walks =mcp-server-connections= directly | Uses =mcp-hub-get-all-tool= + parses =mcp-SERVER= | -| Secrets handling | Reads env from =~/.claude.json= with mtime cache | None | -| Deregistration tracking | =cj/mcp--registered-tools= hash, preserves local tools | Removes by category, no local-tool guarantee | -| Server enablement | =cj/mcp-enabled-servers= defcustom | None | -| Entry-point scoping | =cj/mcp-start-on-entry-points= defcustom | Manual via dispatch menu | -| Status UX | =cj/mcp-status=, =cj/mcp-list-tools= audit buffer | Just dispatch menu | -| OAuth recovery | Pattern matcher with per-auth-class recovery | None | -| Secrets redaction | =cj/mcp--redact= applied everywhere | None | -| mcp.el compat layer | Isolated wrappers around private API | Direct =mcp--*= access scattered | -| Tests | 10 acceptance criteria + manual matrix + no-real-process | None | -| Doctor / live-auth check | Static + live-probe diagnostic | None | - -Adopting it would force shipping safely or wrapping with -everything specified above (two layers, no code reduction). - -*What we are taking from it.* Confidence the API path is right. -The transient-dispatch UX was considered for the keymap (rev 2), -but the keymap is now pinned to discrete commands under =C-; a -C= so existing GPTel keys aren't disturbed (rev 3). - -* References - -- [[file:../../modules/ai-config.el][modules/ai-config.el]] -- =gptel-confirm-tool-calls nil= at - line 386 (removed by this integration); loader at lines 71-96. -- [[file:gptel-tools-shortlist.org][gptel-tools-shortlist.org]] -- local-tools shortlist; MCP servers - slot in as the "external" tier. -- [[file:gptel-agentic-tool-ideas.org][gptel-agentic-tool-ideas.org]] -- broader agentic-tool design. -- [[id:a124dd0f-1f40-4533-aeb8-595d93e20865][gptel-gh-tool-spec.org]] -- sibling design; same confirm-on-write - pattern. -- [[https://github.com/lizqwerscott/mcp.el][lizqwerscott/mcp.el]] -- upstream. -- [[https://github.com/lizqwerscott/gptel-mcp.el][lizqwerscott/gptel-mcp.el]] -- considered and declined; see § - Considered Alternatives. -- =~/code/mcp.el/mcp-hub.el:53-90,131-160= -- verified callback - ownership and start-all helper. -- =elpa/gptel-0.9.8.5/gptel.el:1595-1607,2244-2245= -- verified - =gptel-confirm-tool-calls= semantics and tool-confirm gate. -- =elpa/gptel-0.9.8.5/gptel.el:1729-1820= -- verified - =gptel-make-tool= registration. -- =~/.claude.json= -- Claude Code config. diff --git a/docs/specs/messenger-unification-spec.org b/docs/specs/messenger-unification-spec.org index f8d3b4734..92985f596 100644 --- a/docs/specs/messenger-unification-spec.org +++ b/docs/specs/messenger-unification-spec.org @@ -189,6 +189,18 @@ directions. (registry, predicate, display rule, minor mode, dispatchers), TDD: ERT over the pure parts (registration shape, buffer matching, dispatch with stub fns, nil-verb error). Wire signel; retire its private display entry. + - /Smoke-first parity (Craig 2026-06-16)./ Signal is the least-built backend + and the only one whose UX Craig fully controls (no upstream package fighting + back), so the smoke rebuild is where the shape gets dialed in first: build + smoke to implement every core leaf (=j a u m k C Q=) and the in-buffer + chords natively, tune the feel against real use, and only then adapt telega + and slack to the now-proven contract. The guardrail: design the contract to + the /capability floor/, not to smoke's ceiling. Smoke can do anything, which + makes it the least representative backend — validate each core leaf against + the others' known limits as it is built (telega keeps its modal root buffer; + ERC has no threads/reactions/files; slack has no file upload or search), so + the reference does not bake in something a thinner backend can never match. + Rich verbs (=r e f /=) stay optional per-backend extensions, never core. - *Phase 2 — telega.* Registration + the decision-3 cancel ladder; audit what else the minor-mode map hides in =telega-chat-mode-map=. - *Phase 3 — slack.* Per decision 5; conform compose buffers either way. @@ -200,6 +212,132 @@ Each phase ends with a manual-test checklist filed under the "Manual testing and validation" parent in =todo.org= (placement, each chord, the not-supported message), per the verification discipline. +* Global Prefix Keybinding Alphabet (per-app) + +/DRAFT addition, Claude 2026-06-16, for Craig's review (his request: "put this/ +/in the spec and I'll review it"). Folds the per-action prefix-key analysis/ +/into the held-open spec./ + +This is the third of three keybinding surfaces, and the only one the rest of the +spec doesn't already cover. The first is the in-chat buffer chords (C-c C-c +confirm, C-c C-k cancel, C-c C-a attach) owned by =cj/messenger-mode=. The second +is the cross-app aggregate verb (decision 6's jump-to-unread, one global chord +that raises the newest unread conversation in /any/ backend). This third surface +is the per-app global prefix: each messenger hangs off =C-;= with its own second +key (=S= Slack, =M= Signal, =T= Telega, =E= ERC), and today the third key, the +action leaf, is ad hoc per app. The goal: one leaf alphabet so the same action +is the same final keystroke under every messenger. + +** The problem: the same key means different things today + +| Action | Slack (C-; S) | Signal (C-; M) | Telega (C-; T) | ERC (C-; E) | +|----------------------+---------------+----------------+----------------+-------------| +| Open a chat by name | C | m | unbound | c | +|----------------------+---------------+----------------+----------------+-------------| +| Directory: all | C | d | root buffer | b | +|----------------------+---------------+----------------+----------------+-------------| +| Directory: unread | c | none | unbound | unbound | +|----------------------+---------------+----------------+----------------+-------------| +| New DM / message | d | m | unbound | unbound | +|----------------------+---------------+----------------+----------------+-------------| +| Close this chat | unbound | none | C-x k | q | +|----------------------+---------------+----------------+----------------+-------------| +| Mark read + bury | q | none | unbound | n/a | +|----------------------+---------------+----------------+----------------+-------------| +| Connect / start | s | SPC | T = launch | C | +|----------------------+---------------+----------------+----------------+-------------| +| Disconnect / stop | S | q | Q (in-buffer) | Q | +|----------------------+---------------+----------------+----------------+-------------| + +Read down a column and the leaves are arbitrary; read across a row and they +disagree. Worse, the same letter collides on meaning: =C= opens Slack's roster +but connects an ERC server; =q= disconnects Signal, marks-read in Slack, and +parts a channel in ERC. There is no muscle-memory transfer between messengers. + +** Canonical per-app actions (spelled out) + +Daily verbs (per-conversation): open a specific chat by name; view the directory +of all chats/channels; view the directory of only unread / reply-needed chats; +message someone new; reply in a thread; close the current chat window; mark the +current chat read and bury it; jump to next / previous unread chat; add a +reaction; send an attachment; search messages. + +Session verbs (lifecycle): connect / bring online; disconnect / take offline; +open the dashboard / roster overview. + +** Proposed unified leaf alphabet + +Keep each app's second key (=S M T E=); make the third key identical across all +four so the action is the same tail-keystroke regardless of app. + +| Leaf | Action | Backends that can bind it today | +|-------+------------------------------+--------------------------------------| +| j | jump to / open a chat | Slack, Signal, Telega*, ERC | +|-------+------------------------------+--------------------------------------| +| a | directory: all chats | Slack, Signal, Telega, ERC | +|-------+------------------------------+--------------------------------------| +| u | directory: unread only | Slack, Telega*, ERC* (signel: gap) | +|-------+------------------------------+--------------------------------------| +| m | message someone new / DM | Slack, Signal, Telega, ERC* | +|-------+------------------------------+--------------------------------------| +| k | close (bury) this chat | all four (thin wrappers) | +|-------+------------------------------+--------------------------------------| +| SPC | mark read + bury | Slack, Telega* (others: gap) | +|-------+------------------------------+--------------------------------------| +| n / p | next / previous unread | Telega, ERC* (others: gap) | +|-------+------------------------------+--------------------------------------| +| r | reply in thread | Slack (others: protocol N/A) | +|-------+------------------------------+--------------------------------------| +| e | reaction / emoji | Slack, Telega (others: protocol N/A) | +|-------+------------------------------+--------------------------------------| +| f | attach file | Telega (others: protocol N/A) | +|-------+------------------------------+--------------------------------------| +| / | search | Telega in-chat (others: gap) | +|-------+------------------------------+--------------------------------------| +| C | connect / start | all four | +|-------+------------------------------+--------------------------------------| +| Q | disconnect / stop | all four | +|-------+------------------------------+--------------------------------------| + +(* = the package command exists but needs a global wrapper or a binding added.) + +The core seven (=j a u m k C Q=) are the unifiable floor: every backend can +support them (once signel's gaps are filled). The richer verbs (=r e f /=) bind +only where the protocol and package allow; which-key then shows fewer options +under a thinner backend, which is honest rather than confusing. An unsupported +leaf is simply absent under that app's prefix, the same "nil verb = not +supported" stance the registry already takes for the in-buffer chords. + +** How this rides the registry + +These leaves are the global-prefix face of the same verb set decision 6 is +already growing. jump-to-unread, jump-to-chat-picker, and mark-read-and-bury in +decision 6 map directly to =u= (or the cross-app aggregate), =j=, and =SPC= +here. The registry should carry an optional per-backend command for each leaf +(=:open=, =:roster=, =:unread=, =:message=, =:close= ...), and the library +builds each app's =C-; <key>= submap from whatever the backend registered, so a +new verb lands everywhere in one place, exactly as the in-buffer verbs do. A +backend that registers nil for a leaf gets no binding for it. + +** A caveat on visual UX (keys unify cleanly; rendering does not) + +The leaf can be identical, but "open the directory" still /looks/ different per +backend, because the packages have different display models: Slack and ERC pop a +minibuffer completing-read picker; Telega and (smoke) Signal show a persistent +roster buffer. The bottom-drawer placement rule unifies where a /chat/ lands; +it does not make Slack grow a persistent roster. Unify the keys and the action +vocabulary; accept that the roster rendering differs per backend rather than +fighting each package's design. + +** Open questions for Craig + +1. The leaf letters: are =j a u m k C Q= (+ =r e f / n p SPC=) right, or do you + want different mnemonics (=o= open, =l= list, ...)? This is the muscle-memory + commitment, so it is yours to set before any binding lands. +2. Signal parity (now in scope per Craig 2026-06-16): the smoke rebuild is the + place to hit every core leaf natively. See the smoke-first note added to the + Phases section. + * Risks - Minor-mode shadowing in telega beyond C-c C-c — Phase 2 audits the C-c diff --git a/docs/specs/theme-studio-completion-preview-spec.org b/docs/specs/theme-studio-completion-preview-spec.org new file mode 100644 index 000000000..588f35a93 --- /dev/null +++ b/docs/specs/theme-studio-completion-preview-spec.org @@ -0,0 +1,211 @@ +#+TITLE: Theme Studio Minibuffer-Completion Preview — Spec +#+AUTHOR: Craig Jennings +#+DATE: 2026-06-23 +#+TODO: TODO | DONE SUPERSEDED CANCELLED + +* Metadata +| Status | Not ready — first Codex review found implementation-readiness blockers (2026-06-23) | +|----------+-----------------------------------------------------------------------------| +| Owner | Craig Jennings | +|----------+-----------------------------------------------------------------------------| +| Reviewer | Craig Jennings | +|----------+-----------------------------------------------------------------------------| +| Related | [[file:../../todo.org][todo.org: theme-studio minibuffer-completion preview]] | +|----------+-----------------------------------------------------------------------------| + +* Summary + +Add a single "minibuffer completion" entry to theme-studio's application dropdown that previews the completing-read surface — the prompt you see at M-x, switch-buffer, or a Slack channel pick. The entry composes itself from the package inventory: a vanilla *Completions* baseline that's always present, plus Vertico, Orderless, and Marginalia sections that appear only when those packages are installed. A container radio and modifier checkboxes let the designer flip between the vanilla list and Vertico and watch what orderless and marginalia layer on. It exists because the completion surface spans several packages plus two built-in faces, and theme-studio currently previews each package alone, so a designer can never see their actual prompt assembled. + +* Problem / Context + +theme-studio previews one package at a time. But a real completion prompt is composited from several sources at once: a frontend draws the candidate list (Vertico, or Ivy, or the built-in *Completions* buffer), a matching layer highlights the typed input (orderless, or ivy's own match faces, or the built-in completions-common-part), an annotation layer adds detail (marginalia, or ivy-rich), and two built-in faces — minibuffer-prompt and highlight — sit underneath everything (vertico-current inherits highlight, which is why a selected row with no highlight background looks the way it does). Nothing in theme-studio shows these together, so the designer assigns vertico-current, orderless-match-face-0..3, and minibuffer-prompt in three different places and never sees the result until they export, deploy, and open a real prompt. + +It also has to work for other people, not just this machine. A user of theme-studio may have none of these packages (stock Emacs, vanilla *Completions*), or Vertico+Orderless+Marginalia, or Ivy+Counsel, or any combination. The preview has to adapt to what's installed rather than assume one stack. + +* Goals and Non-Goals + +** Goals +- One application-dropdown entry that renders a faithful, composited completion prompt. +- Inventory-driven composition: only installed providers contribute sections; their faces come from the inventory, so the lists stay accurate per machine. +- A vanilla *Completions* baseline that is always present, including on a stock Emacs with no completion packages. +- A container radio (vanilla / Vertico) plus modifier checkboxes (orderless, marginalia) that reflect the real coupling — modifiers apply under the default/Vertico family and are inert under Ivy. +- The entry owns the completion-specific faces (completions-*, vertico-*, orderless-*, marginalia-*); the redundant generic vertico/orderless/marginalia entries are suppressed so each face is assignable in exactly one place. +- Faithful rendering: match faces stack over the current row, and face inheritance is honored (vertico-current shows highlight's value, no background). + +** Non-Goals +- No hover/click-to-locate in this spec — that's the separate preview-locate feature ([[file:theme-studio-preview-locate-spec.org][theme-studio-preview-locate-spec.org]]); this entry consumes it when it lands and uses a caption until then. +- Not in-buffer completion (company / corfu) — a different surface (popup while typing in a buffer), out of scope. +- Not non-completion minibuffer reads (read-string, y-or-n-p) — nothing to render. +- Not Swiper's in-buffer search highlight in the candidate list — Swiper is a search surface, not a candidate frontend. +- No change to theme-studio's face data model. + +** Scope tiers +- v1: the dropdown entry; the inventory composer; the vanilla baseline plus Vertico / Orderless / Marginalia sections; the container radio + modifier checkboxes; dedup of the generic vertico/orderless/marginalia entries; a caption naming the off-entry built-ins; the renderer plus browser-gates coverage. +- Out of scope: click/hover locate (preview-locate owns it); company/corfu; Swiper in the candidate list. +- vNext (log to todo.org): the Ivy / Counsel provider section; a separate Swiper "Search" preview; consult / embark sections. + +* Design + +** For the user + +You pick "minibuffer completion (vertico / orderless)" from the application dropdown — the same selector where org-faces, mu4e, and elfeed live. The pane shows a mock prompt: a prompt label, a typed query, and a list of candidate rows with one row selected, all drawn in your real theme-in-design colors. Above it sit controls. A container radio chooses how the list is drawn — vanilla *Completions* or Vertico (and Ivy, once that section exists) — and starts on your actual frontend. Modifier checkboxes — Orderless, Marginalia — toggle the match coloring and the annotations so you can see exactly what each one adds on top of the bare list. Flip the radio to vanilla and you see the stock *Completions* buffer that every Emacs has; tick Orderless and the matched characters light up; tick Marginalia and annotations appear. A caption under the preview names the two faces that come from elsewhere — minibuffer-prompt and highlight, both in the UI Faces pane — so you know where to tune the prompt color and the selected-row background. + +What you see is gated by what you have: with only Vertico, Orderless, and Marginalia installed, the radio offers vanilla and Vertico and the two modifier checkboxes; there's no Ivy option. On a stock Emacs the radio has only vanilla and the modifiers are absent — you still get a real, themed *Completions* preview. + +** For the implementer + +The entry is assembled at generate time from the inventory, not hardcoded. A small provider table names each completion provider, the package key that detects it (the same key build-inventory.el writes into package-inventory.json), and the preview fragment it contributes. At build, the composer walks the table, keeps the providers whose package is present, unions their inventory faces with the always-present baseline faces (minibuffer-prompt plus the built-in completions-* set), and produces two things: the entry's assignable face list, and an "active providers" descriptor the renderer reads. The composer also adds the present provider package keys to the bespoke-suppression set so add_inventory_apps stops emitting their generic swatch entries — the faces then live only in the completion entry. + +The renderer, renderCompletionPreview in previews.js, reads the active-providers descriptor and the current control state and builds the sections with the existing os(app, face, text) / previewLines helpers, exactly like the other package previews. The baseline section is always emitted; provider sections are emitted when present and enabled. Match highlighting is drawn with whichever face the active container's family uses — orderless-match-face-N under default/Vertico, completions-common-part / completions-first-difference under bare vanilla — because the matching layer is frontend-specific, not universal. + +The controls are the one genuinely new piece: theme-studio previews are static today, rendered once by buildPkgPreview. This entry needs a small block of interactive state (selected container, enabled modifiers) that, on change, re-renders the preview through the same path. State is local to the entry and ephemeral — not persisted, not part of the exported theme. + +The shared built-in faces (minibuffer-prompt, highlight) stay owned by the UI Faces pane. The preview renders them for context by resolving their current value from shared state — the same cross-pane rendering the preview-locate spec already describes — and they are the hover-only, non-clickable elements once locate ships. + +* Alternatives Considered + +** A — Attach the preview to the existing "vertico" entry +- Good, because it's the cheapest: one PREVIEW_KEYS line, no composer. +- Bad, because the orderless match faces show in the preview but are assigned under a different entry, so the thing you judge and the controls for it are split; and the built-in prompt face is elsewhere again. +- Neutral, because it still reuses the inventory and the os() helpers. + +** B — Wire the same renderer to both the vertico and orderless entries +- Good, because each face stays in its package home and both entries show the shared preview. +- Bad, because there's still no single owner for the surface, minibuffer-prompt is still off in the UI table, and a stock-Emacs user with neither package gets nothing. +- Neutral, because it's a small change over A. + +** C — A synthetic, inventory-composed "minibuffer completion" entry (chosen) +- Good, because it's one stop for the whole surface, matches theme-studio's existing cross-surface grouping (shr (HTML: nov/eww/mail), ansi-color (vterm/eshell/…), gnus (mu4e article view)), and degrades to a useful vanilla baseline for any install. +- Bad, because it needs the inventory composer, the dedup, and the first interactive-control preview — more than a static entry. +- Neutral, because the renderer is the same shape as every other package preview. + +** Controls: auto-show sections from the inventory, no toggles +- Good, because it's simpler — render whatever's installed. +- Bad, because it can't show "what does orderless add" — the explicit goal — and can't show the vanilla baseline to a Vertico user. +- Neutral, because the composer work is identical; only the renderer's control surface differs. + +* Decisions [/] + +** DONE Entry shape — one synthetic "minibuffer completion" entry +- Context: the surface spans vertico + orderless + marginalia (or ivy, or vanilla) plus built-in faces; theme-studio previews packages singly. +- Decision: We will add one synthetic, inventory-composed entry rather than attaching the preview to an existing package entry. +- Consequences: easier — a single design surface, matches house grouping, degrades to vanilla. Harder — needs the composer + dedup + the first interactive preview. + +** DONE Control model — container radio + modifier checkboxes, two families +- Context: frontends take over completing-read globally and only one runs at a time; orderless/marginalia are additive but coupled to the default/Vertico machinery and inert under Ivy. +- Decision: We will use a radio for the mutually-exclusive container (vanilla / Vertico / Ivy) and checkboxes for the additive modifiers (orderless, marginalia), with modifiers disabled when the container is Ivy. +- Consequences: easier — truthful "see what each layer adds" interaction. Harder — introduces interactive control state into a previously-static preview. + +** DONE Discoverability — delegate to preview-locate, caption in the interim +- Context: minibuffer-prompt and highlight shape the prompt but belong to the UI Faces pane, not this entry. +- Decision: We will not build click/hover here; we depend on [[file:theme-studio-preview-locate-spec.org][preview-locate]] for hover/locate and ship a caption naming the off-entry built-ins until it lands. +- Consequences: easier — no duplicated navigation logic, one mechanism for all previews. Harder — full discoverability waits on a second not-started feature; the caption is the stopgap. + +** TODO Entry label text +- Owner / by-when: Craig / before build +- Context: "completing-read" names the Elisp call and is jargon; "Completions" collides with the built-in *Completions* buffer and with in-buffer completion. +- Decision: We will label it "minibuffer completion (<active frontends>)" — e.g. "minibuffer completion (vertico / orderless)". +- Consequences: easier — names the visual surface, disambiguates from company/corfu, reads to non-Lisp users. Harder — the parenthetical is dynamic per inventory, so the label is composed, not a constant. + +** TODO Face dedup — suppress the generic vertico/orderless/marginalia entries +- Owner / by-when: Craig / before build +- Context: a synthetic entry pulls in vertico-current and the orderless/marginalia faces, but those packages still appear as their own generic inventory entries, so a face would be assignable in two places. +- Decision: We will add the present provider package keys to the bespoke-suppression set so the generic entries disappear and each completion face is assignable only in the completion entry; the built-in minibuffer-prompt / highlight stay in the UI Faces pane. +- Consequences: easier — one place per face, no divergent values. Harder — a user who looked for "vertico" in the dropdown now finds those faces under "minibuffer completion" instead; needs the caption/label to make that discoverable. + +** TODO Ivy / Counsel — v1 or vNext +- Owner / by-when: Craig / before build +- Context: shareability argues for Ivy support, but ivy isn't installed on the owner's machine, so an Ivy section can't be rendered or verified here. +- Decision: We will defer the Ivy / Counsel section to vNext and build the composer provider-generic so adding it later is a provider-table row plus a render fragment, validated by someone running Ivy. +- Consequences: easier — v1 ships only what's testable on the owner's machine. Harder — the shareable promise is partial until the Ivy fast-follow lands. + +** TODO Swiper — separate Search preview, not in the candidate list +- Owner / by-when: Craig / before build +- Context: Swiper's faces highlight the buffer during in-buffer search, not the minibuffer candidate list. +- Decision: We will keep Swiper out of the completion candidate list and, if wanted, give it its own small "Search" preview (vNext). +- Consequences: easier — the completion preview stays honest about the candidate-list surface. Harder — Swiper users get no preview until the separate Search one exists. + +* Review findings [0/6] + +** TODO Open decisions still block implementation readiness :blocking: +The workflow's readiness gate requires the =* Decisions= cookie to be complete, but this spec still has four =TODO= decisions: entry label text, face dedup, Ivy/Counsel scope, and Swiper scope. These are not bookkeeping-only; they affect generated app labels, suppression behavior, v1 test cases, and user-visible scope. Resolve or explicitly defer each decision before implementation begins. In particular, the spec should not say v1 includes an Ivy-disabled control path if the Ivy decision is vNext, and it should not leave the dedup behavior as both a goal and an undecided item. (blocking) + +** TODO Locate-preview dependency is not explicit enough in the phase plan :blocking: +Craig clarified this feature will wait for preview-locate's =previewSpan= API. The spec still says =renderCompletionPreview= uses the existing =os(app, face, text)= / =previewLines= helpers "exactly like the other package previews", and Phase 1 promises the entry can appear with a static render before the locate work. That is unsafe because this preview must render UI-owned faces (=minibuffer-prompt= and =highlight=) inside a package-style preview; the pre-locate =os= path only styles package faces through =PKGMAP=. Add an explicit prerequisite phase: preview-locate v1, including =previewSpan(owner, face, text)= and owner-aware gates, must land first. Then update the renderer contract to use =previewSpan= for both package-owned completion faces and =@ui= built-ins, not raw =os= for cross-owner spans. (blocking) + +** TODO Always-present baseline faces are not sourced from current data :blocking: +The spec says the vanilla baseline always includes built-in =completions-*= faces, but current =package-inventory.json= does not contain those faces, and the generic inventory path in =app_inventory.add_inventory_apps= only emits installed-package inventory rows. =minibuffer-prompt= and =highlight= exist in =UI_FACES=, but =completions-common-part=, =completions-first-difference=, =completions-annotations=, and related built-ins are currently only available in the captured defaults snapshot, not as app rows. Define the baseline face source explicitly: name the exact built-in completion face list, say whether these rows are synthetic package-owned rows under the new completion app, and seed them via =DefaultFaces.seed(face)= even though they are absent from package inventory. Add a generation test for a stock inventory proving the completion app still contains the baseline faces and their defaults. (blocking) + +** TODO Generated app descriptor schema is underspecified :blocking: +The spec says the composer emits an "active providers" descriptor that the renderer reads, but today's =APPS= entries only carry =label=, =preview=, and =faces=, and =buildPkgPreview= only passes the current app key through the global =APPS[app].preview= lookup. Without a concrete schema, the implementer has to invent where provider presence, default selected container, modifier availability, and dynamic label parts live. Define the generated app key and full =APPS[completion-key]= shape, for example =label=, =preview: "completion"=, =faces=, and a =completion= descriptor with =containers=, =modifiers=, =defaultContainer=, and =suppressedPackageKeys=. Also state how tests inspect it and how =renderCompletionPreview= obtains it. (blocking) + +** TODO V1 Ivy behavior is contradictory :blocking: +Scope tiers defer Ivy/Counsel to vNext, but the v1 design and acceptance criteria still say the container radio includes Ivy "once that section exists", modifiers are disabled under Ivy, and browser-gates verify modifier controls are inert under the Ivy container. If Ivy is vNext, there should be no Ivy container in v1 and no v1 gate requiring an Ivy state. If the disabled-under-Ivy rule is meant to be contract-tested now, the spec needs a synthetic Ivy descriptor fixture even though no owner machine has Ivy installed. Choose one: remove Ivy from v1 controls/tests entirely while keeping the provider-table extension point, or add a fixture-only Ivy test contract. (blocking) + +** TODO Interactive control lifecycle and DOM contract are not precise enough :blocking: +This is the first package preview with local controls, but the spec only says "a small block of interactive state" re-renders the preview. Current =buildPkgPreview= replaces =#pkgpreview.innerHTML= wholesale and =#pkgprevlabel= is a plain label, so the implementation must decide where controls live, whether they are rebuilt on every render, how state is keyed and reset when leaving/re-entering the entry, and how defaults are chosen from the generated descriptor. Specify the DOM contract and lifecycle: whether controls are inside =#pkgpreview= or the label/control bar, the state object shape, initialization from =defaultContainer= + installed modifiers, reset behavior on view switches/import/regenerate, and exact browser gates for toggling container/modifiers without losing assignment-table state. (blocking) + +* Implementation phases + +** Phase 1 — Inventory composer +Add the provider table and the composer in app_inventory.py / generate.py: detect present providers from package-inventory.json, union their inventory faces with the always-present baseline (minibuffer-prompt + completions-*), emit the entry plus an active-providers descriptor, and add present provider keys to the bespoke-suppression set. Leaves the tree working: the entry appears in the dropdown with a static stacked render and the generic entries are gone. Covered by test_generate.py. + +** Phase 2 — renderCompletionPreview +Add the renderer to previews.js and register it (PACKAGE_PREVIEWS + the entry's preview key). Render the baseline plus present provider sections statically from the descriptor, using os()/previewLines. Match coloring uses the family-correct face. browser-gates asserts every data-face is a real face and that section presence tracks the simulated inventory. Sample candidate data is generic (no real channel/contact names) since the tool is shared. + +** Phase 3 — Interactive controls +Add the container radio + modifier checkboxes in app.js; thread their state into the render so a change re-renders through the existing preview path; gate modifier availability on the container family. Keep the state local and ephemeral. + +** Phase 4 — Caption + locate hook +Add the caption naming minibuffer-prompt + highlight as living in UI Faces. When preview-locate ships, the off-entry built-ins become its hover-only elements with no further work here. + +* Acceptance criteria +- [ ] "minibuffer completion (…)" appears in the application dropdown. +- [ ] On the owner's inventory (vertico + orderless + marginalia, no ivy): preview shows the vanilla baseline plus a Vertico section with orderless match coloring and marginalia annotations; no Ivy option in the radio. +- [ ] On a stock-Emacs inventory (no completion packages): preview shows only the vanilla *Completions* block; no modifier checkboxes. +- [ ] The container radio switches vanilla / Vertico; the modifier checkboxes toggle orderless and marginalia; modifiers are disabled when the container is Ivy. +- [ ] vertico-current renders with no background (inherits highlight), matching live Emacs. +- [ ] The generic vertico / orderless / marginalia entries no longer appear separately; their faces are assignable under the completion entry. +- [ ] Every data-face in the preview is a real face (browser-gates passes). +- [ ] A caption names minibuffer-prompt and highlight as living in the UI Faces pane. + +* Readiness dimensions +- Data model & ownership: faces are inventory-generated; the entry owns the completion-specific faces, the UI Faces pane owns the built-in minibuffer-prompt / highlight; the active-providers descriptor is generated at build; control state is ephemeral UI state, never persisted or exported. +- Errors, empty states & failure: the empty state (no providers) is a first-class render — the vanilla baseline. Read-only preview, so no partial-failure or data-loss path. A provider in the table but absent from the inventory is simply skipped. +- Security & privacy: N/A — no credentials. One concrete rule: the shipped sample candidate data must be generic (no real Slack channels or contact names), since the tool is shared. The /tmp spike used realistic names; the product must not. +- Observability: N/A for a preview tool; browser-gates + test_generate.py are the checks that it renders correctly. +- Performance & scale: trivial — a static render of a dozen mock rows; re-render on a control toggle is cheap. +- Reuse & lost opportunities: reuses the inventory model, os()/previewLines, the PACKAGE_PREVIEWS registry, the bespoke-suppression mechanism, and — for discoverability — the preview-locate feature. Nothing here reinvents those. +- Architecture fit & weak points: integration points are the app composer (app_inventory.py / generate.py), the PACKAGE_PREVIEWS registry, the app.js control rendering, and browser-gates. Weak point: the interactive controls are the first of their kind in theme-studio. Mitigation — keep the state local to this entry and re-render through the existing buildPkgPreview path; do not generalize a controls framework until a second preview needs one. +- Config surface: the provider table (package key → section + fragment) and the sample candidate data are the knobs. Defaults: container starts on the detected frontend, modifiers on if installed. +- Documentation plan: a short note in theme-studio's README and a mention in theme-coloring-guide.org; the dropdown label is otherwise self-documenting. +- Dev tooling: existing — make theme-studio (regenerate), run-tests.sh / browser-gates, test_generate.py. Add cases for the composer and the renderer; no new targets. +- Rollout, compatibility & rollback: additive (new dropdown entry). The dedup removes the generic vertico/orderless/marginalia entries — a mild behavior change, but the faces remain assignable in the new entry. Rollback = remove the entry and the suppression. No persisted-data migration. +- External APIs & deps: N/A — no external API. Depends on build-inventory.el having captured installed packages' faces (theme-studio's existing model). Ivy/Counsel face names are verified-when-installed, which is why that section is deferred. + +* Risks, Rabbit Holes, and Drawbacks +- First interactive-control preview — risk of sliding into a general preview-controls framework. Dodge: keep the controls local to this entry; generalize only when a second preview needs it. +- Inventory variance across machines — relies on build-inventory.el being refreshed. Dodge: the vanilla baseline renders with zero providers, so the entry is never empty. +- Ivy can't be verified on the owner's machine. Dodge: defer Ivy to vNext; keep the composer provider-generic so the fast-follow is a table row plus a fragment. + +* Testing / Verification / Rollout +- test_generate.py: given a synthetic inventory dict, the composed entry's face list equals the expected union and the generic provider entries are suppressed; given an empty inventory, only the baseline faces are present. +- browser-gates: every data-face in renderCompletionPreview is a real face; section presence tracks the simulated inventory; modifier controls are inert under the Ivy container. +- Manual: open theme-studio in Chrome on the owner's inventory and confirm the Vertico section + baseline render, orderless/marginalia toggle, and vertico-current shows no background. + +* References / Appendix +- Reuse: [[file:theme-studio-preview-locate-spec.org][theme-studio-preview-locate-spec.org]] (hover/click locate), [[file:theme-studio-package-faces-spec-doing.org][theme-studio-package-faces-spec-doing.org]]. +- Spike: /tmp completion-face-preview.el (verified render; not committed — informs this spec, not grown into it). +- Live face values captured 2026-06-23 (WIP theme): minibuffer-prompt #899bb1/#100f0f bold; orderless-match-face-0..3 #cbd0d6 / #c99990 / #c5d4ae / #bea9dc bold italic; vertico-current inherits highlight (#eddba7 bold, no background). + +* Review and iteration history +** 2026-06-23 Tue @ 12:18:04 -0400 — Craig Jennings — author +- What: initial draft. +- Why: a multi-layer, shared-tool feature with a new interactive-preview pattern and several open sub-decisions warranted speccing before code. +- Artifacts: docs/specs/theme-studio-completion-preview-spec.org; todo.org task; the /tmp spike. + +** 2026-06-23 Tue @ 13:07:57 -0400 — Codex (emacs-d) — reviewer +- *What changed or was recommended:* First review, =Not ready=. Added six blocking findings: the remaining TODO decisions block readiness; the phase plan must explicitly wait for preview-locate's =previewSpan= API; the always-present built-in completion baseline needs a concrete source outside package inventory; the generated =APPS= descriptor schema is underspecified; v1 Ivy behavior contradicts the vNext deferral; and the first interactive preview needs an explicit DOM/state lifecycle. +- *Why:* The existing implementation has package previews wired through =APPS[app].preview= and =PACKAGE_PREVIEWS=, with generic inventory rows coming only from =package-inventory.json=. Vertico/orderless/marginalia are inventory packages, but the built-in =completions-*= baseline is not, and the pre-locate =os= path cannot style =@ui= faces in a package preview. Those gaps would force implementers to invent architecture and tests mid-build. +- *Artifacts:* findings [0/6]; code read: =scripts/theme-studio/app_inventory.py= (=add_inventory_apps= / =PREVIEW_KEYS=), =scripts/theme-studio/generate.py= (=APPS= generation), =scripts/theme-studio/app.js= (=buildPkgPreview= / =PACKAGE_PREVIEWS= / =curApp=), =scripts/theme-studio/previews.js= (=os= path), =scripts/theme-studio/package-inventory.json= (provider packages present, built-in completions faces absent), =docs/specs/theme-studio-preview-locate-spec.org= (=previewSpan= prerequisite). diff --git a/docs/specs/theme-studio-nerd-icons-colors-spec.org b/docs/specs/theme-studio-nerd-icons-colors-spec.org new file mode 100644 index 000000000..c0f07b6dd --- /dev/null +++ b/docs/specs/theme-studio-nerd-icons-colors-spec.org @@ -0,0 +1,380 @@ +#+TITLE: Theme-driven nerd-icons colors + theme-studio filetype legend — Spec +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-06-23 +#+TODO: TODO | DONE SUPERSEDED CANCELLED + +* Metadata +| Status | Ready pending Craig's go — Codex review rounds 1-3 incorporated | +|----------+------------------------------------------------------------| +| Owner | Craig | +|----------+------------------------------------------------------------| +| Reviewer | Codex (spec-review) | +|----------+------------------------------------------------------------| +| Related | [[file:../../todo.org][todo.org: theme-driven nerd-icons colors]] | +|----------+------------------------------------------------------------| + +* Summary + +nerd-icons glyphs (completion icons, dirvish, dashboard, ibuffer) are forced to a single color by a runtime tint in =nerd-icons-config.el=, which flattens the per-filetype color information nerd-icons ships and prevents the theme from controlling icon color. This spec drops that tint so icon color becomes theme-driven, and adds a theme-studio representation — a filetype legend over the 34 =nerd-icons-*= color faces — so the assignments are visible and editable where the rest of the theme is built. + +* Problem / Context + +=nerd-icons-config.el= defines =cj/nerd-icons-tint-color= (default "darkgoldenrod") and =cj/nerd-icons-apply-tint=, which sets the foreground of all 34 =nerd-icons-*= color faces to that one color, applied in the =nerd-icons= =:config= and a =with-eval-after-load= safety net. Every nerd-icons glyph therefore renders darkgoldenrod regardless of type. + +Two costs. First, the per-filetype color nerd-icons provides is lost: =nerd-icons-extension-icon-alist= (330 entries) maps each filetype onto one of the 34 shared color faces (e.g. =.el/.sh= → =nerd-icons-purple=, an M-x command → =nerd-icons-blue=, =.zsh= → =nerd-icons-lcyan=), and the tint collapses all of that to one hue. Second, the color is not theme-driven: the tint runs at load and overrides whatever the theme set, so a theme can't own icon color. + +theme-studio already inventories the 34 =nerd-icons-*= faces (via the generic package-inventory path), so they are technically editable today — but only as a bare list of face names with no preview of what each color actually hits. There is no representation of the filetype→color mapping, which is the thing that makes coloring these faces meaningful. + +* Goals and Non-Goals +** Goals +- Icon color is theme-driven: the 34 =nerd-icons-*= color faces are set by the theme, not by a runtime tint. +- theme-studio shows a filetype legend — a representative set of filetypes rendered with their real icon glyph in the assigned color — that updates live as a face is recolored. +- The =nerd-icons-*= color-face assignments export into the theme and round-trip on import, like every other themed face. +** Non-Goals +- Per-filetype unique colors. nerd-icons shares 34 faces across 330 filetypes; the model is "color the 34 faces", not "assign 330 colors". +- Changing nerd-icons' glyph selection (=icon-for-file= / =icon-for-buffer= logic) or its icon set. +- An exhaustive 330-row legend. The legend is a curated representative sample, not the full alist. +** Scope tiers +- v1: capture a curated filetype legend (=nerd-icons-legend.json=); a bespoke nerd-icons preview rendering it; assign the theme colors and drop the tint atomically; export/round-trip; live verification in completion/dirvish/dashboard. Native colors already ride the existing default-face seed pipeline, so there is no new color capture. +- Out of scope: per-filetype assignment; editing the filetype→face mapping itself (that lives in nerd-icons). +- vNext (gallery, SHIPPED): the nerd-icons pane becomes an icon grid — one row per color face, rows ordered by hue so families cluster, distinct icons (deduped within a color) drawn in their color with the icon name beneath, plus a per-size preview dropdown. Replaces the v1 legend in the pane (legend data retained). See "vNext — nerd-icons gallery" below. Subsumes the two roam asks (one representative icon per color; group the colors together). +- vNext (later): extend the legend beyond file extensions to buffer-mode and command/symbol categories if the file set proves insufficient; a "reset to nerd-icons native palette" button. + +* Design + +** For the user + +theme-studio gains a nerd-icons pane. On the left, the 34 =nerd-icons-*= color faces as editable foreground rows (icons are foreground-colored, so foreground is the only relevant attribute). On the right, a filetype legend: a representative list of filetypes — a handful of languages, a directory, a command, a buffer — each drawn as its real icon glyph plus a sample name, in the color of the face that filetype maps to. Recoloring =nerd-icons-purple= immediately repaints every legend row that resolves to purple (=.el=, =.sh=, …), so the assignment's reach is visible at edit time. Export writes the face colors into the theme; with the runtime tint gone, those colors are what render in completing-read, dirvish, dashboard, and ibuffer. + +** For the implementer + +Three integration points, plus the config removal. + +1. Data capture — two artifacts, two owners. Native seed colors are *not* captured here. They already ride theme-studio's existing default-face pipeline (=capture-default-faces.py= → =emacs-default-faces.json= → =apply_default_face_seeds= in =app_inventory.py=), which runs in clean =emacs -Q --batch= and stores each face's untinted =face-default-spec=. All 34 color faces plus =nerd-icons-completion-dir-face= are already present there with native colors (e.g. =nerd-icons-blue= #6A9FB5), so the seed is correct and untinted with no new code. The only *new* capture is the legend metadata: =nerd-icons-legend.json=, emitted by =build-nerd-icons-legend.el= and embedded by =generate.py=, listing the curated legend rows. The schema is in "Legend data contract" below. + +2. Bespoke preview. nerd-icons moves from a generic inventory app to a bespoke app (a =BESPOKE_APP_SPECS= / =PREVIEW_KEYS= entry) with a new renderer in =previews.js= that draws the legend rows: for each curated filetype, its glyph + name styled with the effective color of its mapped face, read through the same registry the other previews use. The 34 faces remain editable rows. + +3. Export/theme. The package-export path already serializes edited package faces into the theme; nerd-icons rides it. Confirm =build-theme.el= emits =nerd-icons-*= face specs and that WIP round-trips. + +4. Config. Remove =cj/nerd-icons-tint-color=, =cj/--nerd-icons-color-faces=, =cj/nerd-icons-apply-tint=, and its two call sites in =nerd-icons-config.el=. The dir-icon advice (=cj/--nerd-icons-color-dir=, which layers =nerd-icons-yellow= onto directory glyphs that otherwise carry no color face) stays — its problem is independent of the tint — but now points at a theme-owned face. + +** Legend data contract + +The legend artifact is an array of rows, captured once in Emacs (it reads the nerd-icons alists), embedded by =generate.py= like the rest of =APPS=, and rendered by a =previews.js= renderer. Each row is: + +- =key= — unique row id (=ext:el=, =dir=, =cmd=, =buf=). +- =label= — the sample name the row shows (=init.el=, =src/=, =M-x command=, =*scratch*=). +- =face= — the owner =nerd-icons-*= color face being themed. The row reads its effective color through the same registry the other previews use, so recoloring the face repaints the glyph live. +- =category= — =extension= | =dir= | =command= | =buffer=, naming the source. +- =glyph= — the nerd-font glyph string. The v1 renderer draws it in the owner face's color (see the rendering decision). + +Source per category: =extension= rows from the =:face= of =nerd-icons-extension-icon-alist= entries; the =dir= row from =nerd-icons-yellow= (the dir-advice face, which wins — see the dir decision); =command=/symbol rows from the face in =nerd-icons-completion-category-icons=; =buffer= rows from the face resolved through =nerd-icons-mode-icon-alist=. v1 includes the curated extension rows plus one representative row each for dir, command, and buffer — the categories that actually surface in completing-read. + +*** v1 legend rows +The exact v1 table (=key= | =label= | =category= | source lookup → owner =face=), chosen to span a diverse set of the color faces rather than cover all 34: +- =ext:el= | init.el | extension | ext-alist "el" → =nerd-icons-purple= +- =ext:py= | app.py | extension | "py" → =nerd-icons-dblue= +- =ext:org= | notes.org | extension | "org" → =nerd-icons-lgreen= +- =ext:md= | README.md | extension | "md" → =nerd-icons-lblue= +- =ext:ts= | main.ts | extension | "ts" → =nerd-icons-blue-alt= +- =ext:html= | index.html | extension | "html" → =nerd-icons-orange= +- =ext:rs= | lib.rs | extension | "rs" → =nerd-icons-maroon= +- =ext:js= | app.js | extension | "js" → =nerd-icons-yellow= +- =ext:yml= | ci.yml | extension | "yml" → =nerd-icons-dyellow= +- =ext:c= | main.c | extension | "c" → =nerd-icons-blue= +- =dir= | src/ | dir | =nerd-icons-icon-for-dir= + advice → =nerd-icons-yellow= +- =cmd= | M-x command | command | completion-category =command= → =nerd-icons-blue= +- =buf= | *scratch* | buffer | mode-alist =emacs-lisp-mode= → =nerd-icons-purple= + +The =face= column is *resolved at capture time* from the live nerd-icons alists, not hardcoded here — the values above are the current resolution, recorded so the spec is self-checking. Coverage target: a representative showcase (the ~10 distinct faces above), not all 34 and not all 330 filetypes. A curated source key absent from the installed alist at capture time is skipped and logged, so the legend degrades gracefully across nerd-icons versions. + +* Alternatives Considered + +** A — Keep the tint, make it themeable (theme sets cj/nerd-icons-tint-color) +- Good, because it is the smallest change and preserves the single-knob simplicity. +- Bad, because it keeps one color for all icons, which is the exact thing this feature exists to undo — there is no per-filetype representation to build. +- Neutral, because it would still move color into the theme, just at the wrong granularity. + +** B — Per-filetype color assignment (one row per extension) +- Good, because it is maximally granular. +- Bad, because 330 rows is unusable, and nerd-icons shares faces, so most rows would be redundant edits of the same underlying face. +- Neutral, because it matches no real user intent — nobody colors 330 extensions by hand. + +** C (chosen) — Theme the 34 shared faces + a filetype legend +- Good, because it matches nerd-icons' actual color model, keeps the editable surface small (34), and the legend gives the "what does this color hit" representation Craig asked for. +- Neutral, because the legend is a curated subset of the 330 filetypes, so a rarely-used extension may not appear in the preview (though its color still themes via its shared face). + +** D — Enhance the generic preview instead of a bespoke one +- Good, because it is less code. +- Bad, because the generic preview renders face-name rows; it cannot draw icon glyphs grouped by filetype, which is the whole representation. + +* Decisions [6/6] + +** DONE Color model: theme the 34 shared faces, not per-filetype +- Context: 330 filetypes map onto 34 shared color faces; the data fixes the granularity. +- Decision: We will expose the 34 =nerd-icons-*= color faces as the editable surface and use the filetype list only as a read-only legend. +- Consequences: easier — small editable surface, matches nerd-icons. Harder — a user wanting one filetype a different color from its face-mates can't, without nerd-icons changes (out of scope). + +** DONE Legend scope: curated representative filetypes in v1 +- Context: 330 entries is too noisy to preview; a representative set communicates the mapping. The row schema is in "Legend data contract". +- Decision: We will hand-curate a representative legend (common languages, a dir, a command, a buffer) covering the faces that actually appear in Craig's completion/dirvish use, and mark the set as the v1 legend. +- Consequences: easier — a clean, readable preview. Harder — the curated set needs occasional maintenance as nerd-icons' alist shifts; an uncovered face has no legend row (still themeable). + +** DONE Seed colors: native colors ride the existing default-face pipeline +- Context: theme-studio needs nerd-icons' native palette, not the runtime tint, to seed from. +- Decision: We rely on theme-studio's existing default-face capture (=capture-default-faces.py= → =emacs-default-faces.json= → =apply_default_face_seeds=), which already runs in clean =emacs -Q --batch= and stores each face's untinted =face-default-spec=. No new color capture in =build-inventory.el=. +- Consequences: easier — zero new color-capture code, and the seed is already untinted. Harder — none. Supersedes the original draft's "build-inventory.el emits native defaults" (finding 3), which would have double-seeded. Verified: all 34 color faces + =nerd-icons-completion-dir-face= already present with native colors (=nerd-icons-blue= #6A9FB5). + +** DONE Config sequencing: assign theme colors in the change that drops the tint +- Context: dropping the tint before the theme assigns the 34 faces makes completion icons jump from uniform darkgoldenrod to nerd-icons' native multicolor palette until the theme overrides them. +- Decision: We land the theme's explicit nerd-icons color assignments (Phase 4) in or before the change that removes the tint (Phase 3), so there is no uncolored interim. (Proposed by the author; reopen if you'd rather drop the tint standalone and accept the native-palette interim.) +- Consequences: easier — no ugly interim, one coherent switch. Harder — couples the config change to a theme edit rather than landing independently. + +** DONE Dir advice, precedence, and cross-package ownership +- Context: directory glyphs carry no intrinsic color face. =cj/--nerd-icons-color-dir= layers =nerd-icons-yellow= via =add-face-text-property … nil= — the =nil= APPEND *prepends* the face, so it outranks any =:face= already on the string. =nerd-icons-completion-get-icon= passes =:face 'nerd-icons-completion-dir-face= to =nerd-icons-icon-for-dir=, but the advice's prepended =nerd-icons-yellow= wins. =nerd-icons-completion-dir-face= is owned by a *different* package (=nerd-icons-completion=) in =package-inventory.json=, and =apply_package_overrides= merges seed JSON by app key, so a face cannot be relocated into another app's pane. +- Decision: directory icons are colored by =nerd-icons-yellow= (prepended, wins in both completion and dirvish), so the legend's dir row models =nerd-icons-yellow= — a =nerd-icons= face the bespoke pane already owns. We keep =cj/--nerd-icons-color-dir=. =nerd-icons-completion-dir-face= stays under its own generic =nerd-icons-completion= app (export / import / lock keys unchanged) and is *not* pulled into the bespoke pane. A precedence ERT probe locks "yellow wins" (see Testing). +- Consequences: easier — the bespoke pane owns only =nerd-icons= faces, the dir row points at the face that actually renders, no cross-package merge tangle. Harder — =nerd-icons-completion-dir-face= is effectively inert for color while the advice is active (documented, not a bug). + +** DONE Legend rendering: render the real glyph in its color (v1) +- Context: the glyphs are private-use nerd-font codepoints, so they only render where a Nerd Font is available. theme-studio's CSS declares none — but Nerd Fonts are installed system-wide on Craig's machine (verified via =fc-list=: JetBrainsMono, Hack, Meslo Nerd Font Mono), and Chrome uses system fonts, so a =font-family= rule renders the glyphs with no =@font-face= and no embedded font file. +- Decision: v1 legend rows render the captured =glyph= in the owner face's effective color (inline color style), via =font-family: "Symbols Nerd Font Mono", "Hack Nerd Font Mono", monospace=. The headless gate asserts the glyph char and the inline color (both in the DOM, font-independent), not the rendered pixels. The monospace fallback shows a placeholder box only where no Nerd Font exists, which won't happen on Craig's setup. +- Consequences: easier — the preview mirrors completing-read authentically (real glyph in real color). Harder — a machine-dependent font assumption (fine for a personal tool), and the gate asserts the glyph char + inline color rather than the glyph's pixels. + +* Review findings [10/10] + +** DONE Open decisions block implementation readiness :blocking: +Disposition (accepted): all six decisions are now resolved — the original five plus the legend-rendering decision the schema work surfaced — and the =[6/6]= cookie reads complete. Sequencing (Phase 4 lands with/before Phase 3) and the =nerd-icons-completion-dir-face= scope are both settled; two author-proposed calls (sequencing, glyph rendering) are marked reopen-if-disagree. +The spec still has five =TODO= decisions, including the sequencing of the tint removal relative to theme assignment and whether =nerd-icons-completion-dir-face= is in scope. That blocks readiness because an implementer would have to decide whether Phase 3 can land before Phase 4, and whether the directory completion face is part of the exported/verified surface. Resolve or explicitly risk-accept every decision before implementation starts. (blocking) + +** DONE Legend row contract is underspecified :blocking: +Disposition (accepted): added the "Legend data contract" section — row schema (=key= / =label= / =face= / =category= / =glyph=), the per-category source for each row, and the v1 scope call (curated extension rows plus one representative dir / command / buffer row each). The browser data shape and the non-extension sources are now concrete. +The design says the legend includes filetypes plus "a directory, a command, a buffer," but Phase 1 only names =nerd-icons-extension-icon-alist= as the source. In the current code, package preview data is embedded through =APPS= in =scripts/theme-studio/generate.py= and rendered by =scripts/theme-studio/previews.js=, so the implementer needs a concrete browser data shape: the curated row key, display label/sample name, glyph text, owner face, source category, and what source supplies non-extension rows. Define that schema and state whether v1 really includes directory/command/buffer rows or only extension-backed rows. (blocking) + +** DONE Native-palette capture path conflicts with the current seed pipeline :blocking: +Disposition (accepted): the finding is right — native colors are already owned by the existing default-face pipeline, which runs in =emacs -Q --batch= and stores untinted =face-default-spec= (verified: 35 nerd-icons faces already in =emacs-default-faces.json= with native colors). Dropped the draft's "build-inventory.el emits native defaults" entirely (it would have double-seeded). The only new capture is the legend metadata; the Seed-colors decision was rewritten to match. +The spec says =build-inventory.el= should emit native defface defaults, but the current Theme Studio default-color path already lives in =scripts/theme-studio/capture-default-faces.py= / =emacs-default-faces.json= and is applied in =scripts/theme-studio/app_inventory.py= via =apply_default_face_seeds=. =build-inventory.el= currently emits only package→face ownership. If the implementation adds native colors in the wrong artifact, nerd-icons could be seeded twice or the package default snapshot could keep carrying the runtime tint. Specify the intended owner: either extend the default-face capture to preserve the untinted =face-default-spec= values for =nerd-icons-*=, or add a separate nerd-icons metadata artifact and define exactly how it overrides =apply_default_face_seeds=. (blocking) + +** DONE Curated legend contents are still a product decision :blocking: +Disposition (accepted): added the explicit "v1 legend rows" table (13 rows spanning ~10 distinct color faces), the coverage target (a representative showcase, not all 34 faces or all 330 filetypes), and the missing-source-key rule (a curated key absent from the installed alist is skipped + logged at capture). +The spec says to "hand-curate a representative legend" covering common languages plus dir/command/buffer rows, but it never names the actual v1 rows or the coverage target. That would force the implementer to decide whether the preview should cover every one of the 34 color faces, only Craig's common filetypes, only rows currently visible in completion/dirvish, or a smaller showcase. Define the exact v1 legend table (at least =key=, =label= / sample name, category, and source lookup key), plus the rule for a curated source key disappearing from the installed nerd-icons alist. (blocking) + +** DONE Cross-package face ownership is undefined :blocking: +Disposition (accepted): resolved in the dir decision — the bespoke =nerd-icons= pane owns only =nerd-icons= faces; =nerd-icons-completion-dir-face= stays under its own generic =nerd-icons-completion= app (=apply_package_overrides= keys merges by app, verified, so a cross-package face can't be relocated). The dir legend row's owner is =nerd-icons-yellow=, a =nerd-icons= face, which is what actually colors dir icons. +The spec puts =nerd-icons-completion-dir-face= in the nerd-icons story, but Theme Studio's current model keys package rows by app/package: =package-inventory.json= owns the 34 color faces under =nerd-icons= and =nerd-icons-completion-dir-face= under =nerd-icons-completion=; =apply_package_overrides= merges seed JSON by app key; and =BESPOKE_APPS= only suppresses generic inventory for keys listed in =BESPOKE_APP_SPECS=. If the implementer places =nerd-icons-completion-dir-face= inside the new =nerd-icons= bespoke app, import/reset/lock/generic-suppression behavior is a design call. Specify whether v1 creates a composite "nerd-icons" pane that intentionally owns this cross-package face, keeps a separate =nerd-icons-completion= app, or excludes the completion dir face from the pane; include the export/import key and generic-inventory suppression rule. (blocking) + +** DONE Directory color precedence is not defined :blocking: +Disposition (accepted): resolved in the dir decision — =cj/--nerd-icons-color-dir= prepends =nerd-icons-yellow= (=add-face-text-property … nil=, verified in source), so it wins over the =:face 'nerd-icons-completion-dir-face= that completion passes. Dir icons render =nerd-icons-yellow= in both completion and dirvish; the dir row models it; a precedence ERT probe locks it (Testing). +=nerd-icons-completion-get-icon= passes =:face 'nerd-icons-completion-dir-face= to =nerd-icons-icon-for-dir=, while =cj/--nerd-icons-color-dir= later layers =nerd-icons-yellow= onto the returned string. The spec says the dir row falls back to =nerd-icons-yellow= while =nerd-icons-completion-dir-face= is unset, but live rendering depends on the resulting face stack and Emacs face precedence when both faces carry foregrounds. Define the intended precedence for directory icons in completion and dirvish: which face wins when both are themed, whether one should remain unset, and what the preview row should model. Require an ERT or live probe that locks that precedence. (blocking) + +** DONE Legend metadata artifact and failure behavior are open :blocking: +Disposition (accepted): Phase 1 now names the concrete artifact (=scripts/theme-studio/nerd-icons-legend.json=, committed like =package-inventory.json=), the generator (=build-nerd-icons-legend.el= sibling dump that loads =nerd-icons=), the refresh step, and the =generate.py= failure behavior (absent / malformed / empty → fall back to the generic nerd-icons app, warn, never error). +Phase 1 still says the legend rows are emitted by =build-inventory.el= "or a sibling dump," which leaves implementers to choose the artifact name, refresh command, checked-in status, and package-loading behavior. Define the concrete artifact path and generator entry point, whether it is committed like =package-inventory.json= / =emacs-default-faces.json=, which features it loads (=nerd-icons=, =nerd-icons-completion=), and what =generate.py= does when the artifact is absent, stale, malformed, or a required package is missing. (blocking) + +** DONE Phase order contradicts the sequencing decision :blocking: +Disposition (accepted): merged tint removal and theme assignment into one atomic Phase 3 (assign the 34 colors + remove the tint together, no interim), with verification as Phase 4 — now consistent with the sequencing decision. +The resolved sequencing decision says explicit theme assignments must land in or before the tint removal, but =Implementation phases= still lists "Drop the runtime tint" as Phase 3 and "Theme assignment + verification" as Phase 4. That leaves the implementer to decide whether to follow the written phase order or the decision. Reorder or combine the phases so the build path cannot land a broken/interim state: theme assignment before tint removal, or a single phase that updates the theme and removes the tint atomically. (blocking) + +** DONE Test plan does not name the contracts this feature changes :blocking: +Disposition (accepted): added a "Testing / Verification" section naming each contract's target — delete the apply-tint test, extend color-dir with the precedence probe, Python legend-schema + fallback tests, browser gates (live recolor, valid =data-face= owners), export/import round-trip, and the config smoke. +The acceptance criteria say =run-tests.sh= and launch smoke should pass, but the feature changes specific contracts that need named tests. =tests/test-nerd-icons-config--apply-tint.el= will become obsolete when the tint functions are removed; =tests/test-nerd-icons-config--color-dir.el= should remain and probably gain a precedence case; Theme Studio needs Python tests for the generated legend schema/composite app behavior, browser gates for live recolor + valid =data-face= owners, and an export/import round-trip covering =nerd-icons= plus any cross-package completion face. Add these test targets to the spec so implementation does not invent the verification surface. (blocking) + +** DONE Stale summary sections contradict resolved decisions :blocking: +Disposition (accepted): reconciled every flagged section with the resolved decisions — Scope tiers (no new color capture; legend → =nerd-icons-legend.json=; atomic assign+drop), For-the-implementer summary (named =build-nerd-icons-legend.el= / =nerd-icons-legend.json=), the legend source paragraph (dir row is =nerd-icons-yellow= only), Architecture-fit and Dev-tooling and External-APIs readiness dimensions (=build-nerd-icons-legend.el=, no defface capture), and the Risks section (replaced the moot defface-reading risk with the stale-artifact / fallback risk). +The resolved decisions and Phase 1 now say native colors ride =capture-default-faces.py= / =emacs-default-faces.json=, legend metadata comes from =build-nerd-icons-legend.el= into =nerd-icons-legend.json=, and the dir row models only =nerd-icons-yellow=. Earlier text still says v1 captures native default colors "into the inventory," that legend metadata may be emitted by =build-inventory.el= or a sibling dump, that dir rows come from both =nerd-icons-yellow= and =nerd-icons-completion-dir-face=, that architecture integration uses =build-inventory.el= for capture, and that defface-default capture remains a Phase 1 assumption. Those contradictions would force an implementer to decide which contract wins. Update the Scope, For-the-implementer summary, Legend source paragraph, Readiness dimensions, and Risks so they all match the resolved decisions and phases. (blocking) + +* Implementation phases + +** Phase 1 — Legend capture +Emit the v1 legend rows (=key= / =label= / =face= / =category= / =glyph=) to =scripts/theme-studio/nerd-icons-legend.json=, a committed artifact like =package-inventory.json= / =emacs-default-faces.json=. The generator is a new sibling dump =scripts/theme-studio/build-nerd-icons-legend.el= (loads =nerd-icons=; resolves each curated key's glyph + owner face from the live alists), run via the same =emacsclient -e '(load …)'= step the inventory dumps use. =generate.py= embeds the JSON; if it is absent, malformed, or empty (nerd-icons not installed), =generate.py= logs a warning and nerd-icons falls back to its generic inventory app (no bespoke legend) — never an error. Native seed colors are *not* captured here; they already ride the existing default-face pipeline (=capture-default-faces.py= / =emacs-default-faces.json= / =apply_default_face_seeds=), which stores untinted =face-default-spec= (verified: 35 nerd-icons faces present). Tree stays working (data only; no UI change yet). + +** Phase 2 — Bespoke nerd-icons preview +Register nerd-icons as a bespoke app with a legend renderer in =previews.js= drawing each curated filetype's glyph + name in its mapped face's effective color, live-updating on recolor. Browser-gated; existing previews unaffected. + +** Phase 3 — Theme assignment + tint removal (atomic) +One change: assign the 34 =nerd-icons-*= colors in the WIP theme (and =WIP-theme.el=) *and* remove the tint defcustom / defconst / function + its two call sites from =nerd-icons-config.el=, together, so the icons never pass through an uncolored native-palette interim (the sequencing decision). Keep =cj/--nerd-icons-color-dir=. Live-reload + launch smoke. + +** Phase 4 — Verification +Export → =WIP-theme.el= and re-import round-trip over the =nerd-icons= faces; live check in completing-read / dirvish / dashboard / ibuffer; run the dir-precedence ERT probe (yellow wins). See Testing. + +* Acceptance criteria +- [ ] theme-studio shows a nerd-icons pane: 34 editable foreground rows + a filetype-legend preview that renders each row's real nerd-font glyph in its assigned color (monospace fallback). +- [ ] Recoloring a =nerd-icons-*= face repaints every legend row mapped to it, live. +- [ ] theme-studio opens seeded with nerd-icons' native palette (not darkgoldenrod). +- [ ] Export includes the =nerd-icons-*= face specs; re-import round-trips to the same state. +- [ ] =nerd-icons-config.el= no longer tints; the 34 faces are owned by the theme. +- [ ] In a live frame, completion / dirvish / dashboard icons render the theme's per-filetype colors. +- [ ] run-tests.sh green (Node + browser gates + ERT + Python); =make validate-modules= + launch smoke clean. + +* Readiness dimensions +- Data model & ownership: the 34 color faces are user-authored via theme-studio and owned by the theme; the filetype→face legend and native defaults are generated (captured from nerd-icons), read-only. No remote/cache. +- Errors, empty states & failure: a face with no curated legend row simply has no preview row (still editable); a missing nerd-icons (package absent) skips the bespoke app — capture must degrade to the generic path, not error. +- Security & privacy: N/A because no credentials or sensitive data. +- Observability: the legend preview *is* the observability — the user sees exactly which filetypes a color hits before committing. +- Performance & scale: 34 faces + a curated legend (tens of rows); trivial. The capture dump runs once in Emacs, not per render. +- Reuse & lost opportunities: reuses the existing inventory pipeline, the package-export path, and the preview registry from preview-locate; nerd-icons already supplies the extension→face mapping, so we read it rather than re-derive. +- Architecture fit & weak points: integration points are =build-nerd-icons-legend.el= → =nerd-icons-legend.json= (legend capture), =app_inventory.py= / =face_data.py= (bespoke registration), =previews.js= (renderer), =generate.py= (embed), =nerd-icons-config.el= (tint removal). Weak point: the curated legend can drift from nerd-icons' alist over versions — mitigated by deriving the mapping from the live alist at capture time, curating only which filetypes to show, plus the missing-key skip. +- Config surface: removes =cj/nerd-icons-tint-color= and its machinery; no new public knob (the theme is the surface). =cj/--nerd-icons-color-dir= advice retained. +- Documentation plan: update =nerd-icons-config.el= commentary; a note in theme-studio's =theme-coloring-guide.org= on the nerd-icons pane. No user-facing migration doc needed (personal config). +- Dev tooling: run-tests.sh (theme-studio), =make validate-modules= + launch smoke (config); the legend dump is an =emacsclient -e '(load build-nerd-icons-legend.el)'= step, mirroring the existing inventory dumps. +- Rollout, compatibility & rollback: the change alters the persisted theme (adds nerd-icons face specs) and removes a config knob. Rollback = restore the tint block and drop the nerd-icons specs from the theme. The sequencing decision exists to avoid an uncolored interim. +- External APIs & deps: depends on nerd-icons internals — =nerd-icons-extension-icon-alist= entry shape (=("ext" fn "glyph" :face SYM)=), the dir/mode/completion-category alists, and the 34 =nerd-icons-*= color faces. Verified live this session (330 entries, sample confirmed; 34 faces inventoried; native colors already in =emacs-default-faces.json=). No new defface capture — seeding rides the existing pipeline. + +* Risks, Rabbit Holes, and Drawbacks +- The legend artifact (=nerd-icons-legend.json=) is captured once and committed, so it can go stale if nerd-icons' alists shift — mitigated by the =generate.py= fallback (absent/malformed → generic app), the missing-key skip at capture, and the refresh step in dev tooling. No defface reading in this feature: native seed colors already live untinted in =emacs-default-faces.json= (verified), so the earlier defface-introspection risk is moot. +- The legend is HTML, not Emacs faces: rows render the captured glyph char in a CSS color from the registry, so there's no live Emacs dependency at preview time. The one font dependency is a Nerd Font for the glyph shape — present system-wide on Craig's machine, with a monospace fallback to a placeholder box elsewhere. The gate asserts the glyph char and inline color, not the rendered pixels. + +* Testing / Verification + +The feature changes specific contracts; each gets a named target rather than leaning on a blanket "run-tests.sh passes". + +- =tests/test-nerd-icons-config--apply-tint.el= — *delete* when the tint functions are removed (Phase 3); it tests code that no longer exists. +- =tests/test-nerd-icons-config--color-dir.el= — *keep and extend* with a precedence case: with both =nerd-icons-yellow= and =nerd-icons-completion-dir-face= carrying foregrounds, the advice's prepended =nerd-icons-yellow= is first in the face list (wins). This is the ERT probe that locks the dir-precedence decision. +- Python (=test_generate.py= / a new test): the =nerd-icons-legend.json= schema (every row has =key= / =label= / =face= / =category= / =glyph=; face is a known =nerd-icons-*= face), and the bespoke-vs-generic fallback (absent/malformed artifact → generic app, no crash). +- Browser gates: the nerd-icons legend renders; recoloring a face repaints every row mapped to it; every legend element carries a valid owner =data-face= (the owner-aware gate from preview-locate); the dir row models =nerd-icons-yellow=. +- Export/import round-trip: a WIP with assigned =nerd-icons-*= colors exports and re-imports to the same state; =nerd-icons-completion-dir-face= (separate app) is untouched by the nerd-icons pane. +- Config: =make validate-modules= + launch smoke after the tint removal (Phase 3). + +* vNext — nerd-icons gallery (icon grid by color) — SHIPPED + +This increment builds on shipped v1 and the shipped glyph-rendering infrastructure (the embedded =ThemeStudioNerd= woff2 + the unquoted-inline-font fix that lets nerd-font glyphs render in the browser). It is purely additive display — no config change, no theme change, no new editable surface. The 34 =nerd-icons-*= faces are already themed and editable from v1; the gallery is a read-only view of every distinct icon, organized by the color it renders in. + +The design evolved during the build. The first cut rendered the *full* catalog (every face-bearing mapping, ~700 glyphs, duplicates kept) as a sequence of flowing color sections below the v1 legend, ordered by glyph count, with the source key on hover. Craig redesigned it after a live look into the shipped shape below: a grid of *distinct* icons (deduped within a color), rows ordered by *hue* so families cluster, the icon *name* shown beneath each glyph, the v1 legend dropped from the pane, and a *per-size preview dropdown* so the designer can view the grid at different font sizes. The decisions below record the shipped choices; the superseded ones are noted inline. + +** Summary + +The nerd-icons pane is a grid: one row per =nerd-icons-*= color face, the rows ordered by hue (ascending) so color families sit together (pinks/reds/oranges, yellows, greens, cyans, the grays, blues, purples). Each row is a swatch + face-name header over a wrapping set of cells; each cell draws one icon in the face's color with the icon's nerd-font name (=nf-dev-terminal=) beneath it. Icons are deduplicated within a color, so the ~700 face-bearing mappings collapse to ~314 distinct glyphs. Recoloring a face repaints its swatch and every icon in its row live. Above the grid, a "preview:" dropdown selects the glyph size. + +** For the user + +The pane shows the grid (the v1 legend preview is gone from view; its data is still captured for round-trip and reference). Color-level locate is preserved: clicking a color in the faces table flashes every icon in that row, and clicking an icon flashes its color's row — icons aren't individually editable, only their color is. A "preview:" dropdown above the grid picks the font size; Left/Right arrows step through the sizes when it is focused. + +** For the implementer + +Three integration points; no config or theme path is touched. + +1. Catalog capture. =build-nerd-icons-legend.el= is now a library (capture functions + one entry point, =cj/nerd-icons-write-legend=), so the pure logic unit-tests without nerd-icons. It walks every =:face=-bearing alist (=nerd-icons-extension-icon-alist=, =nerd-icons-regexp-icon-alist=, =nerd-icons-mode-icon-alist=), dedupes icons by name within each owner face, sorts each face's icons by name, computes each face's native hue from its defface foreground, and orders the groups by hue (ascending, ties by descending lightness). It emits into the *same* =nerd-icons-legend.json= under a =gallery= key (v1 rows stay under =legend=). nerd-icons is required only inside the writer, so the file loads tint-free for tests; the daemon invocation is =(progn (load …) (cj/nerd-icons-write-legend))=. =generate.py='s fallback covers absent/malformed/empty → generic app, never an error. + +2. Grid renderer. =renderNerdIconsPreview(sizePt)= in =previews.js= draws the =gallery= groups: per group a swatch + face-name header, then a cell per icon (the glyph in the face's color at =sizePt=, the icon name beneath). It reads color through the same effective-color registry, so recolor repaints live; glyphs render in =ThemeStudioNerd=. With no gallery it falls back to =genericPreview=. Registered under the existing bespoke nerd-icons app. + +3. Preview-pane dropdown. =previewPanes(app)= / =buildPkgPreview= in =app.js= turn the old static preview label into "preview:" + a =<select>=. A single-pane app shows its name disabled; nerd-icons (when it has a gallery) is multi-pane, one pane per font size, and selecting a size re-renders the grid at it. The selected size persists per app. The hover-wayfinding info line moved to its own span beside the dropdown. + +** Gallery data contract + +=nerd-icons-legend.json= gains a =gallery= key alongside =legend=. =gallery= is an array of color groups, ordered by ascending =hue= (families cluster), ties by descending lightness: + +- =face= — the owner =nerd-icons-*= color face. The header and every cell read effective color through the registry, so recoloring repaints the whole row live. +- =hue= — the face's native hue in degrees (0-360), computed from its defface foreground at capture time. Drives the group order and is the gate's ordering check. +- =glyphs= — an array of entries, each: =glyph= (the nerd-font glyph string) and =name= (the icon's nerd-font name, e.g. =nf-dev-terminal=, shown beneath the cell). Deduplicated by name within the face and sorted by name. + +A face with zero resolvable icons, or with no native color, produces no group. Faces resolve from the live alists at capture time, so the catalog tracks the installed nerd-icons version; a malformed or missing alist is skipped. + +** Gallery decisions [6/6] + +*** DONE Gallery content: distinct icons in a grid, grouped by color +- Context: the first cut rendered the full catalog with duplicates as flowing sections. After a live look, Craig wanted a clean grid of distinct icons. +- Decision: render *distinct* icons (deduplicated by name within each color) as a grid, one row per color face. Supersedes the original "full ~713-mapping catalog, duplicates kept" choice. ~700 mappings collapse to ~314 glyphs. +- Consequences: easier — a clean, scannable catalog with no repeated cells. Harder — the per-color count is "distinct icons," not "mappings," so it no longer doubles as a mapping-density signal. + +*** DONE Row order: by hue so color families cluster +- Context: the original order was by descending glyph count, which interleaves hues and reads as random (Craig's words). He asked for "blues together, reds together." +- Decision: order the rows by each face's native hue (ascending), ties by descending lightness. Hue is computed from the defface foreground at capture and frozen in the artifact, so the order doesn't reshuffle as the user recolors. Supersedes the count ordering. +- Consequences: easier — families cluster (spectral order); the gate's ordering check is a simple non-decreasing-hue assertion. Harder — a lone outlier (a near-red light-pink) can land at the far end of the spectrum. + +*** DONE Per-cell label: the icon's nerd-font name, shown beneath +- Context: the original cut used a hover tooltip for the source key and no visible label. Craig wanted the name visible under each icon. +- Decision: show the icon's nerd-font name (=nf-…=) beneath each cell. The capture stores =name= per glyph; the source key is dropped (dedupe collapses multiple sources anyway). Supersedes the hover-only source-key tooltip. +- Consequences: easier — the icon's identity is always on screen. Harder — wider cells; the source filetype/mode is no longer surfaced (the icon name is the identity that matters in a catalog). + +*** DONE Coexistence: the grid replaces the legend in the pane +- Context: the first cut kept the v1 legend above the gallery. Craig wanted the pane to be just the grid. +- Decision: the pane renders only the grid; the v1 legend is dropped from the render. The legend *data* stays in the artifact (still captured, still loaded, still round-trips) so v1's contract and tests are intact. Supersedes "second section below the legend." +- Consequences: easier — a single, focused view. Harder — the legend data rides unused in the artifact (kept for round-trip and possible future use). + +*** DONE Preview-pane size dropdown +- Context: the designer needs to see the glyphs at different sizes. A general preview-pane dropdown solves it. +- Decision: replace the static preview label with "preview:" + a =<select>=. One pane → disabled, names the preview. Multiple panes → enabled. nerd-icons gets one pane per font size in *points* (10/12/14/16/20/24/32/48, default 14) — pt because Emacs sizes fonts in =:height= (1/10 pt), so a pane maps to a real buffer size. The top sizes (32/48) are for inspecting a glyph's detail; the cell width scales with the size, so beyond ~48 pt the 314-icon grid becomes mostly scrolling. Left/Right arrows step the focused dropdown; the size persists per app. The dropdown is multi-pane only when the gallery actually exists, so a failed capture can't promise sizes the renderer can't draw. +- Consequences: easier — size selection with no new UI surface; the mechanism generalizes to any future multi-pane app. Harder — a second piece of per-pane state (the selected index). + +*** DONE Font embed: full Symbols Nerd Font Mono stays embedded +- Context: v1 deferred the full-font (2.1M HTML) vs glyph-subset call; the gallery forces it. +- Decision: keep the full =ThemeStudioNerd= woff2 embedded as a data: URI — the grid draws the whole glyph set, so a v1-curated subset would not cover it. +- Consequences: easier — every glyph renders, no subset bookkeeping. Harder — the ~2.1M self-contained HTML stays (fine for a personal tool). + +** Locate under the grid (color-level association) + +The gallery inverts the studio's usual ~1:1 element↔face association: the visible unit is the icon, but the only editable, locatable handle is the color, which owns ~10-40 icons. Decision (Craig): keep it color-level. Clicking a color in the faces table flashes all its icons; clicking an icon flashes its color row. Icons get no individual editable identity — their name is already on screen, so no flash is needed to identify one. The size dropdown rides cleanly on top: only the selected pane is ever rendered, so a flash always targets the visible size. The alternative (icons as first-class selectable entities) was rejected — there's nothing per-icon to edit, so it would invent a selection concept that fights the rest of the studio. + +** vNext implementation phases (as shipped) + +*** Phase G1 — Catalog capture (library + grid data) +=build-nerd-icons-legend.el= refactored to a library with =cj/nerd-icons-write-legend= (runtime nerd-icons require). Emits the deduped, hue-ordered =gallery= groups (=face= / =hue= / =glyphs:[{glyph,name}]=) into =nerd-icons-legend.json=. =generate.py= parses it (=load_nerd_icons_gallery= with absent/malformed/legacy-array/invalid-group fallbacks) and attaches it via =add_nerd_icons_app=. Unit-tested by =test-nerd-icons-legend-dump.el= (synthetic alists/faces: dedupe, hue order, lightness tiebreak, name sort, the skip rules). + +*** Phase G2 — Grid renderer +=renderNerdIconsPreview(sizePt)= draws the per-color grid (swatch + name header, a cell per icon with the name beneath), replacing the legend render. Live recolor; =genericPreview= fallback when no gallery. Browser-gated (=#nerdiconstest=: grid, hue order, dedupe, recolor). + +*** Phase G3 — Preview-pane dropdown +=previewPanes= / =buildPkgPreview= + the template's "preview:" =<select>= + the moved hover-info span. Size panes for nerd-icons (gated on gallery presence), disabled single pane elsewhere, Left/Right arrow nav, lighter dropdown background. Browser-gated (=#previewpanetest=; =#locatehovertest= updated for the moved info line). + +** vNext acceptance criteria +- [X] The nerd-icons pane is a grid: one row per owning =nerd-icons-*= face, swatch + name header, a cell per distinct icon with the icon name beneath, in the face's color. +- [X] Rows are ordered by hue so color families cluster; icons are deduplicated within a color. +- [X] Recoloring a =nerd-icons-*= face repaints its swatch and every icon in its row live. +- [X] A "preview:" dropdown selects the font size (pt); single-pane apps show a disabled dropdown naming the preview; Left/Right arrows step the sizes. +- [X] =nerd-icons-legend.json= carries a =gallery= key (=face= / =hue= / =glyphs:[{glyph,name}]=); nerd-icons absent/malformed → gallery omitted (generic-app fallback), never an error. +- [X] run-tests.sh green (Python + Node + ERT + browser gates). + +** vNext testing / verification (as shipped) +- Elisp (=test-nerd-icons-legend-dump.el=, 7 ERT): the capture logic with synthetic alists + faces — dedupe within a color, hue ordering, the lightness tiebreak, within-color name sort, and the three skip rules (no =:face=, unresolvable glyph, face with no native color). Runs under the theme-studio batch without nerd-icons. +- Python (=test_generate.py=): the =gallery= schema (each group a known =nerd-icons-*= face + numeric =hue= + non-empty =glyphs= each carrying =glyph= / =name=), hue ordering, dedupe, and the fallback edges (absent / malformed / legacy-array / empty glyphs / foreign face / non-numeric hue / non-dict entry → gallery omitted). +- Browser gates: =#nerdiconstest= (grid renders, hue order, dedupe, valid owner =data-face=, recolor repaints the row); =#previewpanetest= (multi-pane enabled with a pane per size, single-pane disabled, size drives the glyph pt, default 14, no-gallery → single pane + generic fallback, stale-index reset); =#locatehovertest= (hover info in its own span, cleared on leave). + +* Review and iteration history +** 2026-06-24 Wed @ 16:12:27 -0400 — Claude — author (gallery redesign, shipped) +- *What changed:* Rewrote the vNext section to the shipped shape after Craig's live review. The gallery is now a grid (one row per color face, rows ordered by hue so families cluster, distinct icons deduped within a color, the icon name beneath each cell) that replaces the v1 legend in the pane; a per-size "preview:" dropdown (pt, default 14, Left/Right arrows, gated on gallery presence) selects the glyph size. Updated the data contract (=glyphs:[{glyph,name}]= + a per-group =hue=), the decisions (now [6/6]: content → distinct-icon grid, order → hue, label → icon name, coexistence → grid replaces legend, the new size-dropdown decision; the superseded full-catalog/count-order/hover-label/second-section choices noted inline), the phases (G1 capture as a library + G2 grid renderer + G3 dropdown), the locate section (color-level association, Craig's call), and the testing (the new =test-nerd-icons-legend-dump.el= ERT, the Python edges, =#previewpanetest=). Marked acceptance done. +- *Why:* The first cut (full catalog, flowing sections, count order, hover-only source key, legend kept above) shipped, then Craig redesigned it on sight. A latent =face-hsl= bug surfaced while writing the ERT (=cadr= grabbed the keyword, not the plist) and was fixed. +- *Artifacts:* build-nerd-icons-legend.el (library refactor), nerd-icons-legend.json (regenerated: 34 groups, 314 distinct glyphs), previews.js / app.js / theme-studio.template.html (grid + dropdown), test-nerd-icons-legend-dump.el (7 ERT), test_generate.py (Python edges), browser-gates.js (=#previewpanetest= + updates). Decisions [6/6]. + +** 2026-06-24 Wed @ 14:12:54 -0400 — Claude — author (vNext gallery) +- *What changed:* Added the "vNext — nerd-icons gallery (full colored catalog)" section: every nerd-icons glyph in its real color, grouped under a per-face swatch + name header, as a second read-only section below the v1 legend in the same pane. Added the gallery data contract (=nerd-icons-legend.json= gains a =gallery= key), five gallery decisions [5/5], two vNext phases (G1 catalog capture extending =build-nerd-icons-legend.el=; G2 =previews.js= renderer), vNext acceptance criteria, and vNext testing. Updated Scope tiers to name the gallery as the primary vNext item. +- *Why:* Craig asked to widen the curated v1 legend to the full colored catalog. He chose the full-catalog-grouped-by-color scope over the representative-icon and deduped cuts; that choice subsumes the two roam asks (representative-icon-per-color, group-colors-together). The increment is purely additive display — no config/theme change — and rides the shipped glyph-rendering infrastructure (embedded =ThemeStudioNerd= woff2 + the unquoted-inline-font fix). The deferred v1 full-font-vs-subset call is settled here toward full-font, since the gallery needs the whole glyph set. +- *Artifacts:* docs/specs/theme-studio-nerd-icons-colors-spec.org (vNext section). Author-proposed calls (artifact layout, label-on-hover, coexistence) marked reopen-if-disagree. + +** 2026-06-23 Tue @ 22:17:25 -0400 — Claude — author +- What: initial draft. +- Why: Craig chose to spec the drop-the-tint + theme-studio filetype-legend feature before building (spans config + three theme-studio layers + the theme, with real trade-offs on color model, legend scope, seeding, and sequencing). +- Artifacts: docs/specs/theme-studio-nerd-icons-colors-spec.org; todo.org task (to be created at hand-off). + +** 2026-06-23 Tue @ 22:29:01 -0400 — Codex (emacs-d) — reviewer +- *What changed or was recommended:* First review, =Not ready=. Added three blocking findings: unresolved decisions still gate readiness; the legend needs a concrete generated data contract, especially for non-extension rows; and native-palette capture must be assigned to the existing default-face seed pipeline or to a clearly defined metadata override. +- *Why:* The code read confirmed the main direction fits the current implementation: =modules/nerd-icons-config.el= owns the bulk tint and dir advice; Theme Studio package views are generated from =BESPOKE_APP_SPECS= / =APPS= and rendered through =PACKAGE_PREVIEWS=; package export already serializes package face specs. The remaining gaps would still force implementers to invent schema, seed ownership, and sequencing behavior mid-build. +- *Artifacts:* findings [0/3]; code read: =modules/nerd-icons-config.el= (=cj/nerd-icons-apply-tint=, =cj/--nerd-icons-color-dir=), =scripts/theme-studio/build-inventory.el=, =scripts/theme-studio/capture-default-faces.py= / =emacs-default-faces.json=, =scripts/theme-studio/app_inventory.py=, =scripts/theme-studio/face_data.py=, =scripts/theme-studio/generate.py=, =scripts/theme-studio/previews.js=, =scripts/theme-studio/build-theme.el=, =tests/test-nerd-icons-config--apply-tint.el=, =tests/test-nerd-icons-config--color-dir.el=, =todo.org=. + +** 2026-06-23 Tue @ 22:37:50 -0400 — Claude — responder +- *What changed:* All three blocking findings accepted and folded in. Added a "Legend data contract" section (row schema + per-category sources + v1 scope). Corrected the data-capture story: native seed colors are owned by the existing default-face pipeline (verified untinted in =emacs-default-faces.json=), so the draft's =build-inventory.el= native-color capture is dropped; only the legend metadata is newly captured. Resolved all five open decisions and added a sixth (legend renders a swatch + label in v1, not the nerd-font glyph). Phase 1 retitled to "Legend capture". +- *Why:* The findings were correct against the code read. The seed-pipeline conflict was the load-bearing one — capturing native colors in =build-inventory.el= would have double-seeded against =apply_default_face_seeds=. Two calls (sequencing, glyph rendering) are author-proposed and marked reopen-if-disagree. +- *Artifacts:* findings [3/3]; decisions [6/6]; code verified this session: =capture-default-faces.py= (reads =face-default-spec= under =-Q=), =emacs-default-faces.json= (35 nerd-icons faces, native colors), =nerd-icons-extension-icon-alist= (330) / =nerd-icons-dir-icon-alist= (27) / =nerd-icons-completion-category-icons= sources. + +** 2026-06-23 Tue @ 22:56:49 -0400 — Claude — responder (decision flip) +- *What changed:* Flipped the legend-rendering decision from "swatch + label, glyph deferred to vNext" to v1 rendering the real glyph in its color, with a monospace fallback. Updated the schema's glyph/face notes, the acceptance criterion, and the font-dependency risk. +- *Why:* Craig asked why glyph rendering was deferred. The deferral rested on an unverified font dependency. Verified Nerd Fonts are installed system-wide (=fc-list=), so Chrome renders the glyphs from a =font-family= rule with no =@font-face= or embedded font file — v1-cheap. The authentic glyph beats the swatch, and the gate can still assert the glyph char + inline color. +- *Artifacts:* =fc-list= (system Nerd Fonts present: JetBrainsMono / Hack / Meslo Nerd Font Mono); decisions unchanged [6/6]. + +** 2026-06-23 Tue @ 23:03:39 -0400 — Codex (emacs-d) — reviewer +- *What changed or was recommended:* Second review, =Not ready=. Added six blocking findings: exact curated legend rows are still unspecified; =nerd-icons-completion-dir-face= crosses package ownership boundaries; directory face precedence between =nerd-icons-completion-dir-face= and =nerd-icons-yellow= is undefined; the legend metadata artifact/generator/failure behavior is open; the implementation phase order contradicts the sequencing decision; and the test plan needs named contract tests. +- *Why:* Reading as the implementer, the remaining gaps are all answer-changing. The current Theme Studio model keys package faces by app in =APPS= / =PKGMAP= and merges seed JSON by that key, while the spec wants one preview pane to reason across =nerd-icons= and =nerd-icons-completion=. The installed packages confirm the source shapes: =nerd-icons-extension-icon-alist=, =nerd-icons-dir-icon-alist=, =nerd-icons-mode-icon-alist=, and =nerd-icons-completion-category-icons= exist, but they do not by themselves choose the curated UX rows, ownership, precedence, or tests. +- *Artifacts:* findings [3/9]; code read: =scripts/theme-studio/app_inventory.py= (=BESPOKE_APPS= / =apply_package_overrides=), =scripts/theme-studio/generate.py= (=APPS= construction), =scripts/theme-studio/app.js= (=PKGMAP= / package table / =PACKAGE_PREVIEWS=), =scripts/theme-studio/browser-gates.js= (=assertPreviewFaces= owner-aware validation), =elpa/nerd-icons-20260325.346/nerd-icons.el= (=nerd-icons-extension-icon-alist=, =nerd-icons-dir-icon-alist=, =nerd-icons-mode-icon-alist=, =nerd-icons-icon-for-dir=), =elpa/nerd-icons-completion-20251029.2106/nerd-icons-completion.el= (=nerd-icons-completion-category-icons=, =nerd-icons-completion-dir-face=, =nerd-icons-completion-get-icon=), =tests/test-nerd-icons-config--apply-tint.el=, =tests/test-nerd-icons-config--color-dir.el=, =scripts/theme-studio/run-tests.sh=. + +** 2026-06-23 Tue @ 23:12:16 -0400 — Claude — responder +- *What changed:* All six round-2 findings accepted and folded in. Added the explicit "v1 legend rows" table (13 rows, ~10 distinct faces) + the missing-key rule. Rewrote the dir decision to settle precedence (=cj/--nerd-icons-color-dir= prepends =nerd-icons-yellow=, so it wins — verified in source) and cross-package ownership (the bespoke pane owns only =nerd-icons= faces; =nerd-icons-completion-dir-face= stays in its own generic app, since =apply_package_overrides= keys by app). Phase 1 names the concrete artifact (=nerd-icons-legend.json= + =build-nerd-icons-legend.el= generator + =generate.py= fallback). Reordered phases into an atomic Phase 3 (assign + drop-tint together) and a Phase 4 verification, matching the sequencing decision. Added a "Testing / Verification" section naming each changed contract's target. +- *Why:* The round was a strong implementer-eyed pass; every finding would have forced an invented contract mid-build. The load-bearing ones were the cross-package ownership (a face can't move between app-keyed panes) and the dir precedence (resolved by reading the prepend semantics), which together let the dir row point at =nerd-icons-yellow= and sidestep the tangle. +- *Artifacts:* findings [9/9]; decisions [6/6]; verified: extension/command/buffer face resolution from the live alists (the 13-row table), =app_inventory.py= =apply_package_overrides= (app-keyed merge), =cj/--nerd-icons-color-dir= prepend (=add-face-text-property … nil=). + +** 2026-06-23 Tue @ 23:17:51 -0400 — Codex (emacs-d) — reviewer +- *What changed or was recommended:* Third review, =Not ready=. Added one blocking finding: stale summary/readiness/risk text still contradicts the resolved implementation contract for native-color capture, legend artifact ownership, directory row ownership, and Phase 1 verification. +- *Why:* The main decisions now look implementation-ready, but an implementer should not have to reconcile earlier prose against the later decisions. The contradictions point at different code paths (=build-inventory.el= vs =build-nerd-icons-legend.el=, color capture vs default-face seeding, =nerd-icons-completion-dir-face= vs =nerd-icons-yellow=) and could produce the wrong implementation if followed literally. +- *Artifacts:* findings [9/10]; code read: =modules/nerd-icons-config.el= (=cj/--nerd-icons-color-dir= and tint removal target), =scripts/theme-studio/app_inventory.py= (=BESPOKE_APPS=, app-keyed merge), =scripts/theme-studio/generate.py= (=APPS= construction), =scripts/theme-studio/face_data.py= (=BESPOKE_APP_SPECS=), =elpa/nerd-icons-20260325.346/nerd-icons.el= and =elpa/nerd-icons-completion-20251029.2106/nerd-icons-completion.el= (legend source facts), =todo.org=. + +** 2026-06-23 Tue @ 23:24:56 -0400 — Claude — responder +- *What changed:* Accepted the one round-3 finding (pure consistency cleanup) and reconciled the stale summary text with the resolved decisions: Scope tiers, the For-the-implementer data-capture summary, the legend source paragraph (dir row = =nerd-icons-yellow= only), the Architecture-fit / Dev-tooling / External-APIs readiness dimensions (=build-nerd-icons-legend.el= → =nerd-icons-legend.json=, no defface capture), and the Risks section (dropped the moot defface-introspection risk for the stale-artifact / fallback risk). No decision or contract changed — only the prose that lagged behind them. +- *Why:* Round 2 settled the contracts but left earlier prose describing the superseded ones (color capture, =build-inventory.el=, dual dir sources), which an implementer could follow literally into the wrong path. +- *Artifacts:* findings [10/10]; decisions [6/6]; no code re-verification needed — the edits align prose to already-verified decisions. diff --git a/docs/specs/theme-studio-preview-locate-spec.org b/docs/specs/theme-studio-preview-locate-spec.org index bb77a2248..dee27e8c4 100644 --- a/docs/specs/theme-studio-preview-locate-spec.org +++ b/docs/specs/theme-studio-preview-locate-spec.org @@ -1,6 +1,6 @@ :PROPERTIES: :ID: fbcf0e20-1328-42b4-aa36-3401509e7816 -:STATUS: not-started +:STATUS: ready-pending-go :END: #+TITLE: Theme Studio Preview Element Locate — Spec #+AUTHOR: Craig Jennings @@ -8,14 +8,14 @@ #+TODO: TODO | DONE SUPERSEDED CANCELLED * Metadata -| Status | not-started | -|----------+----------------------------------------------------------------| -| Owner | Craig Jennings | -|----------+----------------------------------------------------------------| -| Reviewer | Craig Jennings | -|----------+----------------------------------------------------------------| -| Related | [[file:../../todo.org][todo.org: theme-studio preview locate + org-agenda app]] | -|----------+----------------------------------------------------------------| +| Status | Ready pending Craig's go — four reviews incorporated (Codex, 2026-06-23) | +|----------+------------------------------------------------------------| +| Owner | Craig Jennings | +|----------+------------------------------------------------------------| +| Reviewer | Craig Jennings | +|----------+------------------------------------------------------------| +| Related | [[file:../../todo.org][todo.org: theme-studio preview locate + org-agenda app]] | +|----------+------------------------------------------------------------| * Summary @@ -28,10 +28,10 @@ Previews render representative text styled by many faces, but there is no way to * Goals and Non-Goals ** Goals -- Hover any previewed element to see its section (owning app), element (face name), and current value/attributes. +- Hover any data-face previewed element to see its section (owning app), element (face name), and current value/attributes. - Click an element whose face is assigned on the current pane to flash it and scroll/select its assignment row. - Leave off-pane elements non-clickable, with the hover still naming section and element so the user can navigate there. -- A face -> (owning app, value, attributes) registry built from the app/face model, read by previews for both hover content and the clickable test. +- A face -> (owning app, value, attributes) registry spanning the two data-face surfaces (the UI mock via UIMAP, package previews via PKGMAP/APPS), read by previews for both hover content and the clickable test. - Let previews render substrings in faces owned by other panes live from the shared state (the org-agenda agenda mock is the first to need this); those render correctly and are the hover-only, non-clickable elements. ** Non-Goals @@ -41,21 +41,53 @@ Previews render representative text styled by many faces, but there is no way to - Not a search/filter over faces; this is point-and-identify, not query. ** Scope tiers -- v1: the face registry; hover tooltips (section + element + value/attributes) on every previewed element; click-to-row for current-pane faces; off-pane elements non-clickable; the os() preview tag that carries each element's face; one existing preview wired as the showcase. -- Out of scope: cross-pane click navigation; editing from the preview; persisting any of this. -- vNext: a "reveal in pane" affordance for off-pane elements (switch pane + scroll) if the hover-only model proves too manual. +- v1: the face registry over the two data-face surfaces (UI mock + package previews); hover tooltips (section + element + value/attributes) via the =title= attribute on every data-face previewed element; click-to-row for current-pane faces; off-pane elements non-clickable; the os() preview tag carrying owning app + face + an on-pane flag; updated preview gates that validate each data-face against its owner app; a gate-only synthetic showcase fixture (one package-owned + one =@ui= off-pane span) — no user-facing preview changes in v1. +- Out of scope: the syntax/code (=data-k=) tier in the registry — it already has its own =cp.onclick= -> =flashAssign= locate path; cross-pane click navigation; a custom (non-=title=) tooltip; editing from the preview; persisting any of this. +- vNext: folding the syntax/code tier into the unified registry; a focusable hover/focus info strip for keyboard-only wayfinding (v1 is pointer-driven); a "reveal in pane" affordance for off-pane elements (switch pane + scroll) if the hover-only model proves too manual. * Design ** For the user -Move the pointer over any colored element in a preview. A tooltip names the section it belongs to (the app, e.g. "org-faces"), the element (the face, e.g. "org-faces-todo"), and its current value and attributes (e.g. foreground #8fbf73, bold). If that face is one you can edit on the pane you are looking at, click it: the element flashes and the pane scrolls to and highlights its assignment row, ready to tune. If the element is colored by a face from another pane -- a keyword in the agenda is owned by org-faces, a tag by org-mode -- clicking does nothing, but the tooltip has already told you which section and element to go find. Every preview becomes a legend you can interrogate. +Move the pointer over any colored element in a preview. A tooltip names the section it belongs to (the app, e.g. "org-faces"), the element (the face, e.g. "org-faces-todo"), and its current value and attributes (e.g. foreground #8fbf73, bold). If that face is one you can edit on the pane you are looking at, click it: the element flashes and the pane scrolls to and highlights its assignment row, ready to tune. If the element is colored by a face from another pane -- a keyword in the agenda is owned by org-faces, a tag by org-mode -- clicking does nothing, but the hover has already told you which section and element to go find. As you move over elements, a single info line in the preview-label area updates immediately with "section > face — value", so wayfinding doesn't wait on the browser's delayed native tooltip; the =title= stays as the deterministic fallback. (v1 wayfinding is pointer-driven; keyboard-focus traversal of preview spans is a vNext accessibility item.) Every preview becomes a legend you can interrogate. ** For the implementer -- Registry: a derived map from face name to its owning app, current value, and resolved attributes, built from the same app/face assignment state the editor already holds. One source for hover content and the is-on-this-pane test. Rebuilt when assignments change. -- Preview tagging: the os(app, face, text) preview helper already wraps each element in a span keyed by face; extend it to also carry the face name and owning app as data attributes so hover and click can resolve without re-parsing. -- Hover: a tooltip (or info strip) that reads the registry for the hovered element's face and shows section + element + value/attributes. -- Click: if the element's face is owned by the current pane, flash the element and scroll/select its assignment row; otherwise the element is non-interactive (cursor and affordance reflect that). -- Cross-pane rendering: os() resolves a face's current color from the registry regardless of which pane owns it, so a preview can faithfully render off-pane faces (the agenda mock's keywords/priorities/tags) while marking them non-clickable. +- Registry: a derived map keyed by *owner-qualified identity* — ={surface, owner, face}=, where surface is =ui= or =package=, owner is an internal key (=@ui= for the UI surface, the app-key for a package), and face is the face name. Keying by face name alone is unsafe: the same face name can appear under two owners on one DOM surface, so the owner is part of the key, and =data-owner-app= carries that internal owner key (never the display label like "UI faces", which is shown separately). Each entry holds the owning app/surface, current value, and resolved attributes, built from the assignment state the editor already holds (UIMAP for UI faces, PKGMAP/APPS for package faces). The syntax/code tier (=data-k=, keyed by syntax *kind* not face) is deliberately not in the registry for v1; its code preview already locates via =cp.onclick= -> =flashAssign=. All lookups (=locateFaceMeta=, =isLocateOnPane=, =previewFaceAttrs=, =assertPreviewFaces=) take =(owner, face)=, never face alone. +- Preview tagging: rendering goes through =previewSpan(owner, face, text)=, which dispatches by the owner's surface — a package owner uses the existing =ofs=/=PKGMAP= path, =@ui= uses the =uiCss=/=resolveUiAttr=/=UIMAP= path — and emits the shared locate attributes in both branches: =data-owner-app= (the owner key), =data-face=, and a derived =locate-onpane= flag set when the owner is the pane being viewed. =os(app, face, text)= stays as a thin backward-compatible wrapper that calls =previewSpan= with the package owner, so existing previews are unchanged. +- Hover: read the registry for the hovered element's =(owner, face)= and show section + element + value/source (see the tooltip contract below). +- Click: if =locate-onpane= is set, scroll the element's assignment row into view and flash it (=flashRow=) in the owning pane's table (=flashUi= for UI faces, =flashPkg= for package faces). "Select" is deliberately not used — v1 is the =flashRow= animation only, no persistent selected-row state. Off-pane and unassigned elements have no click handler. +- Affordance: on-pane spans get =cursor:pointer= plus a =locate-onpane= class (the class the gate checks); off-pane and unassigned spans get the default cursor and no handler. Every element's =title= begins with the owning section, so an element that doesn't respond to a click explains why on hover rather than feeling broken. +- Cross-pane rendering: =os= resolves a face's current color from the registry regardless of which pane owns it, so a preview can faithfully render off-pane faces (the agenda mock's keywords/priorities/tags; the completion preview's minibuffer-prompt and highlight) while marking them hover-only. +- Gates: =assertPreviewFaces= currently asserts every =data-face= belongs to the *current* app's faces; change it to validate each element against its =data-owner-app='s face set (UIMAP keys for UI faces, =APPS[owner].faces= for package faces), so an intentional cross-pane span passes instead of failing. + +** Implementation shape (refactor boundaries) +The current paths spread preview ownership across global =PKGMAP= / =UIMAP= / =APPS= state, =os= in previews.js, click handlers in =buildMockFrame= / =buildPkgPreview=, the flash helpers in app.js, and the membership check in browser-gates.js. To keep the state matrix determinate, the logic lives as pure helpers in =app-core.js= (already the home of =faceCss=, =faceAttrs=, =buildPkgmap=, =packagesForExport=), with DOM code as thin adapters: +Pure helpers in app-core.js — all state passed in, no globals, returning data never HTML: +- =buildLocateRegistry(apps, pkgmap, uimap, map)= — the derived ={surface,owner,face}= -> value/attributes/source map (=map= is the ground palette for the =effFg=/=effBg= floors). +- =locateFaceMeta(owner, face, registry)= — value + attributes + per-attribute source for one owner-qualified face. +- =formatLocateTitle(meta)= — the =title= string from effective value + source notes. +- =previewFaceAttrs(owner, face, registry)= / owner validation — the owner-aware membership check the gate calls. +- =isLocateOnPane(owner, currentApp)= — the clickable predicate (recomputed at render time, not cached). + +The one stateful piece, =previewSpan(owner, face, text)=, lives in previews.js (not app-core.js): a thin adapter that reads the current globals (PKGMAP / UIMAP / MAP), dispatches by surface to =faceCss= (package) or =uiCss= (UI), calls the pure helpers for the locate attributes, owns HTML escaping (=esc=), and emits the span. =os= delegates to it for package owners. This is the boundary: pure data helpers in app-core.js (Node-tested), the escaped-HTML renderer in previews.js (browser-gated). + +After the gates are green, the post-feature cleanup (Phase 5) replaces the =buildPkgPreview= / =buildMockFrame= face-click branches with a single locate dispatcher; the =data-k= syntax-click path stays separate for v1. + +** Tooltip contract (v1) +The deterministic hover surface in v1 is the element's native =title= attribute (the preview-label info line mirrors it on mouse hover — see wayfinding). =value= is always the *effective rendered* color — what the eye actually sees — never the raw assignment, because a cleared or unset attribute still renders a floored/inherited default and a tooltip that disagreed with the pixels would mislead. The =title= reads, comma-separated: +- *section* — the owning app / surface display label (e.g. "org-faces", "UI faces"). +- *element* — the face name (e.g. "vertico-current"). +- *value* — the effective rendered foreground and, when set, background as hex (after =effFg=/=effBg= flooring and inherit resolution). +- *attributes* — the non-default ones among weight, slant, underline, strike, box, inherit, height, inverse, extend. +- *source note* — per attribute, how that effective value arose: "direct", "inherited from <face>", "default foreground" / "default background", or "cleared, rendering as default". + +The source note is what disambiguates the states the value alone can't: +- *direct* — a value was assigned on this face. +- *inherited* — the value comes from an =:inherit= (package via =effResolve=, UI via =resolveUiAttr=); names the source face. +- *cleared* — the assignment was explicitly removed; the note says "cleared, rendering as default fg/bg" while =value= still shows the rendered default, so the tooltip matches the pixels. +- *unassigned* — the face resolves to no owning app; the element is hover-only, never a dead click. + +** Registry lifecycle +One cached, module-level registry. It is rebuilt *once per assignment / import / reset / clear batch*, before preview spans render — hooked into the existing rebuild points (=uiSelect=->=paintUI=/=buildMockFrame=, =pkgChanged=, import replacing =UIMAP=/=PKGMAP=, =resetApp=/=resetUnlocked=) — never per hover and never per =previewSpan= call. Palette edits that change effective displayed values without changing ownership refresh the values on that same rebuild. =locate-onpane= is *not* stored in the registry: it's recomputed from the current view at render time, because switching the viewed pane changes clickability but not ownership. Budget: registry build is linear in face count, does no DOM queries, and a pure-Node test holds it under a small millisecond threshold over the full UI+package face set. * Alternatives Considered @@ -101,41 +133,153 @@ Move the pointer over any colored element in a preview. A tooltip names the sect - Decision: os() resolves any face's current color from the registry, so previews render off-pane faces live; those elements are the non-clickable, hover-only ones. - Consequences: easier -- previews read real and the agenda mock is possible; harder -- the preview must distinguish own-pane (clickable) from off-pane (hover-only) elements. +* Review findings [14/14] + +** DONE Registry scope does not cover all preview element tiers :blocking: +The spec promises "every rendered preview element" and "every previewed element", but the current theme-studio has three separate assignment surfaces: syntax/code preview spans use =data-k= and =SYNTAX/MAP= with =flashAssign=, the UI mock uses =data-face= and =UIMAP= with =flashUi=, and package previews use =data-face= and =PKGMAP/APPS= with =flashPkg=. A face -> owning-app registry built only from the app/face model would cover package panes but not syntax categories or UI faces, so an implementer would have to invent whether v1 is package-preview-only or how =@code= / =@ui= elements enter the registry. Define the registry's full key and owner model across all three tiers, or explicitly narrow v1 to package previews and adjust the "every previewed element" acceptance criteria. (blocking) + +Disposition: *modified.* The finding is right that the unscoped "every previewed element" language hides three surfaces; verified in code (data-k/flashAssign syntax, data-face/UIMAP/flashUi UI mock, data-face/PKGMAP/flashPkg package). Rather than build a registry across all three, v1 covers the *two data-face surfaces* — the UI mock (UIMAP) and package previews (PKGMAP/APPS) — and explicitly leaves the syntax/code tier out of the registry, because that tier already has its own go-from-render-to-assignment path (=cp.onclick= -> =flashAssign= on =data-k= spans). Scoping to the data-face surfaces is also what the two known consumers actually need: the org-agenda showcase mixes org-faces and org-mode (both package faces), and the planned minibuffer-completion preview needs minibuffer-prompt and highlight (UI faces). Folded into Goals, Scope tiers, Design (implementer registry model), and the acceptance criteria, which now say "every data-face previewed element." + +** DONE Cross-pane preview contract and gates are underspecified :blocking: +The current =os(app, face, text)= helper both styles an element and emits only =data-face=, while =buildPkgPreview= clicks any =data-face= by calling =flashPkg= against the current =#pkgbody= row. The browser helper =assertPreviewFaces= currently asserts every preview =data-face= belongs to the current app's =APPS[app].faces=. A preview that intentionally renders =org-faces= or =org-mode= faces from the future =org-agenda= pane will either fail the existing gate or silently produce non-clicks with no explicit hover/cursor contract. Specify the data attributes and tests for cross-pane spans, e.g. =data-owner-app= plus =data-face= and a derived clickable class/flag, state that =os= takes the owning app rather than the current preview app, and update preview gates to validate off-pane faces against their owner app. (blocking) + +Disposition: *accepted as written* — precise and code-correct. Verified: =os= at previews.js:6 emits only =data-face=; =fr.onclick=/=buildPkgPreview= route by =data-face= against the current pane; =assertPreviewFaces= (browser-gates.js:41) checks membership in the current app's faces. Folded the contract into Design (implementer): =os= carries the owning app and emits =data-owner-app= + =data-face= + a derived =locate-onpane= flag; click routes by owner (on-pane -> flash that pane's row; off-pane -> inert with a non-interactive cursor); =assertPreviewFaces= validates each =data-face= against its =data-owner-app='s face set (UIMAP for UI faces, PKGMAP/APPS for package faces) rather than the current app. Reflected in Phases 2 and 4, the acceptance criteria, and a gate note. + +** DONE Tooltip value formatting is not testable yet +The spec says the hover shows current value and attributes, but the current model has raw assignments, effective inherited values, cleared/default source states, and extra attributes such as weight, slant, underline, strike, box, inherit, height, inverse, and extend. Without a minimal display contract, tests can only assert that "some value" appears, and implementations may disagree on whether to show raw =fg=null=, effective foreground, inherited source, or only non-default attributes. Define the v1 tooltip fields and fallback strings for assigned, default/inherited, cleared, and unassigned elements; this can reuse the existing =title= hover surface if a custom tooltip is not part of v1. (non-blocking) + +Disposition: *accepted*, taking the reviewer's own escape hatch — v1 reuses the existing =title= attribute as the hover surface (no custom tooltip), with a defined field contract so it's testable. Added a "Tooltip contract (v1)" subsection to Design fixing the fields (section, face, effective fg/bg, non-default attributes) and the four fallback strings (assigned / inherited-from-X / cleared / unassigned). Reflected in the acceptance criteria and the Observability + Errors readiness dimensions. + +** DONE Deterministic test matrix is not explicit enough :blocking: +The spec names "unit-tested", "browser-gate tested", and "one existing preview wired as the showcase", but that is not enough to judge whether the corner cases are covered. This feature has a small state matrix that can fail silently: UI-owned on-pane, package-owned on-pane, package-owned off-pane, UI-owned off-pane from a package preview, missing owner, inherited value, cleared value, explicit fg/bg, non-default structural attributes, assignment edits that rebuild the registry, and view switches that change the on-pane flag. V1 should require a deterministic test matrix: pure Node tests for registry construction, owner lookup, tooltip formatting, and owner-aware preview-face validation; browser gates for exact =title= strings, =data-owner-app= / =locate-onpane= attributes, on-pane click flashing the correct row, off-pane click leaving rows unflashed, stale-registry prevention after an edit, and =assertPreviewFaces= accepting intentional off-pane spans while rejecting bad owners. Manual inspection can remain only for the subjective "is this discoverable enough?" check. (blocking) + +Disposition: *accepted as written.* The matrix matches the project's deterministic-test discipline and the existing harness — run-tests.sh auto-discovers pure-Node =*.mjs= tests and drives headless-Chrome hash gates, so both halves have a home. Added a "* Testing strategy" section enumerating the full state matrix split into pure-Node unit cases (registry construction, owner lookup, tooltip formatting including the four fallbacks, owner-aware face validation) and browser gates (exact =title= strings, =data-owner-app= / =locate-onpane= attributes, on-pane click flashes the right row, off-pane click leaves rows unflashed, post-edit registry rebuild, =assertPreviewFaces= accepts intentional off-pane spans and rejects bad owners), with manual inspection limited to the subjective discoverability read. Mirrored in the acceptance criteria and the Dev-tooling dimension. + +** DONE Refactoring boundaries are missing before the feature adds more coupling :blocking: +The current implementation already spreads preview ownership across global =PKGMAP= / =UIMAP= / =APPS= state, =os= in =previews.js=, click handlers in =buildMockFrame= and =buildPkgPreview=, row flash helpers in =app.js=, and gate membership checks in =browser-gates.js=. Adding owner-aware registry, tooltip formatting, click routing, and cross-pane rendering directly inside those paths will multiply special cases and make the test matrix harder to keep determinate. Specify the refactor shape before implementation: extract pure helpers in =app-core.js= (or another imported core module) for =buildLocateRegistry=, =locateFaceMeta=, =formatLocateTitle=, =previewFaceAttrs= / owner validation, and =isLocateOnPane=; keep DOM code as thin adapters that call those helpers. Also name the after-feature cleanup: replace the direct =buildPkgPreview= / =buildMockFrame= face-click branches with one locate dispatcher once the gates are green, while leaving the =data-k= syntax path separate for v1. (blocking) + +Disposition: *accepted*, with the refactor split into bracketing phases so each leaves a working tree. Confirmed app-core.js is already the pure-helper home (faceCss, faceAttrs, buildPkgmap, packagesForExport are pure and Node-tested), so the named helpers fit the existing pattern. Added "Phase 0 — Pure-helper extraction" before the feature work (buildLocateRegistry, locateFaceMeta, formatLocateTitle, previewFaceAttrs/owner-validation, isLocateOnPane in app-core.js, DOM left as thin adapters, each helper unit-tested) and "Phase 5 — Locate-dispatch cleanup" after the click phase (replace the buildPkgPreview / buildMockFrame face-click branches with one locate dispatcher once gates are green; the data-k syntax path stays separate for v1). Recorded in an "Implementation shape" note in Design. + +** DONE Preview affordance and row selection semantics are still ambiguous +The spec says off-pane elements are "non-clickable" and current-pane elements "scroll/select" their assignment row, but it does not define the visible affordance. Existing CSS already makes =.mock [data-face]= look clickable, and package preview spans may inherit no distinct cursor at all; an inert off-pane element with a delayed native title can feel broken rather than intentionally hover-only. V1 should define exact affordances: on-pane spans get pointer cursor plus whatever clickable class/name the gate checks; off-pane and unassigned spans get default/help cursor and no click handler; the title begins with the owning section so the user can tell why a click is unavailable; and "select row" means either an existing persistent selected-row state or only the current =flashRow= animation, but not both words. Add browser-gate assertions for the cursor/class split and for the clicked row's visible result. (non-blocking) + +Disposition: *accepted*, resolving the "select row" ambiguity toward flash-only. =flashRow= (app.js:560) already does scrollIntoView + a flash animation and there is no persistent selected-row state for these tables; adding one is scope creep, so v1 means "scroll the assignment row into view and flash it (=flashRow=)", and the word "select" is dropped. Defined the affordance split: on-pane spans get =cursor:pointer= plus a =locate-onpane= class the gate checks; off-pane and unassigned spans get the default cursor and no click handler; every element's =title= begins with the owning section so an unavailable click explains itself. Folded into Design (For the user + implementer), with matching browser-gate assertions for the cursor/class split and the flashed-row result added to the Testing strategy and acceptance criteria. + +** DONE Registry identity is under-specified and face-name keys can collide :blocking: +The design still says the registry is keyed by face name, while also spanning UI faces and package faces. That is not a safe implementation contract: =data-owner-app= exists precisely because a face name alone is not enough once package-owned spans, UI-owned spans, and future cross-pane spans share one DOM surface. It also makes missing-owner and bad-owner validation ambiguous. Define the registry key as an owner-qualified identity, e.g. ={surface:"ui", owner:"@ui", face}= and ={surface:"package", owner:<app-key>, face}=, and make =data-owner-app= an internal owner key, not the display label "UI faces". =locateFaceMeta=, =isLocateOnPane=, =previewFaceAttrs=, and =assertPreviewFaces= should all take owner+face, never face alone. Add a test with the same face name under two owners (or a synthetic constructed collision) to prove the registry does not collapse them. (blocking) + +Disposition: *accepted as written* — a real correctness bug in the draft (face-name keys collide across surfaces, and "UI faces" was being used as both display label and owner key). The registry is now keyed by an owner-qualified identity: ={surface, owner, face}=, owner being an internal key (=@ui= for the UI surface, the app-key for a package), distinct from the display label. =data-owner-app= carries that owner key. =locateFaceMeta=, =isLocateOnPane=, =previewFaceAttrs=, and =assertPreviewFaces= all take =(owner, face)=, never face alone. Folded into Design (registry model + implementation shape); a same-face-name-under-two-owners collision case is in the Testing strategy. + +** DONE Cross-surface rendering API is not explicit enough :blocking: +=os(app, face, text)= currently styles through =ofs=, which resolves =PKGMAP[app][face]= and package inherit chains. The spec says cross-pane previews can render UI faces such as =minibuffer-prompt= and =highlight=, but it does not define how =os= styles a UI-owned face from a package preview. Passing "UI faces" or =@ui= to =ofs= would not work because UI faces live in =UIMAP= and use =uiCss= / =resolveUiAttr=, not =PKGMAP=. Define the preview-span API before implementation: either a single =previewSpan(owner, face, text)= that dispatches by owner surface, or separate wrappers such as =os(app, face, text)= for package faces and =uos(face, text)= for UI faces, both sharing the locate attributes. The API should remain backward-compatible for existing package previews so Phase 2 is mechanical and stoppable. (blocking) + +Disposition: *accepted as written* — verified: =ofs= resolves =PKGMAP[app][face]= via =pkgEffFg/pkgEffBg= -> =effResolve=, while UI faces render through =uiCss= + =resolveUiAttr= over =UIMAP=, so =os= cannot style a UI face. The API is now =previewSpan(owner, face, text)= dispatching by owner surface (=@ui= -> the =uiCss=/=UIMAP= path, a package owner -> the existing =ofs=/=PKGMAP= path), emitting the shared locate attributes (=data-owner-app=, =data-face=, the =locate-onpane= class) in both branches. =os(app, face, text)= stays as a thin backward-compatible wrapper that calls =previewSpan= with the package owner, so existing previews are untouched and Phase 2 stays mechanical. Folded into Design (preview tagging + implementation shape) and the phase split. + +** DONE Tooltip source-state semantics still conflate raw and effective values :blocking: +The tooltip contract names assigned, inherited, cleared, and unassigned, but it does not say whether =value= is raw assignment, effective rendered value, or both. Current rendering floors unset foreground/background through =effFg= / =effBg=, package faces can inherit via =effResolve=, UI faces can inherit via =resolveUiAttr= for a small built-in chain, and =source:"cleared"= can still render an effective default color. A tooltip saying "cleared (no foreground)" while the preview visibly has a foreground would be confusing. Specify the exact contract: show effective rendered fg/bg as the value, and separately show source notes such as "direct", "inherited from X", "default foreground", "default background", or "cleared, rendering as default fg/bg". Cover fg-cleared, bg-cleared, inherited package value, inherited UI value, and fully unassigned in pure tests. (blocking) + +Disposition: *accepted* — this supersedes the round-1 tooltip contract (finding 3), which conflated the two. Verified the floors: =effFg/effBg= fall back to =MAP['p']=/=MAP['bg']=, package inherit via =effResolve=, UI inherit via =resolveUiAttr=, and a =cleared= source still renders an effective default. Rewrote the Tooltip contract: =value= is always the *effective rendered* fg/bg (what the eye sees), and a separate *source note* per attribute says "direct", "inherited from X", "default foreground/background", or "cleared, rendering as default" — so a cleared face reads "cleared, rendering as default fg" rather than the misleading "no foreground". Pure tests cover fg-cleared, bg-cleared, inherited-package, inherited-UI, and fully-unassigned. + +** DONE Phase boundaries are not clean enough for a stop-between-phases implementation :blocking: +The phases now include useful pre/post refactors, but Phase 2 still implies changing =os=, every preview span, owner-aware validation, cross-pane styling, and gate behavior in one step. Because =os= is used by every bespoke preview, a half-done Phase 2 can break most package previews or invalidate existing preview gates. Split the phase so each stop leaves a working tree: first add backward-compatible span helpers and owner attributes with package owners defaulting to the current app; then update =assertPreviewFaces= to accept owner-qualified package spans while preserving existing same-app previews; then add UI-owner rendering support; only then wire a showcase off-pane span. Each subphase should name the gate that proves existing previews still pass before the next step. (blocking) + +Disposition: *accepted as written.* Split the old Phase 2 into four stoppable subphases, each naming the gate that proves existing previews still pass: 2a — add =previewSpan= + owner attributes, package owner defaulting to the current app, =os= delegating to it (gate: all existing package-preview gates still pass unchanged); 2b — update =assertPreviewFaces= to accept owner-qualified package spans while same-app previews still pass (gate: existing previews green under the new validator); 2c — add =@ui= rendering support to =previewSpan= (gate: a UI face renders correctly off a package preview); 2d — wire one showcase off-pane span (gate: the off-pane span renders, is hover-only, and passes the owner-aware validator). Reflected in the Implementation phases. + +** DONE Hover-only wayfinding may be too weak with native title alone +The rejected auto-switch alternative had one strong property: it gave an immediate path to the owning pane. V1 rejects the jump, but the spec currently keeps only delayed native =title= text as the substitute, which is weak for dense previews and inaccessible to keyboard-only exploration. Pull the useful part of the rejected scenario without adding auto-navigation: update a small existing status surface (for example the preview label area or a single non-card info strip) on hover/focus with "section > face — value", while keeping =title= as the deterministic browser fallback. This preserves off-pane non-clickability but makes the wayfinding immediate and testable. If v1 deliberately stays native-title-only, record that as an accepted UX caveat and add a vNext task for a hover/focus info strip. (non-blocking) + +Disposition: *modified.* Took the immediate-wayfinding half: v1 updates a single info line in the existing preview-label area on mouse hover (=mouseover=) with "section > face — value", with =title= kept as the deterministic fallback and the gate target. Deferred the *keyboard/focus* half to vNext: the previews aren't keyboard-navigable today, so making spans focusable is a separate lift, not a bolt-on — recorded as an explicit accepted accessibility caveat (v1 wayfinding is pointer-driven) plus a vNext task for a focusable hover/focus info strip. This keeps v1 bounded while still fixing the "delayed title is too weak" complaint for the common (mouse) path. Folded into Design (For the user + a wayfinding note), Scope tiers (vNext), and a browser-gate for the info-line content. + +** DONE Registry lifecycle and performance budget are underspecified :blocking: +The likely cost of a registry over UI faces plus package faces is acceptable, but the spec does not define the cache lifecycle. Current edit paths rebuild tables/previews from many places: =uiSelect= calls =paintUI= + =buildMockFrame=, package controls call =pkgChanged=, import replaces =UIMAP= / =PKGMAP=, reset/clear paths mutate many faces, and palette edits can change effective displayed values without changing face ownership. Without an explicit lifecycle, an implementation may rebuild the registry for every =os()= span, or worse, leave stale =title= and clickability after import/reset/edit/view-switch. Define a single cached registry variable plus an invalidation/rebuild contract: rebuild once per assignment/import/reset batch before rendering preview spans; never rebuild per hover/span; recompute =locate-onpane= from current view at render time; and include a rough budget/gate such as "registry build over all current UI + package faces stays below a small millisecond threshold in a Node test" or at least "linear in face count, no DOM queries." (blocking) + +Disposition: *accepted as written.* Verified the many edit paths (=uiSelect=->=paintUI=/=buildMockFrame=, =pkgChanged=, import replacing =UIMAP=/=PKGMAP=, =resetApp=/=resetUnlocked=). Added a "Registry lifecycle" subsection to Design: one cached module-level registry; rebuilt once per assignment/import/reset/clear batch *before* preview spans render (hooked into the existing rebuild points, not per span and never per hover); =locate-onpane= is *not* stored in the registry — it's recomputed from the current view at render time, since a view switch changes clickability but not ownership; palette edits that change effective values without changing ownership refresh values on the same rebuild. Perf gate: a pure-Node test asserts the build is linear in face count, does no DOM queries, and stays under a small ms threshold over the full UI+package face set. Added to the Testing strategy. + +** DONE Pure-helper boundary and state inputs are still inconsistent :blocking: +The spec says the locate logic lives as pure helpers in =app-core.js=, but the named helper signatures still depend on state they do not accept. =buildLocateRegistry(apps, pkgmap, uimap)= is supposed to store effective rendered values and source notes, yet effective package/UI rendering also depends on default ground colors from =MAP= (=effFg=/=effBg= floor through =MAP['p']= / =MAP['bg']=) and UI/package inheritance helpers that need the current maps. Likewise =previewSpan(owner, face, text)= is listed as a pure helper, but the current render path in =previews.js= uses globals (=PKGMAP=, =MAP=, =faceCss=, =effFg=, =pkgEffFg=, =pkgEffBg=), while UI rendering uses =uiCss= / =resolveUiAttr= over =UIMAP=. Two implementers could both follow this spec and put the stateful string renderer either in =app-core.js= or in the DOM/global layer, with different test seams and cache behavior. Decide the boundary explicitly before implementation: either pass all required state into pure helpers (for example registry/map/uimap/pkgmap/current-owner plus any escaping/style dependencies), or split it into pure =locateAttrs= / =locateMeta= / =formatLocateTitle= helpers in =app-core.js= and a stateful =previewSpan= adapter in =previews.js= / =app.js=. The final spec should name the exact signatures and which layer owns HTML escaping. (blocking) + +Disposition: *accepted*, resolved toward the split (the second option). The spec previously listed =previewSpan= among the app-core.js pure helpers, which was the inconsistency — it can't be pure because it reads =PKGMAP=/=UIMAP=/=MAP= and emits escaped HTML. The boundary is now explicit. Pure helpers in app-core.js, all state passed in, no globals, returning data (never HTML): =buildLocateRegistry(apps, pkgmap, uimap, map)= (=map= added for the =effFg=/=effBg= ground floors), =locateFaceMeta(owner, face, registry)=, =formatLocateTitle(meta)=, =previewFaceAttrs(owner, face, registry)=, =isLocateOnPane(owner, currentApp)=. The one stateful piece, =previewSpan(owner, face, text)=, lives in previews.js as a thin adapter: it reads the current globals, calls the pure helpers plus the existing =faceCss=/=uiCss= style functions, owns HTML escaping (=esc=), and emits the span. =os= delegates to it. Recorded the exact signatures and the escaping owner in the Implementation shape note, Phase 0 (pure helpers only), and Phase 2a (the adapter). + +** DONE Showcase preview is not named precisely enough :blocking: +The spec repeatedly requires "one existing preview wired as the showcase" and Phase 2d says to "wire one showcase off-pane span", but it never names the preview, the exact off-pane owner(s), or the text spans to add. That matters because the natural motivating examples are not both obviously available in v1: the org-agenda app is still planned, while the completion preview appears to be a separate spec. Leaving this open forces the implementer to decide whether to mutate the current =org-mode= preview with cross-owner spans, wait for another feature, add a synthetic browser-only fixture, or build a new showcase preview as part of this work. Choose one concrete showcase path and define the exact spans and owners it must exercise (for example a named existing package preview with one package-owned off-pane span and one =@ui= span, or explicitly allow a gate-only fixture if no user-facing existing preview should change yet). The acceptance criteria and Phase 2d gate should reference that named showcase. (blocking) + +Disposition: *accepted*, choosing the gate-only fixture. Neither natural consumer is available in v1 (org-agenda is unbuilt; the completion preview is a separate spec), and grafting demo cross-owner spans onto a real preview like =org-mode= would be misleading — the org-mode preview has no business showing a UI face. So v1's showcase is a *gate-only synthetic fixture* in browser-gates (no user-facing preview changes): a host package-preview context rendering exactly two off-pane spans — one package-owned (a face from a different package app) and one =@ui= span (=minibuffer-prompt=) — asserting each renders with its owner's real color, is hover-only (no =locate-onpane=, default cursor), and passes the owner-aware =assertPreviewFaces=. The first real cross-owner preview to ship (org-agenda or the completion preview) becomes the organic showcase then. Phase 2d and the acceptance criteria now reference this named fixture. + * Implementation phases +** Phase 0 — Pure-helper extraction (pre-work) +Before any feature behavior, land the pure helpers in app-core.js — buildLocateRegistry(apps, pkgmap, uimap, map), locateFaceMeta, formatLocateTitle, previewFaceAttrs/owner-validation, isLocateOnPane — all state passed in, no globals, returning data not HTML. The stateful previewSpan adapter is NOT here; it lands in Phase 2a. Each helper gets pure-Node unit tests. Leaves the tree working: helpers exist and are tested; no preview behavior has changed yet. + ** Phase 1 — Face registry -Build the derived face -> (owning app, value, attributes) map from the app/face assignment state, rebuilt on change. Unit-tested against a constructed assignment model. +Build the derived face -> (owning app, value, attributes) map over the two data-face surfaces (UIMAP for UI faces, PKGMAP/APPS for package faces), rebuilt on assignment change, via buildLocateRegistry. The syntax/code tier stays out (it has its own data-k -> flashAssign path). Unit-tested against a constructed assignment model. -** Phase 2 — Preview tagging + cross-pane resolution -Extend os() so each previewed element carries its face name and owning app, and resolves its color from the registry regardless of owning pane. Tested on a preview that references an off-pane face. +** Phase 2 — Preview tagging + cross-pane resolution + gates (four stoppable steps) +Each step leaves every existing preview green; the named gate proves it before the next. +- *2a* — add the =previewSpan(owner, face, text)= adapter in previews.js (reads globals, owns =esc=, calls the pure helpers) emitting =data-owner-app= + =data-face= + the =locate-onpane= flag, with package owners defaulting to the current app; =os= delegates to it. Gate: all existing package-preview gates pass unchanged. +- *2b* — update =assertPreviewFaces= to validate each element against its =data-owner-app='s face set (owner-qualified). Gate: existing same-app previews still pass under the new validator; a bad owner is rejected. +- *2c* — add =@ui= rendering support to =previewSpan= (the =uiCss=/=UIMAP= path). Gate: a UI face renders correctly off a package preview. +- *2d* — add the gate-only showcase fixture (browser-gates, no user-facing preview change): a host package-preview context with one package-owned off-pane span and one =@ui= off-pane span (=minibuffer-prompt=). Gate: each renders with its owner's real color, is hover-only (no =locate-onpane=, default cursor), and passes the owner-aware validator. ** Phase 3 — Hover -Tooltip showing section + element + value/attributes for the hovered element, from the registry. Browser-gate tested. +Populate each element's =title= with section + element + value/attributes per the tooltip contract, including the inherited/cleared/unassigned fallbacks. Browser-gate tested against the four source states. ** Phase 4 — Click -Flash + scroll/select the assignment row for a current-pane element; off-pane elements non-clickable (cursor/affordance reflects it). Browser-gate tested; one existing preview wired as the showcase. +On a =locate-onpane= element, scroll its assignment row into view and flash it (=flashRow=) in the owning pane's table; off-pane and unassigned elements have no handler and the default cursor. Browser-gate tested (on-pane flashes the right row, off-pane leaves rows unflashed, cursor/class split); one existing preview wired as the showcase. + +** Phase 5 — Locate-dispatch cleanup (post-feature) +With the gates green, replace the =buildPkgPreview= / =buildMockFrame= face-click branches with a single locate dispatcher that routes by owner. The =data-k= syntax-click path stays separate for v1. Leaves the tree working: behavior identical, one click path instead of two. * Acceptance criteria -- [ ] Hovering any previewed element shows its section, element, and current value/attributes. -- [ ] Clicking a current-pane element flashes it and scrolls/selects its assignment row. +- [ ] Hovering any data-face previewed element shows its section, element, effective value, and source note via =title=, and the preview-label info line updates immediately with "section > face — value". +- [ ] A cleared face shows its rendered default as the value with a "cleared, rendering as default" source note (value matches the pixels); inherited shows "inherited from X"; unassigned shows "unassigned". +- [ ] The registry is keyed by owner-qualified identity ={surface, owner, face}=; the same face name under two owners stays distinct. +- [ ] The registry is a single cached structure rebuilt once per assignment/import/reset batch (never per hover/span); =locate-onpane= is recomputed from the current view; the build meets the linear/no-DOM/ms-threshold budget. +- [ ] Clicking an on-pane element scrolls its assignment row into view and flashes it (=flashRow=); no persistent selection state is introduced. +- [ ] On-pane spans carry =cursor:pointer= + the =locate-onpane= class; off-pane and unassigned spans carry the default cursor and no handler. - [ ] Clicking an off-pane element does nothing and is visibly non-interactive. -- [ ] A preview can render an off-pane face's real color (cross-pane resolution) and that element is hover-only. -- [ ] The registry resolves every previewed face to its owning app, and updates when an assignment changes. +- [ ] The gate-only showcase fixture renders a package-owned off-pane span and a =@ui= (=minibuffer-prompt=) span, each in its owner's real color and hover-only, and the updated =assertPreviewFaces= passes them against their owner app while rejecting a bad owner. No user-facing preview changes in v1. +- [ ] The locate logic splits cleanly: the app-core.js helpers are pure (state passed in, no globals, return data), and only =previewSpan= in previews.js reads globals and emits escaped HTML. +- [ ] The registry resolves every data-face previewed face (UI + package surfaces) to its owning app, and rebuilds when an assignment changes. +- [ ] The deterministic test matrix (below) passes: pure-Node unit cases and the browser gates, with manual inspection limited to the subjective discoverability read. * Readiness dimensions -- Data model & ownership: the registry is derived, not authoritative; the app/face assignment state remains the source of truth. -- Errors, empty states & failure: an element whose face resolves to no owning app falls back to hover-only with an "unassigned" note rather than a dead click. +- Data model & ownership: the registry is derived, not authoritative; the assignment state (UIMAP, PKGMAP/APPS) remains the source of truth. It spans the two data-face surfaces; the syntax/code tier is out of v1 (served by its own data-k path). +- Errors, empty states & failure: an element whose face resolves to no owning app falls back to hover-only with an "unassigned" note rather than a dead click; cleared and inherited sources have their own fallback strings (see the tooltip contract) so no state renders as a blank or misleading value. - Security & privacy: N/A -- browser-local theme editor state. -- Observability: the tooltip is the surface; a face that fails to resolve shows that in the tooltip. +- Observability: the element's =title= is the surface; a face that fails to resolve shows "unassigned" there rather than silently. - Performance & scale: the registry is rebuilt on assignment change, not per hover; hover/click read it in O(1) by face name. - Reuse & lost opportunities: reuses the existing os() preview helper and the assignment state; every current and future preview benefits. -- Architecture fit & weak points: the registry is the new shared structure; the weak point is keeping it in sync with edits, addressed by rebuilding on change. +- Architecture fit & weak points: the registry and the locate logic live as pure helpers in app-core.js (buildLocateRegistry, locateFaceMeta, formatLocateTitle, owner-validation, isLocateOnPane) with DOM as thin adapters; the post-feature cleanup collapses the two face-click branches into one dispatcher. The weak point is registry/edit sync, addressed by rebuilding on change; pulling the logic into tested pure helpers is what keeps the state matrix determinate. - Config surface: none beyond the existing theme-studio build. - Documentation plan: the theme-studio test suite and this spec; a note in the tool's help if the interaction isn't self-evident. -- Dev tooling: existing make theme-studio-test and the browser-gate harness. +- Dev tooling: existing — run-tests.sh drives pure-Node =*.mjs= unit tests (auto-discovered) and the headless-Chrome hash gates; the deterministic matrix (see Testing strategy) lands as new cases in both, no new targets. - Rollout, compatibility & rollback: additive; rollback removes the tagging and the registry. Existing previews keep working without the interaction. - External APIs & deps: none -- browser JS plus the existing generate pipeline. +* Testing strategy +A deterministic matrix, no flaky dependencies. Pure-Node tests (=*.mjs=, auto-discovered by run-tests.sh) cover the logic; browser hash gates cover the DOM contract; manual inspection is limited to the subjective "discoverable enough?" read. + +Pure-Node unit cases (over app-core.js helpers): +- =buildLocateRegistry= construction over a constructed apps/PKGMAP/UIMAP model; rebuild after a simulated assignment edit (stale-registry prevention). +- *owner-qualified identity:* the same face name under two owners (a constructed collision) stays two entries, never collapsed. +- =locateFaceMeta(owner, face)= lookup: UI-owned, package-owned, missing owner, bad owner. +- =formatLocateTitle= for each source state with *effective value*: direct (fg only; fg+bg; non-default structural attributes), inherited-from-X (package via effResolve, UI via resolveUiAttr), fg-cleared and bg-cleared (value shows the rendered default, note says "cleared, rendering as default"), fully unassigned. +- owner-aware face validation =previewFaceAttrs(owner, face)=: an off-pane face validates against its owner; a bad owner is rejected. +- =isLocateOnPane(owner, currentApp)= true/false across UI-owned and package-owned faces under a given current pane. +- *lifecycle/perf:* registry build is linear in face count, does no DOM queries, and stays under the ms threshold over the full UI+package set. + +Browser gates: +- exact =title= strings for representative elements in each source state (effective value + source note). +- the preview-label info line updates on mouse hover with "section > face — value" (the immediate-wayfinding surface), =title= present as fallback. +- =data-owner-app= (owner key) and =locate-onpane= attributes present and correct on on-pane vs off-pane spans. +- a =@ui= face renders correctly when emitted from a package preview (cross-surface =previewSpan=). +- on-pane click flashes the correct assignment row (=flashRow=, no persistent selection); off-pane click leaves all rows unflashed. +- the cursor/class split (pointer + =locate-onpane= on-pane; default cursor, no handler off-pane). +- =assertPreviewFaces= accepts an intentional off-pane span and rejects a bad owner. +- the per-subphase regression gate: every existing package preview still passes after 2a and 2b. +- helper purity: the app-core.js helpers are called with state passed in (no globals) in the Node tests; only =previewSpan= touches globals. +- the gate-only showcase fixture: its package-owned off-pane span and its =@ui= (=minibuffer-prompt=) span each render with the owner's color, are hover-only, and pass the owner-aware validator. + * Risks, Rabbit Holes, and Drawbacks - Registry staleness: a stale registry mislabels a hover. Dodge: rebuild on assignment change; derive, never cache independently. - Tooltip noise on dense previews: hovering everything could be busy. Dodge: tooltip on deliberate hover only, not a persistent overlay. @@ -146,3 +290,62 @@ Flash + scroll/select the assignment row for a current-pane element; off-pane el - What: initial draft. - Why: breaking the org-agenda faces into their own pane surfaced the need to go from a rendered preview element back to its assignment; the interaction (hover-identifies, click-locates-on-pane, off-pane-inert) is general and worth settling before the agenda preview consumes it. - Artifacts: the org-faces app (the bespoke-app pattern); the planned org-agenda app (first consumer); os() preview helper. + +** 2026-06-23 Tue @ 12:26:40 -0400 — Codex (emacs-d) — reviewer +- *What changed or was recommended:* Assigned =Not ready=. Added three review findings: the registry scope must cover the actual syntax/UI/package preview tiers or narrow v1, the cross-pane =os()= / data-attribute / browser-gate contract must be specified before the org-agenda showcase can use off-pane faces, and tooltip value formatting needs a minimal testable display contract. +- *Why:* The current implementation has separate =data-k= syntax assignments, =UIMAP= UI faces, and =PKGMAP/APPS= package faces, while the draft registry only named app/face assignments. The existing package preview click path and =assertPreviewFaces= gate also assume preview faces belong to the current app, which conflicts with the planned cross-pane org-agenda preview. +- *Artifacts:* =* Review findings= section in this spec; code read: =scripts/theme-studio/previews.js= (=os= / package preview spans), =scripts/theme-studio/app.js= (=flashAssign=, =flashUi=, =flashPkg=, =buildPkgPreview=), =scripts/theme-studio/browser-gates.js= (=assertPreviewFaces=, =#hovertest=). + +** 2026-06-23 Tue @ 12:32:08 -0400 — Claude Code (emacs-d) — responder +- *What changed:* Folded all three findings ([3/3] DONE). Modified finding 1: scoped the v1 registry to the two data-face surfaces (UI mock via UIMAP, package previews via PKGMAP/APPS) and left the syntax/code =data-k= tier out (it already locates via =cp.onclick= -> =flashAssign=); "every previewed element" -> "every data-face previewed element". Accepted finding 2 as written: =os= now takes the owning app and emits =data-owner-app= + =data-face= + a =locate-onpane= flag, click routes by owner, =assertPreviewFaces= validates each element against its owner app. Accepted finding 3 via the reviewer's escape hatch: v1 reuses the =title= attribute with a fixed field contract and four source-state fallbacks (assigned / inherited / cleared / unassigned) — added as a "Tooltip contract (v1)" subsection. +- *Why:* The findings were code-verified and correct. The finding-1 narrowing keeps v1 bounded while still covering both known consumers — the org-agenda showcase (package faces) and the planned minibuffer-completion preview (UI faces minibuffer-prompt + highlight, which is why the UI surface must be in v1). No new unproven obligations introduced; the response only specified existing ambiguity, so the readiness rubric rises rather than regresses. +- *Cross-spec:* enables [[file:theme-studio-completion-preview-spec.org][theme-studio-completion-preview-spec.org]] — its off-entry discoverability (minibuffer-prompt, highlight in UI Faces) depends on the UI surface being in this registry, now guaranteed by the finding-1 scope. +- *Artifacts:* findings [3/3] in this spec; code reads previews.js:6, app.js:565-570/644, browser-gates.js:41-47. + +** 2026-06-23 Tue @ 12:40:48 -0400 — Codex (emacs-d) — reviewer +- *What changed or was recommended:* Second review, =Not ready= again. Added three findings: the test plan needs an explicit deterministic state matrix (pure-Node + browser gates) rather than one happy-path showcase; the feature needs pre-work pure-helper extraction and a post-feature locate-dispatch cleanup to avoid coupling growth; and the preview affordance / "select row" semantics need exact UX and gate assertions. +- *Why:* The first response resolved scope and tooltip, but the plan still under-specified corner-case proof and would have grown coupling by wiring owner-aware behavior straight into the global/DOM paths. +- *Artifacts:* findings 4-6; code read: run-tests.sh, app-core.js, app.js (buildMockFrame, buildPkgPreview, flashRow), browser-gates.js, styles.css. + +** 2026-06-23 Tue @ 12:40:48 -0400 — Claude Code (emacs-d) — responder +- *What changed:* Folded findings 4-6 ([6/6] DONE). Accepted the deterministic matrix as written — added a "* Testing strategy" section splitting pure-Node unit cases (registry build/rebuild, owner lookup, title formatting incl. four fallbacks, owner-aware validation, isLocateOnPane) from browser gates (exact titles, data-owner-app/locate-onpane attrs, on-pane flash, off-pane unflashed, cursor/class split, gate accepts off-pane + rejects bad owner). Accepted the refactor with bracketing phases — Phase 0 pure-helper extraction in app-core.js before the feature, Phase 5 locate-dispatch cleanup after, recorded as an "Implementation shape" note. Accepted the affordance finding and resolved "select row" -> flashRow only (scroll + flash, no persistent selection), with the on-pane pointer + locate-onpane class vs off-pane default-cursor split defined. +- *Why:* All three were code-verified and match the project's deterministic-test and pure-helper-extraction discipline (app-core.js is already the pure-helper home; flashRow already does scroll + flash with no selection state). The refactor phases keep the tree working at each step; the matrix makes the corner cases provable rather than eyeballed. +- *Cross-spec:* unchanged — the completion preview still rides the UI surface in the registry. +- *Artifacts:* findings [6/6]; code reads app-core.js (pure-helper home), app.js:560 (flashRow), run-tests.sh (*.mjs + hash gates), styles.css (cursor rules). + +** 2026-06-23 Tue @ 12:50:38 -0400 — Codex (emacs-d) — reviewer +- *What changed or was recommended:* Third review, =Not ready=. Six findings (7-12, five blocking): the registry must be keyed by owner-qualified identity rather than face name (collision risk); the cross-surface render API must be defined because =os= can't style a UI face; the tooltip must show effective value plus a source note instead of conflating raw/effective; Phase 2 must split into stoppable steps since =os= is used by every preview; native-title wayfinding is weak; and the registry needs a defined cache lifecycle + perf budget. +- *Why:* The prior responses settled scope, tooltip fields, test matrix, and refactor home, but the keying/API/lifecycle contracts were still implicit and would have surfaced as correctness or coupling bugs mid-build. +- *Artifacts:* findings 7-12; code read: previews.js (=os=/=ofs=), app.js (=effFg=/=effBg=/=uiCss=/=resolveUiAttr=/=uiSelect=/=pkgChanged=/=resetApp=), app-core.js (=effResolve=), browser-gates.js. + +** 2026-06-23 Tue @ 12:50:38 -0400 — Claude Code (emacs-d) — responder +- *What changed:* Folded findings 7-12 ([12/12] DONE). Five accepted as written, one modified. Registry re-keyed to owner-qualified ={surface, owner, face}= with all lookups taking =(owner, face)= (finding 7). Defined =previewSpan(owner, face, text)= dispatching by surface, with =os= as a backward-compatible package wrapper (finding 8). Rewrote the tooltip contract: =value= is always the effective rendered color, with a per-attribute source note (direct / inherited-from-X / default / cleared-rendering-as-default), superseding the round-1 four-fallback contract (finding 9). Split Phase 2 into 2a-2d, each naming the regression gate that keeps existing previews green (finding 10). Added a "Registry lifecycle" subsection — one cached registry, rebuilt per assignment/import/reset batch, =locate-onpane= recomputed at render, linear/no-DOM/ms-budget gate (finding 12). *Modified* the wayfinding finding: took the immediate mouse-hover info line in the preview-label area (with =title= as fallback) for v1, deferred keyboard-focus traversal to vNext as a recorded accessibility caveat (finding 11). +- *Why:* All six were code-verified. The keying and API findings were genuine correctness gaps (face-name collisions; =os= structurally can't reach =UIMAP=); the tooltip and lifecycle findings prevent the tooltip lying about cleared/inherited values and the registry going stale or rebuilding per span. The phase split keeps =os=, used by every preview, from breaking mid-change. The one modify keeps v1 from growing a keyboard-navigation model while still fixing the delayed-title complaint for the mouse path. +- *Cross-spec:* unchanged — the completion preview still rides the UI surface, now via owner =@ui= in the registry. +- *Artifacts:* findings [12/12]; code reads previews.js:5-6 (=ofs=/=os=), app.js:43-44/572/649/699/752/793, app-core.js:120/155 (=effResolve=/=resolveUiAttr=). + +** 2026-06-23 Tue @ 13:00:43 -0400 — Codex (emacs-d) — reviewer +- *What changed or was recommended:* Fourth review, =Not ready=. Two implementation-readiness blockers: the pure/stateful boundary is inconsistent (=previewSpan= listed as pure but needs globals; =buildLocateRegistry= needs =MAP= for the ground floors) — name exact signatures and which layer owns HTML escaping; and the showcase is never named (org-agenda unbuilt, completion preview separate) — pick a concrete showcase or allow a gate-only fixture. +- *Why:* Two implementers could split the stateful renderer differently, and the unnamed showcase left Phase 2d's deliverable undefined. +- *Artifacts:* findings 13-14; code read: previews.js (=os=/=ofs= globals), app.js (=effFg=/=effBg=/=MAP=), app-core.js (pure-helper home). + +** 2026-06-23 Tue @ 13:00:43 -0400 — Claude Code (emacs-d) — responder +- *What changed:* Folded findings 13-14 ([14/14] DONE), both accepted with concrete resolutions. Boundary (13): split explicitly — pure helpers in app-core.js take all state as args and return data (=buildLocateRegistry(apps, pkgmap, uimap, map)=, =locateFaceMeta=, =formatLocateTitle=, =previewFaceAttrs=, =isLocateOnPane=); the one stateful piece =previewSpan(owner, face, text)= lives in previews.js, reads globals, owns =esc= HTML escaping, and lands in Phase 2a (removed from the Phase 0 pure set). Showcase (14): chose a gate-only synthetic fixture (one package-owned off-pane span + one =@ui= =minibuffer-prompt= span) with no user-facing preview change — neither natural consumer (org-agenda, completion) exists in v1, and grafting demo spans onto a real preview would mislead; the first real cross-owner preview becomes the organic showcase later. Phase 2d, scope tiers, testing strategy, and acceptance now name the fixture and the boundary. +- *Why:* Both were genuine readiness gaps — the boundary ambiguity would have produced divergent test seams, and the unnamed showcase left a phase deliverable undefined. The gate-only fixture keeps v1 from mutating user-facing previews or blocking on an unbuilt consumer. +- *Cross-spec:* unchanged — completion preview still rides =@ui= in the registry; it (or org-agenda) is the eventual organic showcase. +- *Artifacts:* findings [14/14]; code reads previews.js (=os=/=esc=), app.js:43-44 (=effFg=/=effBg= over =MAP=), app-core.js (pure-helper home). + +** 2026-06-23 Tue @ 12:37:50 -0400 — Codex (emacs-d) — reviewer +- *What changed or was recommended:* Assigned =Not ready= again. Added three findings focused on Craig's second-review questions: the test strategy needs an explicit deterministic matrix rather than one happy-path browser showcase; the feature needs pre-work pure-helper refactoring and a post-feature locate-dispatch cleanup to avoid coupling growth; and the preview affordance / "select row" semantics need exact UX and gate assertions. +- *Why:* The response resolved the first review's scope and tooltip blockers, but the remaining plan still under-specifies how to prove all corner cases. Current code routes preview ownership through globals, separate click handlers, and a current-app-only gate helper; adding cross-pane behavior without pure helper boundaries would make correctness depend on DOM side effects and manual inspection. +- *Artifacts:* new =* Review findings= entries; code read: =scripts/theme-studio/run-tests.sh= (auto-discovered browser gates), =scripts/theme-studio/app-core.js= (pure helper home), =scripts/theme-studio/app.js= (=buildMockFrame=, =buildPkgPreview=, =flashRow=), =scripts/theme-studio/browser-gates.js= (=assertPreviewFaces=), =scripts/theme-studio/styles.css= (existing preview cursor rules). + +** 2026-06-23 Tue @ 12:47:12 -0400 — Codex (emacs-d) — reviewer +- *What changed or was recommended:* Third review, =Not ready=. Added six findings after walking the implementation path as if starting the work: owner-qualified registry identity is required; cross-surface preview rendering needs an explicit API for UI-owned faces; tooltip source-state semantics must distinguish raw assignment from effective rendered value; Phase 2 must be split into stoppable substeps; native-title-only hover may not carry enough wayfinding; and the registry cache lifecycle/performance budget must be specified. +- *Why:* The spec looked ready after the prior response, but implementation would still force decisions in code. The current =os= path only knows package faces via =PKGMAP=, UI preview rendering uses =UIMAP= / =uiCss= separately, and the proposed face-name registry would not be robust once owner-qualified cross-pane spans are real. +- *Artifacts:* new =* Review findings= entries; code read: =scripts/theme-studio/previews.js= (=os= / =ofs=), =scripts/theme-studio/app.js= (=buildMockFrame=, =buildPkgPreview=, =uiCss=, =pkgChanged=, =uiSelect=, =importState=), =scripts/theme-studio/app-core.js= (=effResolve=, =resolveUiAttr=, pure-helper pattern), =scripts/theme-studio/face_data.py= and =package-inventory.json= (face-name namespace shape). + +** 2026-06-23 Tue @ 12:57:00 -0400 — Codex (emacs-d) — reviewer +- *What changed or was recommended:* Fourth review, =Not ready=. Added two blocking findings: the pure-helper/refactor boundary is still internally inconsistent because the listed helper signatures omit the state needed to compute effective rendered values and =previewSpan= is stateful despite being listed under pure helpers; and the showcase preview remains unnamed, leaving implementers to choose between mutating an existing preview, waiting for another feature, or creating a synthetic fixture. +- *Why:* The prior response closed the owner identity, API, tooltip, phase, UX, and lifecycle blockers, but these two gaps still affect architecture and phase execution. They would make Phase 0 and Phase 2d diverge by implementer preference rather than by a settled spec decision. +- *Artifacts:* findings 13-14; code read: =scripts/theme-studio/previews.js= (=ofs= / =os= use =PKGMAP= and =MAP= globals), =scripts/theme-studio/app.js= (=effFg= / =effBg= / =uiCss= / render rebuild points), =scripts/theme-studio/app-core.js= (=faceCss=, =effResolve=, =resolveUiAttr= pure-helper pattern). |
