diff options
| author | Craig Jennings <c@cjennings.net> | 2026-05-17 01:16:03 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-05-17 01:16:03 -0500 |
| commit | 2b61aa82afa39a0ff1b165fa9ff09d55d21bfabf (patch) | |
| tree | 0ff4916e842caff081bfa2a70d8753dc6fa21044 /docs | |
| parent | 1fef9aed248aaf7586950475aa22849e413f1c04 (diff) | |
| download | dotemacs-2b61aa82afa39a0ff1b165fa9ff09d55d21bfabf.tar.gz dotemacs-2b61aa82afa39a0ff1b165fa9ff09d55d21bfabf.zip | |
docs(design): MCP-into-gptel + gh-as-gptel-tool specs + MCP phases
Two new design docs in docs/design/ covering the next two GPTel
work items, plus matching task scaffolding in todo.org.
mcp-el-gptel-integration.org wires mcp.el into the config so GPTel
gets access to the nine MCP servers Claude Code already uses
(linear, notion, figma, slack-deepsat, drawio, google-calendar,
google-docs-personal, google-docs-work, google-keep). The design
covers async startup, the write-confirmation policy, a
server-enablement defcustom, a doctor with live-auth-check, the
audit buffer, and the mcp.el compatibility layer. The spec is at
revision 3 after two code-review passes flagged a critical
confirmation gap (gptel-confirm-tool-calls nil at ai-config.el:386
silently ignored per-tool :confirm slots) and several incorrect
mcp.el API assumptions. Both are addressed.
gptel-gh-tool.org wraps the gh CLI as a hybrid surface: 14 typed
read wrappers plus one general write tool gated by :confirm t.
Host/repo resolution is command-aware: --repo HOST/OWNER/REPO for
repo commands, --hostname only for api and auth status. The runner
enforces an irreversible-command blocklist, a 64KB in-flight output
cap, and a debug-record plus last-error-buffer story. The spec is
at revision 2 after a code-review pass corrected gh flag
assumptions and reframed the safety story around per-tool confirm.
todo.org gains a link to the MCP spec under the parent task plus
nine TODO sub-tasks (one per implementation phase), and a new
gh-tool TODO with the same spec-link shape.
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/design/gptel-gh-tool.org | 1061 | ||||
| -rw-r--r-- | docs/design/mcp-el-gptel-integration.org | 1434 |
2 files changed, 2495 insertions, 0 deletions
diff --git a/docs/design/gptel-gh-tool.org b/docs/design/gptel-gh-tool.org new file mode 100644 index 00000000..a9ba22bb --- /dev/null +++ b/docs/design/gptel-gh-tool.org @@ -0,0 +1,1061 @@ +#+TITLE: Design: Wrap the gh CLI as a GPTel tool +#+AUTHOR: Craig Jennings +#+DATE: 2026-05-16 +#+OPTIONS: toc:nil num:nil + +* Status + +Draft (revision 2). Pre-implementation; no code shipped yet. =gh= +v2.92.0 installed and authenticated against both =github.com= (account +=cjennings=) and =deepsat.ghe.com= (account =craig-jennings=). + +Revision 2 incorporates a code-review pass that caught several +incorrect =gh= CLI assumptions and a critical safety gap: with +=gptel-confirm-tool-calls= set to =nil= in =modules/ai-config.el:386=, +the "irreversible-only blocklist" V1 from revision 1 would let the +agent run unconfirmed writes (=pr merge=, =run cancel=, =api -f +body=...=, =release create=, etc.). Revision 2 keeps Craig's intent +of a general-capability tool but applies =:confirm t= to every +classified write, drops the "scan command string for =--hostname= +and =-H=" approach in favor of argv-list builders, and expands the +wrapper set so most agent workflows don't need to fall back to the +general tool. + +* Problem + +GPTel agents can read git history locally (=git_log=, =git_diff=, +=git_status=) but have no access to the GitHub side of the workflow: +PRs, issues, reviews, CI runs, releases, gists, repo metadata. The +=gh= CLI does all of this and is already authenticated against both +hosts the user works with. Wrapping it as a GPTel tool gives the +agent the same GitHub surface the user has. + +The wrapper has to handle four complications the local git tools +don't: + +1. *Two authenticated hosts* -- personal (=github.com=) and work + GHE (=deepsat.ghe.com=) -- with different policies and different + blast radii for destructive operations. +2. *A vast subcommand surface* (~30 top-level subcommands, hundreds + of leaves) that doesn't fit cleanly into one typed schema per + leaf. +3. *Inconsistent flag semantics across subcommands.* =--hostname= + exists only on =gh api= and =gh auth status=; the rest use + =--repo [HOST/]OWNER/REPO=. =-H= means =--head= on =gh pr list= + but =--header= on =gh api=. Naive string-scanning to detect + user intent is wrong. +4. *Global GPTel setting* (=gptel-confirm-tool-calls nil=) means + the agent can call any registered tool without user confirmation + unless that specific tool is registered with =:confirm t=. V1 + safety must come from per-tool flags, not from a session-level + gate. + +* Goals + +1. The agent can invoke the non-interactive subset of =gh= against + either host, gated by a safety policy. +2. High-frequency reads have typed wrappers with sensible defaults + (host/repo resolution, JSON-by-default with minimal fields, + output cap, timeout). Wrappers auto-execute. +3. The active host is resolved from a single context object that + considers (in order): explicit =hostname=/=repo= args, branch + upstream remote, =origin= remote, =cj/gh-default-host=. Every + tool response prefixes the resolved host/repo so cross-host + mistakes are visible. +4. Writes via the general tool execute only after user + confirmation (=:confirm t= on the gptel tool registration). + Unknown classifications fail closed (=:confirm t=). + Irreversible commands hard-block. +5. Interactive commands (=auth login=, =codespace ssh=, =--web=, + =browse=, =pr checkout=, =repo clone=, =config set=, + =alias set=, =extension exec=) are hard-blocked in V1. +6. File exfiltration paths (=api --input=, =api -F key=@file=, + =release upload=, =gist create=) are hard-blocked in V1. +7. Output is capped *during capture*, not after, so a runaway + =--paginate= or =--log= can't fill memory or block Emacs. + Long-output commands have higher timeouts but the same byte + cap. +8. Every call produces a structured debug record (host, repo, cwd, + sanitized argv, classification, policy decision, exit code, + duration, bytes captured, truncation flag, error kind) inspectable + via =cj/gh-tool-last-error=. +9. =cj/gh-doctor= diagnoses missing prerequisites (=gh= + executable, version floor, auth per host, cwd repo detection, + environment overrides) before they fail at runtime. +10. V2 adds true per-host policy (currently uniform across both + hosts), profile-based tool subsets, and bridge helpers between + local git tools and the gh wrappers. + +* Non-Goals + +- *Interactive subcommands.* =gh auth login=, =gh codespace ssh=, + =gh pr checkout= (modifies working tree), =gh repo clone= + (writes outside controlled paths) are not in scope. The user + runs those in a real terminal. +- *Replicating gh extensions.* Core gh only. Extensions can be + invoked from a regular terminal if needed. +- *Streaming long-running output.* =gh run watch=, =gh + --paginate= for unbounded result sets, =gh repo clone= for large + repos. V1 blocks =--paginate= and =watch= verbs; future + versions may add streaming support. +- *Per-host write policies* (work-read-only / personal-full-write). + V1 applies the same write-confirmation policy to both hosts. + V2 makes the policy host-keyed. +- *Conversation-context injection* and *local/remote bridge + helpers* (current branch → PR, changed files → PR comments). + Future enhancements; tracked as a follow-up. +- *Artifact download* (=gh run download=). Writes files; needs + its own confirmation flow. Deferred to V2. + +* Verified gh CLI Contracts + +These were checked against =gh --version 2.92.0= help output before +the spec was revised. Implementation can rely on them. + +** Host / repo selection per subcommand + +- =gh api= and =gh auth status=: use =--hostname HOST= (long form + only; no short form). +- =gh pr {view,list,diff,checks}=, =gh issue {view,list}=, + =gh run {view,list}=, =gh release *=, =gh search *=: use + =--repo [HOST/]OWNER/REPO= (short =-R=). +- =gh repo view= and similar root-level commands: infer from cwd + remote; no =--hostname= or =--repo= flag. +- Environment variables: =GH_HOST= and =GH_REPO= work for commands + that lack explicit flags. + +** Flags with overloaded short forms + +- =gh pr list -H FOO= → =--head FOO= (branch). +- =gh search prs -H FOO= → =--head FOO= (branch). +- =gh api -H KEY:VAL= → =--header KEY:VAL=. + +Substring scanning =-H= as "hostname" is wrong on every command +that uses it. + +** =gh api= method classification + +The help text explicitly says: "adding request parameters will +automatically switch the method to POST". This means: + +- =gh api repos/x/y/issues= → GET (read). +- =gh api repos/x/y/issues -f title=Foo= → POST (write). +- =gh api repos/x/y/issues -F body=@file= → POST (write + + exfiltration). +- =gh api --method GET repos/x/y/issues -f q=x= → GET (read, + explicit override). + +Classifier must inspect for =-f=, =-F=, =--field=, =--raw-field=, +=--input=, and =--method= before defaulting to GET. + +** Wrappers without =--json= support + +- =gh pr diff= produces patch output; no =--json= flag. Wrapper + is text-only. Structured file metadata available via + =gh pr view --json files= (new wrapper =gh_pr_files=). + +** Environment variables for non-interactive use + +All confirmed via =gh help environment=: + +- =GH_PROMPT_DISABLED= -- disable interactive prompting. +- =GH_PAGER= / =PAGER= -- set to =cat= so no pager fires. +- =GH_NO_UPDATE_NOTIFIER= -- silence the "new version" banner. +- =GH_NO_EXTENSION_UPDATE_NOTIFIER= -- same for extensions. +- =GH_SPINNER_DISABLED= -- silence the progress spinner. +- =NO_COLOR= -- disable ANSI color (cleaner LLM context). + +V1 subprocess env always includes all of the above. + +** Authentication + +#+begin_example +✓ Logged in to github.com account cjennings (keyring) +✓ Logged in to deepsat.ghe.com account craig-jennings (keyring) +#+end_example + +Both stored in the system keyring. =gh= reads the keyring per +call; the wrapper does not handle tokens. Expired auth surfaces +as exit code 4 with a clear stderr message; recovery is interactive +(=gh auth login --hostname HOST= in a terminal). + +* Current State + +** =modules/ai-config.el= + +- =gptel-confirm-tool-calls nil= at line 386. Per-tool + =:confirm t= is the only confirmation mechanism in V1. +- =cj/gptel-load-local-tools= (lines 71-96) loads tools from + =gptel-tools/=. Add new gh feature names to + =cj/gptel-local-tool-features=. + +** =gptel-tools/= + +Ten tools today. =git_log.el= is the closest analogue: validates +input, runs subprocess with =process-file=, caps output, signals +clear errors. The gh wrappers follow the same shape with shared +helpers in =gh-common.el=. + +* Design + +** File layout + +Two files plus per-wrapper registrations. Wrappers are +deliberately small (argv builders + JSON-field defaults) so they +all live in one file: + +- =gptel-tools/gh-common.el= -- shared helpers: host/repo context, + argv runner, subprocess env, classifier, blocklist, redaction, + debug record, last-error buffer. No tool registrations. +- =gptel-tools/gh.el= -- the general tool + every wrapper + registration. Pulls in =gh-common= via =require=. + +This matches Craig's preference for low loader surface (per the +MCP revision discussion). If =gh.el= grows past ~600 lines we +split per-resource later (=gh-pr.el=, =gh-issue.el=, etc.). + +** Host / repo resolution + +A single helper returns a structured context object used by every +tool entry point: + +#+begin_src emacs-lisp +(defun cj/gh--resolve-context (cwd hostname repo) + "Return a context plist for a gh invocation. +Resolution order: +1. Explicit REPO argument (`[HOST/]OWNER/REPO'). If it contains + `HOST/', split off the host. +2. Explicit HOSTNAME argument. +3. CWD's branch upstream remote URL (`git rev-parse --abbrev-ref + @{u}' then `git remote get-url REMOTE'). +4. CWD's `origin' remote URL. +5. `cj/gh-default-host'. + +Returns: + (:host HOST :owner OWNER :repo REPO :source SYM) +where SOURCE is one of: explicit-repo, explicit-host, upstream, +origin, default." + ...) +#+end_src + +URL parsing handles all three forms produced by git: + +- =git@HOST:OWNER/REPO.git= +- =ssh://git@HOST/OWNER/REPO.git= +- =https://HOST/OWNER/REPO.git= + +Returns =nil host= if the URL doesn't match a known host; the +caller uses =cj/gh-default-host= in that case. + +#+begin_src emacs-lisp +(defcustom cj/gh-default-host "github.com" + "Host used when no other resolution path yields one." + :type 'string + :group 'cj) + +(defcustom cj/gh-known-hosts '("github.com" "deepsat.ghe.com") + "Hosts the gh CLI is authenticated against." + :type '(repeat string) + :group 'cj) +#+end_src + +Context appears in every tool response as a one-line prefix: + +#+begin_example +[gh github.com/cjennings/dotemacs read ok 12.4KB] ... +[gh deepsat.ghe.com/org/repo write confirmed 0.8KB] ... +[gh github.com api unknown blocked] ... +#+end_example + +This makes cross-host accidents visible. + +** Argv-list builders, never command strings + +Every tool builds an explicit argv list and hands it to a shared +runner. Wrappers expose typed parameters that map directly into +argv positions: + +#+begin_src emacs-lisp +(defun cj/gh-pr-view--build-argv (context number include-comments fields) + "Return an argv list for `gh pr view NUMBER ...'." + (let ((repo-arg (format "%s/%s/%s" + (plist-get context :host) + (plist-get context :owner) + (plist-get context :repo)))) + (append (list "pr" "view" (number-to-string number) + "--repo" repo-arg) + (when include-comments '("--comments")) + (when fields (list "--json" fields))))) +#+end_src + +The general tool takes an =args= parameter typed as array-of-string +(not a shell string) so the agent constructs argv directly: + +#+begin_src emacs-lisp +;; agent passes: ["pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo"] +;; runner runs: gh pr view 171 --repo deepsat.ghe.com/org/repo +#+end_src + +If the agent accidentally includes leading =gh= in args[0], the +runner strips it and logs the normalization. + +** Subprocess contract + +One runner for everything: + +#+begin_src emacs-lisp +(defun cj/gh--run (argv &key timeout context) + "Run gh with ARGV (a list of strings). +Returns a plist: + (:exit-code N :stdout STR :stderr STR :truncated BOOL :duration-ms N + :argv ARGV :context CONTEXT) +Enforces TIMEOUT (default `cj/gh--default-timeout'). +Captures output in-flight, killing the process at +`cj/gh--max-bytes' to prevent runaway memory use." + ...) +#+end_src + +Contract: + +- *Env vars set every call:* =GH_PROMPT_DISABLED=1=, =GH_PAGER=cat=, + =PAGER=cat=, =NO_COLOR=1=, =GH_NO_UPDATE_NOTIFIER=1=, + =GH_NO_EXTENSION_UPDATE_NOTIFIER=1=, =GH_SPINNER_DISABLED=1=. +- *Timeout:* default 20 s for reads, 60 s for diffs/logs/search. + Per-tool override allowed but capped at 120 s. Timeout kills + the process and returns =:error-kind 'timeout=. +- *Output cap during capture:* a process filter accumulates bytes + up to =cj/gh--max-bytes= (64 KB). At the cap, the filter sets + =truncated=, ignores further output, and sends SIGTERM after a + short grace. Truncation marker appended to returned stdout. +- *TRAMP rejection:* if =(file-remote-p default-directory)= is + non-nil, return =:error-kind 'remote-cwd= without invoking gh. +- *Executable check:* if =(executable-find cj/gh--executable)= is + nil, return =:error-kind 'no-executable= with install + instructions. +- *Version check:* the first call per session verifies + =gh --version= meets =cj/gh--min-version= (default "2.50.0"); + cached for the session. + +** Read / write / destructive classifier + +Used to apply =:confirm t= and the irreversible blocklist: + +#+begin_src emacs-lisp +(defconst cj/gh--read-verbs + '("view" "list" "status" "search" "diff" "checks" "describe" + "show" "logs" "auth-status")) + +(defconst cj/gh--write-verbs + '("create" "edit" "merge" "close" "reopen" "comment" "review" + "upload" "set" "add" "remove" "rerun" "cancel" "delete" + "fork" "archive" "unarchive" "lock" "unlock" "pin" "unpin" + "ready" "draft" "rename" "transfer" "approve" "label" "assign")) + +(defconst cj/gh--blocked-verbs + '(;; interactive / opens UI + "login" "logout" "checkout" "clone" "ssh" "code" "edit-prompt" + ;; modifies user config + "alias" "config" "extension" + ;; opens browser + "browse")) + +(defconst cj/gh--blocked-flags + '("--web" ; many commands + "--paginate" ; can produce unbounded output + "--input" ; gh api file upload + "--editor")) ; opens editor + +(defconst cj/gh--irreversible-patterns + '("\\`repo delete\\b" + "\\`release delete\\b" + "\\`secret delete\\b" + "\\`ssh-key delete\\b" + "\\`gpg-key delete\\b" + "\\`org delete\\b" + "\\`project delete\\b" + "\\`variable delete\\b" + "\\`ruleset delete\\b" + "\\`label delete\\b.*--yes")) +#+end_src + +Classifier: + +#+begin_src emacs-lisp +(defun cj/gh--classify (argv) + "Return one of: read, write, destructive, blocked, unknown." + (let* ((stripped (cj/gh--strip-flags argv)) + (resource (car stripped)) + (verb (cadr stripped))) + (cond + ;; Hard blocks first. + ((cj/gh--has-blocked-verb-p argv) 'blocked) + ((cj/gh--has-blocked-flag-p argv) 'blocked) + ((cj/gh--has-file-arg-p argv) 'blocked) ; -F key=@file + ((cj/gh--matches-irreversible-p argv) 'destructive) + ;; gh api is special. + ((string= resource "api") (cj/gh--classify-api argv)) + ;; Verb match. + ((member verb cj/gh--read-verbs) 'read) + ((member verb cj/gh--write-verbs) 'write) + (t 'unknown)))) + +(defun cj/gh--classify-api (argv) + "Classify a `gh api ...' invocation. +Reads: explicit method GET/HEAD with no -f/-F/--input. +Writes: any -f/-F/--field/--raw-field/--input, OR explicit +non-GET/HEAD method. +Default (no method, no field): read (matches gh's GET default)." + (let* ((explicit-method (or (cadr (member "-X" argv)) + (cadr (member "--method" argv)))) + (has-field (cl-some + (lambda (f) (cl-some (lambda (a) (string-prefix-p f a)) + argv)) + '("-f" "--raw-field" "-F" "--field"))) + (has-input (member "--input" argv))) + (cond + (has-input 'blocked) ; file exfiltration + ((and explicit-method + (not (member (upcase explicit-method) '("GET" "HEAD")))) + 'write) + (has-field 'write) ; -f/-F auto-promotes to POST + ((and explicit-method + (member (upcase explicit-method) '("GET" "HEAD"))) + 'read) + (t 'read)))) +#+end_src + +Classifier-driven policy table: + +| Classification | Policy | +|----------------+--------| +| =read= | auto-execute | +| =write= | =:confirm t= on registration; agent's call shows in confirm prompt | +| =destructive= | hard-block; return =:error-kind 'irreversible-blocked= | +| =blocked= | hard-block; return =:error-kind 'policy-blocked= with reason | +| =unknown= | =:confirm t= (fail closed) | + +** Safety policy + +V1 uniform policy applied to both hosts: + +#+begin_src emacs-lisp +(defcustom cj/gh-policy + '((read . auto) + (write . confirm) + (destructive . block) + (blocked . block) + (unknown . confirm)) + "Per-classification policy. V1 applies uniformly to both hosts." + :type '(alist :key-type symbol :value-type symbol) + :group 'cj) +#+end_src + +Since GPTel's =:confirm t= flag is per-tool-registration (not +per-call), the general tool is registered with =:confirm t= +always. The wrappers are registered per their classification: + +| Tool | Registered with | +|------+-----------------| +| Read wrappers (=gh_pr_view=, etc.) | =:confirm nil= | +| =gh= general tool | =:confirm t= | + +The general tool then applies the policy table at invocation +time: reads execute without further prompting (already past +GPTel's confirm because :confirm t fires once); writes show +detail before invocation; destructive/blocked never reach gh. + +** General tool: =gh= + +Single tool registered with =:confirm t= covering everything the +wrappers don't. Schema: + +| Arg | Type | Required | Purpose | +|-----+------+----------+---------| +| =args= | array of string | yes | argv list, e.g. =["pr", "view", "171", "--repo", "deepsat.ghe.com/org/repo"]= | +| =hostname= | string | no | Override host for commands that accept =--hostname= (=api=, =auth status=); ignored otherwise | +| =repo= | string | no | =[HOST/]OWNER/REPO= for repo-scoped commands; if HOST is present in repo, hostname arg is overridden | +| =cwd= | string | no | Working directory; defaults to current buffer; must be under =$HOME=, must not be TRAMP | +| =timeout= | integer | no | Seconds before kill; default 20, max 120 | + +Description (registered) explicitly says: + +#+begin_example +Use this only when no task-specific gh_* tool fits. Prefer +gh_pr_view, gh_pr_list, gh_pr_checks, gh_issue_view, gh_issue_list, +gh_run_view, gh_run_list, gh_run_logs_failed, gh_repo_view, +gh_search_prs, gh_search_issues, gh_api_get. + +Writes (create/edit/merge/etc.) require user confirmation. +Destructive (repo/release/secret delete) are hard-blocked. +Interactive commands (auth login, codespace ssh, --web, browse) +are hard-blocked. File uploads (api --input, -F @file, release +upload, gist create) are hard-blocked. +#+end_example + +** Wrapper inventory + +Twelve wrappers grouped by resource. Each has typed args, JSON +field defaults, output truncation, timeout override, and a +description that names its scope. + +| Tool | gh command | Defaults | Timeout | +|------+------------+----------+---------| +| =gh_repo_view= | =repo view --json= | JSON fields: =name,nameWithOwner,description,defaultBranchRef,url,visibility= | 20s | +| =gh_pr_view= | =pr view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | +| =gh_pr_list= | =pr list --json= | JSON fields: =number,title,state,author,createdAt,headRefName=; default =--limit 30= (capped at 100) | 20s | +| =gh_pr_diff= | =pr diff N --color never= | text only; capped at 64 KB | 60s | +| =gh_pr_checks= | =pr checks N --json= | JSON fields: =name,status,conclusion,startedAt,completedAt,link= | 20s | +| =gh_pr_files= | =pr view N --json files= | JSON fields: =files= (path, additions, deletions, mode) | 20s | +| =gh_pr_current= | =pr view --json= (no number — auto-detect) | Same as =gh_pr_view= | 20s | +| =gh_issue_view= | =issue view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | +| =gh_issue_list= | =issue list --json= | JSON fields: =number,title,state,author,createdAt,labels=; default =--limit 30= | 20s | +| =gh_run_view= | =run view RUN-ID --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch,event,url= | 20s | +| =gh_run_list= | =run list --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch=; default =--limit 20= | 20s | +| =gh_run_logs_failed= | =run view RUN-ID --log-failed= | text only; capped at 64 KB | 60s | +| =gh_search_prs= | =search prs --json= | JSON fields: =number,title,state,author,repository,url=; default =--limit 30= (capped at 100) | 30s | +| =gh_search_issues= | =search issues --json= | Same as PR search | 30s | +| =gh_api_get= | =api ENDPOINT --method GET= | text/JSON pass-through; rejects fields/input args | 30s | + +Common args for every wrapper (unless noted): + +| Arg | Type | Required | Purpose | +|-----+------+----------+---------| +| =repo= | string | no | =[HOST/]OWNER/REPO=; resolved from context otherwise | +| =hostname= | string | no | Override host for context resolution | +| =limit= | integer | no | =--limit= for list/search wrappers; clamped to per-wrapper max | + +The =gh_api_get= wrapper *explicitly* rejects =-f=, =-F=, +=--field=, =--raw-field=, =--input=, =--method=, and any method +override. It only accepts =ENDPOINT= and optional =-H= headers +that don't carry secrets (the runner redacts =Authorization:= and +similar regardless). Writes via API go through the general tool +with confirmation. + +** JSON field defaults + +Per-wrapper defaults are minimal -- enough for the agent to decide +whether to drill in, not so much that one call fills the context. + +For =gh_pr_view= specifically: + +- *Default* (no =fields= override): the small list above (number, + title, state, author, createdAt, url, body-truncated-to-2KB). +- *Override*: agent passes =fields= as a comma-separated string; + wrapper validates against a per-resource allowlist (so the agent + can't request =reviews,comments,files= in one call to bypass the + cap). +- *Include flags*: =include-body t/nil=, =include-comments t/nil=, + =include-reviews t/nil= as boolean args. Each adds the + corresponding JSON field; agent opts in only when needed. + +** Output truncation + +Process filter pattern, not post-hoc cap: + +#+begin_src emacs-lisp +(defun cj/gh--make-filter (state-var) + "Return a process filter that accumulates into STATE-VAR's :stdout, +stops collecting after `cj/gh--max-bytes', sets :truncated, and +sends SIGTERM to the process." + (lambda (proc output) + (let* ((state (symbol-value state-var)) + (current (plist-get state :stdout)) + (current-len (length current)) + (remaining (- cj/gh--max-bytes current-len))) + (cond + ((<= remaining 0) nil) ; already at cap + ((<= (length output) remaining) + (plist-put state :stdout (concat current output))) + (t + (plist-put state :stdout + (concat current (substring output 0 remaining))) + (plist-put state :truncated t) + (ignore-errors (delete-process proc))))))) +#+end_src + +Truncation marker appended before return: + +#+begin_example +[truncated at 64KB; use --limit, narrower fields, or a specific +wrapper to reduce output] +#+end_example + +** Error classification + debug record + +Every call returns (and =cj/gh-tool-last-error= caches) a debug +record: + +#+begin_src emacs-lisp +(defun cj/gh--debug-record (argv context exit-code stdout stderr + duration-ms truncated) + (list :host (plist-get context :host) + :repo (cj/gh--repo-arg context) + :cwd default-directory + :argv (cj/gh--redact-argv argv) + :classification (cj/gh--classify argv) + :policy (cj/gh--policy-decision argv) + :exit-code exit-code + :duration-ms duration-ms + :bytes-captured (length stdout) + :truncated truncated + :error-kind (cj/gh--error-kind exit-code stderr))) +#+end_src + +=:error-kind= mapping: + +| Condition | =:error-kind= | Returned message | +|-----------+---------------+------------------| +| Exit 4 + "authentication required" | =auth= | "Run =gh auth login --hostname HOST= in a terminal." | +| Process killed by timeout timer | =timeout= | "Command exceeded N seconds; narrow the query or use a more specific wrapper." | +| Policy block | =policy-blocked= | "Blocked by V1 policy: REASON." | +| Irreversible match | =irreversible-blocked= | "Hard-blocked irreversible command: COMMAND." | +| Truncated output | =truncated= | "Output truncated at 64KB; reduce scope." | +| TRAMP cwd | =remote-cwd= | "Cannot run gh from remote directory: CWD." | +| Missing executable | =no-executable= | "gh not found at CJ/GH--EXECUTABLE; install via 'pacman -S github-cli' (or equivalent)." | +| Other non-zero exit | =gh-exit= | (raw stderr, redacted) | + +Each error includes a sanitized reproduce line: + +#+begin_example +Reproduce: GH_PROMPT_DISABLED=1 GH_PAGER=cat gh pr view 171 --repo HOST/OWNER/REPO +#+end_example + +(Secrets, body text, file paths redacted via =cj/gh--redact-argv=.) + +** Audit log (V1, opt-out) + +Every call appends one line to +=~/.emacs.d/data/gh-tool-log/YYYY-MM-DD.log=: + +#+begin_example +2026-05-16T14:23:45-0500 host=github.com repo=cjennings/dotemacs class=read policy=auto exit=0 duration=128ms bytes=4321 +2026-05-16T14:24:02-0500 host=deepsat.ghe.com repo=org/repo class=write policy=confirm exit=0 duration=412ms bytes=88 +#+end_example + +Metadata only, not output bodies. Defcustom +=cj/gh-tool-audit-log-enabled= (default =t=). Daily rotation +implicit (one file per day). Cleanup manual. + +** Secrets redaction + +=cj/gh--redact-argv= masks: + +- Anything after =--token=, =--secret=, =--password= flags. +- Authorization headers (=-H "Authorization: ..."=). +- =--figma-api-key=KEY= (in case general tool spawns figma-mcp + somehow). +- Bearer tokens in URLs (=?token=...=). +- Values for =-f=/=-F= keys named like =body=, =text=, + =description= (private content; metadata still logged). + +Applied to: +- All stderr returned to the agent. +- All audit-log lines. +- All debug records. +- The reproduce line on error. + +* Commands & UX + +** =cj/gh-doctor= + +Diagnostic command. No side effects. Checks: + +- =gh= executable found at =cj/gh--executable=. +- =gh --version= meets =cj/gh--min-version=. +- =gh auth status= for each host in =cj/gh-known-hosts=. +- Current buffer's cwd repo detection: resolves to which host/repo? +- Environment overrides effective (=GH_PROMPT_DISABLED= etc. would + be set by the runner). +- Active account per host. +- Warnings if any block-list-relevant env is set externally + (e.g. user already has =GH_PAGER= set to something that pages). + +Output: a buffer with PASS / FAIL / WARN per check + recovery +actions for failures. + +** =cj/gh-tool-last-error= + +Opens a buffer showing the last call's debug record: + +#+begin_example +Host: deepsat.ghe.com +Repo: org/repo +Source: upstream +CWD: ~/projects/work/foo +Argv: ("pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo") +Classification: read +Policy: auto +Exit code: 0 +Duration: 412 ms +Bytes captured: 4321 +Truncated: no +Error kind: none + +Reproduce: + GH_PROMPT_DISABLED=1 GH_PAGER=cat NO_COLOR=1 \ + gh pr view 171 --repo deepsat.ghe.com/org/repo +#+end_example + +** Tool response header + +Every tool result begins with a one-line header so cross-host / +policy decisions are visible: + +#+begin_example +[gh github.com/cjennings/dotemacs read ok 4.3KB] +{ ... } +[gh deepsat.ghe.com/org/repo write confirmed 0.2KB] +{ ... } +[gh github.com/cjennings/dotemacs api blocked policy-blocked] +Error: Hard-blocked file-upload path: --input file. +Reproduce: gh api repos/cjennings/dotemacs/contents/foo --input file +#+end_example + +* Implementation Plan + +Eight phases. Each ends with green ERT tests + manual smoke +before the next. + +** Phase 1 -- Common helpers + context resolver + +=gh-common.el=: =cj/gh--executable=, =cj/gh--available-p=, +=cj/gh--version= (cached), =cj/gh--validate-cwd= (HOME + non-TRAMP), +=cj/gh--parse-remote-url=, =cj/gh--resolve-context=, +=cj/gh--redact-argv=. No subprocess execution yet. + +Tests cover all helpers against fixture remote URLs and synthetic +git directories. No real gh calls. + +** Phase 2 -- Runner with subprocess env + timeout + in-flight cap + +=cj/gh--run=, the process filter, the timer kill path, env-var +setup. Tests stub =make-process= to simulate output / exit / hang +/ truncation paths. + +Acceptance: with stub configured to produce 100 KB of output, +returned stdout is exactly 64 KB plus the truncation marker, and +the process gets SIGTERM. + +** Phase 3 -- Classifier + blocklist + policy + +=cj/gh--classify=, =cj/gh--classify-api=, blocklist constants, +=cj/gh--policy-decision=. Tests for every blocklist pattern +(verbs + flags + file-arg paths), API edge cases (=-f= promotes +to POST, =--method GET -f x=y= stays GET, etc.), and the +unknown-fails-closed contract. + +** Phase 4 -- Read wrappers (5 first) + +=gh_repo_view=, =gh_pr_view=, =gh_pr_list=, =gh_pr_diff=, +=gh_issue_view=. Each is a thin schema + argv builder + delegate +to =cj/gh--run=. Tests verify argv shape for typical args. + +Manual smoke against both hosts. First real gh calls. + +** Phase 5 -- Remaining wrappers + JSON defaults + +Eight more wrappers (=gh_pr_checks=, =gh_pr_files=, +=gh_pr_current=, =gh_issue_list=, =gh_run_view=, =gh_run_list=, +=gh_run_logs_failed=, =gh_search_prs=, =gh_search_issues=, +=gh_api_get=). JSON-field defaults per wrapper. Tests for the +=gh_api_get= flag-rejection contract. + +** Phase 6 -- General tool + +=gh.el= general tool registration with =:confirm t=, blocklist +enforcement, policy decision applied before invocation. Tests +verify confirmation gate (stub gptel's confirm flow), blocked +commands never reach =cj/gh--run=, destructive commands return +=:irreversible-blocked= without prompting. + +** Phase 7 -- UX: doctor, last-error, response header + +=cj/gh-doctor=, =cj/gh-tool-last-error=, response header +formatting. Audit log writer. Defcustom for log enable. + +** Phase 8 -- Loader wiring + integration + +Add the 16 feature names to =cj/gptel-local-tool-features= (one +per wrapper + the general tool + the helpers feature). Verify +they land in =gptel-tools=. + +* Test Plan + +Target: 55-70 ERT tests across four files. No real subprocesses, +no real network, no real =~/.claude.json= (gh tools don't use it, +but the no-real-process rule applies uniformly). + +** =tests/test-gh-common.el= -- pure helpers (~25 tests) + +- =cj/gh--parse-remote-url=: ssh-scp, ssh-url, https, with/without + =.git=, with/without trailing slash, unknown host returns =:host + nil=, =github.com.evil.example= does NOT match =github.com=. +- =cj/gh--resolve-context=: explicit repo wins; explicit hostname + wins over remote; upstream beats origin; origin beats default; + default fires when no git. +- =cj/gh--validate-cwd=: HOME-rooted ok; outside HOME errors; + TRAMP errors; non-directory errors. +- =cj/gh--redact-argv=: =--token=, =-H "Authorization: ..."=, + =--figma-api-key=, =-f body=...= → body redacted; sentinel + =REDACTED_TEST_SECRET= never appears in any output of any + helper. +- =cj/gh--available-p=: nil when =executable-find= fails; t + otherwise. +- =cj/gh--version=: caches per session; floor check rejects + too-old. + +** =tests/test-gh-runner.el= -- runner contract (~15 tests) + +Stub =make-process=: +- Normal exit 0 with short output: returned verbatim, no + truncation flag. +- Long output (100 KB stub): truncated at 64 KB exactly, + truncation marker present, =:truncated t=. +- Process hangs past timeout: timer fires, SIGTERM sent, returns + =:error-kind 'timeout=. +- Exit 4 + auth stderr: returns =:error-kind 'auth= with recovery + message. +- TRAMP cwd: never invokes =make-process=, returns =:error-kind + 'remote-cwd=. +- Missing executable: returns =:error-kind 'no-executable=. +- Environment: =process-environment= includes all six required + vars before =make-process= call. +- Argv with leading "gh" stripped + logged. + +** =tests/test-gh-classifier.el= -- policy logic (~20 tests) + +- Every read verb classifies as read. +- Every write verb classifies as write. +- Every destructive pattern matches. +- Every blocked verb (=login=, =browse=, =clone=, etc.) classifies + as blocked. +- Every blocked flag (=--web=, =--paginate=, =--input=) classifies + as blocked. +- File-upload (=-F key=@/path=) classifies as blocked. +- =gh api= GET (no fields): read. +- =gh api= GET (=-f x=y=): write (auto-POST per gh's rules). +- =gh api --method GET -f q=x=: read (explicit override). +- =gh api -X DELETE=: destructive. +- =gh api -X PATCH=: write. +- =gh api --input file=: blocked. +- Unknown verb (=gh frobnicate=): unknown → confirm. +- =-H= as branch (=gh pr list -H feature=) doesn't trigger host + treatment. +- =-H= as header (=gh api -H Accept:json=) doesn't trigger host + treatment. +- Policy decision: read → auto; write → confirm; destructive → + block; blocked → block; unknown → confirm. + +** =tests/test-gh-wrappers.el= -- per-wrapper builders + schemas (~15 tests) + +- Every wrapper's schema is valid (correct =:name=, =:type=, + =:description=, =:args= shape). +- =gh_pr_view= argv with number 171 and repo "host/o/r" produces + =("pr" "view" "171" "--repo" "host/o/r" "--json" "DEFAULTS")=. +- =gh_pr_diff= rejects =format='json=. +- =gh_api_get= rejects =-f=, =-F=, =--field=, =--raw-field=, + =--input=, =--method= other than GET. +- =gh_pr_current= invokes without a number arg (uses cwd). +- =limit= clamped to per-wrapper max. +- =fields= validated against per-resource allowlist. + +** Manual smoke (every phase) + +| Phase | Smoke | +|-------+-------| +| 4 | =gh_pr_view N= against both hosts | +| 4 | =gh_pr_list= in =~/.emacs.d= → uses github.com/cjennings/dotemacs | +| 5 | =gh_pr_checks= shows CI status without full logs | +| 5 | =gh_run_logs_failed= cap kicks in on a long-failed run | +| 5 | =gh_api_get= rejects a =-f= arg with clear error | +| 6 | =gh= general tool: agent asked to merge a PR triggers GPTel confirm prompt | +| 6 | Agent asked to =gh repo delete= gets irreversible-blocked | +| 6 | Agent asked to =gh --web=... gets policy-blocked | +| 7 | =cj/gh-doctor= correctly identifies an unauthenticated host | +| 7 | =cj/gh-tool-last-error= shows debug record after a failing call | + +** Opt-in integration suite + +A small set of real-gh tests (in =tests/test-gh-integration.el= +marked =:tag :integration=, default skipped): + +- =gh auth status --hostname github.com= ok. +- =gh auth status --hostname deepsat.ghe.com= ok. +- =gh repo view cjennings/dotemacs --json name= + returns parseable JSON. +- =gh pr list --repo cjennings/dotemacs --limit 1= returns ≤ 1 + PR. + +Run manually via =make test-name TEST=gh-integration=. + +* Acceptance Criteria + +1. *Argv contract.* No tool produces a command string for execution; + every call goes through the argv-list runner. +2. *No silent writes.* Every classified write either prompts + GPTel's confirm or hard-blocks. Verified by an end-to-end + test where the agent attempts =pr merge= and the test fails if + =make-process= is invoked before confirm. +3. *In-flight cap.* A stubbed process emitting 1 MB returns + exactly 64 KB; the runner never holds more than 65 KB in + memory. +4. *Host visibility.* Every successful tool response begins with + =[gh HOST/REPO ...]=. Verified by a test that greps the + response text. +5. *Doctor coverage.* =cj/gh-doctor= correctly identifies (a) no + gh executable, (b) too-old gh, (c) unauthenticated host, (d) + non-git cwd, (e) git cwd whose remote points to an unknown + host. +6. *No secret leakage.* Test fixtures containing + =REDACTED_TEST_SECRET= in every secret-bearing slot + (=--token=, =-H Authorization=, =-f body=, etc.) produce zero + matches when grepping audit log, debug record, and any + user-facing message. + +* Risks + +** R1 -- gh CLI evolves and verbs drift + +New =gh= versions may add subcommands the blocklist doesn't +cover. Or rename verbs. + +*Mitigation:* the blocklist works on verbs (not full subcommand +paths) so most additions are caught. Doctor includes a +=gh --version= floor. Periodic review when gh bumps a major +version. + +** R2 -- The "all reads auto-execute" default may still be too broad + +Some reads expose private content (issue bodies, PR descriptions +from private repos). An agent surfacing a confidential issue +body into a saved conversation has data-leak implications. + +*Mitigation:* response header makes the host/repo visible in +every result, so the saved conversation makes the privacy +boundary auditable. Wrappers truncate body/comments by default; +agent must explicitly opt-in to include them. Documented in +commentary. + +** R3 -- The general tool's =:confirm t= prompt may become click-fatigue + +If the agent uses the general tool heavily during a write-heavy +workflow (PR creation, label management), confirming every call +becomes tedious. + +*Mitigation:* the expanded wrapper set covers most reads, so the +general tool fires mainly for writes -- where confirmation is +exactly the right behavior. If usage shows confirm fatigue, +V2's per-host policy can add =:auto= for explicit +write-confirmed contexts. + +** R4 -- =--paginate= block conflicts with legitimate large queries + +Blocking =--paginate= globally means the agent can't get +historical CI runs (which may need pagination). + +*Mitigation:* =gh_run_list= and =gh_search_*= accept a clamped +=--limit= which usually substitutes. If a use case needs more, +the agent can request multiple non-paginated pages explicitly. + +** R5 -- Token expiry surfaces as cryptic exit 4 + +When a host's keyring entry expires, every call returns exit 4 +with "authentication required" on stderr. The agent sees the +error but may not realize the fix is interactive. + +*Mitigation:* the runner's =:error-kind 'auth= mapping prepends +the recovery message before returning to the agent. =cj/gh-doctor= +proactively checks auth status. + +** R6 -- TRAMP cwd silently runs gh remotely + +Without the explicit TRAMP rejection, =process-file= would try to +spawn =gh= on the remote host (where it may not exist, or may +authenticate against the wrong keyring). + +*Mitigation:* runner checks =(file-remote-p default-directory)= +first and returns =:error-kind 'remote-cwd= with a clear message. + +** R7 -- =gh search= behaves differently on GHE + +GHE may not support every advanced search operator +=github.com= does. Search wrappers may return inconsistent +results across hosts. + +*Mitigation:* documented in =gh_search_*= wrapper descriptions. +Result header makes the host visible so the agent can adjust. + +** R8 -- Audit log grows unbounded + +One file per day, but no automatic cleanup. + +*Mitigation:* metadata-only entries are tiny (~150 bytes); a +year of heavy use is a few MB. Manual cleanup acceptable. +Defcustom to disable for users who don't want it. + +* Open Questions + +** Q1 -- Should the general tool's confirmation prompt include the classification? + +When GPTel asks "Run gh tool? (y/n)" the prompt shows the argv but +not the classification. Showing "WRITE: gh pr merge 171" gives the +user more context. Need to investigate gptel's confirm-prompt +extensibility. + +** Q2 -- Should =gh_pr_diff= cap differently from text wrappers? + +A PR diff can legitimately be 100KB+ for a large refactor. The +64KB cap is the same as everywhere else. If diffs need a higher +cap (256KB?), that's per-wrapper config. + +** Q3 -- Should wrappers expose =include-body=, =include-comments=, etc., as separate args, or as a comma-separated list? + +The spec proposes separate boolean args (=:include-body t=, +=:include-comments t=). Alternative: one =:include= comma-list +arg. Separate args are more discoverable; comma-list is more +compact. Decide during Phase 4. + +** Q4 -- Should =cj/gh-tool-audit-log= grow into a query interface? + +V1 writes one line per call. Future: a command to query the +log (=cj/gh-audit-search REGEX=) for surfacing "what did the agent +do to this PR last week?". + +* V2 Roadmap + +Items intentionally deferred: + +- *Per-host policy.* =cj/gh-host-policy= alist keyed by hostname + (mirror of the MCP spec's structure) so work GHE can be + read-only while personal allows writes-with-confirm. +- *Conversation context injection.* After a PR view, the wrapper + inserts a "GitHub context: HOST/REPO PR #N at URL" line into the + GPTel buffer so saved conversations stay traceable without + bundling full output. +- *Local/remote bridge helpers.* current-branch → PR-number, + changed-files → matching PR file comments, etc. +- *Artifact download.* =gh_run_artifacts=, =gh_release_download= + with explicit confirm and write to a controlled directory. +- *Async general tool.* =make-process= + sentinel for the cases + where 60s timeout isn't enough (rare, but real for some + =--paginate= scenarios). +- *Audit log query interface.* =cj/gh-audit-search=, + =cj/gh-audit-by-host=. +- *Profile-based tool subsets.* e.g. read-only profile + vs. write-capable profile per buffer. + +* References + +- [[file:../../gptel-tools/git_log.el][gptel-tools/git_log.el]] -- pattern reference for new tool files. +- [[file:../../modules/ai-config.el][modules/ai-config.el]] -- =gptel-confirm-tool-calls nil= at + line 386; loader at lines 71-96. +- [[file:gptel-agentic-tool-ideas.org][gptel-agentic-tool-ideas.org]] -- broader agentic-tool design; + =gh= sits alongside the MCP integration as the + collaboration tier. +- [[file:mcp-el-gptel-integration.org][mcp-el-gptel-integration.org]] -- sibling design; same + confirm-on-write pattern for safety. +- [[https://cli.github.com/manual/][gh CLI manual]] -- subcommand reference. +- =gh --version 2.92.0= help output -- verified flag semantics + per subcommand. +- =gh help environment= -- verified env-var names for non-interactive + mode. diff --git a/docs/design/mcp-el-gptel-integration.org b/docs/design/mcp-el-gptel-integration.org new file mode 100644 index 00000000..6bac77c5 --- /dev/null +++ b/docs/design/mcp-el-gptel-integration.org @@ -0,0 +1,1434 @@ +#+TITLE: Design: Wire mcp.el into GPTel for MCP server access +#+AUTHOR: Craig Jennings +#+DATE: 2026-05-16 +#+OPTIONS: toc:nil num:nil + +* Status + +Draft (revision 3). Pre-implementation; no code shipped yet. The +mcp.el package is cloned at =~/code/mcp.el/= (fork of +[[https://github.com/lizqwerscott/mcp.el][lizqwerscott/mcp.el]]) but not wired into the config. + +Revision 3 tightens seven contracts the revision-2 review flagged: + +1. *GPTel confirmation* -- =gptel-confirm-tool-calls= is =nil= at + =ai-config.el:386=, which short-circuits every per-tool + =:confirm= slot. The integration flips it to =auto= as a hard + precondition. +2. *Async timeout mechanics* -- replaced =with-timeout= (which + only supervises dynamic extent) with an explicit timer/callback + race for async tool calls. +3. *Startup completion semantics* -- the hub's completion callback + is opportunistic, not authoritative; the stall timer + polling + =mcp-server-connections= is the source of truth. +4. *Server identity at registration* -- walk + =mcp-server-connections= directly instead of parsing + =:category mcp-SERVER= out of =mcp-hub-get-all-tool=. +5. *Server enablement* -- =cj/mcp-enabled-servers= defcustom lets + users disable a server without writing code. Profiles still + deferred. +6. *Keymap pinned* -- =C-; a C= (Connect) is the MCP subprefix. + =M= (=gptel-menu=) and =m= (=cj/gptel-change-model=) stay + where they are. +7. *mcp.el private-API isolation* -- a compat layer wraps every + =mcp--*= call so version drift surfaces in one place. + +Plus several smaller changes: every MCP tool registers async, +description normalization adds a server-name prefix and a write +risk note, =cj/mcp-start-on-entry-points= defcustom scopes +startup triggers (default: full chat only), TRAMP processes +local-only, doctor gains live-auth-check, =cj/mcp-wait-until-ready= +command added, audit buffer surfaces failed servers prominently. + +* Problem + +GPTel exposes ten local tools today (=read_buffer=, =read_text_file=, +=write_text_file=, =update_text_file=, =list_directory_files=, +=move_to_trash=, =git_status=, =git_log=, =git_diff=, =web_fetch= -- +see =gptel-tools/=). Claude Code, by contrast, has access to nine +external MCP servers (linear, notion, figma, slack-deepsat, +google-calendar, google-docs-personal, google-docs-work, drawio, +google-keep), each exposing 10-70 additional tools. + +The asymmetry means agentic work done in GPTel can't touch the same +external systems Claude Code can. Wiring [[https://github.com/lizqwerscott/mcp.el][mcp.el]] into the config +closes the gap: GPTel gains access to every MCP server Claude Code +uses, modulo three claude.ai-hosted servers whose OAuth is bound to +the Claude.ai session (see Non-Goals). + +* Goals + +1. GPTel sees every tool from the enabled subset of nine reusable + MCP servers in =gptel-menu=, grouped by server via the tool's + =:category= field. +2. Servers spawn *asynchronously*. Opening GPTel never blocks on + MCP startup; tools arrive incrementally and =gptel-tools= updates + as each server reports its inventory. =cj/toggle-gptel= must + return without waiting for any MCP subprocess. +3. Write/destructive MCP tools are gated by a confirmation prompt + the user actually sees. Two preconditions: + =gptel-confirm-tool-calls= is set to =auto= (so the per-tool + =:confirm= slot is honored), and write/destructive tools are + registered with =:confirm t=. Read-only tools execute without + confirmation. +4. Secrets stay in =~/.claude.json= (single source of truth, shared + with Claude Code). The Emacs config reads env vars from there + at server-spawn time, with an mtime-aware cache. Secrets are + never echoed to status, errors, hub buffers, or tests. +5. A per-server status alist tracks each server's lifecycle (idle / + starting / ready / failed / stopped) and is inspectable via + =cj/mcp-status= and a =cj/mcp-list-tools= audit buffer. +6. Server-management commands live under a =C-; a C= (Connect) + subprefix so existing GPTel keys (=C-; a M=, =m=) aren't + disturbed. +7. A failed server (network down, OAuth token expired, npx package + 404) is surfaced clearly via the OAuth-recovery pattern matcher + and does not block GPTel itself. Successful servers' tools are + available immediately; failed servers' tools are absent (not + stale). +8. The config can swap between MELPA mcp.el and the local + =~/code/mcp.el/= checkout with a one-line uncomment, gated by a + capability check that asserts required API functions exist. +9. A first-run =cj/mcp-doctor= command diagnoses missing + prerequisites (=npx=, =uvx=, =~/.claude.json=, per-server + commands, known local endpoints) and optionally runs a + live-auth probe before they fail at runtime. + +* Non-Goals + +- The three claude.ai-hosted MCP servers (Gmail / Drive / Calendar + served from =*.googleapis.com/mcp/v1=). Their OAuth is issued + by the Claude.ai session and is not transferable to GPTel. +- *MCP resources and prompts.* v1 registers tools only. + Resource browsing and prompt invocation are tracked as + follow-ups; the local checkout has the API surface ready. +- *Per-conversation tool profiles.* v1 ships + =cj/mcp-enabled-servers= for whole-server enable/disable; + profiles (different tool subsets per chat) wait for v1.5 once + usage shows whether they're needed. +- *Auth-source migration.* Deferred until the OAuth re-auth flow + for expiring tokens is understood. Tracked in Open Questions + §Q3. +- *Automated OAuth re-auth when tokens expire.* Out of scope; the + user re-authenticates via Claude Code, and the next GPTel + invocation picks up the refreshed values from =~/.claude.json=. +- *Modifying mcp.el itself in this repo.* Upstream patches and + tests live in =~/code/mcp.el/= and ship via PRs to lizqwerscott's + master. + +* Verified API Contracts + +These were checked against the actual source before each revision. +Behavior summarized here so implementation can rely on it. + +** GPTel confirmation gating (=gptel.el:2244=) + +The confirmation check is: + +#+begin_src emacs-lisp +(if (and gptel-confirm-tool-calls + (or (eq gptel-confirm-tool-calls t) + (gptel-tool-confirm tool-spec))) + ;; ask user + ...) +#+end_src + +When =gptel-confirm-tool-calls= is =nil=, the =(and ...)= +short-circuits and the tool's =:confirm= slot is ignored. + +The defcustom default is ='auto=, which "seeks confirmation only +when the corresponding tool spec has a non-nil :confirm slot" +(=gptel.el:1601-1603=). =ai-config.el:386= currently sets it to +=nil=. + +*Implementation consequence:* =ai-mcp.el= must =setq +gptel-confirm-tool-calls 'auto= as part of its setup (and +=ai-config.el= drops the explicit =nil= setting). Without this, +write-gated tools register =:confirm t= and gptel ignores it. + +** mcp-hub callback ownership (=~/code/mcp.el/mcp-hub.el:53-90=) + +=mcp-hub--start-server= unconditionally appends its own six +callbacks (=:initial-callback=, =:tools-callback=, +=:prompts-callback=, =:resources-callback=, +=:resources-templates-callback=, =:error-callback=) to whatever +the caller passes. Per-server custom callbacks in the alist +result in duplicate keyword arguments to =mcp-connect-server= -- +behavior implementation-defined. + +*Implementation consequence:* the integration does not slip custom +callbacks through =mcp-hub-servers=. It uses +=mcp-hub-start-all-server='s top-level completion callback as an +opportunistic signal, walks =mcp-server-connections= directly for +authoritative state, and uses a stall timer as the deadline. + +** mcp-hub-start-all-server completion semantics + +The hub's completion callback (=mcp-hub-start-all-server='s +=CALLBACK= argument) fires when its internal counter reaches the +total server count. The counter increments: + +- On immediate Elisp errors from =mcp-hub--start-server=. +- When the =:inited-callback= passed to =mcp-hub--start-server= + fires, which happens inside the hub's =:tools-callback=. + +Async error paths flow through =:error-callback= -- which the hub +also installs but does *not* obviously chain into the inited +callback. Servers without tools may not pass through the tools +callback in the same way. + +*Implementation consequence:* the callback is treated as an +opportunistic readiness signal, *not* as "all initialized or +failed". The authoritative state comes from polling +=mcp-server-connections= (each entry has =mcp--status= of +=connected= / =error= / =starting=) and from the stall timer +deadline. + +** gptel-make-tool registration semantics (=gptel.el:1729-1820=) + +=gptel-make-tool= registers the tool into =gptel--known-tools= +keyed by category + name. It does *not* add the tool to +=gptel-tools= (the per-buffer active list). The existing local +tools (=gptel-tools/git_log.el:97=) explicitly do: + +#+begin_src emacs-lisp +(gptel-make-tool ...) +(add-to-list 'gptel-tools (gptel-get-tool '("category" "name"))) +#+end_src + +*Implementation consequence:* the registration pipeline does +both calls per tool, tracks tool names per server in a hash, and +deregisters cleanly on restart/stop without disturbing local +tools. + +** MELPA vs local checkout + +The local checkout (=~/code/mcp.el/=, tip =f10768e=) has HTTP +transport (=mcp-http-process-connection=) and recent UX +improvements (resource reading, imenu support, detail mode). +MELPA parity not yet verified. + +*Implementation consequence:* =cj/mcp--assert-capabilities= +checks for required functions at load time and signals a clear +=user-error= if missing. Use-package block defaults to MELPA; +the local-checkout =:load-path= line stays commented until the +capability check tells us MELPA is missing something. + +* GPTel Confirmation Contract + +The single most consequential precondition for the safety story: + +** Current state + +=modules/ai-config.el:386= sets =gptel-confirm-tool-calls= to +=nil=. This was a deliberate "allow tool access by default" +choice when only the ten local tools existed -- all of which are +either read-only (git_log, git_status, list_directory_files, etc.) +or already wrap their own confirm prompts (web_fetch uses +=:confirm t= but is ignored under the current setting; this was +acceptable because the only "real" tool there is on a user-typed +URL). + +** Required state + +The MCP integration cannot ship without flipping this to =auto=. +Specifically, =modules/ai-mcp.el= must: + +1. =setq gptel-confirm-tool-calls 'auto= during its load. +2. Audit the existing local tools and add =:confirm t= to any + that should be gated. =web_fetch= is the obvious candidate; + =write_text_file=, =update_text_file=, =move_to_trash= may + also warrant it depending on Craig's preference. + +The existing =ai-config.el:386= line is removed. A comment +points readers at =ai-mcp.el= for the new value. + +** Verification test + +A test in =tests/test-ai-mcp-confirm-contract.el= asserts: + +- After =ai-mcp= loads, =gptel-confirm-tool-calls= is ='auto=. +- A write-classified MCP tool registered with =:confirm t= takes + the confirmation branch in =gptel-send='s tool-dispatch code + (verified by stubbing gptel's confirm-prompt and checking it + fires). +- A read-classified MCP tool registered with =:confirm nil= does + not take the confirmation branch. +- Local =git_log= (=:confirm nil=) still runs without prompting. + +* Current State + +** =modules/ai-config.el= + +- =use-package gptel= block at lines 363-414, defer-loaded on the + =gptel= / =gptel-send= / =gptel-menu= commands. +- =gptel-confirm-tool-calls nil= at line 386. Removed by this + integration; see § GPTel Confirmation Contract. +- =cj/gptel-load-local-tools= (lines 71-96) loads the ten local + tools from =gptel-tools/=. +- =cj/toggle-gptel= (lines 418-441) is the primary entry point + (=C-; a t=). Other entry points: =cj/gptel-quick-ask= + (=C-; a q=), =gptel-magit-commit-generate= (=g= in magit), + =cj/gptel-rewrite-with-directive= (=C-; a r=), =gptel-send= + (=C-RET= in gptel buffer). +- =cj/ai-keymap= (lines 510-528) currently uses keys A B M d . f + b l m p q r R c s t x. =C= (uppercase) is free and becomes the + MCP subprefix. + +** =gptel-tools/= + +Ten =.el= files. =git_log.el= is the closest analogue; +=web_fetch.el= demonstrates the =:confirm t= pattern. + +** =~/.claude.json= + +Mode 0600, ~75 KB. Top-level =mcpServers= key holds the nine +servers we want. Env-var names per server (values redacted): + +| Server | Env vars | +|----------------------+-------------------------------------------------------| +| google-calendar | =GOOGLE_OAUTH_CREDENTIALS= | +| google-docs-personal | =GOOGLE_CLIENT_ID=, =GOOGLE_CLIENT_SECRET=, =GOOGLE_MCP_PROFILE= | +| google-docs-work | Same three vars (different values) | +| google-keep | =GOOGLE_EMAIL=, =GOOGLE_MASTER_TOKEN= | + +* Design + +** Module split + +Implementation lives in =modules/ai-mcp.el=. =modules/ai-config.el= +gains only autoload declarations and the =C-; a C= subprefix +wiring. + +** Code organization outline (=ai-mcp.el=) + +The file is organized in seven sections so it stays readable as +features land: + +1. *Constants and defcustoms* -- =cj/mcp-server-specs=, + =cj/mcp-claude-config=, =cj/mcp-enabled-servers=, + =cj/mcp-start-on-entry-points=, =cj/mcp-startup-timeout=, + =cj/mcp-tool-timeout=, =cj/mcp-tool-confirm-overrides=, + audit-log defcustoms. +2. *Public commands* -- =cj/mcp-ensure-started=, =cj/mcp-hub=, + =cj/mcp-status=, =cj/mcp-list-tools=, + =cj/mcp-restart-failed=, =cj/mcp-restart-server=, + =cj/mcp-stop-all=, =cj/mcp-doctor=, + =cj/mcp-wait-until-ready=. +3. *Pure helpers* -- Claude config reader, =cj/mcp--build-server-alist=, + =cj/mcp--redact=, =cj/mcp--confirm-p=, + =cj/mcp--normalize-description=. +4. *mcp.el compatibility layer* -- 3-5 wrappers around private + API (=mcp--status=, =mcp--tools=, etc.). Single source of + version-drift risk. +5. *Registration pipeline* -- =cj/mcp--register-tool=, + =cj/mcp--register-server-tools=, + =cj/mcp--deregister-server-tools=, + =cj/mcp--registered-tools= hash. +6. *Async state machine* -- =cj/mcp--state=, + =cj/mcp--server-status=, =cj/mcp--on-all-started=, + =cj/mcp--stall-timer=, =cj/mcp--poll-status=. +7. *UI* -- audit-buffer mode, doctor buffer, recovery-pattern + matcher, response prefixes. + +This explicit outline doubles as the file's table of contents in +its commentary block. + +** Server inventory: data first + +The nine servers are described as a defconst of plists, with no +secrets baked in: + +#+begin_src emacs-lisp +(defconst cj/mcp-server-specs + '((:name "linear" + :transport http + :url "https://mcp.linear.app/mcp" + :auth in-protocol + :risk write-capable) + (:name "notion" + :transport http + :url "https://mcp.notion.com/mcp" + :auth in-protocol + :risk write-capable) + (:name "figma" + :transport stdio + :command "npx" + :args ("-y" "figma-developer-mcp" "--stdio") + :secret-args ("--figma-api-key" :figma-api-key) + :auth args-token + :risk arg-leak) + (:name "slack-deepsat" + :transport sse + :url "http://127.0.0.1:13080/sse" + :auth local + :risk write-capable) + (:name "drawio" + :transport stdio + :command "npx" + :args ("-y" "@drawio/mcp") + :auth none + :risk none) + (:name "google-calendar" + :transport stdio + :command "npx" + :args ("-y" "@cocal/google-calendar-mcp") + :env (:GOOGLE_OAUTH_CREDENTIALS t) + :auth oauth + :risk write-capable) + (:name "google-docs-personal" + :transport stdio + :command "npx" + :args ("-y" "@a-bonus/google-docs-mcp") + :env (:GOOGLE_CLIENT_ID t :GOOGLE_CLIENT_SECRET t :GOOGLE_MCP_PROFILE t) + :auth oauth + :risk write-capable) + (:name "google-docs-work" + :transport stdio + :command "npx" + :args ("-y" "@a-bonus/google-docs-mcp") + :env (:GOOGLE_CLIENT_ID t :GOOGLE_CLIENT_SECRET t :GOOGLE_MCP_PROFILE t) + :auth oauth + :risk write-capable) + (:name "google-keep" + :transport stdio + :command "uvx" + :args ("--from" "keep-mcp" "python" "-m" "server.cli") + :env (:GOOGLE_EMAIL t :GOOGLE_MASTER_TOKEN t) + :auth token + :risk write-capable))) +#+end_src + +The same data drives the doctor check list, status labels, and +recovery messages -- a single source of truth keeps them from +drifting. + +** Server enablement + +#+begin_src emacs-lisp +(defcustom cj/mcp-enabled-servers + (mapcar (lambda (s) (plist-get s :name)) cj/mcp-server-specs) + "List of MCP server names to start. +Defaults to every server in `cj/mcp-server-specs'. Set to a +shorter list to disable specific servers without editing the +spec. Changes take effect on next `cj/mcp-restart-failed' or +Emacs restart." + :type '(repeat string) + :group 'cj) +#+end_src + +=cj/mcp--build-server-alist= filters by this list before +returning. A user who wants only =linear= and =drawio= sets: + +#+begin_src emacs-lisp +(setq cj/mcp-enabled-servers '("linear" "drawio")) +#+end_src + +This is the answer to "100+ tools is overwhelming" without +needing per-conversation profiles. + +** Entry-point policy + +Not every GPTel entry point should trigger MCP startup. Quick +ask, rewrite, and magit commit-message generation are +lightweight; spinning up nine subprocesses for a 50-word commit +message is surprising overhead. + +#+begin_src emacs-lisp +(defcustom cj/mcp-start-on-entry-points + '(toggle-gptel) + "GPTel entry points that trigger MCP startup. +Symbols correspond to commands: `toggle-gptel', `gptel-send', +`gptel-quick-ask', `gptel-rewrite-with-directive', +`gptel-magit-generate-message'. Default: only full chat +(`toggle-gptel')." + :type '(repeat symbol) + :group 'cj) +#+end_src + +Each entry-point command checks membership before calling +=cj/mcp-ensure-started=: + +#+begin_src emacs-lisp +(defun cj/toggle-gptel () + ... + (when (memq 'toggle-gptel cj/mcp-start-on-entry-points) + (cj/mcp-ensure-started)) + ...) +#+end_src + +** Claude config reader (mtime-cached, structured returns) + +#+begin_src emacs-lisp +(defcustom cj/mcp-claude-config + (expand-file-name "~/.claude.json") + "Path to the Claude Code config that holds MCP server env vars." + :type 'file + :group 'cj) + +(defvar cj/mcp--config-cache nil + "Cons of (MTIME . PARSED) for `cj/mcp-claude-config'.") + +(defun cj/mcp--read-claude-config () + "Return a structured result describing the Claude config state. +Result shape: + (:ok t :data PLIST) + (:ok nil :reason missing-file) + (:ok nil :reason unreadable) + (:ok nil :reason malformed-json :message STR) +Cached by mtime; subsequent calls reparse only on change." + ...) +#+end_src + +** mcp.el compatibility layer + +All private-API access lives in 3-5 helpers documented with the +upstream commit they target. This is the only file that touches +=mcp--*= names; everything else calls these wrappers. + +#+begin_src emacs-lisp +;; ai-mcp-compat -- isolates private mcp.el API. +;; Verified against upstream commit f10768e (2026-05-16). + +(defun cj/mcp--server-status (connection) + "Return CONNECTION's lifecycle status: connected, error, starting." + (mcp--status connection)) + +(defun cj/mcp--server-tools (connection) + "Return CONNECTION's discovered tool list (plists)." + (mcp--tools connection)) + +(defun cj/mcp--server-name (connection) + "Return CONNECTION's logical server name." + (jsonrpc-name connection)) + +(defun cj/mcp--assert-capabilities () + "Signal `user-error' if any required mcp.el function is missing." + (dolist (fn '(mcp-connect-server mcp-make-text-tool + mcp-hub mcp-hub-start-all-server + mcp-hub-get-all-tool mcp-server-connections)) + (unless (fboundp fn) + (user-error "mcp.el too old; missing %s. Upgrade or switch \ +to local checkout in `ai-mcp.el' use-package block" fn)))) +#+end_src + +If mcp.el renames a slot or changes a return shape, only these +helpers break. Tests cover each helper against stub objects. + +** Startup model: async + state machine + polling + +Three state structures capture lifecycle: + +#+begin_src emacs-lisp +(defvar cj/mcp--state 'idle + "Overall MCP integration state: idle, starting, partial, ready, failed.") + +(defvar cj/mcp--server-status nil + "Alist mapping server name to status plist: + (:state STATE :tool-count N :tools (NAME ...) :last-error STR + :started-at TIME :ready-at TIME)") + +(defvar cj/mcp--stall-timer nil + "Timer guarding against servers that never call back.") +#+end_src + +=cj/mcp-ensure-started= is the only entry point consumers call: + +#+begin_src emacs-lisp +(defun cj/mcp-ensure-started () + "Schedule MCP startup if it hasn't run yet this session. +Returns immediately. Servers spawn asynchronously." + (when (eq cj/mcp--state 'idle) + (setq cj/mcp--state 'starting) + (cj/mcp--assert-capabilities) + (cj/mcp--build-status-from-specs) + (setq mcp-hub-servers (cj/mcp--build-server-alist)) + (message "MCP: starting %d server(s) in background..." + (length mcp-hub-servers)) + ;; The hub callback is opportunistic. We poll status on each + ;; tick and the stall timer is the authoritative deadline. + (mcp-hub-start-all-server #'cj/mcp--on-hub-callback nil nil) + (cj/mcp--start-stall-timer))) +#+end_src + +State transitions and authority: + +- *Hub completion callback (opportunistic).* Triggers an + immediate poll + registration pass for whatever's ready. Does + not signal completion by itself. +- *Stall timer (authoritative deadline).* After + =cj/mcp-startup-timeout= (default 30 s), marks every server + still in =starting= state as =failed= with reason =timeout=, + registers tools from servers that did become ready, transitions + =cj/mcp--state= to its final value (=ready=, =partial=, or + =failed=). +- *Polling (authoritative state).* =cj/mcp--poll-status= walks + =mcp-server-connections= and maps each entry's =mcp--status= to + =cj/mcp--server-status=. Called from the hub callback and from + the stall timer. Servers that transitioned through + =:error-callback= (which the hub doesn't chain into the inited + callback) show up here. + +Properties: + +- =cj/mcp-ensure-started= returns in <100 ms regardless of + subprocess state. +- =mcp-hub-start-all-server= itself is async (third =SYNCP= arg + is =nil=). +- Servers transition independently; tools land in =gptel-tools= + as each server reports inventory. +- A server that never connects, never errors, and never reports + tools is caught by the stall timer. + +** Tool registration pipeline + +Walks =mcp-server-connections= directly, per server, after each +status poll. This gives clean per-server bookkeeping without +parsing the =mcp-SERVER= category prefix: + +#+begin_src emacs-lisp +(defun cj/mcp--register-server-tools (server-name) + "Register every tool from the connected server SERVER-NAME. +Idempotent: re-registration replaces the function pointer +without duplicating menu entries." + (let ((connection (gethash server-name mcp-server-connections))) + (when (and connection + (eq (cj/mcp--server-status connection) 'connected)) + ;; First, deregister any existing tools for this server. + (cj/mcp--deregister-server-tools server-name) + (dolist (raw-tool (cj/mcp--server-tools connection)) + (cj/mcp--register-tool server-name raw-tool)) + (cj/mcp--update-server-status server-name :state 'ready)))) + +(defun cj/mcp--register-tool (server-name raw-tool) + "Register one tool from SERVER-NAME. +RAW-TOOL is the plist from `mcp--tools' (untransformed)." + (let* ((remote-name (plist-get raw-tool :name)) + (gptel-name (format "mcp__%s__%s" server-name remote-name)) + (description (cj/mcp--normalize-description + server-name raw-tool)) + (confirm-p (cj/mcp--confirm-p gptel-name remote-name)) + ;; mcp-make-text-tool builds the closure; we async by default. + (mcp-plist (mcp-make-text-tool server-name remote-name t)) + ;; Rewrite name + description + confirm after mcp.el builds the closure. + (gptel-plist (cj/mcp--rewrite-plist + mcp-plist + :name gptel-name + :description description + :confirm confirm-p + :async t + :category (format "mcp-%s" server-name)))) + (apply #'gptel-make-tool gptel-plist) + (add-to-list 'gptel-tools + (gptel-get-tool + (list (format "mcp-%s" server-name) gptel-name))) + (push gptel-name + (gethash server-name cj/mcp--registered-tools)))) +#+end_src + +Key properties: + +- *Async by default.* All MCP tools register with =:async t=. + This avoids any sync MCP tool call blocking Emacs during + =gptel-send='s tool dispatch. Per-call timeout uses the + timer-race pattern (next subsection). +- *Closure preserves remote name.* =mcp-make-text-tool= built + the function before we rewrote =:name=, so the closure calls + =mcp-call-tool SERVER REMOTE-NAME=, not the prefixed + =mcp__SERVER__TOOL=. +- *Idempotent.* Each registration deregisters first, so + callbacks firing multiple times or restarts don't accumulate + duplicate entries. +- *Description normalization.* See next subsection. + +** Per-call timeout: explicit timer/callback race + +=with-timeout= only supervises the dynamic extent of the form it +wraps. For async tools where the function returns immediately +and the callback fires later, it does nothing. The correct +pattern is an explicit timer: + +#+begin_src emacs-lisp +(defun cj/mcp--wrap-async-with-timeout (server-name remote-name + mcp-callback) + "Return a gptel-compatible async wrapper for an MCP tool. +GPTel calls the returned function with (CALLBACK . ARGS). +The wrapper races MCP's response against `cj/mcp-tool-timeout'; +whichever fires first wins, and late callbacks are ignored." + (lambda (gptel-callback &rest args) + (let* ((done nil) + (timer (run-at-time cj/mcp-tool-timeout nil + (lambda () + (unless done + (setq done t) + (funcall gptel-callback + (format "MCP tool %s/%s \ +timed out after %ds" + server-name + remote-name + cj/mcp-tool-timeout))))))) + (mcp-async-call-tool + (gethash server-name mcp-server-connections) + remote-name + (cj/mcp--args-to-plist args) + (lambda (result) + (cancel-timer timer) + (unless done + (setq done t) + (funcall gptel-callback + (mcp--parse-tool-call-result result)))) + (lambda (code message) + (cancel-timer timer) + (unless done + (setq done t) + (funcall gptel-callback + (format "MCP error %s: %s" code + (cj/mcp--redact message))))))))) +#+end_src + +Both branches set =done= before invoking gptel's callback so a +late response (e.g., MCP responds after the timer fired but +before its sentinel cancels) doesn't deliver twice. Timer is +canceled on the success and error paths. + +The closure that =mcp-make-text-tool= built gets replaced with +this wrapper during registration (the =:function= slot of the +rewritten plist). + +** Confirmation / safety policy + +All enabled tools are registered (per the goal of full +visibility), but write/destructive tools get =:confirm t=. +Classification is name-based: + +#+begin_src emacs-lisp +(defconst cj/mcp--write-name-patterns + '("\\`create\\b" "\\`update\\b" "\\`delete\\b" "\\`remove\\b" + "\\`send\\b" "\\`post\\b" "\\`add\\b" "\\`move\\b" + "\\`invite\\b" "\\`share\\b" "\\`upload\\b" "\\`set\\b" + "\\`patch\\b" "\\`import\\b" "\\`sync\\b" "\\`merge\\b" + "\\`close\\b" "\\`reopen\\b" "\\`archive\\b" "\\`unarchive\\b" + "\\`approve\\b" "\\`reject\\b" "\\`label\\b" "\\`assign\\b" + "\\`reply\\b" "\\`comment\\b" "\\`trash\\b" "\\`restore\\b" + "\\`pin\\b" "\\`unpin\\b" "\\`copy\\b" "\\`rename\\b")) + +(defconst cj/mcp--read-name-patterns + '("\\`get\\b" "\\`list\\b" "\\`read\\b" "\\`search\\b" + "\\`find\\b" "\\`fetch\\b" "\\`view\\b" "\\`query\\b" + "\\`describe\\b" "\\`show\\b" "\\`check\\b")) + +(defcustom cj/mcp-tool-confirm-overrides nil + "Per-tool confirmation overrides. +Alist mapping fully qualified MCP tool name (e.g., +\"mcp__linear__create_issue\") to t or nil. Wins over the +pattern-based classifier." + :type '(alist :key-type string :value-type boolean) + :group 'cj) + +(defun cj/mcp--confirm-p (gptel-name remote-name) + "Return non-nil if the tool should register with `:confirm t'." + (let ((override (assoc gptel-name cj/mcp-tool-confirm-overrides))) + (cond + (override (cdr override)) + ((seq-some (lambda (p) (string-match-p p remote-name)) + cj/mcp--write-name-patterns) t) + ((seq-some (lambda (p) (string-match-p p remote-name)) + cj/mcp--read-name-patterns) nil) + (t t)))) ; unknown → confirm +#+end_src + +This is the second half of the safety story; the first half +(=gptel-confirm-tool-calls 'auto=) is enforced by ai-mcp.el at +load time. + +** Description normalization + +mcp-side descriptions are written for an arbitrary client and +vary in quality. The registration pipeline prefixes a stable +"server / write-risk" header so the agent and the user have +consistent context: + +#+begin_src emacs-lisp +(defun cj/mcp--normalize-description (server-name raw-tool) + "Return a normalized description string for RAW-TOOL. +Prefix: [SERVER] for reads, [SERVER WRITE] for writes, +[SERVER ?] for unknown classification. Then the upstream +description, unchanged." + (let* ((remote-name (plist-get raw-tool :name)) + (upstream (or (plist-get raw-tool :description) + "(no description provided by server)")) + (cls (cond + ((seq-some (lambda (p) (string-match-p p remote-name)) + cj/mcp--write-name-patterns) "WRITE") + ((seq-some (lambda (p) (string-match-p p remote-name)) + cj/mcp--read-name-patterns) "") + (t "?")))) + (format "[%s%s] %s" server-name + (if (string-empty-p cls) "" (concat " " cls)) + upstream))) +#+end_src + +Tools in =gptel-menu= show as: + +#+begin_example +[linear WRITE] Create a new Linear issue in a team. +[linear] List issues in a Linear team. +[google-keep ?] Frobnicate a note (unknown classification). +#+end_example + +** Tool deregistration + +Three triggers remove tools from =gptel-tools=: + +- =cj/mcp-stop-all= -- removes every MCP-registered tool, clears + =cj/mcp--registered-tools=, calls =mcp-stop-server= per server. + Local tools untouched. +- =cj/mcp-restart-server= -- removes tools for the named server + before re-registering. +- =cj/mcp-restart-failed= -- deregister + re-register each + =failed= server. + +#+begin_src emacs-lisp +(defun cj/mcp--deregister-server-tools (server-name) + "Remove every GPTel tool this integration registered for SERVER-NAME. +Local tools (those not in `cj/mcp--registered-tools') are +preserved." + (let ((mcp-tools (gethash server-name cj/mcp--registered-tools))) + (dolist (tool-name mcp-tools) + (setq gptel-tools + (cl-remove-if (lambda (tool) + (string= (gptel-tool-name tool) tool-name)) + gptel-tools))) + (remhash server-name cj/mcp--registered-tools))) +#+end_src + +** Secrets redaction + +=cj/mcp--redact= masks every secret-bearing field before any +string surfaces in messages, errors, status buffers, hub +displays, or test fixtures. Used by status formatting, audit +buffer, failure surfacing, and error wrappers. Tests assert +sentinel =REDACTED_TEST_SECRET= never appears in any user-facing +output. + +** Process cleanup + +=kill-emacs-hook= gets one entry: =cj/mcp-stop-all=. Stdio +process sentinels record abnormal exits into the per-server +status. + +*Local-only constraint.* MCP server processes are always spawned +under the local Emacs's =default-directory='s root. TRAMP / +remote buffers do not relocate the spawn; MCP processes are +local-only. This is enforced implicitly (mcp-hub-start-all-server +uses =make-process= which is local) and documented in the +commentary. + +** Privacy: external tool output in saved conversations + +MCP tool results land in the GPTel buffer. GPTel's autosave (when +enabled) persists those results to +=~/.emacs.d/ai-conversations/=. Concrete implications: + +- A Slack channel excerpt the agent fetched is now on disk. +- A Google Docs snippet the agent quoted is now on disk. +- A Linear issue body the agent read is now on disk. + +This is normal external-tool behavior, but users may not realize +it. Two mitigations: + +- =ai-mcp.el='s commentary explicitly documents the autosave + privacy implication. +- The audit buffer (=cj/mcp-list-tools=) includes a header note: + "Tool results land in =gptel-tools= responses; saved + conversations persist them. Use =cj/gptel-autosave-toggle= per + buffer to opt out." + +A future enhancement (V1.5+) could mark MCP-sourced tool output +with a visible delimiter in the GPTel buffer so the user sees +"this came from an external server" during the chat. Not v1. + +** Per-server auth matrix + +| =:auth= value | Servers | How it works | Recovery if failed | +|---------------+---------+--------------+--------------------| +| =in-protocol= | linear, notion | mcp.el HTTP transport handles OAuth handshake | Open URL surfaced in =cj/mcp-status= | +| =local= | slack-deepsat | Local SSE; no auth but proxy must run | Start the local proxy | +| =none= | drawio | No auth | n/a | +| =args-token= | figma | API key in process args | Update Claude config, restart | +| =oauth= | google-calendar, google-docs-* | OAuth token in env; refresh out-of-band | Re-auth via Claude Code, restart | +| =token= | google-keep | Long-lived token in env | Regenerate, update Claude config, restart | + +Recovery surfaces via pattern matching on errors: + +#+begin_src emacs-lisp +(defconst cj/mcp--recovery-patterns + '(("\\(401\\|unauthorized\\|token expired\\)" + . "Token expired -- re-auth via Claude Code, then C-; a C r SERVER") + ("\\(connection refused\\|ECONNREFUSED\\)" + . "Local endpoint unreachable -- check the upstream service is running") + ("\\(ENOENT\\|command not found\\)" + . "Missing dependency -- run `cj/mcp-doctor' to diagnose"))) +#+end_src + +** Timeouts + +| Timeout | Variable | Default | Behavior | +|---------+----------+---------+----------| +| Startup / tool discovery | =cj/mcp-startup-timeout= | 30 s | Server marked =failed/timeout=; integration continues with ready servers | +| Per-call tool execution | =cj/mcp-tool-timeout= | 60 s | Tool call returns timeout string to agent via the timer-race wrapper; server stays connected | + +Both via defcustoms. Per-tool override possible via +=cj/mcp-tool-timeout-overrides= alist. + +* Status UX + +** Echo-area summary: =cj/mcp-status= + +Single-line summary keyed off =cj/mcp--state=: + +| State | Echo | +|-------+------| +| idle | =MCP: not started. (C-; a t triggers it.)= | +| starting | =MCP: starting (3/9 ready)...= | +| partial | =MCP: 7/9 ready (failed: google-calendar, slack-deepsat).= | +| ready | =MCP: 9/9 ready, 87 tools registered.= | +| failed | =MCP: all 9 servers failed. C-; a C d to diagnose.= | + +** Wait until ready: =cj/mcp-wait-until-ready= + +For the case where the agent asks for a Calendar tool immediately +after =cj/toggle-gptel=: + +#+begin_src emacs-lisp +(defun cj/mcp-wait-until-ready (&optional timeout) + "Block until MCP startup completes or TIMEOUT seconds pass. +TIMEOUT defaults to `cj/mcp-startup-timeout'. Returns the final +state symbol (`ready', `partial', `failed', `starting')." + (interactive) + ...) +#+end_src + +Bound to =C-; a C w=. Reports progress every second via +=message= so the user sees countdown. + +** Audit buffer: =cj/mcp-list-tools= + +Tabulated-list buffer. Failed servers appear at the top with a +red face so they're visually obvious. Each row: + +#+begin_example +Server State Tools Confirm Description +-------------------- -------- ----- ------- ------------------------------ +google-calendar FAILED 0 - Token expired; see status +slack-deepsat FAILED 0 - Local proxy unreachable +-------------------- -------- ----- ------- ------------------------------ +linear/list_issues ready - no List issues in a Linear team +linear/create_issue ready - YES Create a new Linear issue +linear/... ready 44 total +... +#+end_example + +Sort: failed servers first, then by server name, then by tool +name. Keys: =g= refresh, =RET= jump to tool's category in +gptel-menu, =r= restart server under point, =c= toggle confirm +override for tool under point. + +* Commands & Keymap + +=C-; a C= becomes the MCP (Connect) subprefix. Existing keys are +preserved: =M= keeps =gptel-menu=, =m= keeps +=cj/gptel-change-model=. + +| Key | Command | Purpose | +|-----+---------+---------| +| =C-; a C h= | =cj/mcp-hub= | Open server-management buffer | +| =C-; a C s= | =cj/mcp-status= | Echo state summary | +| =C-; a C l= | =cj/mcp-list-tools= | Open audit buffer | +| =C-; a C r= | =cj/mcp-restart-failed= | Restart failed servers | +| =C-; a C R= | =cj/mcp-restart-server= | Restart a named server | +| =C-; a C S= | =cj/mcp-stop-all= | Stop everything | +| =C-; a C d= | =cj/mcp-doctor= | Diagnose prerequisites | +| =C-; a C w= | =cj/mcp-wait-until-ready= | Block until ready | + +which-key labels mirror the table. + +** cj/mcp-doctor + +Diagnostic command. Two modes: + +- *Static* (default) -- no side effects, no network: capability + check, =npx=/=uvx= on PATH, Claude config parseability, + per-server env-var presence, local endpoint reachability. +- *Live* (=C-u C-; a C d=) -- opt-in: invokes a single safe read + per auth class to verify OAuth tokens haven't silently expired. + For example, =gh_search= against linear is one tool call; same + for notion, google-calendar, google-docs, google-keep. Static + checks first; live probe only fires if static passes. + +Output buffer keys: + +- =c= copy diagnostic summary to kill ring (for pasting into + bug reports / notes). +- =r= rerun all failed checks. +- =q= quit. + +Each check row formats as: =PASS / FAIL / WARN CHECK RECOVERY=. + +* Implementation Plan + +Eight phases (was seven in rev 2; added Phase 1.5 for the +confirmation-contract setup). Each ends with green ERT tests + +a manual smoke test before the next. + +** Phase 1 -- Module + pure helpers + +Create =modules/ai-mcp.el=. Implement: =cj/mcp-server-specs=, +=cj/mcp-enabled-servers=, =cj/mcp-start-on-entry-points=, +=cj/mcp--read-claude-config=, =cj/mcp--get-env=, +=cj/mcp--build-server-alist= (pure transformer; filters by +=cj/mcp-enabled-servers=), =cj/mcp--redact=, +=cj/mcp--confirm-p=, =cj/mcp--normalize-description=. + +Tests cover all of the above against fixtures. + +** Phase 1.5 -- Confirmation contract + +Flip =gptel-confirm-tool-calls= to ='auto= in =ai-mcp.el='s +setup. Remove the =(setq gptel-confirm-tool-calls nil)= line +from =ai-config.el=. Audit existing local tools and add +=:confirm t= to any that should be gated (=web_fetch= +guaranteed; =write_text_file=, =update_text_file=, +=move_to_trash= per Craig's decision). + +Verification test in =tests/test-ai-mcp-confirm-contract.el= +asserts the setting, the local-tool gating behavior, and that +=git_log= (=:confirm nil=) still runs without prompting. + +** Phase 2 -- Compat layer + tool registration with fake inventory + +Add =ai-mcp-compat= helpers. Build the registration pipeline +against a stubbed =mcp-server-connections=. Verify: +- Tool name rewriting (=remote-name= stays in closure; + =gptel-name= is =mcp__SERVER__TOOL=). +- =gptel-make-tool= + explicit =add-to-list 'gptel-tools=. +- =:confirm= application per policy + overrides. +- Description normalization adds expected prefix. +- =cj/mcp--registered-tools= bookkeeping. +- Deregister removes from =gptel-tools= without disturbing local + tools (test pre-populates =gptel-tools= with a local tool and + asserts it survives). +- Re-register after deregister doesn't duplicate. +- All tools register with =:async t=. + +** Phase 3 -- Async state machine + timeout wrapper + +Implement =cj/mcp-ensure-started=, =cj/mcp--on-hub-callback=, +=cj/mcp--state= transitions, stall timer, polling. Implement +=cj/mcp--wrap-async-with-timeout=. Stub +=mcp-hub-start-all-server= with synthetic delayed callbacks and +synthetic async errors. + +Verify: +- =cj/mcp-ensure-started= returns in <100 ms regardless of stubs. +- Hub callback triggers status poll + registration. +- Stall timer fires for stuck servers. +- Async error path (=:error-callback= without inited callback) + reaches =cj/mcp--server-status= via polling. +- =cj/mcp--wrap-async-with-timeout=: timer-first ignores late MCP + response; MCP-first cancels timer; both branches deliver + exactly once. + +** Phase 4 -- First real connection (no auth) + +Wire one =drawio= or =slack-deepsat= server. Verify the stubbed +Phase 3 behavior matches real subprocesses. + +** Phase 5 -- Status UX + commands + doctor (static) + +Implement =cj/mcp-status=, =cj/mcp-list-tools= (with failed +servers at top, red face), =cj/mcp-doctor= (static mode only), +=cj/mcp-wait-until-ready=, restart commands, keymap entries, +which-key labels. + +Investigate =gptel-menu= refresh behavior: if the transient is +already open when a new tool registers, does it pick it up on +next invocation? Document and add an acceptance test: +"register new tool while gptel-menu is open; close and reopen; +new tool appears." + +** Phase 6 -- HTTP servers + +Add =linear= and =notion=. Test in-protocol OAuth handshake. +Add live-auth-check mode to doctor. + +** Phase 7 -- Env-dependent stdio servers + +Add =figma=, =google-calendar=, =google-docs-personal=, +=google-docs-work=, =google-keep=. + +** Phase 8 -- Privacy + audit polish + +Add audit buffer's privacy header, autosave commentary, audit-log +hygiene (=cj/mcp-tool-audit-log-enabled= defcustom). Update +=ai-mcp.el= commentary with the code-organization outline as a +TOC. + +* Test Plan + +** Confirmation contract (Phase 1.5) + +- =gptel-confirm-tool-calls= is ='auto= after =ai-mcp= loads. +- Write-classified MCP tool with =:confirm t= triggers confirm + prompt (stub gptel's prompt and assert it fires). +- Read-classified MCP tool with =:confirm nil= does not trigger + prompt. +- Pre-existing =git_log= (=:confirm nil=) does not trigger + prompt. +- =web_fetch= (newly gated with =:confirm t=) triggers prompt. + +** Pure helpers (Phase 1-2) + +- =cj/mcp--read-claude-config=: good fixture, missing, unreadable, + malformed JSON, missing =:mcpServers=, missing server, empty + env, non-string env values. Cache invalidation on mtime + change. +- =cj/mcp--build-server-alist=: each transport, each auth class, + env merge, args splicing for figma, no mutation of + =cj/mcp-server-specs=, filter by =cj/mcp-enabled-servers=. +- =cj/mcp--redact=: bearer tokens, OAuth credentials, + TOKEN/KEY/SECRET/CREDENTIALS-suffixed env vars, URL query + tokens, figma args slot. Sentinel never leaks. +- =cj/mcp--confirm-p=: read patterns, write patterns, unknown → + t, override map wins. +- =cj/mcp--normalize-description=: prefix shape per + classification. + +** Registration pipeline (Phase 2) + +- Single tool registers into all three structures. +- Two tools with same =:name= from different servers don't + collide. +- Re-registration replaces function pointer without duplicating. +- Deregister removes from =gptel-tools= without touching a + pre-populated local tool. +- All tools register with =:async t=. +- Confirm overrides win over patterns. + +** Compat layer (Phase 2) + +- Each =cj/mcp--*-server-*= helper returns expected value against + stub object. +- =cj/mcp--assert-capabilities= signals when a required function + is missing. + +** State machine + timeout (Phase 3) + +- =cj/mcp-ensure-started= from idle returns in <100 ms with + delayed-callback stub. +- Second call from =starting= is no-op. +- Hub callback triggers status poll. +- Stall timer marks slow servers =failed/timeout=. +- Async error path: server emits error callback only (no inited + callback); polling catches it and marks =failed/error= within + stall window. +- =cj/mcp--wrap-async-with-timeout=: + - MCP responds first, before timer: gptel callback fires once + with result; timer canceled. + - Timer fires first: gptel callback fires once with timeout + message; late MCP response is ignored (done flag). + - Error callback: cancels timer, fires once with redacted + error. + +** Async / freeze (Phase 3) + +- Stub =mcp-hub-start-all-server= to delay callbacks 5 s. + =cj/toggle-gptel= returns within 250 ms; buffer is + interactive. +- Stub a server that never responds. After + =cj/mcp-startup-timeout=, server is =failed=, + =cj/mcp--state= is =partial= or =failed=. + +** Entry-point policy (Phase 5) + +- With =cj/mcp-start-on-entry-points '(toggle-gptel)=, calling + =cj/toggle-gptel= triggers startup; calling + =cj/gptel-quick-ask= does not. +- Adding =gptel-quick-ask= to the list makes quick-ask trigger. + +** Local-tool preservation (Phase 2) + +- =cj/mcp-stop-all= removes only MCP-registered tools from + =gptel-tools=; local tools like =git_log= remain. +- =cj/mcp-restart-server= removes only that server's tools; + other MCP servers' tools and local tools both remain. + +** Process / cleanup (Phase 4+) + +- =kill-emacs-hook= calls =cj/mcp-stop-all=; subprocesses exit. +- =cj/mcp-stop-all= clears =gptel-tools= MCP entries and + =cj/mcp--registered-tools=. +- Restart doesn't leak process buffers or duplicate process + objects. +- Process sentinel records abnormal exits into status. + +** Partial availability (Phase 4+) + +- 8 of 9 servers ready, 1 failed: ready tools available; + failed-server tools absent. +- Restart-failed only retries the failed one. + +** Saved-conversation behavior (Phase 7+) + +- After a successful MCP tool call, GPTel autosave (when on) + persists the tool result. Test asserts the saved file + contains the result text and the audit buffer's privacy + header is updated. +- =cj/gptel-autosave-toggle= off → result not saved. + +** Keymap (Phase 5) + +- =C-; a C h/s/l/r/R/S/d/w= all bound after =ai-mcp= loads. +- Existing =C-; a M=, =C-; a m= still bound to + =gptel-menu=, =cj/gptel-change-model=. +- which-key labels present for every new binding. +- No duplicate labels. + +** =gptel-menu= refresh (Phase 5) + +- Register new tool while a previously-opened gptel-menu is + closed; reopen; new tool appears. Document whether mid-open + refresh works. + +** No-real-process rule + +All tests in =tests/test-ai-mcp*= use stubs for =process-file=, +=make-process=, =mcp-hub-start-all-server=, +=mcp-server-connections=, =mcp--tools=, =mcp--status=. No real +=npx=, no network, no real =~/.claude.json=. + +** Manual test matrix + +| Scenario | Expected | +|----------+----------| +| No =~/.claude.json= | Doctor warns; env-free servers still start | +| Malformed Claude config | Status shows =malformed-json=; integration =failed= cleanly | +| Network offline | HTTP servers fail; stdio servers start; status =partial= | +| =npx= not on PATH | Doctor flags it; stdio servers fail with clear message | +| One stdio server exits immediately | Sentinel records failure; others continue | +| slack-deepsat endpoint down | Server =failed=; recovery message points at local proxy | +| Google token expired | Server starts; tool calls fail; live-auth check (=C-u doctor=) surfaces it | +| All servers available | =MCP: 9/9 ready, ~N tools registered= | +| =cj/mcp-restart-failed= after fix | Only retried servers transition | +| =cj/mcp-stop-all= then call a tool | Tool absent from =gptel-tools= | +| Disable a server via defcustom | Doctor and status reflect the absence | +| TRAMP buffer open + =cj/toggle-gptel= | MCP starts locally; no remote spawn | + +* Acceptance Criteria + +1. *No freeze.* =cj/toggle-gptel= returns in <250 ms with mcp.el + wired and nine real servers starting in background. +2. *Incremental registration.* As each server reports tools, + =gptel-tools= updates; in-flight =gptel-send= calls see + newly-added tools on the next request. +3. *No MCP failure breaks ordinary GPTel chat.* With every MCP + server failing, =cj/toggle-gptel= still opens a usable chat + buffer; non-tool prompts work normally; local tools (git_log + etc.) still callable. +4. *Confirm gate works.* After =ai-mcp= loads, a write-classified + MCP tool actually prompts before invocation. Verified by a + test that fails if =mcp-async-call-tool= is invoked before + gptel's confirm-prompt stub fires. +5. *Local-tool preservation.* =cj/mcp-stop-all= and per-server + restart remove only MCP-owned tools. +6. *Partial availability.* With one failed server, status is + =partial=, ready servers' tools work, failed server's tools + absent. +7. *Idempotent restart.* Calling =cj/mcp-restart-failed= twice + with no intervening change produces identical state. +8. *No secret leakage.* Grep every user-facing output for + sentinel fixture secrets; zero matches. +9. *Doctor coverage (static).* Identifies each diagnosable + failure in the manual test matrix. +10. *Server enablement.* Setting =cj/mcp-enabled-servers= to a + subset starts only those servers; doctor reports the disabled + ones as expected-absent. + +* Risks + +** R1 -- mcp.el API drift behind compat layer + +Even with the compat layer, an upstream rename could break us if +the capability check misses it. + +*Mitigation:* compat helpers document the upstream commit and +file location. Tests cover each helper against stub objects; +when mcp.el bumps, run those tests first. + +** R2 -- Cold-start latency for nine subprocesses + +Nine =npx -y= invocations cold-start over several seconds. Time +to =ready= state may be 10-30 seconds on a cold machine. + +*Mitigation:* async model means user doesn't wait. Tools arrive +incrementally; status indicator shows progress. +=cj/mcp-wait-until-ready= for the rare case where the agent needs +a specific tool immediately. + +** R3 -- OAuth token expiry surfaces silently + +A Google server starts cleanly but every tool call fails with +auth errors. + +*Mitigation:* the OAuth recovery pattern matcher inspects every +tool-call error. Live-auth-check mode in doctor proactively +calls one safe read per auth class. + +** R4 -- Tool count balloons gptel-menu + +Up to 100+ tools. Even with category grouping, the transient +menu is large. + +*Mitigation:* =cj/mcp-enabled-servers= lets users disable +servers they don't need. Audit buffer is the alternate browser. +Per-conversation profiles deferred to v1.5. + +** R5 -- =~/.claude.json= schema change + +Parser breaks if Anthropic restructures the file. + +*Mitigation:* =cj/mcp--read-claude-config= returns structured +errors that surface in status and doctor. Integration degrades +to "env-free servers work, env-dependent servers fail" rather +than crashing. + +** R6 -- Process argument leakage (figma) + +figma's API key is in process args -- visible via =ps=, +=/proc/PID/cmdline=, =list-system-processes=. + +*Mitigation:* accepted risk (the figma package only supports +args-token). =cj/mcp--redact= ensures the key never appears in +Emacs-side output. Commentary flags this. + +** R7 -- Confirmation fatigue from unknown-classification tools + +Default for unknown is =:confirm t=. A server with many tools +matching neither read nor write pattern produces many prompts. + +*Mitigation:* audit buffer surfaces unknown classifications with +their confirm state. =cj/mcp-tool-confirm-overrides= alist lets +the user pin specific tools to =nil= once vetted. A "review +unknowns" doctor pass could enumerate them on demand (v1.5). + +** R8 -- Subprocess accumulation across sessions + +If =kill-emacs-hook= is bypassed (kill -9, crash), subprocesses +persist. + +*Mitigation:* =cj/mcp-doctor= can detect orphaned mcp processes +via =list-system-processes= (v1.5 enhancement). + +* Open Questions + +** Q1 -- Should =gptel-menu= refresh after mid-call tool registration? + +Investigation during Phase 5. If gptel-menu's transient caches +=gptel-tools= at open time, mid-call additions won't appear +until close+reopen. Document the behavior; if it's a real +pain, file a gptel upstream issue. + +** Q2 -- Should write-confirmation overrides be per-host? + +A v1.5 question: when per-conversation profiles land, the +override alist could also be scoped (e.g., "in /work/ folder, +auto-confirm google-docs-work writes"). Out of v1 scope. + +** Q3 -- Auth-source migration path for OAuth tokens + +Three candidate paths (Elisp OAuth client / Claude Code refresh +script / timer-based refresh) all have meaningful complexity. +Until one is viable, =~/.claude.json= stays the source. + +** Q4 -- Live-auth check cadence + +Doctor's live-auth mode is opt-in. Should there be a periodic +auto-check (every N hours via timer) that catches expiry between +explicit doctor runs? Adds complexity; defer until usage shows +need. + +* Considered Alternatives + +** =gptel-mcp.el= (declined; cited as prior art) + +[[https://github.com/lizqwerscott/gptel-mcp.el][lizqwerscott/gptel-mcp.el]] is a 96-line wrapper from the same +author as mcp.el that bridges mcp.el's tool inventory into gptel. +It exposes five functions: + +- =gptel-mcp-register-tool= -- walks =mcp-hub-get-all-tool= and + calls =gptel-make-tool= on each plist. +- =gptel-mcp-activate-all-tool= -- pushes tools into + =gptel-tools=. +- =gptel-mcp-deactivate-all-tool= -- removes them by category + + name lookup. +- =gptel-mcp-start-all-server-and-register= -- chains + =mcp-hub-start-all-server= with the register callback. +- =gptel-mcp-dispatch= -- a transient menu with three keys (=s= + start, =A= activate, =C= deactivate). + +*Why this matters for the spec.* The package independently +validates the integration shape this spec converged on: +=mcp-hub-start-all-server= → walk connections → =gptel-make-tool= +→ =add-to-list gptel-tools= is the canonical path. + +*Why we are not adopting it.* The package solves the trivial +"wire tools through" problem but skips every concern this spec +exists to address: + +| Concern | Spec | gptel-mcp.el | +|---------+------+--------------| +| GPTel confirmation contract (=gptel-confirm-tool-calls 'auto=) | Required precondition | Not addressed | +| Tool-name collisions | Rewrites to =mcp__SERVER__TOOL= | Silent overwrite | +| Confirm-on-write policy | Per-tool =:confirm t= for writes | All tools register with default | +| Async startup contract | =cj/mcp-ensure-started= returns in <100 ms | Synchronous-feel start | +| Async per-call timeout | Explicit timer/callback race | None | +| State machine | =cj/mcp--state= + per-server status | None | +| Server identity strategy | Walks =mcp-server-connections= directly | Uses =mcp-hub-get-all-tool= + parses =mcp-SERVER= | +| Secrets handling | Reads env from =~/.claude.json= with mtime cache | None | +| Deregistration tracking | =cj/mcp--registered-tools= hash, preserves local tools | Removes by category, no local-tool guarantee | +| Server enablement | =cj/mcp-enabled-servers= defcustom | None | +| Entry-point scoping | =cj/mcp-start-on-entry-points= defcustom | Manual via dispatch menu | +| Status UX | =cj/mcp-status=, =cj/mcp-list-tools= audit buffer | Just dispatch menu | +| OAuth recovery | Pattern matcher with per-auth-class recovery | None | +| Secrets redaction | =cj/mcp--redact= applied everywhere | None | +| mcp.el compat layer | Isolated wrappers around private API | Direct =mcp--*= access scattered | +| Tests | 10 acceptance criteria + manual matrix + no-real-process | None | +| Doctor / live-auth check | Static + live-probe diagnostic | None | + +Adopting it would force shipping safely or wrapping with +everything specified above (two layers, no code reduction). + +*What we are taking from it.* Confidence the API path is right. +The transient-dispatch UX was considered for the keymap (rev 2), +but the keymap is now pinned to discrete commands under =C-; a +C= so existing GPTel keys aren't disturbed (rev 3). + +* References + +- [[file:../../modules/ai-config.el][modules/ai-config.el]] -- =gptel-confirm-tool-calls nil= at + line 386 (removed by this integration); loader at lines 71-96. +- [[file:gptel-tools-shortlist.org][gptel-tools-shortlist.org]] -- local-tools shortlist; MCP servers + slot in as the "external" tier. +- [[file:gptel-agentic-tool-ideas.org][gptel-agentic-tool-ideas.org]] -- broader agentic-tool design. +- [[file:gptel-gh-tool.org][gptel-gh-tool.org]] -- sibling design; same confirm-on-write + pattern. +- [[https://github.com/lizqwerscott/mcp.el][lizqwerscott/mcp.el]] -- upstream. +- [[https://github.com/lizqwerscott/gptel-mcp.el][lizqwerscott/gptel-mcp.el]] -- considered and declined; see § + Considered Alternatives. +- =~/code/mcp.el/mcp-hub.el:53-90,131-160= -- verified callback + ownership and start-all helper. +- =elpa/gptel-0.9.8.5/gptel.el:1595-1607,2244-2245= -- verified + =gptel-confirm-tool-calls= semantics and tool-confirm gate. +- =elpa/gptel-0.9.8.5/gptel.el:1729-1820= -- verified + =gptel-make-tool= registration. +- =~/.claude.json= -- Claude Code config. |
