diff options
Diffstat (limited to 'docs/specs')
| -rw-r--r-- | docs/specs/gptel-gh-tool-spec.org | 1065 | ||||
| -rw-r--r-- | docs/specs/gptel-git-tools-magit-backend-spec.org | 196 | ||||
| -rw-r--r-- | docs/specs/gptel-network-tools-spec.org | 411 | ||||
| -rw-r--r-- | docs/specs/mcp-el-gptel-integration-spec-doing.org | 1438 |
4 files changed, 0 insertions, 3110 deletions
diff --git a/docs/specs/gptel-gh-tool-spec.org b/docs/specs/gptel-gh-tool-spec.org deleted file mode 100644 index 80ecc0ab6..000000000 --- a/docs/specs/gptel-gh-tool-spec.org +++ /dev/null @@ -1,1065 +0,0 @@ -:PROPERTIES: -:ID: a124dd0f-1f40-4533-aeb8-595d93e20865 -:STATUS: not-started -:END: -#+TITLE: Design: Wrap the gh CLI as a GPTel tool -#+AUTHOR: Craig Jennings -#+DATE: 2026-05-16 -#+OPTIONS: toc:nil num:nil - -* Status - -Draft (revision 2). Pre-implementation; no code shipped yet. =gh= -v2.92.0 installed and authenticated against both =github.com= (account -=cjennings=) and =deepsat.ghe.com= (account =craig-jennings=). - -Revision 2 incorporates a code-review pass that caught several -incorrect =gh= CLI assumptions and a critical safety gap: with -=gptel-confirm-tool-calls= set to =nil= in =modules/ai-config.el:386=, -the "irreversible-only blocklist" V1 from revision 1 would let the -agent run unconfirmed writes (=pr merge=, =run cancel=, =api -f -body=...=, =release create=, etc.). Revision 2 keeps Craig's intent -of a general-capability tool but applies =:confirm t= to every -classified write, drops the "scan command string for =--hostname= -and =-H=" approach in favor of argv-list builders, and expands the -wrapper set so most agent workflows don't need to fall back to the -general tool. - -* Problem - -GPTel agents can read git history locally (=git_log=, =git_diff=, -=git_status=) but have no access to the GitHub side of the workflow: -PRs, issues, reviews, CI runs, releases, gists, repo metadata. The -=gh= CLI does all of this and is already authenticated against both -hosts the user works with. Wrapping it as a GPTel tool gives the -agent the same GitHub surface the user has. - -The wrapper has to handle four complications the local git tools -don't: - -1. *Two authenticated hosts* -- personal (=github.com=) and work - GHE (=deepsat.ghe.com=) -- with different policies and different - blast radii for destructive operations. -2. *A vast subcommand surface* (~30 top-level subcommands, hundreds - of leaves) that doesn't fit cleanly into one typed schema per - leaf. -3. *Inconsistent flag semantics across subcommands.* =--hostname= - exists only on =gh api= and =gh auth status=; the rest use - =--repo [HOST/]OWNER/REPO=. =-H= means =--head= on =gh pr list= - but =--header= on =gh api=. Naive string-scanning to detect - user intent is wrong. -4. *Global GPTel setting* (=gptel-confirm-tool-calls nil=) means - the agent can call any registered tool without user confirmation - unless that specific tool is registered with =:confirm t=. V1 - safety must come from per-tool flags, not from a session-level - gate. - -* Goals - -1. The agent can invoke the non-interactive subset of =gh= against - either host, gated by a safety policy. -2. High-frequency reads have typed wrappers with sensible defaults - (host/repo resolution, JSON-by-default with minimal fields, - output cap, timeout). Wrappers auto-execute. -3. The active host is resolved from a single context object that - considers (in order): explicit =hostname=/=repo= args, branch - upstream remote, =origin= remote, =cj/gh-default-host=. Every - tool response prefixes the resolved host/repo so cross-host - mistakes are visible. -4. Writes via the general tool execute only after user - confirmation (=:confirm t= on the gptel tool registration). - Unknown classifications fail closed (=:confirm t=). - Irreversible commands hard-block. -5. Interactive commands (=auth login=, =codespace ssh=, =--web=, - =browse=, =pr checkout=, =repo clone=, =config set=, - =alias set=, =extension exec=) are hard-blocked in V1. -6. File exfiltration paths (=api --input=, =api -F key=@file=, - =release upload=, =gist create=) are hard-blocked in V1. -7. Output is capped *during capture*, not after, so a runaway - =--paginate= or =--log= can't fill memory or block Emacs. - Long-output commands have higher timeouts but the same byte - cap. -8. Every call produces a structured debug record (host, repo, cwd, - sanitized argv, classification, policy decision, exit code, - duration, bytes captured, truncation flag, error kind) inspectable - via =cj/gh-tool-last-error=. -9. =cj/gh-doctor= diagnoses missing prerequisites (=gh= - executable, version floor, auth per host, cwd repo detection, - environment overrides) before they fail at runtime. -10. V2 adds true per-host policy (currently uniform across both - hosts), profile-based tool subsets, and bridge helpers between - local git tools and the gh wrappers. - -* Non-Goals - -- *Interactive subcommands.* =gh auth login=, =gh codespace ssh=, - =gh pr checkout= (modifies working tree), =gh repo clone= - (writes outside controlled paths) are not in scope. The user - runs those in a real terminal. -- *Replicating gh extensions.* Core gh only. Extensions can be - invoked from a regular terminal if needed. -- *Streaming long-running output.* =gh run watch=, =gh - --paginate= for unbounded result sets, =gh repo clone= for large - repos. V1 blocks =--paginate= and =watch= verbs; future - versions may add streaming support. -- *Per-host write policies* (work-read-only / personal-full-write). - V1 applies the same write-confirmation policy to both hosts. - V2 makes the policy host-keyed. -- *Conversation-context injection* and *local/remote bridge - helpers* (current branch → PR, changed files → PR comments). - Future enhancements; tracked as a follow-up. -- *Artifact download* (=gh run download=). Writes files; needs - its own confirmation flow. Deferred to V2. - -* Verified gh CLI Contracts - -These were checked against =gh --version 2.92.0= help output before -the spec was revised. Implementation can rely on them. - -** Host / repo selection per subcommand - -- =gh api= and =gh auth status=: use =--hostname HOST= (long form - only; no short form). -- =gh pr {view,list,diff,checks}=, =gh issue {view,list}=, - =gh run {view,list}=, =gh release *=, =gh search *=: use - =--repo [HOST/]OWNER/REPO= (short =-R=). -- =gh repo view= and similar root-level commands: infer from cwd - remote; no =--hostname= or =--repo= flag. -- Environment variables: =GH_HOST= and =GH_REPO= work for commands - that lack explicit flags. - -** Flags with overloaded short forms - -- =gh pr list -H FOO= → =--head FOO= (branch). -- =gh search prs -H FOO= → =--head FOO= (branch). -- =gh api -H KEY:VAL= → =--header KEY:VAL=. - -Substring scanning =-H= as "hostname" is wrong on every command -that uses it. - -** =gh api= method classification - -The help text explicitly says: "adding request parameters will -automatically switch the method to POST". This means: - -- =gh api repos/x/y/issues= → GET (read). -- =gh api repos/x/y/issues -f title=Foo= → POST (write). -- =gh api repos/x/y/issues -F body=@file= → POST (write + - exfiltration). -- =gh api --method GET repos/x/y/issues -f q=x= → GET (read, - explicit override). - -Classifier must inspect for =-f=, =-F=, =--field=, =--raw-field=, -=--input=, and =--method= before defaulting to GET. - -** Wrappers without =--json= support - -- =gh pr diff= produces patch output; no =--json= flag. Wrapper - is text-only. Structured file metadata available via - =gh pr view --json files= (new wrapper =gh_pr_files=). - -** Environment variables for non-interactive use - -All confirmed via =gh help environment=: - -- =GH_PROMPT_DISABLED= -- disable interactive prompting. -- =GH_PAGER= / =PAGER= -- set to =cat= so no pager fires. -- =GH_NO_UPDATE_NOTIFIER= -- silence the "new version" banner. -- =GH_NO_EXTENSION_UPDATE_NOTIFIER= -- same for extensions. -- =GH_SPINNER_DISABLED= -- silence the progress spinner. -- =NO_COLOR= -- disable ANSI color (cleaner LLM context). - -V1 subprocess env always includes all of the above. - -** Authentication - -#+begin_example -✓ Logged in to github.com account cjennings (keyring) -✓ Logged in to deepsat.ghe.com account craig-jennings (keyring) -#+end_example - -Both stored in the system keyring. =gh= reads the keyring per -call; the wrapper does not handle tokens. Expired auth surfaces -as exit code 4 with a clear stderr message; recovery is interactive -(=gh auth login --hostname HOST= in a terminal). - -* Current State - -** =modules/ai-config.el= - -- =gptel-confirm-tool-calls nil= at line 386. Per-tool - =:confirm t= is the only confirmation mechanism in V1. -- =cj/gptel-load-local-tools= (lines 71-96) loads tools from - =gptel-tools/=. Add new gh feature names to - =cj/gptel-local-tool-features=. - -** =gptel-tools/= - -Ten tools today. =git_log.el= is the closest analogue: validates -input, runs subprocess with =process-file=, caps output, signals -clear errors. The gh wrappers follow the same shape with shared -helpers in =gh-common.el=. - -* Design - -** File layout - -Two files plus per-wrapper registrations. Wrappers are -deliberately small (argv builders + JSON-field defaults) so they -all live in one file: - -- =gptel-tools/gh-common.el= -- shared helpers: host/repo context, - argv runner, subprocess env, classifier, blocklist, redaction, - debug record, last-error buffer. No tool registrations. -- =gptel-tools/gh.el= -- the general tool + every wrapper - registration. Pulls in =gh-common= via =require=. - -This matches Craig's preference for low loader surface (per the -MCP revision discussion). If =gh.el= grows past ~600 lines we -split per-resource later (=gh-pr.el=, =gh-issue.el=, etc.). - -** Host / repo resolution - -A single helper returns a structured context object used by every -tool entry point: - -#+begin_src emacs-lisp -(defun cj/gh--resolve-context (cwd hostname repo) - "Return a context plist for a gh invocation. -Resolution order: -1. Explicit REPO argument (`[HOST/]OWNER/REPO'). If it contains - `HOST/', split off the host. -2. Explicit HOSTNAME argument. -3. CWD's branch upstream remote URL (`git rev-parse --abbrev-ref - @{u}' then `git remote get-url REMOTE'). -4. CWD's `origin' remote URL. -5. `cj/gh-default-host'. - -Returns: - (:host HOST :owner OWNER :repo REPO :source SYM) -where SOURCE is one of: explicit-repo, explicit-host, upstream, -origin, default." - ...) -#+end_src - -URL parsing handles all three forms produced by git: - -- =git@HOST:OWNER/REPO.git= -- =ssh://git@HOST/OWNER/REPO.git= -- =https://HOST/OWNER/REPO.git= - -Returns =nil host= if the URL doesn't match a known host; the -caller uses =cj/gh-default-host= in that case. - -#+begin_src emacs-lisp -(defcustom cj/gh-default-host "github.com" - "Host used when no other resolution path yields one." - :type 'string - :group 'cj) - -(defcustom cj/gh-known-hosts '("github.com" "deepsat.ghe.com") - "Hosts the gh CLI is authenticated against." - :type '(repeat string) - :group 'cj) -#+end_src - -Context appears in every tool response as a one-line prefix: - -#+begin_example -[gh github.com/cjennings/dotemacs read ok 12.4KB] ... -[gh deepsat.ghe.com/org/repo write confirmed 0.8KB] ... -[gh github.com api unknown blocked] ... -#+end_example - -This makes cross-host accidents visible. - -** Argv-list builders, never command strings - -Every tool builds an explicit argv list and hands it to a shared -runner. Wrappers expose typed parameters that map directly into -argv positions: - -#+begin_src emacs-lisp -(defun cj/gh-pr-view--build-argv (context number include-comments fields) - "Return an argv list for `gh pr view NUMBER ...'." - (let ((repo-arg (format "%s/%s/%s" - (plist-get context :host) - (plist-get context :owner) - (plist-get context :repo)))) - (append (list "pr" "view" (number-to-string number) - "--repo" repo-arg) - (when include-comments '("--comments")) - (when fields (list "--json" fields))))) -#+end_src - -The general tool takes an =args= parameter typed as array-of-string -(not a shell string) so the agent constructs argv directly: - -#+begin_src emacs-lisp -;; agent passes: ["pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo"] -;; runner runs: gh pr view 171 --repo deepsat.ghe.com/org/repo -#+end_src - -If the agent accidentally includes leading =gh= in args[0], the -runner strips it and logs the normalization. - -** Subprocess contract - -One runner for everything: - -#+begin_src emacs-lisp -(defun cj/gh--run (argv &key timeout context) - "Run gh with ARGV (a list of strings). -Returns a plist: - (:exit-code N :stdout STR :stderr STR :truncated BOOL :duration-ms N - :argv ARGV :context CONTEXT) -Enforces TIMEOUT (default `cj/gh--default-timeout'). -Captures output in-flight, killing the process at -`cj/gh--max-bytes' to prevent runaway memory use." - ...) -#+end_src - -Contract: - -- *Env vars set every call:* =GH_PROMPT_DISABLED=1=, =GH_PAGER=cat=, - =PAGER=cat=, =NO_COLOR=1=, =GH_NO_UPDATE_NOTIFIER=1=, - =GH_NO_EXTENSION_UPDATE_NOTIFIER=1=, =GH_SPINNER_DISABLED=1=. -- *Timeout:* default 20 s for reads, 60 s for diffs/logs/search. - Per-tool override allowed but capped at 120 s. Timeout kills - the process and returns =:error-kind 'timeout=. -- *Output cap during capture:* a process filter accumulates bytes - up to =cj/gh--max-bytes= (64 KB). At the cap, the filter sets - =truncated=, ignores further output, and sends SIGTERM after a - short grace. Truncation marker appended to returned stdout. -- *TRAMP rejection:* if =(file-remote-p default-directory)= is - non-nil, return =:error-kind 'remote-cwd= without invoking gh. -- *Executable check:* if =(executable-find cj/gh--executable)= is - nil, return =:error-kind 'no-executable= with install - instructions. -- *Version check:* the first call per session verifies - =gh --version= meets =cj/gh--min-version= (default "2.50.0"); - cached for the session. - -** Read / write / destructive classifier - -Used to apply =:confirm t= and the irreversible blocklist: - -#+begin_src emacs-lisp -(defconst cj/gh--read-verbs - '("view" "list" "status" "search" "diff" "checks" "describe" - "show" "logs" "auth-status")) - -(defconst cj/gh--write-verbs - '("create" "edit" "merge" "close" "reopen" "comment" "review" - "upload" "set" "add" "remove" "rerun" "cancel" "delete" - "fork" "archive" "unarchive" "lock" "unlock" "pin" "unpin" - "ready" "draft" "rename" "transfer" "approve" "label" "assign")) - -(defconst cj/gh--blocked-verbs - '(;; interactive / opens UI - "login" "logout" "checkout" "clone" "ssh" "code" "edit-prompt" - ;; modifies user config - "alias" "config" "extension" - ;; opens browser - "browse")) - -(defconst cj/gh--blocked-flags - '("--web" ; many commands - "--paginate" ; can produce unbounded output - "--input" ; gh api file upload - "--editor")) ; opens editor - -(defconst cj/gh--irreversible-patterns - '("\\`repo delete\\b" - "\\`release delete\\b" - "\\`secret delete\\b" - "\\`ssh-key delete\\b" - "\\`gpg-key delete\\b" - "\\`org delete\\b" - "\\`project delete\\b" - "\\`variable delete\\b" - "\\`ruleset delete\\b" - "\\`label delete\\b.*--yes")) -#+end_src - -Classifier: - -#+begin_src emacs-lisp -(defun cj/gh--classify (argv) - "Return one of: read, write, destructive, blocked, unknown." - (let* ((stripped (cj/gh--strip-flags argv)) - (resource (car stripped)) - (verb (cadr stripped))) - (cond - ;; Hard blocks first. - ((cj/gh--has-blocked-verb-p argv) 'blocked) - ((cj/gh--has-blocked-flag-p argv) 'blocked) - ((cj/gh--has-file-arg-p argv) 'blocked) ; -F key=@file - ((cj/gh--matches-irreversible-p argv) 'destructive) - ;; gh api is special. - ((string= resource "api") (cj/gh--classify-api argv)) - ;; Verb match. - ((member verb cj/gh--read-verbs) 'read) - ((member verb cj/gh--write-verbs) 'write) - (t 'unknown)))) - -(defun cj/gh--classify-api (argv) - "Classify a `gh api ...' invocation. -Reads: explicit method GET/HEAD with no -f/-F/--input. -Writes: any -f/-F/--field/--raw-field/--input, OR explicit -non-GET/HEAD method. -Default (no method, no field): read (matches gh's GET default)." - (let* ((explicit-method (or (cadr (member "-X" argv)) - (cadr (member "--method" argv)))) - (has-field (cl-some - (lambda (f) (cl-some (lambda (a) (string-prefix-p f a)) - argv)) - '("-f" "--raw-field" "-F" "--field"))) - (has-input (member "--input" argv))) - (cond - (has-input 'blocked) ; file exfiltration - ((and explicit-method - (not (member (upcase explicit-method) '("GET" "HEAD")))) - 'write) - (has-field 'write) ; -f/-F auto-promotes to POST - ((and explicit-method - (member (upcase explicit-method) '("GET" "HEAD"))) - 'read) - (t 'read)))) -#+end_src - -Classifier-driven policy table: - -| Classification | Policy | -|----------------+--------| -| =read= | auto-execute | -| =write= | =:confirm t= on registration; agent's call shows in confirm prompt | -| =destructive= | hard-block; return =:error-kind 'irreversible-blocked= | -| =blocked= | hard-block; return =:error-kind 'policy-blocked= with reason | -| =unknown= | =:confirm t= (fail closed) | - -** Safety policy - -V1 uniform policy applied to both hosts: - -#+begin_src emacs-lisp -(defcustom cj/gh-policy - '((read . auto) - (write . confirm) - (destructive . block) - (blocked . block) - (unknown . confirm)) - "Per-classification policy. V1 applies uniformly to both hosts." - :type '(alist :key-type symbol :value-type symbol) - :group 'cj) -#+end_src - -Since GPTel's =:confirm t= flag is per-tool-registration (not -per-call), the general tool is registered with =:confirm t= -always. The wrappers are registered per their classification: - -| Tool | Registered with | -|------+-----------------| -| Read wrappers (=gh_pr_view=, etc.) | =:confirm nil= | -| =gh= general tool | =:confirm t= | - -The general tool then applies the policy table at invocation -time: reads execute without further prompting (already past -GPTel's confirm because :confirm t fires once); writes show -detail before invocation; destructive/blocked never reach gh. - -** General tool: =gh= - -Single tool registered with =:confirm t= covering everything the -wrappers don't. Schema: - -| Arg | Type | Required | Purpose | -|-----+------+----------+---------| -| =args= | array of string | yes | argv list, e.g. =["pr", "view", "171", "--repo", "deepsat.ghe.com/org/repo"]= | -| =hostname= | string | no | Override host for commands that accept =--hostname= (=api=, =auth status=); ignored otherwise | -| =repo= | string | no | =[HOST/]OWNER/REPO= for repo-scoped commands; if HOST is present in repo, hostname arg is overridden | -| =cwd= | string | no | Working directory; defaults to current buffer; must be under =$HOME=, must not be TRAMP | -| =timeout= | integer | no | Seconds before kill; default 20, max 120 | - -Description (registered) explicitly says: - -#+begin_example -Use this only when no task-specific gh_* tool fits. Prefer -gh_pr_view, gh_pr_list, gh_pr_checks, gh_issue_view, gh_issue_list, -gh_run_view, gh_run_list, gh_run_logs_failed, gh_repo_view, -gh_search_prs, gh_search_issues, gh_api_get. - -Writes (create/edit/merge/etc.) require user confirmation. -Destructive (repo/release/secret delete) are hard-blocked. -Interactive commands (auth login, codespace ssh, --web, browse) -are hard-blocked. File uploads (api --input, -F @file, release -upload, gist create) are hard-blocked. -#+end_example - -** Wrapper inventory - -Twelve wrappers grouped by resource. Each has typed args, JSON -field defaults, output truncation, timeout override, and a -description that names its scope. - -| Tool | gh command | Defaults | Timeout | -|------+------------+----------+---------| -| =gh_repo_view= | =repo view --json= | JSON fields: =name,nameWithOwner,description,defaultBranchRef,url,visibility= | 20s | -| =gh_pr_view= | =pr view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | -| =gh_pr_list= | =pr list --json= | JSON fields: =number,title,state,author,createdAt,headRefName=; default =--limit 30= (capped at 100) | 20s | -| =gh_pr_diff= | =pr diff N --color never= | text only; capped at 64 KB | 60s | -| =gh_pr_checks= | =pr checks N --json= | JSON fields: =name,status,conclusion,startedAt,completedAt,link= | 20s | -| =gh_pr_files= | =pr view N --json files= | JSON fields: =files= (path, additions, deletions, mode) | 20s | -| =gh_pr_current= | =pr view --json= (no number — auto-detect) | Same as =gh_pr_view= | 20s | -| =gh_issue_view= | =issue view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | -| =gh_issue_list= | =issue list --json= | JSON fields: =number,title,state,author,createdAt,labels=; default =--limit 30= | 20s | -| =gh_run_view= | =run view RUN-ID --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch,event,url= | 20s | -| =gh_run_list= | =run list --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch=; default =--limit 20= | 20s | -| =gh_run_logs_failed= | =run view RUN-ID --log-failed= | text only; capped at 64 KB | 60s | -| =gh_search_prs= | =search prs --json= | JSON fields: =number,title,state,author,repository,url=; default =--limit 30= (capped at 100) | 30s | -| =gh_search_issues= | =search issues --json= | Same as PR search | 30s | -| =gh_api_get= | =api ENDPOINT --method GET= | text/JSON pass-through; rejects fields/input args | 30s | - -Common args for every wrapper (unless noted): - -| Arg | Type | Required | Purpose | -|-----+------+----------+---------| -| =repo= | string | no | =[HOST/]OWNER/REPO=; resolved from context otherwise | -| =hostname= | string | no | Override host for context resolution | -| =limit= | integer | no | =--limit= for list/search wrappers; clamped to per-wrapper max | - -The =gh_api_get= wrapper *explicitly* rejects =-f=, =-F=, -=--field=, =--raw-field=, =--input=, =--method=, and any method -override. It only accepts =ENDPOINT= and optional =-H= headers -that don't carry secrets (the runner redacts =Authorization:= and -similar regardless). Writes via API go through the general tool -with confirmation. - -** JSON field defaults - -Per-wrapper defaults are minimal -- enough for the agent to decide -whether to drill in, not so much that one call fills the context. - -For =gh_pr_view= specifically: - -- *Default* (no =fields= override): the small list above (number, - title, state, author, createdAt, url, body-truncated-to-2KB). -- *Override*: agent passes =fields= as a comma-separated string; - wrapper validates against a per-resource allowlist (so the agent - can't request =reviews,comments,files= in one call to bypass the - cap). -- *Include flags*: =include-body t/nil=, =include-comments t/nil=, - =include-reviews t/nil= as boolean args. Each adds the - corresponding JSON field; agent opts in only when needed. - -** Output truncation - -Process filter pattern, not post-hoc cap: - -#+begin_src emacs-lisp -(defun cj/gh--make-filter (state-var) - "Return a process filter that accumulates into STATE-VAR's :stdout, -stops collecting after `cj/gh--max-bytes', sets :truncated, and -sends SIGTERM to the process." - (lambda (proc output) - (let* ((state (symbol-value state-var)) - (current (plist-get state :stdout)) - (current-len (length current)) - (remaining (- cj/gh--max-bytes current-len))) - (cond - ((<= remaining 0) nil) ; already at cap - ((<= (length output) remaining) - (plist-put state :stdout (concat current output))) - (t - (plist-put state :stdout - (concat current (substring output 0 remaining))) - (plist-put state :truncated t) - (ignore-errors (delete-process proc))))))) -#+end_src - -Truncation marker appended before return: - -#+begin_example -[truncated at 64KB; use --limit, narrower fields, or a specific -wrapper to reduce output] -#+end_example - -** Error classification + debug record - -Every call returns (and =cj/gh-tool-last-error= caches) a debug -record: - -#+begin_src emacs-lisp -(defun cj/gh--debug-record (argv context exit-code stdout stderr - duration-ms truncated) - (list :host (plist-get context :host) - :repo (cj/gh--repo-arg context) - :cwd default-directory - :argv (cj/gh--redact-argv argv) - :classification (cj/gh--classify argv) - :policy (cj/gh--policy-decision argv) - :exit-code exit-code - :duration-ms duration-ms - :bytes-captured (length stdout) - :truncated truncated - :error-kind (cj/gh--error-kind exit-code stderr))) -#+end_src - -=:error-kind= mapping: - -| Condition | =:error-kind= | Returned message | -|-----------+---------------+------------------| -| Exit 4 + "authentication required" | =auth= | "Run =gh auth login --hostname HOST= in a terminal." | -| Process killed by timeout timer | =timeout= | "Command exceeded N seconds; narrow the query or use a more specific wrapper." | -| Policy block | =policy-blocked= | "Blocked by V1 policy: REASON." | -| Irreversible match | =irreversible-blocked= | "Hard-blocked irreversible command: COMMAND." | -| Truncated output | =truncated= | "Output truncated at 64KB; reduce scope." | -| TRAMP cwd | =remote-cwd= | "Cannot run gh from remote directory: CWD." | -| Missing executable | =no-executable= | "gh not found at CJ/GH--EXECUTABLE; install via 'pacman -S github-cli' (or equivalent)." | -| Other non-zero exit | =gh-exit= | (raw stderr, redacted) | - -Each error includes a sanitized reproduce line: - -#+begin_example -Reproduce: GH_PROMPT_DISABLED=1 GH_PAGER=cat gh pr view 171 --repo HOST/OWNER/REPO -#+end_example - -(Secrets, body text, file paths redacted via =cj/gh--redact-argv=.) - -** Audit log (V1, opt-out) - -Every call appends one line to -=~/.emacs.d/data/gh-tool-log/YYYY-MM-DD.log=: - -#+begin_example -2026-05-16T14:23:45-0500 host=github.com repo=cjennings/dotemacs class=read policy=auto exit=0 duration=128ms bytes=4321 -2026-05-16T14:24:02-0500 host=deepsat.ghe.com repo=org/repo class=write policy=confirm exit=0 duration=412ms bytes=88 -#+end_example - -Metadata only, not output bodies. Defcustom -=cj/gh-tool-audit-log-enabled= (default =t=). Daily rotation -implicit (one file per day). Cleanup manual. - -** Secrets redaction - -=cj/gh--redact-argv= masks: - -- Anything after =--token=, =--secret=, =--password= flags. -- Authorization headers (=-H "Authorization: ..."=). -- =--figma-api-key=KEY= (in case general tool spawns figma-mcp - somehow). -- Bearer tokens in URLs (=?token=...=). -- Values for =-f=/=-F= keys named like =body=, =text=, - =description= (private content; metadata still logged). - -Applied to: -- All stderr returned to the agent. -- All audit-log lines. -- All debug records. -- The reproduce line on error. - -* Commands & UX - -** =cj/gh-doctor= - -Diagnostic command. No side effects. Checks: - -- =gh= executable found at =cj/gh--executable=. -- =gh --version= meets =cj/gh--min-version=. -- =gh auth status= for each host in =cj/gh-known-hosts=. -- Current buffer's cwd repo detection: resolves to which host/repo? -- Environment overrides effective (=GH_PROMPT_DISABLED= etc. would - be set by the runner). -- Active account per host. -- Warnings if any block-list-relevant env is set externally - (e.g. user already has =GH_PAGER= set to something that pages). - -Output: a buffer with PASS / FAIL / WARN per check + recovery -actions for failures. - -** =cj/gh-tool-last-error= - -Opens a buffer showing the last call's debug record: - -#+begin_example -Host: deepsat.ghe.com -Repo: org/repo -Source: upstream -CWD: ~/projects/work/foo -Argv: ("pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo") -Classification: read -Policy: auto -Exit code: 0 -Duration: 412 ms -Bytes captured: 4321 -Truncated: no -Error kind: none - -Reproduce: - GH_PROMPT_DISABLED=1 GH_PAGER=cat NO_COLOR=1 \ - gh pr view 171 --repo deepsat.ghe.com/org/repo -#+end_example - -** Tool response header - -Every tool result begins with a one-line header so cross-host / -policy decisions are visible: - -#+begin_example -[gh github.com/cjennings/dotemacs read ok 4.3KB] -{ ... } -[gh deepsat.ghe.com/org/repo write confirmed 0.2KB] -{ ... } -[gh github.com/cjennings/dotemacs api blocked policy-blocked] -Error: Hard-blocked file-upload path: --input file. -Reproduce: gh api repos/cjennings/dotemacs/contents/foo --input file -#+end_example - -* Implementation Plan - -Eight phases. Each ends with green ERT tests + manual smoke -before the next. - -** Phase 1 -- Common helpers + context resolver - -=gh-common.el=: =cj/gh--executable=, =cj/gh--available-p=, -=cj/gh--version= (cached), =cj/gh--validate-cwd= (HOME + non-TRAMP), -=cj/gh--parse-remote-url=, =cj/gh--resolve-context=, -=cj/gh--redact-argv=. No subprocess execution yet. - -Tests cover all helpers against fixture remote URLs and synthetic -git directories. No real gh calls. - -** Phase 2 -- Runner with subprocess env + timeout + in-flight cap - -=cj/gh--run=, the process filter, the timer kill path, env-var -setup. Tests stub =make-process= to simulate output / exit / hang -/ truncation paths. - -Acceptance: with stub configured to produce 100 KB of output, -returned stdout is exactly 64 KB plus the truncation marker, and -the process gets SIGTERM. - -** Phase 3 -- Classifier + blocklist + policy - -=cj/gh--classify=, =cj/gh--classify-api=, blocklist constants, -=cj/gh--policy-decision=. Tests for every blocklist pattern -(verbs + flags + file-arg paths), API edge cases (=-f= promotes -to POST, =--method GET -f x=y= stays GET, etc.), and the -unknown-fails-closed contract. - -** Phase 4 -- Read wrappers (5 first) - -=gh_repo_view=, =gh_pr_view=, =gh_pr_list=, =gh_pr_diff=, -=gh_issue_view=. Each is a thin schema + argv builder + delegate -to =cj/gh--run=. Tests verify argv shape for typical args. - -Manual smoke against both hosts. First real gh calls. - -** Phase 5 -- Remaining wrappers + JSON defaults - -Eight more wrappers (=gh_pr_checks=, =gh_pr_files=, -=gh_pr_current=, =gh_issue_list=, =gh_run_view=, =gh_run_list=, -=gh_run_logs_failed=, =gh_search_prs=, =gh_search_issues=, -=gh_api_get=). JSON-field defaults per wrapper. Tests for the -=gh_api_get= flag-rejection contract. - -** Phase 6 -- General tool - -=gh.el= general tool registration with =:confirm t=, blocklist -enforcement, policy decision applied before invocation. Tests -verify confirmation gate (stub gptel's confirm flow), blocked -commands never reach =cj/gh--run=, destructive commands return -=:irreversible-blocked= without prompting. - -** Phase 7 -- UX: doctor, last-error, response header - -=cj/gh-doctor=, =cj/gh-tool-last-error=, response header -formatting. Audit log writer. Defcustom for log enable. - -** Phase 8 -- Loader wiring + integration - -Add the 16 feature names to =cj/gptel-local-tool-features= (one -per wrapper + the general tool + the helpers feature). Verify -they land in =gptel-tools=. - -* Test Plan - -Target: 55-70 ERT tests across four files. No real subprocesses, -no real network, no real =~/.claude.json= (gh tools don't use it, -but the no-real-process rule applies uniformly). - -** =tests/test-gh-common.el= -- pure helpers (~25 tests) - -- =cj/gh--parse-remote-url=: ssh-scp, ssh-url, https, with/without - =.git=, with/without trailing slash, unknown host returns =:host - nil=, =github.com.evil.example= does NOT match =github.com=. -- =cj/gh--resolve-context=: explicit repo wins; explicit hostname - wins over remote; upstream beats origin; origin beats default; - default fires when no git. -- =cj/gh--validate-cwd=: HOME-rooted ok; outside HOME errors; - TRAMP errors; non-directory errors. -- =cj/gh--redact-argv=: =--token=, =-H "Authorization: ..."=, - =--figma-api-key=, =-f body=...= → body redacted; sentinel - =REDACTED_TEST_SECRET= never appears in any output of any - helper. -- =cj/gh--available-p=: nil when =executable-find= fails; t - otherwise. -- =cj/gh--version=: caches per session; floor check rejects - too-old. - -** =tests/test-gh-runner.el= -- runner contract (~15 tests) - -Stub =make-process=: -- Normal exit 0 with short output: returned verbatim, no - truncation flag. -- Long output (100 KB stub): truncated at 64 KB exactly, - truncation marker present, =:truncated t=. -- Process hangs past timeout: timer fires, SIGTERM sent, returns - =:error-kind 'timeout=. -- Exit 4 + auth stderr: returns =:error-kind 'auth= with recovery - message. -- TRAMP cwd: never invokes =make-process=, returns =:error-kind - 'remote-cwd=. -- Missing executable: returns =:error-kind 'no-executable=. -- Environment: =process-environment= includes all six required - vars before =make-process= call. -- Argv with leading "gh" stripped + logged. - -** =tests/test-gh-classifier.el= -- policy logic (~20 tests) - -- Every read verb classifies as read. -- Every write verb classifies as write. -- Every destructive pattern matches. -- Every blocked verb (=login=, =browse=, =clone=, etc.) classifies - as blocked. -- Every blocked flag (=--web=, =--paginate=, =--input=) classifies - as blocked. -- File-upload (=-F key=@/path=) classifies as blocked. -- =gh api= GET (no fields): read. -- =gh api= GET (=-f x=y=): write (auto-POST per gh's rules). -- =gh api --method GET -f q=x=: read (explicit override). -- =gh api -X DELETE=: destructive. -- =gh api -X PATCH=: write. -- =gh api --input file=: blocked. -- Unknown verb (=gh frobnicate=): unknown → confirm. -- =-H= as branch (=gh pr list -H feature=) doesn't trigger host - treatment. -- =-H= as header (=gh api -H Accept:json=) doesn't trigger host - treatment. -- Policy decision: read → auto; write → confirm; destructive → - block; blocked → block; unknown → confirm. - -** =tests/test-gh-wrappers.el= -- per-wrapper builders + schemas (~15 tests) - -- Every wrapper's schema is valid (correct =:name=, =:type=, - =:description=, =:args= shape). -- =gh_pr_view= argv with number 171 and repo "host/o/r" produces - =("pr" "view" "171" "--repo" "host/o/r" "--json" "DEFAULTS")=. -- =gh_pr_diff= rejects =format='json=. -- =gh_api_get= rejects =-f=, =-F=, =--field=, =--raw-field=, - =--input=, =--method= other than GET. -- =gh_pr_current= invokes without a number arg (uses cwd). -- =limit= clamped to per-wrapper max. -- =fields= validated against per-resource allowlist. - -** Manual smoke (every phase) - -| Phase | Smoke | -|-------+-------| -| 4 | =gh_pr_view N= against both hosts | -| 4 | =gh_pr_list= in =~/.emacs.d= → uses github.com/cjennings/dotemacs | -| 5 | =gh_pr_checks= shows CI status without full logs | -| 5 | =gh_run_logs_failed= cap kicks in on a long-failed run | -| 5 | =gh_api_get= rejects a =-f= arg with clear error | -| 6 | =gh= general tool: agent asked to merge a PR triggers GPTel confirm prompt | -| 6 | Agent asked to =gh repo delete= gets irreversible-blocked | -| 6 | Agent asked to =gh --web=... gets policy-blocked | -| 7 | =cj/gh-doctor= correctly identifies an unauthenticated host | -| 7 | =cj/gh-tool-last-error= shows debug record after a failing call | - -** Opt-in integration suite - -A small set of real-gh tests (in =tests/test-gh-integration.el= -marked =:tag :integration=, default skipped): - -- =gh auth status --hostname github.com= ok. -- =gh auth status --hostname deepsat.ghe.com= ok. -- =gh repo view cjennings/dotemacs --json name= - returns parseable JSON. -- =gh pr list --repo cjennings/dotemacs --limit 1= returns ≤ 1 - PR. - -Run manually via =make test-name TEST=gh-integration=. - -* Acceptance Criteria - -1. *Argv contract.* No tool produces a command string for execution; - every call goes through the argv-list runner. -2. *No silent writes.* Every classified write either prompts - GPTel's confirm or hard-blocks. Verified by an end-to-end - test where the agent attempts =pr merge= and the test fails if - =make-process= is invoked before confirm. -3. *In-flight cap.* A stubbed process emitting 1 MB returns - exactly 64 KB; the runner never holds more than 65 KB in - memory. -4. *Host visibility.* Every successful tool response begins with - =[gh HOST/REPO ...]=. Verified by a test that greps the - response text. -5. *Doctor coverage.* =cj/gh-doctor= correctly identifies (a) no - gh executable, (b) too-old gh, (c) unauthenticated host, (d) - non-git cwd, (e) git cwd whose remote points to an unknown - host. -6. *No secret leakage.* Test fixtures containing - =REDACTED_TEST_SECRET= in every secret-bearing slot - (=--token=, =-H Authorization=, =-f body=, etc.) produce zero - matches when grepping audit log, debug record, and any - user-facing message. - -* Risks - -** R1 -- gh CLI evolves and verbs drift - -New =gh= versions may add subcommands the blocklist doesn't -cover. Or rename verbs. - -*Mitigation:* the blocklist works on verbs (not full subcommand -paths) so most additions are caught. Doctor includes a -=gh --version= floor. Periodic review when gh bumps a major -version. - -** R2 -- The "all reads auto-execute" default may still be too broad - -Some reads expose private content (issue bodies, PR descriptions -from private repos). An agent surfacing a confidential issue -body into a saved conversation has data-leak implications. - -*Mitigation:* response header makes the host/repo visible in -every result, so the saved conversation makes the privacy -boundary auditable. Wrappers truncate body/comments by default; -agent must explicitly opt-in to include them. Documented in -commentary. - -** R3 -- The general tool's =:confirm t= prompt may become click-fatigue - -If the agent uses the general tool heavily during a write-heavy -workflow (PR creation, label management), confirming every call -becomes tedious. - -*Mitigation:* the expanded wrapper set covers most reads, so the -general tool fires mainly for writes -- where confirmation is -exactly the right behavior. If usage shows confirm fatigue, -V2's per-host policy can add =:auto= for explicit -write-confirmed contexts. - -** R4 -- =--paginate= block conflicts with legitimate large queries - -Blocking =--paginate= globally means the agent can't get -historical CI runs (which may need pagination). - -*Mitigation:* =gh_run_list= and =gh_search_*= accept a clamped -=--limit= which usually substitutes. If a use case needs more, -the agent can request multiple non-paginated pages explicitly. - -** R5 -- Token expiry surfaces as cryptic exit 4 - -When a host's keyring entry expires, every call returns exit 4 -with "authentication required" on stderr. The agent sees the -error but may not realize the fix is interactive. - -*Mitigation:* the runner's =:error-kind 'auth= mapping prepends -the recovery message before returning to the agent. =cj/gh-doctor= -proactively checks auth status. - -** R6 -- TRAMP cwd silently runs gh remotely - -Without the explicit TRAMP rejection, =process-file= would try to -spawn =gh= on the remote host (where it may not exist, or may -authenticate against the wrong keyring). - -*Mitigation:* runner checks =(file-remote-p default-directory)= -first and returns =:error-kind 'remote-cwd= with a clear message. - -** R7 -- =gh search= behaves differently on GHE - -GHE may not support every advanced search operator -=github.com= does. Search wrappers may return inconsistent -results across hosts. - -*Mitigation:* documented in =gh_search_*= wrapper descriptions. -Result header makes the host visible so the agent can adjust. - -** R8 -- Audit log grows unbounded - -One file per day, but no automatic cleanup. - -*Mitigation:* metadata-only entries are tiny (~150 bytes); a -year of heavy use is a few MB. Manual cleanup acceptable. -Defcustom to disable for users who don't want it. - -* Open Questions - -** Q1 -- Should the general tool's confirmation prompt include the classification? - -When GPTel asks "Run gh tool? (y/n)" the prompt shows the argv but -not the classification. Showing "WRITE: gh pr merge 171" gives the -user more context. Need to investigate gptel's confirm-prompt -extensibility. - -** Q2 -- Should =gh_pr_diff= cap differently from text wrappers? - -A PR diff can legitimately be 100KB+ for a large refactor. The -64KB cap is the same as everywhere else. If diffs need a higher -cap (256KB?), that's per-wrapper config. - -** Q3 -- Should wrappers expose =include-body=, =include-comments=, etc., as separate args, or as a comma-separated list? - -The spec proposes separate boolean args (=:include-body t=, -=:include-comments t=). Alternative: one =:include= comma-list -arg. Separate args are more discoverable; comma-list is more -compact. Decide during Phase 4. - -** Q4 -- Should =cj/gh-tool-audit-log= grow into a query interface? - -V1 writes one line per call. Future: a command to query the -log (=cj/gh-audit-search REGEX=) for surfacing "what did the agent -do to this PR last week?". - -* V2 Roadmap - -Items intentionally deferred: - -- *Per-host policy.* =cj/gh-host-policy= alist keyed by hostname - (mirror of the MCP spec's structure) so work GHE can be - read-only while personal allows writes-with-confirm. -- *Conversation context injection.* After a PR view, the wrapper - inserts a "GitHub context: HOST/REPO PR #N at URL" line into the - GPTel buffer so saved conversations stay traceable without - bundling full output. -- *Local/remote bridge helpers.* current-branch → PR-number, - changed-files → matching PR file comments, etc. -- *Artifact download.* =gh_run_artifacts=, =gh_release_download= - with explicit confirm and write to a controlled directory. -- *Async general tool.* =make-process= + sentinel for the cases - where 60s timeout isn't enough (rare, but real for some - =--paginate= scenarios). -- *Audit log query interface.* =cj/gh-audit-search=, - =cj/gh-audit-by-host=. -- *Profile-based tool subsets.* e.g. read-only profile - vs. write-capable profile per buffer. - -* References - -- [[file:../../gptel-tools/git_log.el][gptel-tools/git_log.el]] -- pattern reference for new tool files. -- [[file:../../modules/ai-config.el][modules/ai-config.el]] -- =gptel-confirm-tool-calls nil= at - line 386; loader at lines 71-96. -- [[file:gptel-agentic-tool-ideas.org][gptel-agentic-tool-ideas.org]] -- broader agentic-tool design; - =gh= sits alongside the MCP integration as the - collaboration tier. -- [[id:b4c274c5-8572-4a7b-b657-d315712bd6af][mcp-el-gptel-integration-spec-doing.org]] -- sibling design; same - confirm-on-write pattern for safety. -- [[https://cli.github.com/manual/][gh CLI manual]] -- subcommand reference. -- =gh --version 2.92.0= help output -- verified flag semantics - per subcommand. -- =gh help environment= -- verified env-var names for non-interactive - mode. diff --git a/docs/specs/gptel-git-tools-magit-backend-spec.org b/docs/specs/gptel-git-tools-magit-backend-spec.org deleted file mode 100644 index bd84b0595..000000000 --- a/docs/specs/gptel-git-tools-magit-backend-spec.org +++ /dev/null @@ -1,196 +0,0 @@ -:PROPERTIES: -:ID: bd47c9a8-aae1-4a3d-ad5b-b8767f2fd580 -:STATUS: not-started -:END: -#+TITLE: Design: gptel git tools on a magit backend -#+AUTHOR: Craig Jennings -#+DATE: 2026-05-16 - -* Status - -Draft. Supersedes the three current git-tool implementations -(=gptel-tools/git_status.el=, =gptel-tools/git_log.el=, -=gptel-tools/git_diff.el=) shipped in commit =ceeae9b5=. Trigger: -Craig flagged that magit already does much of this and could carry -the backend for more git tools cheaply. - -* Problem - -The three current git_* tools shell out to git directly via -=process-file= and parse stdout. Each carries: - -- Its own =--is-inside-work-tree= path-validation step. -- Its own =-c color.ui=false= color suppression workaround (`git - status' doesn't accept =--no-color= the way `git log' / `git diff' - do). -- Boilerplate to set up a temp buffer, run =process-file=, capture - output, return the string. - -There's also an opportunity cost: adding more git context tools -(=git_blame=, =git_show=, =git_branches=, etc.) would mean -duplicating the same boilerplate per tool. - -* Wins from a magit backend - -Three concrete things magit provides: - -1. *Path validation via =magit-toplevel=.* One call replaces the - two-step =process-file= + =rev-parse --is-inside-work-tree= - check. Returns the working-tree root or nil. - -2. *Process plumbing via =magit-git-insert= / =magit-git-string= / - =magit-git-lines=.* These wrap git invocation with magit's - environment, encoding handling, and the right color posture. - Drops the per-subcommand color-flag bikeshedding. - -3. *Typed helpers for higher-level concepts* -- =magit-get-current-branch=, - =magit-list-branches=, =magit-rev-ancestor-p=, etc. Most - relevant for the *new* tools (branches, show, blame), not the - three we already wrote. - -What magit doesn't give us: high-level "give me status as a string" -helpers. =magit-status= / =magit-log-current= etc. populate -interactive magit buffers, not strings. For tool output we'd still -call =magit-git-insert "status" "--short" "--branch"= and grab the -buffer string. Same shape, less boilerplate. - -* Costs - -- *Magit loads on first invocation* of any git_* tool. Magit pulls - in transient, with-editor, magit-section, magit-core -- heavyweight. - Mitigation: lazy =(require 'magit)= inside each tool's function - body so cold-start Emacs sessions don't pay the cost unless the - user actually calls a git tool. -- *Tools no longer portable* to a no-magit Emacs. Acceptable here - because magit is a non-negotiable in this config; a future - drop-in distribution would need to publish a magit-free fallback. - -* Proposed shape - -** Single-file module: =gptel-tools/git_tools.el= - -The current "one file per tool" convention exists because the -existing tools share little. These six tools share a lot -(validate-path, run-git, truncate-output), so a single file with -shared helpers is more honest. - -** Shared helpers - -- =cj/gptel-git--toplevel-or-error PATH= - - Wraps =magit-toplevel=. Signals =user-error= when PATH escapes - HOME, doesn't exist, or isn't inside a working tree. - - Returns the resolved working-tree root on success. - -- =cj/gptel-git--insert ARGS...= - - Wraps =magit-git-insert= in a =with-temp-buffer=, returns - =buffer-string=. Single chokepoint for color / encoding / error - handling. - -- =cj/gptel-git--truncate TEXT MAX-BYTES= - - Caps output, appends a one-line truncation marker when - triggered. - - Open question: consolidate the matching helper from =web_fetch.el= - (=cj/gptel-web-fetch--truncate=) and the - =cj/update-text-file--*= analogue into a shared - =cj/gptel-tools--truncate-bytes= in =system-lib.el=, or keep - per-tool. - -** Six tools - -| Name | Magit-flavored shape | -|------------------+--------------------------------------------------------------------| -| =git_status= | =magit-git-insert "status" "--short" "--branch"= | -| =git_log= | =magit-git-insert "log" "--oneline" (format "-n%d" N) ?--since= | -| =git_diff= | =magit-git-insert "diff" REF1 REF2 "--" FILE= (each optional) | -| =git_blame= | =magit-git-insert "blame" "--line-porcelain" FILE [-L S,E]= | -| =git_show= | =magit-git-insert "show" REF= (message + full diff) | -| =git_branches= | =magit-list-branches= (optionally filtered by =--list PATTERN=) | - -Each tool: -- Validates =path= via =cj/gptel-git--toplevel-or-error=. -- Calls =cj/gptel-git--insert= with the appropriate args. -- Truncates via =cj/gptel-git--truncate=. -- Registered as a separate tool with =gptel-make-tool= for - description / argv clarity at the model side. - -** Caps - -| Tool | Default cap | Hard cap | -|---------------+----------------+--------------| -| =git_status= | uncapped | uncapped | -| =git_log= | 100 commits | 100 commits | -| =git_diff= | 500 KB | 500 KB | -| =git_blame= | 500 KB | 500 KB | -| =git_show= | 500 KB | 500 KB | -| =git_branches=| uncapped | uncapped | - -=git_log='s cap is on commit count; the rest cap output bytes. - -** :confirm posture - -All six tools are read-only. Same posture as the current -implementation: =:confirm nil= (the model can call them -autonomously, since they can't mutate state). The current -git_status / git_log / git_diff already ship with =:confirm nil= -- -keeping it. - -** Tests - -Single file =tests/test-gptel-tools-git-tools.el=, replacing the -three current per-tool test files. Real temp git repos via -=process-file= (same pattern as current tests). Coverage per tool: -Normal / Boundary / Error. - -Rough count: ~12 shared-helper tests (validator, insert wrapper, -truncate) + ~7 per tool × 6 tools = ~54 tests total. - -* Migration - -1. Delete =gptel-tools/git_status.el=, =git_log.el=, =git_diff.el=. -2. Delete =tests/test-gptel-tools-git-status.el=, - =test-gptel-tools-git-log.el=, =test-gptel-tools-git-diff.el=. -3. Create =gptel-tools/git_tools.el= containing all six tools + - shared helpers. -4. Create =tests/test-gptel-tools-git-tools.el=. -5. Update =cj/gptel-local-tool-features= in =modules/ai-config.el=: - replace the three =git_*= symbols with one =git_tools= symbol - (or six if each tool wants its own feature file -- decide during - implementation). -6. Make sure =modules/ai-config.el= can re-load without breaking the - live gptel session if the old tool symbols are still registered - from a prior Emacs. - -* Risks - -| Risk | Mitigation | -|-------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------| -| Magit load slows first git_* tool call | One-time hit per session, scoped to the tool's :function body. Acceptable for opt-in tools. | -| Tool registration name collision with the old git_* symbols | Use distinct names (git_status / git_log / git_diff stay; new tools join them). Or remove + restart. | -| =magit-toplevel= behavior on TRAMP / remote paths | Validator rejects paths outside HOME first, so TRAMP paths can't reach magit-toplevel. | -| =git_blame= exposes code surfaces the model shouldn't read | =:confirm nil= is the wrong posture if blame is sensitive. Open question for review. | -| =git_show= reveals past-self commit message wording | Same as blame -- low risk on personal repo, but worth flagging. | - -* Open questions - -1. Build all six tools in one push, or phase status/log/diff first - and add blame/show/branches in a follow-up? My read: one push. - The helpers are shared, marginal cost of three more tools is - small, and the model gets meaningfully more useful git context. -2. Consolidate the output-truncation helper into =system-lib.el=, - touching =web_fetch.el= and =update_text_file.el= for a cleaner - API? Or defer that to a separate refactor commit? -3. =git_blame= and =git_show= -- =:confirm nil= or =:confirm t=? - Personal repo lowers the stakes but the model could ask for - blame on /any/ file under HOME. -4. Tool feature symbols: one =git_tools= entry in - =cj/gptel-local-tool-features=, or six (one per tool)? - Currently each tool lives in its own provide-symbol file. With - the single-file design we'd register one feature symbol that - loads all six. - -* Effort estimate - -M (1-3 hours). Helpers + six tool wrappers + ~50 tests + migration. -Most of the time is test authoring; the production code is small -because magit absorbs the boilerplate. diff --git a/docs/specs/gptel-network-tools-spec.org b/docs/specs/gptel-network-tools-spec.org deleted file mode 100644 index c28d54694..000000000 --- a/docs/specs/gptel-network-tools-spec.org +++ /dev/null @@ -1,411 +0,0 @@ -:PROPERTIES: -:ID: 6388588c-dac2-4c52-97ad-2343ba1443fc -:STATUS: not-started -:END: -#+TITLE: Design: gptel network tools -#+AUTHOR: Craig Jennings -#+DATE: 2026-05-16 -#+OPTIONS: toc:nil num:nil - -* Status - -Draft. Brainstorm output captured from a =/brainstorm= session on -2026-05-16. Sibling to -=docs/specs/gptel-git-tools-magit-backend-spec.org= and the broader theme -hierarchy under =** TODO [#B] GPTel Tool Work= in =todo.org=. - -The conventional vs tail-sample exploration covered three categories -(network, text/data, build/code). Network was selected as the next -build target; this doc captures the network slice in full. The other -two categories are referenced briefly and live as theme stubs under -=*** TODO [#B] Filesystem Related Tools= and -=*** TODO [#B] Development Workflow Related Tools= in =todo.org=. - -* Problem - -The current =gptel-tools/= set covers filesystem CRUD, web fetch, and -git status/log/diff. When the user asks the agent "why can't I reach -X?" or "what's on my LAN right now?" the agent has no affordances -- -it can only suggest commands the user runs manually. - -Network diagnosis is a recurring task on this laptop (homelab, mixed -wifi/wired, occasional VPN, NetworkManager-managed connections). The -agent should be able to run read-only network probes directly, return -structured findings, and synthesize an explanation. Anything that -mutates network state (=nmcli connection up=, route changes) stays -behind =:confirm t=. - -* Non-goals - -- Active offensive scanning, vulnerability probes, or exploitation - tooling. Out of scope at the wrapper boundary -- nmap's - =-A=/=-O=/aggressive modes are rejected, NSE is deferred. -- Scanning networks the user doesn't own. Public targets are gated - behind an explicit =external=t= flag and =:confirm t=. -- Real-time/streaming inspection (=iftop=, =nethogs=, =tcpdump - follow=). Snapshot tools only; streaming tools don't fit the - request/response shape of gptel tools. -- Replacing Magit's git tooling, mu4e's mail handling, or any other - Emacs-native workflow. Network tooling is the gap. - -* Approaches considered - -The =/brainstorm= run generated six candidate themes across three -categories. Three conventional (high-prior), three tail samples -(genuinely different regions of the option space). Network was -chosen as the first build target; the others are recorded for -follow-up sessions. - -** Recommended: network triage bundle (conventional #1) - -Five tools covering discovery, diagnostics, and inspection: - -| Tool | Purpose | -|-------------------+--------------------------------------------------| -| =net_diagnose= | "Why can't I reach X?" -- composite probe | -| =net_discover= | "What's on this subnet?" -- LAN host discovery | -| =net_services= | "What's listening on host X?" -- service detect | -| =network_status= | "What's my current network state?" -- snapshot | -| =dns_lookup= | Typed DNS query (A/AAAA/MX/NS/TXT/SRV/CAA) | - -Detailed in =* Design= below. - -*** Pros - -- Hits the highest-leverage daily question (connectivity diagnosis) - with a single mental entry point (=net_diagnose=). -- Atomic tools (=dns_lookup=, =network_status=) for cases the - composite is too coarse for. -- All read-only at the network layer; =:confirm nil= for RFC1918, - =:confirm t= for public targets. -- nmap's two genuinely-unique capabilities (subnet discovery, service - enumeration) get first-class wrappers. - -*** Cons - -- Five tools is heavy for one category. Some are thin wrappers around - a single command. -- Composite =net_diagnose= hides which sub-check fired; debugging the - tool itself is harder than debugging atomic tools. -- nmap is the one tool that *can* get the user in trouble. Target - gating must be airtight or it's the wrong tool to ship. - -** Rejected: code-quality fan-out (conventional #2) - -=shellcheck_run=, =format_check= (black/prettier/gofmt/rustfmt/elisp, -returns unified diff), =lint_run= (eslint/ruff/golangci-lint), -=dot_render=, =mermaid_render=. - -Folded into =*** TODO [#B] Development Workflow Related Tools= as -per-language work rather than a standalone bundle. Most of the per- -language wins land in the existing prog-*.el modules' format-on-save -and LSP attachments; the agent benefits more from /reading/ those -buffers than from re-running the formatters via tool calls. - -** Rejected: GitHub workspace (conventional #3) - -=gh_pr_view=, =gh_issue_search=, =gh_run_logs=, =gh_pr_diff=. - -Overlaps with the magit-backend track (=gptel-git-tools-magit-backend=) -for several queries. Better treated as a follow-on once the magit -backend lands -- some queries are local (magit) and some are remote -(gh), and the seam is clearer after the local side is built. - -** Rejected: DNS-chain inspector (tail sample) - -=dns_chain= walks NS -> A/AAAA -> MX -> SPF -> DMARC -> DKIM for a -domain and returns a structured assessment with red flags ("MX -missing TLS-RPT", "SPF includes >10 lookups", "DMARC policy=none"). - -Real value when it's useful but probably 5 calls/year for this -laptop. =dns_lookup= covers 90% of the recurring need; the chain -walker is parked for a possible follow-on. - -** Rejected: awk_eval / sed_eval with explanation (tail sample) - -Accept snippet + sample input, return both the transformed output and -a plain-English explanation of what the snippet does. - -Doubles work the model already does internally -- the model is -already good at generating and explaining awk/sed. Real win would -only be the actual execution against actual data, which the eshell -escape hatch in the Filesystem section already covers. - -** Adopted as project convention: plan/apply split (tail sample) - -=rsync_plan= / =rsync_apply= split: plan always runs =--dry-run= and -returns the file list and byte counts that *would* transfer; apply is -a separate tool registration with =:confirm t=. Same shape for -=nmcli= (status read vs connection mutate) and any other mutating -tool. - -Promoted to a documented convention rather than a single tool: any -mutating wrapper in =gptel-tools/= should split into a preview and an -apply. The preview is =:confirm nil= so the agent can plan -autonomously; the apply is =:confirm t= and stops cleanly for human -review. Applies to =rsync=, =nmcli connection up=, =ssh= mutations, -and the pandoc/ffmpeg/imagemagick output-writing tools in the -Filesystem section. - -* Design - -** Tool 1: =net_diagnose= - -Composite "why can't I reach X?" probe. Given a target (hostname or -IP), runs a sequence of sub-checks and returns a structured result: - -1. =dig +short= on the name (skip if target is an IP literal). -2. =ping -c 3 -W 2= against the resolved IP. -3. =traceroute -n -w 2 -q 1 -m 20= to the IP. -4. If a port is given: =curl --max-time 5 -o /dev/null -sw '%{http_code}\n'= - for ports 80/443, or =nc -zv -w 3= for arbitrary TCP ports. - -Output shape (alist or plist returned to the model): - -#+begin_src text - ((target . "example.com") - (resolved-to . "93.184.216.34") - (dns-time-ms . 12) - (ping . ((sent . 3) (received . 3) (avg-ms . 14.2))) - (traceroute . ((hops . 8) (last-hop . "93.184.216.34"))) - (port-check . ((port . 443) (status . "200") (tls . "ok")))) -#+end_src - -Caps: total runtime <30s. Each sub-check has its own timeout. If a -sub-check fails (no ping reply, no route, no DNS), the field carries -the failure mode rather than aborting the whole call -- the agent -needs the partial picture to reason. - -=:confirm nil=. Read-only. - -** Tool 2: =net_discover= - -Wraps =nmap -sn <subnet>= for LAN host discovery. Two argv shapes: - -- =net_discover ()= -- defaults to the current LAN, derived from - =ip route get 1.1.1.1= and the matching interface's =/24=. -- =net_discover :subnet "192.168.1.0/24"= -- explicit subnet. - -Guardrails: - -- Subnet must be RFC1918, link-local (169.254/16), CGNAT (100.64/10), - or loopback. Public subnets rejected at the validator. -- Subnet mask must be /22 or smaller (no /16 or wider). At /22 that's - ~1024 hosts -- enough for any homelab. Default home network is /24. -- =--host-timeout 30s --max-retries 1= to bound runtime. - -Output: list of =(ip mac hostname state)= tuples. - -=:confirm nil= for RFC1918 / link-local / CGNAT / loopback. Public -subnets never reach this tool (validator rejects). - -** Tool 3: =net_services= - -Wraps =nmap -sV= for service/version detection on a single host. - -Argv: - -- =:host= -- required. RFC1918 / link-local / CGNAT / loopback by - default. Public hosts require =:external t= which flips - =:confirm t=. -- =:ports= -- optional port spec. Default: top-100 (=--top-ports - 100=). Custom lists allowed: ="22,80,443,5432,6379"= or - ="1-1024"=. Hard cap: 1024 ports total. -- =:fast= -- if t, uses =--top-ports 20= for a quick check. - -Mode allowlist enforced at the wrapper: only =-sV= with optional -=-p=. Reject =-A=, =-O=, =-T4=/=-T5=, =--script=, raw-packet flags. - -Output: list of =(port protocol state service version banner)= -tuples, parsed from =-oG -= (greppable output). - -=:confirm nil= for RFC1918 / link-local / CGNAT / loopback. -=:confirm t= for any target reachable only as a public IP/hostname. - -** Tool 4: =network_status= - -Snapshot of the local network state. Composite of: - -- =ip -br addr= -- interfaces and their addresses. -- =ip route= -- routing table. -- =nmcli -t -f NAME,TYPE,DEVICE,STATE connection show --active= -- - active NetworkManager connections. -- =ss -tulpn= (or =netstat -tulpn= fallback) -- listening sockets. -- =resolvectl status= (or =/etc/resolv.conf= fallback) -- DNS - resolver state. - -Output: structured alist with sections for each. - -=:confirm nil=. Read-only. - -Note: this is also the candidate target for the plan/apply split if -=nmcli connection up=/=down= ever lands as a tool -- =network_status= -becomes the "plan" side and any mutation is a separate tool. - -** Tool 5: =dns_lookup= - -Typed DNS query. Argv: - -- =:name= -- required. The DNS name to query. -- =:type= -- record type. Default =A=. Allowed: =A=, =AAAA=, =MX=, - =NS=, =TXT=, =SRV=, =CAA=, =CNAME=, =PTR=, =SOA=. -- =:server= -- optional resolver. Default uses system resolver. - When set, must be RFC1918 or one of a small allowlist (=1.1.1.1=, - =8.8.8.8=, =9.9.9.9=) so the tool can't be used to probe arbitrary - hosts via DNS. - -Output: list of records with TTL. For =MX= and =SRV=, includes -priority/weight/port. For =TXT=, the records are split into the -quoted segments dig returns. - -=:confirm nil=. Read-only. - -** Shared helpers - -In =gptel-tools/network_tools.el= (single file, mirrors the -magit-backend plan for git tools): - -- =cj/gptel-net--validate-target HOST &optional ALLOW-PUBLIC= - - Resolves HOST. Rejects unless resolved IP is RFC1918 / - link-local / CGNAT / loopback, unless ALLOW-PUBLIC is non-nil. - - Returns the resolved IP on success. - -- =cj/gptel-net--validate-subnet CIDR= - - Rejects non-private subnets and subnets wider than /22. - - Returns =(network mask)= on success. - -- =cj/gptel-net--current-lan= - - Derives the current /24 from =ip route get 1.1.1.1=. - -- =cj/gptel-net--run ARGS &key TIMEOUT= - - Wraps =process-file= with a uniform timeout, color/encoding - posture, and structured return =(exit-code stdout stderr)=. - -- =cj/gptel-net--parse-nmap-greppable STRING= - - Parses nmap =-oG -= output into structured tuples. - -- =cj/gptel-net--truncate TEXT MAX-BYTES= - - Same shape as the existing per-tool truncate helpers. Open - question whether this consolidates into =system-lib.el= alongside - the matching helpers in =web_fetch.el= and =update_text_file.el=. - -** Caps - -| Tool | Default cap | Hard cap | -|------------------+------------------------+------------------------| -| =net_diagnose= | <30s total runtime | <30s total runtime | -| =net_discover= | /24 default, /22 max | /22 | -| =net_services= | top-100 ports | 1024 ports | -| =network_status= | uncapped (snapshot) | uncapped | -| =dns_lookup= | uncapped | uncapped | - -** =:confirm= posture - -| Tool | RFC1918 target | Public target | -|------------------+-------------------+-------------------------| -| =net_diagnose= | =:confirm nil= | =:confirm t= | -| =net_discover= | =:confirm nil= | rejected at validator | -| =net_services= | =:confirm nil= | =:confirm t= | -| =network_status= | =:confirm nil= | n/a (local snapshot) | -| =dns_lookup= | =:confirm nil= | =:confirm nil= | - -=dns_lookup= stays =:confirm nil= for public names because DNS is -read-only and innocuous. =net_diagnose= and =net_services= against -public targets are gated because pinging/probing public hosts isn't -*illegal* but it can trip rate-limits or get the user flagged on a -managed network. - -** Tests - -Single file =tests/test-gptel-tools-network-tools.el=. Real subnets -are not available in CI, so: - -- =net_discover= and =net_services= are stubbed via =cl-letf= on - =cj/gptel-net--run=, returning canned nmap output. Real nmap - invocation tested via one =:tags '(:integration)= test that runs - =nmap -sn 127.0.0.1/32= and asserts the parser handles the real - format. -- =net_diagnose= sub-checks stubbed individually so each failure mode - can be exercised. -- =network_status= sections stubbed per-command; one integration test - runs against the live system and asserts the structure parses. -- =dns_lookup= stubbed against canned =dig= output; one integration - test against =localhost= via the system resolver. - -Rough count: ~12 shared-helper tests (validators, current-lan -detector, parsers) + ~7 per tool x 5 tools = ~47 tests. - -** Risk surface - -| Risk | Mitigation | -|-----------------------------------------------------------+---------------------------------------------------------------------| -| nmap scan against an unintended target | Validator gates on resolved IP, not on the input string. Public | -| | targets require explicit =:external t= flag + =:confirm t=. | -| Scan triggers IDS/IPS on a corporate/managed network | Default modes are non-aggressive (=-sn=, =-sV= only). No =-A=, no | -| | NSE, no high T-level. =:confirm t= for non-RFC1918 targets gives | -| | the user a manual checkpoint. | -| =net_diagnose= hangs on a slow target | Per-sub-check timeouts; total runtime cap; partial-failure return | -| | rather than abort. | -| nmap not installed on the system | =:command= check at module load via =cj/executable-find-or-warn= | -| | (matching the prettier/pyright pattern documented in CLAUDE.md). | -| Network tools shell out via =process-file= | argv-list invocation, no shell. =shell-quote-argument= unused | -| | because no shell is involved. | -| /tmp pollution or banner output writing to disk | All output captured to buffer via =process-file=, never written. | - -* Open questions - -1. *Default port set for =net_services=.* Top-100 (nmap default), - top-1000 (full default scan, slower), or a custom homelab-tuned - list (=22, 80, 443, 445, 3389, 5432, 6379, 8080, 8443, 9090, 9000, - 631=)? My read: top-100 default + =:fast t= for top-20 + custom - override for the homelab list when needed. -2. *NSE in v1 or deferred?* Skip entirely (clean v1) or ship a small - allowlist (=ssl-cert=, =http-title=, =ssh-hostkey=)? My read: - skip in v1. If a real use case shows up (TLS audit), add a single - =net_tls_audit= tool wrapping just =ssl-enum-ciphers=/=ssl-cert= - rather than a generic NSE escape hatch. -3. *Consolidate the truncate helper.* Same open question as the - magit-backend doc: move =cj/gptel-net--truncate= and its siblings - into =system-lib.el= as =cj/gptel-tools--truncate-bytes=, or keep - per-module? My read: consolidate when there are three callers - (web_fetch, update_text_file, network_tools all qualify). -4. *Composite vs atomic for =net_diagnose=.* Build it as one - composite, or break it into =ping_run=, =traceroute_run=, - =port_check= and let the agent compose? My read: composite is - better -- the agent reasons in "diagnose-this-target" terms more - often than in "just-ping-this". Atomic sub-tools can be added - later if the composite proves coarse-grained. -5. *Promote plan/apply split to documented convention now?* Or wait - until a second tool exercises it (post-rsync)? My read: document - the convention in the Filesystem section body now, since pandoc / - ffmpeg / imagemagick all benefit, even before any of them ship. -6. *nmcli mutation tools.* Out of scope for this doc but worth - flagging: =nmcli connection up <name>= / =nmcli connection down - <name>= / =nmcli device wifi connect <ssid>=. These would be the - first apply-side tools under the plan/apply convention, with - =network_status= as the plan side. - -* Effort estimate - -M (1-3 hours). Five tools + shared helpers + ~47 tests. Most of the -time is test authoring (canned nmap output, dig output, ss output); -production code is small because each tool is a thin =process-file= -wrapper plus a parser. - -* Next steps - -- Resolve open questions #1 and #2 before any code lands (the - =net_services= shape can't be finalized without them). -- Once approved, the work attaches to =*** TODO [#B] (Network bundle: - net_diagnose / net_discover / net_services / network_status / - dns_lookup)= -- a new theme under =*** TODO [#B] (Networking tools - category)= which itself becomes a new top-level under =** TODO [#B] - GPTel Tool Work= in =todo.org=, peer to the existing Filesystem - section. -- Implementation follows =/start-work= flow: TDD, characterization - tests for the parsers first (canned nmap/dig/ss fixtures), then - the wrappers, then the registrations in - =cj/gptel-local-tool-features=. -- After landing, revisit candidate #6 (plan/apply split) -- the - first apply-side tool (=nmcli connection up=, =rsync_apply=, - pandoc-output) exercises the convention end-to-end. diff --git a/docs/specs/mcp-el-gptel-integration-spec-doing.org b/docs/specs/mcp-el-gptel-integration-spec-doing.org deleted file mode 100644 index f22e91959..000000000 --- a/docs/specs/mcp-el-gptel-integration-spec-doing.org +++ /dev/null @@ -1,1438 +0,0 @@ -:PROPERTIES: -:ID: b4c274c5-8572-4a7b-b657-d315712bd6af -:STATUS: doing -:END: -#+TITLE: Design: Wire mcp.el into GPTel for MCP server access -#+AUTHOR: Craig Jennings -#+DATE: 2026-05-16 -#+OPTIONS: toc:nil num:nil - -* Status - -Draft (revision 3). Pre-implementation; no code shipped yet. The -mcp.el package is cloned at =~/code/mcp.el/= (fork of -[[https://github.com/lizqwerscott/mcp.el][lizqwerscott/mcp.el]]) but not wired into the config. - -Revision 3 tightens seven contracts the revision-2 review flagged: - -1. *GPTel confirmation* -- =gptel-confirm-tool-calls= is =nil= at - =ai-config.el:386=, which short-circuits every per-tool - =:confirm= slot. The integration flips it to =auto= as a hard - precondition. -2. *Async timeout mechanics* -- replaced =with-timeout= (which - only supervises dynamic extent) with an explicit timer/callback - race for async tool calls. -3. *Startup completion semantics* -- the hub's completion callback - is opportunistic, not authoritative; the stall timer + polling - =mcp-server-connections= is the source of truth. -4. *Server identity at registration* -- walk - =mcp-server-connections= directly instead of parsing - =:category mcp-SERVER= out of =mcp-hub-get-all-tool=. -5. *Server enablement* -- =cj/mcp-enabled-servers= defcustom lets - users disable a server without writing code. Profiles still - deferred. -6. *Keymap pinned* -- =C-; a C= (Connect) is the MCP subprefix. - =M= (=gptel-menu=) and =m= (=cj/gptel-change-model=) stay - where they are. -7. *mcp.el private-API isolation* -- a compat layer wraps every - =mcp--*= call so version drift surfaces in one place. - -Plus several smaller changes: every MCP tool registers async, -description normalization adds a server-name prefix and a write -risk note, =cj/mcp-start-on-entry-points= defcustom scopes -startup triggers (default: full chat only), TRAMP processes -local-only, doctor gains live-auth-check, =cj/mcp-wait-until-ready= -command added, audit buffer surfaces failed servers prominently. - -* Problem - -GPTel exposes ten local tools today (=read_buffer=, =read_text_file=, -=write_text_file=, =update_text_file=, =list_directory_files=, -=move_to_trash=, =git_status=, =git_log=, =git_diff=, =web_fetch= -- -see =gptel-tools/=). Claude Code, by contrast, has access to nine -external MCP servers (linear, notion, figma, slack-deepsat, -google-calendar, google-docs-personal, google-docs-work, drawio, -google-keep), each exposing 10-70 additional tools. - -The asymmetry means agentic work done in GPTel can't touch the same -external systems Claude Code can. Wiring [[https://github.com/lizqwerscott/mcp.el][mcp.el]] into the config -closes the gap: GPTel gains access to every MCP server Claude Code -uses, modulo three claude.ai-hosted servers whose OAuth is bound to -the Claude.ai session (see Non-Goals). - -* Goals - -1. GPTel sees every tool from the enabled subset of nine reusable - MCP servers in =gptel-menu=, grouped by server via the tool's - =:category= field. -2. Servers spawn *asynchronously*. Opening GPTel never blocks on - MCP startup; tools arrive incrementally and =gptel-tools= updates - as each server reports its inventory. =cj/toggle-gptel= must - return without waiting for any MCP subprocess. -3. Write/destructive MCP tools are gated by a confirmation prompt - the user actually sees. Two preconditions: - =gptel-confirm-tool-calls= is set to =auto= (so the per-tool - =:confirm= slot is honored), and write/destructive tools are - registered with =:confirm t=. Read-only tools execute without - confirmation. -4. Secrets stay in =~/.claude.json= (single source of truth, shared - with Claude Code). The Emacs config reads env vars from there - at server-spawn time, with an mtime-aware cache. Secrets are - never echoed to status, errors, hub buffers, or tests. -5. A per-server status alist tracks each server's lifecycle (idle / - starting / ready / failed / stopped) and is inspectable via - =cj/mcp-status= and a =cj/mcp-list-tools= audit buffer. -6. Server-management commands live under a =C-; a C= (Connect) - subprefix so existing GPTel keys (=C-; a M=, =m=) aren't - disturbed. -7. A failed server (network down, OAuth token expired, npx package - 404) is surfaced clearly via the OAuth-recovery pattern matcher - and does not block GPTel itself. Successful servers' tools are - available immediately; failed servers' tools are absent (not - stale). -8. The config can swap between MELPA mcp.el and the local - =~/code/mcp.el/= checkout with a one-line uncomment, gated by a - capability check that asserts required API functions exist. -9. A first-run =cj/mcp-doctor= command diagnoses missing - prerequisites (=npx=, =uvx=, =~/.claude.json=, per-server - commands, known local endpoints) and optionally runs a - live-auth probe before they fail at runtime. - -* Non-Goals - -- The three claude.ai-hosted MCP servers (Gmail / Drive / Calendar - served from =*.googleapis.com/mcp/v1=). Their OAuth is issued - by the Claude.ai session and is not transferable to GPTel. -- *MCP resources and prompts.* v1 registers tools only. - Resource browsing and prompt invocation are tracked as - follow-ups; the local checkout has the API surface ready. -- *Per-conversation tool profiles.* v1 ships - =cj/mcp-enabled-servers= for whole-server enable/disable; - profiles (different tool subsets per chat) wait for v1.5 once - usage shows whether they're needed. -- *Auth-source migration.* Deferred until the OAuth re-auth flow - for expiring tokens is understood. Tracked in Open Questions - §Q3. -- *Automated OAuth re-auth when tokens expire.* Out of scope; the - user re-authenticates via Claude Code, and the next GPTel - invocation picks up the refreshed values from =~/.claude.json=. -- *Modifying mcp.el itself in this repo.* Upstream patches and - tests live in =~/code/mcp.el/= and ship via PRs to lizqwerscott's - master. - -* Verified API Contracts - -These were checked against the actual source before each revision. -Behavior summarized here so implementation can rely on it. - -** GPTel confirmation gating (=gptel.el:2244=) - -The confirmation check is: - -#+begin_src emacs-lisp -(if (and gptel-confirm-tool-calls - (or (eq gptel-confirm-tool-calls t) - (gptel-tool-confirm tool-spec))) - ;; ask user - ...) -#+end_src - -When =gptel-confirm-tool-calls= is =nil=, the =(and ...)= -short-circuits and the tool's =:confirm= slot is ignored. - -The defcustom default is ='auto=, which "seeks confirmation only -when the corresponding tool spec has a non-nil :confirm slot" -(=gptel.el:1601-1603=). =ai-config.el:386= currently sets it to -=nil=. - -*Implementation consequence:* =ai-mcp.el= must =setq -gptel-confirm-tool-calls 'auto= as part of its setup (and -=ai-config.el= drops the explicit =nil= setting). Without this, -write-gated tools register =:confirm t= and gptel ignores it. - -** mcp-hub callback ownership (=~/code/mcp.el/mcp-hub.el:53-90=) - -=mcp-hub--start-server= unconditionally appends its own six -callbacks (=:initial-callback=, =:tools-callback=, -=:prompts-callback=, =:resources-callback=, -=:resources-templates-callback=, =:error-callback=) to whatever -the caller passes. Per-server custom callbacks in the alist -result in duplicate keyword arguments to =mcp-connect-server= -- -behavior implementation-defined. - -*Implementation consequence:* the integration does not slip custom -callbacks through =mcp-hub-servers=. It uses -=mcp-hub-start-all-server='s top-level completion callback as an -opportunistic signal, walks =mcp-server-connections= directly for -authoritative state, and uses a stall timer as the deadline. - -** mcp-hub-start-all-server completion semantics - -The hub's completion callback (=mcp-hub-start-all-server='s -=CALLBACK= argument) fires when its internal counter reaches the -total server count. The counter increments: - -- On immediate Elisp errors from =mcp-hub--start-server=. -- When the =:inited-callback= passed to =mcp-hub--start-server= - fires, which happens inside the hub's =:tools-callback=. - -Async error paths flow through =:error-callback= -- which the hub -also installs but does *not* obviously chain into the inited -callback. Servers without tools may not pass through the tools -callback in the same way. - -*Implementation consequence:* the callback is treated as an -opportunistic readiness signal, *not* as "all initialized or -failed". The authoritative state comes from polling -=mcp-server-connections= (each entry has =mcp--status= of -=connected= / =error= / =starting=) and from the stall timer -deadline. - -** gptel-make-tool registration semantics (=gptel.el:1729-1820=) - -=gptel-make-tool= registers the tool into =gptel--known-tools= -keyed by category + name. It does *not* add the tool to -=gptel-tools= (the per-buffer active list). The existing local -tools (=gptel-tools/git_log.el:97=) explicitly do: - -#+begin_src emacs-lisp -(gptel-make-tool ...) -(add-to-list 'gptel-tools (gptel-get-tool '("category" "name"))) -#+end_src - -*Implementation consequence:* the registration pipeline does -both calls per tool, tracks tool names per server in a hash, and -deregisters cleanly on restart/stop without disturbing local -tools. - -** MELPA vs local checkout - -The local checkout (=~/code/mcp.el/=, tip =f10768e=) has HTTP -transport (=mcp-http-process-connection=) and recent UX -improvements (resource reading, imenu support, detail mode). -MELPA parity not yet verified. - -*Implementation consequence:* =cj/mcp--assert-capabilities= -checks for required functions at load time and signals a clear -=user-error= if missing. Use-package block defaults to MELPA; -the local-checkout =:load-path= line stays commented until the -capability check tells us MELPA is missing something. - -* GPTel Confirmation Contract - -The single most consequential precondition for the safety story: - -** Current state - -=modules/ai-config.el:386= sets =gptel-confirm-tool-calls= to -=nil=. This was a deliberate "allow tool access by default" -choice when only the ten local tools existed -- all of which are -either read-only (git_log, git_status, list_directory_files, etc.) -or already wrap their own confirm prompts (web_fetch uses -=:confirm t= but is ignored under the current setting; this was -acceptable because the only "real" tool there is on a user-typed -URL). - -** Required state - -The MCP integration cannot ship without flipping this to =auto=. -Specifically, =modules/ai-mcp.el= must: - -1. =setq gptel-confirm-tool-calls 'auto= during its load. -2. Audit the existing local tools and add =:confirm t= to any - that should be gated. =web_fetch= is the obvious candidate; - =write_text_file=, =update_text_file=, =move_to_trash= may - also warrant it depending on Craig's preference. - -The existing =ai-config.el:386= line is removed. A comment -points readers at =ai-mcp.el= for the new value. - -** Verification test - -A test in =tests/test-ai-mcp-confirm-contract.el= asserts: - -- After =ai-mcp= loads, =gptel-confirm-tool-calls= is ='auto=. -- A write-classified MCP tool registered with =:confirm t= takes - the confirmation branch in =gptel-send='s tool-dispatch code - (verified by stubbing gptel's confirm-prompt and checking it - fires). -- A read-classified MCP tool registered with =:confirm nil= does - not take the confirmation branch. -- Local =git_log= (=:confirm nil=) still runs without prompting. - -* Current State - -** =modules/ai-config.el= - -- =use-package gptel= block at lines 363-414, defer-loaded on the - =gptel= / =gptel-send= / =gptel-menu= commands. -- =gptel-confirm-tool-calls nil= at line 386. Removed by this - integration; see § GPTel Confirmation Contract. -- =cj/gptel-load-local-tools= (lines 71-96) loads the ten local - tools from =gptel-tools/=. -- =cj/toggle-gptel= (lines 418-441) is the primary entry point - (=C-; a t=). Other entry points: =cj/gptel-quick-ask= - (=C-; a q=), =gptel-magit-commit-generate= (=g= in magit), - =cj/gptel-rewrite-with-directive= (=C-; a r=), =gptel-send= - (=C-RET= in gptel buffer). -- =cj/ai-keymap= (lines 510-528) currently uses keys A B M d . f - b l m p q r R c s t x. =C= (uppercase) is free and becomes the - MCP subprefix. - -** =gptel-tools/= - -Ten =.el= files. =git_log.el= is the closest analogue; -=web_fetch.el= demonstrates the =:confirm t= pattern. - -** =~/.claude.json= - -Mode 0600, ~75 KB. Top-level =mcpServers= key holds the nine -servers we want. Env-var names per server (values redacted): - -| Server | Env vars | -|----------------------+-------------------------------------------------------| -| google-calendar | =GOOGLE_OAUTH_CREDENTIALS= | -| google-docs-personal | =GOOGLE_CLIENT_ID=, =GOOGLE_CLIENT_SECRET=, =GOOGLE_MCP_PROFILE= | -| google-docs-work | Same three vars (different values) | -| google-keep | =GOOGLE_EMAIL=, =GOOGLE_MASTER_TOKEN= | - -* Design - -** Module split - -Implementation lives in =modules/ai-mcp.el=. =modules/ai-config.el= -gains only autoload declarations and the =C-; a C= subprefix -wiring. - -** Code organization outline (=ai-mcp.el=) - -The file is organized in seven sections so it stays readable as -features land: - -1. *Constants and defcustoms* -- =cj/mcp-server-specs=, - =cj/mcp-claude-config=, =cj/mcp-enabled-servers=, - =cj/mcp-start-on-entry-points=, =cj/mcp-startup-timeout=, - =cj/mcp-tool-timeout=, =cj/mcp-tool-confirm-overrides=, - audit-log defcustoms. -2. *Public commands* -- =cj/mcp-ensure-started=, =cj/mcp-hub=, - =cj/mcp-status=, =cj/mcp-list-tools=, - =cj/mcp-restart-failed=, =cj/mcp-restart-server=, - =cj/mcp-stop-all=, =cj/mcp-doctor=, - =cj/mcp-wait-until-ready=. -3. *Pure helpers* -- Claude config reader, =cj/mcp--build-server-alist=, - =cj/mcp--redact=, =cj/mcp--confirm-p=, - =cj/mcp--normalize-description=. -4. *mcp.el compatibility layer* -- 3-5 wrappers around private - API (=mcp--status=, =mcp--tools=, etc.). Single source of - version-drift risk. -5. *Registration pipeline* -- =cj/mcp--register-tool=, - =cj/mcp--register-server-tools=, - =cj/mcp--deregister-server-tools=, - =cj/mcp--registered-tools= hash. -6. *Async state machine* -- =cj/mcp--state=, - =cj/mcp--server-status=, =cj/mcp--on-all-started=, - =cj/mcp--stall-timer=, =cj/mcp--poll-status=. -7. *UI* -- audit-buffer mode, doctor buffer, recovery-pattern - matcher, response prefixes. - -This explicit outline doubles as the file's table of contents in -its commentary block. - -** Server inventory: data first - -The nine servers are described as a defconst of plists, with no -secrets baked in: - -#+begin_src emacs-lisp -(defconst cj/mcp-server-specs - '((:name "linear" - :transport http - :url "https://mcp.linear.app/mcp" - :auth in-protocol - :risk write-capable) - (:name "notion" - :transport http - :url "https://mcp.notion.com/mcp" - :auth in-protocol - :risk write-capable) - (:name "figma" - :transport stdio - :command "npx" - :args ("-y" "figma-developer-mcp" "--stdio") - :secret-args ("--figma-api-key" :figma-api-key) - :auth args-token - :risk arg-leak) - (:name "slack-deepsat" - :transport sse - :url "http://127.0.0.1:13080/sse" - :auth local - :risk write-capable) - (:name "drawio" - :transport stdio - :command "npx" - :args ("-y" "@drawio/mcp") - :auth none - :risk none) - (:name "google-calendar" - :transport stdio - :command "npx" - :args ("-y" "@cocal/google-calendar-mcp") - :env (:GOOGLE_OAUTH_CREDENTIALS t) - :auth oauth - :risk write-capable) - (:name "google-docs-personal" - :transport stdio - :command "npx" - :args ("-y" "@a-bonus/google-docs-mcp") - :env (:GOOGLE_CLIENT_ID t :GOOGLE_CLIENT_SECRET t :GOOGLE_MCP_PROFILE t) - :auth oauth - :risk write-capable) - (:name "google-docs-work" - :transport stdio - :command "npx" - :args ("-y" "@a-bonus/google-docs-mcp") - :env (:GOOGLE_CLIENT_ID t :GOOGLE_CLIENT_SECRET t :GOOGLE_MCP_PROFILE t) - :auth oauth - :risk write-capable) - (:name "google-keep" - :transport stdio - :command "uvx" - :args ("--from" "keep-mcp" "python" "-m" "server.cli") - :env (:GOOGLE_EMAIL t :GOOGLE_MASTER_TOKEN t) - :auth token - :risk write-capable))) -#+end_src - -The same data drives the doctor check list, status labels, and -recovery messages -- a single source of truth keeps them from -drifting. - -** Server enablement - -#+begin_src emacs-lisp -(defcustom cj/mcp-enabled-servers - (mapcar (lambda (s) (plist-get s :name)) cj/mcp-server-specs) - "List of MCP server names to start. -Defaults to every server in `cj/mcp-server-specs'. Set to a -shorter list to disable specific servers without editing the -spec. Changes take effect on next `cj/mcp-restart-failed' or -Emacs restart." - :type '(repeat string) - :group 'cj) -#+end_src - -=cj/mcp--build-server-alist= filters by this list before -returning. A user who wants only =linear= and =drawio= sets: - -#+begin_src emacs-lisp -(setq cj/mcp-enabled-servers '("linear" "drawio")) -#+end_src - -This is the answer to "100+ tools is overwhelming" without -needing per-conversation profiles. - -** Entry-point policy - -Not every GPTel entry point should trigger MCP startup. Quick -ask, rewrite, and magit commit-message generation are -lightweight; spinning up nine subprocesses for a 50-word commit -message is surprising overhead. - -#+begin_src emacs-lisp -(defcustom cj/mcp-start-on-entry-points - '(toggle-gptel) - "GPTel entry points that trigger MCP startup. -Symbols correspond to commands: `toggle-gptel', `gptel-send', -`gptel-quick-ask', `gptel-rewrite-with-directive', -`gptel-magit-generate-message'. Default: only full chat -(`toggle-gptel')." - :type '(repeat symbol) - :group 'cj) -#+end_src - -Each entry-point command checks membership before calling -=cj/mcp-ensure-started=: - -#+begin_src emacs-lisp -(defun cj/toggle-gptel () - ... - (when (memq 'toggle-gptel cj/mcp-start-on-entry-points) - (cj/mcp-ensure-started)) - ...) -#+end_src - -** Claude config reader (mtime-cached, structured returns) - -#+begin_src emacs-lisp -(defcustom cj/mcp-claude-config - (expand-file-name "~/.claude.json") - "Path to the Claude Code config that holds MCP server env vars." - :type 'file - :group 'cj) - -(defvar cj/mcp--config-cache nil - "Cons of (MTIME . PARSED) for `cj/mcp-claude-config'.") - -(defun cj/mcp--read-claude-config () - "Return a structured result describing the Claude config state. -Result shape: - (:ok t :data PLIST) - (:ok nil :reason missing-file) - (:ok nil :reason unreadable) - (:ok nil :reason malformed-json :message STR) -Cached by mtime; subsequent calls reparse only on change." - ...) -#+end_src - -** mcp.el compatibility layer - -All private-API access lives in 3-5 helpers documented with the -upstream commit they target. This is the only file that touches -=mcp--*= names; everything else calls these wrappers. - -#+begin_src emacs-lisp -;; ai-mcp-compat -- isolates private mcp.el API. -;; Verified against upstream commit f10768e (2026-05-16). - -(defun cj/mcp--server-status (connection) - "Return CONNECTION's lifecycle status: connected, error, starting." - (mcp--status connection)) - -(defun cj/mcp--server-tools (connection) - "Return CONNECTION's discovered tool list (plists)." - (mcp--tools connection)) - -(defun cj/mcp--server-name (connection) - "Return CONNECTION's logical server name." - (jsonrpc-name connection)) - -(defun cj/mcp--assert-capabilities () - "Signal `user-error' if any required mcp.el function is missing." - (dolist (fn '(mcp-connect-server mcp-make-text-tool - mcp-hub mcp-hub-start-all-server - mcp-hub-get-all-tool mcp-server-connections)) - (unless (fboundp fn) - (user-error "mcp.el too old; missing %s. Upgrade or switch \ -to local checkout in `ai-mcp.el' use-package block" fn)))) -#+end_src - -If mcp.el renames a slot or changes a return shape, only these -helpers break. Tests cover each helper against stub objects. - -** Startup model: async + state machine + polling - -Three state structures capture lifecycle: - -#+begin_src emacs-lisp -(defvar cj/mcp--state 'idle - "Overall MCP integration state: idle, starting, partial, ready, failed.") - -(defvar cj/mcp--server-status nil - "Alist mapping server name to status plist: - (:state STATE :tool-count N :tools (NAME ...) :last-error STR - :started-at TIME :ready-at TIME)") - -(defvar cj/mcp--stall-timer nil - "Timer guarding against servers that never call back.") -#+end_src - -=cj/mcp-ensure-started= is the only entry point consumers call: - -#+begin_src emacs-lisp -(defun cj/mcp-ensure-started () - "Schedule MCP startup if it hasn't run yet this session. -Returns immediately. Servers spawn asynchronously." - (when (eq cj/mcp--state 'idle) - (setq cj/mcp--state 'starting) - (cj/mcp--assert-capabilities) - (cj/mcp--build-status-from-specs) - (setq mcp-hub-servers (cj/mcp--build-server-alist)) - (message "MCP: starting %d server(s) in background..." - (length mcp-hub-servers)) - ;; The hub callback is opportunistic. We poll status on each - ;; tick and the stall timer is the authoritative deadline. - (mcp-hub-start-all-server #'cj/mcp--on-hub-callback nil nil) - (cj/mcp--start-stall-timer))) -#+end_src - -State transitions and authority: - -- *Hub completion callback (opportunistic).* Triggers an - immediate poll + registration pass for whatever's ready. Does - not signal completion by itself. -- *Stall timer (authoritative deadline).* After - =cj/mcp-startup-timeout= (default 30 s), marks every server - still in =starting= state as =failed= with reason =timeout=, - registers tools from servers that did become ready, transitions - =cj/mcp--state= to its final value (=ready=, =partial=, or - =failed=). -- *Polling (authoritative state).* =cj/mcp--poll-status= walks - =mcp-server-connections= and maps each entry's =mcp--status= to - =cj/mcp--server-status=. Called from the hub callback and from - the stall timer. Servers that transitioned through - =:error-callback= (which the hub doesn't chain into the inited - callback) show up here. - -Properties: - -- =cj/mcp-ensure-started= returns in <100 ms regardless of - subprocess state. -- =mcp-hub-start-all-server= itself is async (third =SYNCP= arg - is =nil=). -- Servers transition independently; tools land in =gptel-tools= - as each server reports inventory. -- A server that never connects, never errors, and never reports - tools is caught by the stall timer. - -** Tool registration pipeline - -Walks =mcp-server-connections= directly, per server, after each -status poll. This gives clean per-server bookkeeping without -parsing the =mcp-SERVER= category prefix: - -#+begin_src emacs-lisp -(defun cj/mcp--register-server-tools (server-name) - "Register every tool from the connected server SERVER-NAME. -Idempotent: re-registration replaces the function pointer -without duplicating menu entries." - (let ((connection (gethash server-name mcp-server-connections))) - (when (and connection - (eq (cj/mcp--server-status connection) 'connected)) - ;; First, deregister any existing tools for this server. - (cj/mcp--deregister-server-tools server-name) - (dolist (raw-tool (cj/mcp--server-tools connection)) - (cj/mcp--register-tool server-name raw-tool)) - (cj/mcp--update-server-status server-name :state 'ready)))) - -(defun cj/mcp--register-tool (server-name raw-tool) - "Register one tool from SERVER-NAME. -RAW-TOOL is the plist from `mcp--tools' (untransformed)." - (let* ((remote-name (plist-get raw-tool :name)) - (gptel-name (format "mcp__%s__%s" server-name remote-name)) - (description (cj/mcp--normalize-description - server-name raw-tool)) - (confirm-p (cj/mcp--confirm-p gptel-name remote-name)) - ;; mcp-make-text-tool builds the closure; we async by default. - (mcp-plist (mcp-make-text-tool server-name remote-name t)) - ;; Rewrite name + description + confirm after mcp.el builds the closure. - (gptel-plist (cj/mcp--rewrite-plist - mcp-plist - :name gptel-name - :description description - :confirm confirm-p - :async t - :category (format "mcp-%s" server-name)))) - (apply #'gptel-make-tool gptel-plist) - (add-to-list 'gptel-tools - (gptel-get-tool - (list (format "mcp-%s" server-name) gptel-name))) - (push gptel-name - (gethash server-name cj/mcp--registered-tools)))) -#+end_src - -Key properties: - -- *Async by default.* All MCP tools register with =:async t=. - This avoids any sync MCP tool call blocking Emacs during - =gptel-send='s tool dispatch. Per-call timeout uses the - timer-race pattern (next subsection). -- *Closure preserves remote name.* =mcp-make-text-tool= built - the function before we rewrote =:name=, so the closure calls - =mcp-call-tool SERVER REMOTE-NAME=, not the prefixed - =mcp__SERVER__TOOL=. -- *Idempotent.* Each registration deregisters first, so - callbacks firing multiple times or restarts don't accumulate - duplicate entries. -- *Description normalization.* See next subsection. - -** Per-call timeout: explicit timer/callback race - -=with-timeout= only supervises the dynamic extent of the form it -wraps. For async tools where the function returns immediately -and the callback fires later, it does nothing. The correct -pattern is an explicit timer: - -#+begin_src emacs-lisp -(defun cj/mcp--wrap-async-with-timeout (server-name remote-name - mcp-callback) - "Return a gptel-compatible async wrapper for an MCP tool. -GPTel calls the returned function with (CALLBACK . ARGS). -The wrapper races MCP's response against `cj/mcp-tool-timeout'; -whichever fires first wins, and late callbacks are ignored." - (lambda (gptel-callback &rest args) - (let* ((done nil) - (timer (run-at-time cj/mcp-tool-timeout nil - (lambda () - (unless done - (setq done t) - (funcall gptel-callback - (format "MCP tool %s/%s \ -timed out after %ds" - server-name - remote-name - cj/mcp-tool-timeout))))))) - (mcp-async-call-tool - (gethash server-name mcp-server-connections) - remote-name - (cj/mcp--args-to-plist args) - (lambda (result) - (cancel-timer timer) - (unless done - (setq done t) - (funcall gptel-callback - (mcp--parse-tool-call-result result)))) - (lambda (code message) - (cancel-timer timer) - (unless done - (setq done t) - (funcall gptel-callback - (format "MCP error %s: %s" code - (cj/mcp--redact message))))))))) -#+end_src - -Both branches set =done= before invoking gptel's callback so a -late response (e.g., MCP responds after the timer fired but -before its sentinel cancels) doesn't deliver twice. Timer is -canceled on the success and error paths. - -The closure that =mcp-make-text-tool= built gets replaced with -this wrapper during registration (the =:function= slot of the -rewritten plist). - -** Confirmation / safety policy - -All enabled tools are registered (per the goal of full -visibility), but write/destructive tools get =:confirm t=. -Classification is name-based: - -#+begin_src emacs-lisp -(defconst cj/mcp--write-name-patterns - '("\\`create\\b" "\\`update\\b" "\\`delete\\b" "\\`remove\\b" - "\\`send\\b" "\\`post\\b" "\\`add\\b" "\\`move\\b" - "\\`invite\\b" "\\`share\\b" "\\`upload\\b" "\\`set\\b" - "\\`patch\\b" "\\`import\\b" "\\`sync\\b" "\\`merge\\b" - "\\`close\\b" "\\`reopen\\b" "\\`archive\\b" "\\`unarchive\\b" - "\\`approve\\b" "\\`reject\\b" "\\`label\\b" "\\`assign\\b" - "\\`reply\\b" "\\`comment\\b" "\\`trash\\b" "\\`restore\\b" - "\\`pin\\b" "\\`unpin\\b" "\\`copy\\b" "\\`rename\\b")) - -(defconst cj/mcp--read-name-patterns - '("\\`get\\b" "\\`list\\b" "\\`read\\b" "\\`search\\b" - "\\`find\\b" "\\`fetch\\b" "\\`view\\b" "\\`query\\b" - "\\`describe\\b" "\\`show\\b" "\\`check\\b")) - -(defcustom cj/mcp-tool-confirm-overrides nil - "Per-tool confirmation overrides. -Alist mapping fully qualified MCP tool name (e.g., -\"mcp__linear__create_issue\") to t or nil. Wins over the -pattern-based classifier." - :type '(alist :key-type string :value-type boolean) - :group 'cj) - -(defun cj/mcp--confirm-p (gptel-name remote-name) - "Return non-nil if the tool should register with `:confirm t'." - (let ((override (assoc gptel-name cj/mcp-tool-confirm-overrides))) - (cond - (override (cdr override)) - ((seq-some (lambda (p) (string-match-p p remote-name)) - cj/mcp--write-name-patterns) t) - ((seq-some (lambda (p) (string-match-p p remote-name)) - cj/mcp--read-name-patterns) nil) - (t t)))) ; unknown → confirm -#+end_src - -This is the second half of the safety story; the first half -(=gptel-confirm-tool-calls 'auto=) is enforced by ai-mcp.el at -load time. - -** Description normalization - -mcp-side descriptions are written for an arbitrary client and -vary in quality. The registration pipeline prefixes a stable -"server / write-risk" header so the agent and the user have -consistent context: - -#+begin_src emacs-lisp -(defun cj/mcp--normalize-description (server-name raw-tool) - "Return a normalized description string for RAW-TOOL. -Prefix: [SERVER] for reads, [SERVER WRITE] for writes, -[SERVER ?] for unknown classification. Then the upstream -description, unchanged." - (let* ((remote-name (plist-get raw-tool :name)) - (upstream (or (plist-get raw-tool :description) - "(no description provided by server)")) - (cls (cond - ((seq-some (lambda (p) (string-match-p p remote-name)) - cj/mcp--write-name-patterns) "WRITE") - ((seq-some (lambda (p) (string-match-p p remote-name)) - cj/mcp--read-name-patterns) "") - (t "?")))) - (format "[%s%s] %s" server-name - (if (string-empty-p cls) "" (concat " " cls)) - upstream))) -#+end_src - -Tools in =gptel-menu= show as: - -#+begin_example -[linear WRITE] Create a new Linear issue in a team. -[linear] List issues in a Linear team. -[google-keep ?] Frobnicate a note (unknown classification). -#+end_example - -** Tool deregistration - -Three triggers remove tools from =gptel-tools=: - -- =cj/mcp-stop-all= -- removes every MCP-registered tool, clears - =cj/mcp--registered-tools=, calls =mcp-stop-server= per server. - Local tools untouched. -- =cj/mcp-restart-server= -- removes tools for the named server - before re-registering. -- =cj/mcp-restart-failed= -- deregister + re-register each - =failed= server. - -#+begin_src emacs-lisp -(defun cj/mcp--deregister-server-tools (server-name) - "Remove every GPTel tool this integration registered for SERVER-NAME. -Local tools (those not in `cj/mcp--registered-tools') are -preserved." - (let ((mcp-tools (gethash server-name cj/mcp--registered-tools))) - (dolist (tool-name mcp-tools) - (setq gptel-tools - (cl-remove-if (lambda (tool) - (string= (gptel-tool-name tool) tool-name)) - gptel-tools))) - (remhash server-name cj/mcp--registered-tools))) -#+end_src - -** Secrets redaction - -=cj/mcp--redact= masks every secret-bearing field before any -string surfaces in messages, errors, status buffers, hub -displays, or test fixtures. Used by status formatting, audit -buffer, failure surfacing, and error wrappers. Tests assert -sentinel =REDACTED_TEST_SECRET= never appears in any user-facing -output. - -** Process cleanup - -=kill-emacs-hook= gets one entry: =cj/mcp-stop-all=. Stdio -process sentinels record abnormal exits into the per-server -status. - -*Local-only constraint.* MCP server processes are always spawned -under the local Emacs's =default-directory='s root. TRAMP / -remote buffers do not relocate the spawn; MCP processes are -local-only. This is enforced implicitly (mcp-hub-start-all-server -uses =make-process= which is local) and documented in the -commentary. - -** Privacy: external tool output in saved conversations - -MCP tool results land in the GPTel buffer. GPTel's autosave (when -enabled) persists those results to -=~/.emacs.d/ai-conversations/=. Concrete implications: - -- A Slack channel excerpt the agent fetched is now on disk. -- A Google Docs snippet the agent quoted is now on disk. -- A Linear issue body the agent read is now on disk. - -This is normal external-tool behavior, but users may not realize -it. Two mitigations: - -- =ai-mcp.el='s commentary explicitly documents the autosave - privacy implication. -- The audit buffer (=cj/mcp-list-tools=) includes a header note: - "Tool results land in =gptel-tools= responses; saved - conversations persist them. Use =cj/gptel-autosave-toggle= per - buffer to opt out." - -A future enhancement (V1.5+) could mark MCP-sourced tool output -with a visible delimiter in the GPTel buffer so the user sees -"this came from an external server" during the chat. Not v1. - -** Per-server auth matrix - -| =:auth= value | Servers | How it works | Recovery if failed | -|---------------+---------+--------------+--------------------| -| =in-protocol= | linear, notion | mcp.el HTTP transport handles OAuth handshake | Open URL surfaced in =cj/mcp-status= | -| =local= | slack-deepsat | Local SSE; no auth but proxy must run | Start the local proxy | -| =none= | drawio | No auth | n/a | -| =args-token= | figma | API key in process args | Update Claude config, restart | -| =oauth= | google-calendar, google-docs-* | OAuth token in env; refresh out-of-band | Re-auth via Claude Code, restart | -| =token= | google-keep | Long-lived token in env | Regenerate, update Claude config, restart | - -Recovery surfaces via pattern matching on errors: - -#+begin_src emacs-lisp -(defconst cj/mcp--recovery-patterns - '(("\\(401\\|unauthorized\\|token expired\\)" - . "Token expired -- re-auth via Claude Code, then C-; a C r SERVER") - ("\\(connection refused\\|ECONNREFUSED\\)" - . "Local endpoint unreachable -- check the upstream service is running") - ("\\(ENOENT\\|command not found\\)" - . "Missing dependency -- run `cj/mcp-doctor' to diagnose"))) -#+end_src - -** Timeouts - -| Timeout | Variable | Default | Behavior | -|---------+----------+---------+----------| -| Startup / tool discovery | =cj/mcp-startup-timeout= | 30 s | Server marked =failed/timeout=; integration continues with ready servers | -| Per-call tool execution | =cj/mcp-tool-timeout= | 60 s | Tool call returns timeout string to agent via the timer-race wrapper; server stays connected | - -Both via defcustoms. Per-tool override possible via -=cj/mcp-tool-timeout-overrides= alist. - -* Status UX - -** Echo-area summary: =cj/mcp-status= - -Single-line summary keyed off =cj/mcp--state=: - -| State | Echo | -|-------+------| -| idle | =MCP: not started. (C-; a t triggers it.)= | -| starting | =MCP: starting (3/9 ready)...= | -| partial | =MCP: 7/9 ready (failed: google-calendar, slack-deepsat).= | -| ready | =MCP: 9/9 ready, 87 tools registered.= | -| failed | =MCP: all 9 servers failed. C-; a C d to diagnose.= | - -** Wait until ready: =cj/mcp-wait-until-ready= - -For the case where the agent asks for a Calendar tool immediately -after =cj/toggle-gptel=: - -#+begin_src emacs-lisp -(defun cj/mcp-wait-until-ready (&optional timeout) - "Block until MCP startup completes or TIMEOUT seconds pass. -TIMEOUT defaults to `cj/mcp-startup-timeout'. Returns the final -state symbol (`ready', `partial', `failed', `starting')." - (interactive) - ...) -#+end_src - -Bound to =C-; a C w=. Reports progress every second via -=message= so the user sees countdown. - -** Audit buffer: =cj/mcp-list-tools= - -Tabulated-list buffer. Failed servers appear at the top with a -red face so they're visually obvious. Each row: - -#+begin_example -Server State Tools Confirm Description --------------------- -------- ----- ------- ------------------------------ -google-calendar FAILED 0 - Token expired; see status -slack-deepsat FAILED 0 - Local proxy unreachable --------------------- -------- ----- ------- ------------------------------ -linear/list_issues ready - no List issues in a Linear team -linear/create_issue ready - YES Create a new Linear issue -linear/... ready 44 total -... -#+end_example - -Sort: failed servers first, then by server name, then by tool -name. Keys: =g= refresh, =RET= jump to tool's category in -gptel-menu, =r= restart server under point, =c= toggle confirm -override for tool under point. - -* Commands & Keymap - -=C-; a C= becomes the MCP (Connect) subprefix. Existing keys are -preserved: =M= keeps =gptel-menu=, =m= keeps -=cj/gptel-change-model=. - -| Key | Command | Purpose | -|-----+---------+---------| -| =C-; a C h= | =cj/mcp-hub= | Open server-management buffer | -| =C-; a C s= | =cj/mcp-status= | Echo state summary | -| =C-; a C l= | =cj/mcp-list-tools= | Open audit buffer | -| =C-; a C r= | =cj/mcp-restart-failed= | Restart failed servers | -| =C-; a C R= | =cj/mcp-restart-server= | Restart a named server | -| =C-; a C S= | =cj/mcp-stop-all= | Stop everything | -| =C-; a C d= | =cj/mcp-doctor= | Diagnose prerequisites | -| =C-; a C w= | =cj/mcp-wait-until-ready= | Block until ready | - -which-key labels mirror the table. - -** cj/mcp-doctor - -Diagnostic command. Two modes: - -- *Static* (default) -- no side effects, no network: capability - check, =npx=/=uvx= on PATH, Claude config parseability, - per-server env-var presence, local endpoint reachability. -- *Live* (=C-u C-; a C d=) -- opt-in: invokes a single safe read - per auth class to verify OAuth tokens haven't silently expired. - For example, =gh_search= against linear is one tool call; same - for notion, google-calendar, google-docs, google-keep. Static - checks first; live probe only fires if static passes. - -Output buffer keys: - -- =c= copy diagnostic summary to kill ring (for pasting into - bug reports / notes). -- =r= rerun all failed checks. -- =q= quit. - -Each check row formats as: =PASS / FAIL / WARN CHECK RECOVERY=. - -* Implementation Plan - -Eight phases (was seven in rev 2; added Phase 1.5 for the -confirmation-contract setup). Each ends with green ERT tests + -a manual smoke test before the next. - -** Phase 1 -- Module + pure helpers - -Create =modules/ai-mcp.el=. Implement: =cj/mcp-server-specs=, -=cj/mcp-enabled-servers=, =cj/mcp-start-on-entry-points=, -=cj/mcp--read-claude-config=, =cj/mcp--get-env=, -=cj/mcp--build-server-alist= (pure transformer; filters by -=cj/mcp-enabled-servers=), =cj/mcp--redact=, -=cj/mcp--confirm-p=, =cj/mcp--normalize-description=. - -Tests cover all of the above against fixtures. - -** Phase 1.5 -- Confirmation contract - -Flip =gptel-confirm-tool-calls= to ='auto= in =ai-mcp.el='s -setup. Remove the =(setq gptel-confirm-tool-calls nil)= line -from =ai-config.el=. Audit existing local tools and add -=:confirm t= to any that should be gated (=web_fetch= -guaranteed; =write_text_file=, =update_text_file=, -=move_to_trash= per Craig's decision). - -Verification test in =tests/test-ai-mcp-confirm-contract.el= -asserts the setting, the local-tool gating behavior, and that -=git_log= (=:confirm nil=) still runs without prompting. - -** Phase 2 -- Compat layer + tool registration with fake inventory - -Add =ai-mcp-compat= helpers. Build the registration pipeline -against a stubbed =mcp-server-connections=. Verify: -- Tool name rewriting (=remote-name= stays in closure; - =gptel-name= is =mcp__SERVER__TOOL=). -- =gptel-make-tool= + explicit =add-to-list 'gptel-tools=. -- =:confirm= application per policy + overrides. -- Description normalization adds expected prefix. -- =cj/mcp--registered-tools= bookkeeping. -- Deregister removes from =gptel-tools= without disturbing local - tools (test pre-populates =gptel-tools= with a local tool and - asserts it survives). -- Re-register after deregister doesn't duplicate. -- All tools register with =:async t=. - -** Phase 3 -- Async state machine + timeout wrapper - -Implement =cj/mcp-ensure-started=, =cj/mcp--on-hub-callback=, -=cj/mcp--state= transitions, stall timer, polling. Implement -=cj/mcp--wrap-async-with-timeout=. Stub -=mcp-hub-start-all-server= with synthetic delayed callbacks and -synthetic async errors. - -Verify: -- =cj/mcp-ensure-started= returns in <100 ms regardless of stubs. -- Hub callback triggers status poll + registration. -- Stall timer fires for stuck servers. -- Async error path (=:error-callback= without inited callback) - reaches =cj/mcp--server-status= via polling. -- =cj/mcp--wrap-async-with-timeout=: timer-first ignores late MCP - response; MCP-first cancels timer; both branches deliver - exactly once. - -** Phase 4 -- First real connection (no auth) - -Wire one =drawio= or =slack-deepsat= server. Verify the stubbed -Phase 3 behavior matches real subprocesses. - -** Phase 5 -- Status UX + commands + doctor (static) - -Implement =cj/mcp-status=, =cj/mcp-list-tools= (with failed -servers at top, red face), =cj/mcp-doctor= (static mode only), -=cj/mcp-wait-until-ready=, restart commands, keymap entries, -which-key labels. - -Investigate =gptel-menu= refresh behavior: if the transient is -already open when a new tool registers, does it pick it up on -next invocation? Document and add an acceptance test: -"register new tool while gptel-menu is open; close and reopen; -new tool appears." - -** Phase 6 -- HTTP servers - -Add =linear= and =notion=. Test in-protocol OAuth handshake. -Add live-auth-check mode to doctor. - -** Phase 7 -- Env-dependent stdio servers - -Add =figma=, =google-calendar=, =google-docs-personal=, -=google-docs-work=, =google-keep=. - -** Phase 8 -- Privacy + audit polish - -Add audit buffer's privacy header, autosave commentary, audit-log -hygiene (=cj/mcp-tool-audit-log-enabled= defcustom). Update -=ai-mcp.el= commentary with the code-organization outline as a -TOC. - -* Test Plan - -** Confirmation contract (Phase 1.5) - -- =gptel-confirm-tool-calls= is ='auto= after =ai-mcp= loads. -- Write-classified MCP tool with =:confirm t= triggers confirm - prompt (stub gptel's prompt and assert it fires). -- Read-classified MCP tool with =:confirm nil= does not trigger - prompt. -- Pre-existing =git_log= (=:confirm nil=) does not trigger - prompt. -- =web_fetch= (newly gated with =:confirm t=) triggers prompt. - -** Pure helpers (Phase 1-2) - -- =cj/mcp--read-claude-config=: good fixture, missing, unreadable, - malformed JSON, missing =:mcpServers=, missing server, empty - env, non-string env values. Cache invalidation on mtime - change. -- =cj/mcp--build-server-alist=: each transport, each auth class, - env merge, args splicing for figma, no mutation of - =cj/mcp-server-specs=, filter by =cj/mcp-enabled-servers=. -- =cj/mcp--redact=: bearer tokens, OAuth credentials, - TOKEN/KEY/SECRET/CREDENTIALS-suffixed env vars, URL query - tokens, figma args slot. Sentinel never leaks. -- =cj/mcp--confirm-p=: read patterns, write patterns, unknown → - t, override map wins. -- =cj/mcp--normalize-description=: prefix shape per - classification. - -** Registration pipeline (Phase 2) - -- Single tool registers into all three structures. -- Two tools with same =:name= from different servers don't - collide. -- Re-registration replaces function pointer without duplicating. -- Deregister removes from =gptel-tools= without touching a - pre-populated local tool. -- All tools register with =:async t=. -- Confirm overrides win over patterns. - -** Compat layer (Phase 2) - -- Each =cj/mcp--*-server-*= helper returns expected value against - stub object. -- =cj/mcp--assert-capabilities= signals when a required function - is missing. - -** State machine + timeout (Phase 3) - -- =cj/mcp-ensure-started= from idle returns in <100 ms with - delayed-callback stub. -- Second call from =starting= is no-op. -- Hub callback triggers status poll. -- Stall timer marks slow servers =failed/timeout=. -- Async error path: server emits error callback only (no inited - callback); polling catches it and marks =failed/error= within - stall window. -- =cj/mcp--wrap-async-with-timeout=: - - MCP responds first, before timer: gptel callback fires once - with result; timer canceled. - - Timer fires first: gptel callback fires once with timeout - message; late MCP response is ignored (done flag). - - Error callback: cancels timer, fires once with redacted - error. - -** Async / freeze (Phase 3) - -- Stub =mcp-hub-start-all-server= to delay callbacks 5 s. - =cj/toggle-gptel= returns within 250 ms; buffer is - interactive. -- Stub a server that never responds. After - =cj/mcp-startup-timeout=, server is =failed=, - =cj/mcp--state= is =partial= or =failed=. - -** Entry-point policy (Phase 5) - -- With =cj/mcp-start-on-entry-points '(toggle-gptel)=, calling - =cj/toggle-gptel= triggers startup; calling - =cj/gptel-quick-ask= does not. -- Adding =gptel-quick-ask= to the list makes quick-ask trigger. - -** Local-tool preservation (Phase 2) - -- =cj/mcp-stop-all= removes only MCP-registered tools from - =gptel-tools=; local tools like =git_log= remain. -- =cj/mcp-restart-server= removes only that server's tools; - other MCP servers' tools and local tools both remain. - -** Process / cleanup (Phase 4+) - -- =kill-emacs-hook= calls =cj/mcp-stop-all=; subprocesses exit. -- =cj/mcp-stop-all= clears =gptel-tools= MCP entries and - =cj/mcp--registered-tools=. -- Restart doesn't leak process buffers or duplicate process - objects. -- Process sentinel records abnormal exits into status. - -** Partial availability (Phase 4+) - -- 8 of 9 servers ready, 1 failed: ready tools available; - failed-server tools absent. -- Restart-failed only retries the failed one. - -** Saved-conversation behavior (Phase 7+) - -- After a successful MCP tool call, GPTel autosave (when on) - persists the tool result. Test asserts the saved file - contains the result text and the audit buffer's privacy - header is updated. -- =cj/gptel-autosave-toggle= off → result not saved. - -** Keymap (Phase 5) - -- =C-; a C h/s/l/r/R/S/d/w= all bound after =ai-mcp= loads. -- Existing =C-; a M=, =C-; a m= still bound to - =gptel-menu=, =cj/gptel-change-model=. -- which-key labels present for every new binding. -- No duplicate labels. - -** =gptel-menu= refresh (Phase 5) - -- Register new tool while a previously-opened gptel-menu is - closed; reopen; new tool appears. Document whether mid-open - refresh works. - -** No-real-process rule - -All tests in =tests/test-ai-mcp*= use stubs for =process-file=, -=make-process=, =mcp-hub-start-all-server=, -=mcp-server-connections=, =mcp--tools=, =mcp--status=. No real -=npx=, no network, no real =~/.claude.json=. - -** Manual test matrix - -| Scenario | Expected | -|----------+----------| -| No =~/.claude.json= | Doctor warns; env-free servers still start | -| Malformed Claude config | Status shows =malformed-json=; integration =failed= cleanly | -| Network offline | HTTP servers fail; stdio servers start; status =partial= | -| =npx= not on PATH | Doctor flags it; stdio servers fail with clear message | -| One stdio server exits immediately | Sentinel records failure; others continue | -| slack-deepsat endpoint down | Server =failed=; recovery message points at local proxy | -| Google token expired | Server starts; tool calls fail; live-auth check (=C-u doctor=) surfaces it | -| All servers available | =MCP: 9/9 ready, ~N tools registered= | -| =cj/mcp-restart-failed= after fix | Only retried servers transition | -| =cj/mcp-stop-all= then call a tool | Tool absent from =gptel-tools= | -| Disable a server via defcustom | Doctor and status reflect the absence | -| TRAMP buffer open + =cj/toggle-gptel= | MCP starts locally; no remote spawn | - -* Acceptance Criteria - -1. *No freeze.* =cj/toggle-gptel= returns in <250 ms with mcp.el - wired and nine real servers starting in background. -2. *Incremental registration.* As each server reports tools, - =gptel-tools= updates; in-flight =gptel-send= calls see - newly-added tools on the next request. -3. *No MCP failure breaks ordinary GPTel chat.* With every MCP - server failing, =cj/toggle-gptel= still opens a usable chat - buffer; non-tool prompts work normally; local tools (git_log - etc.) still callable. -4. *Confirm gate works.* After =ai-mcp= loads, a write-classified - MCP tool actually prompts before invocation. Verified by a - test that fails if =mcp-async-call-tool= is invoked before - gptel's confirm-prompt stub fires. -5. *Local-tool preservation.* =cj/mcp-stop-all= and per-server - restart remove only MCP-owned tools. -6. *Partial availability.* With one failed server, status is - =partial=, ready servers' tools work, failed server's tools - absent. -7. *Idempotent restart.* Calling =cj/mcp-restart-failed= twice - with no intervening change produces identical state. -8. *No secret leakage.* Grep every user-facing output for - sentinel fixture secrets; zero matches. -9. *Doctor coverage (static).* Identifies each diagnosable - failure in the manual test matrix. -10. *Server enablement.* Setting =cj/mcp-enabled-servers= to a - subset starts only those servers; doctor reports the disabled - ones as expected-absent. - -* Risks - -** R1 -- mcp.el API drift behind compat layer - -Even with the compat layer, an upstream rename could break us if -the capability check misses it. - -*Mitigation:* compat helpers document the upstream commit and -file location. Tests cover each helper against stub objects; -when mcp.el bumps, run those tests first. - -** R2 -- Cold-start latency for nine subprocesses - -Nine =npx -y= invocations cold-start over several seconds. Time -to =ready= state may be 10-30 seconds on a cold machine. - -*Mitigation:* async model means user doesn't wait. Tools arrive -incrementally; status indicator shows progress. -=cj/mcp-wait-until-ready= for the rare case where the agent needs -a specific tool immediately. - -** R3 -- OAuth token expiry surfaces silently - -A Google server starts cleanly but every tool call fails with -auth errors. - -*Mitigation:* the OAuth recovery pattern matcher inspects every -tool-call error. Live-auth-check mode in doctor proactively -calls one safe read per auth class. - -** R4 -- Tool count balloons gptel-menu - -Up to 100+ tools. Even with category grouping, the transient -menu is large. - -*Mitigation:* =cj/mcp-enabled-servers= lets users disable -servers they don't need. Audit buffer is the alternate browser. -Per-conversation profiles deferred to v1.5. - -** R5 -- =~/.claude.json= schema change - -Parser breaks if Anthropic restructures the file. - -*Mitigation:* =cj/mcp--read-claude-config= returns structured -errors that surface in status and doctor. Integration degrades -to "env-free servers work, env-dependent servers fail" rather -than crashing. - -** R6 -- Process argument leakage (figma) - -figma's API key is in process args -- visible via =ps=, -=/proc/PID/cmdline=, =list-system-processes=. - -*Mitigation:* accepted risk (the figma package only supports -args-token). =cj/mcp--redact= ensures the key never appears in -Emacs-side output. Commentary flags this. - -** R7 -- Confirmation fatigue from unknown-classification tools - -Default for unknown is =:confirm t=. A server with many tools -matching neither read nor write pattern produces many prompts. - -*Mitigation:* audit buffer surfaces unknown classifications with -their confirm state. =cj/mcp-tool-confirm-overrides= alist lets -the user pin specific tools to =nil= once vetted. A "review -unknowns" doctor pass could enumerate them on demand (v1.5). - -** R8 -- Subprocess accumulation across sessions - -If =kill-emacs-hook= is bypassed (kill -9, crash), subprocesses -persist. - -*Mitigation:* =cj/mcp-doctor= can detect orphaned mcp processes -via =list-system-processes= (v1.5 enhancement). - -* Open Questions - -** Q1 -- Should =gptel-menu= refresh after mid-call tool registration? - -Investigation during Phase 5. If gptel-menu's transient caches -=gptel-tools= at open time, mid-call additions won't appear -until close+reopen. Document the behavior; if it's a real -pain, file a gptel upstream issue. - -** Q2 -- Should write-confirmation overrides be per-host? - -A v1.5 question: when per-conversation profiles land, the -override alist could also be scoped (e.g., "in /work/ folder, -auto-confirm google-docs-work writes"). Out of v1 scope. - -** Q3 -- Auth-source migration path for OAuth tokens - -Three candidate paths (Elisp OAuth client / Claude Code refresh -script / timer-based refresh) all have meaningful complexity. -Until one is viable, =~/.claude.json= stays the source. - -** Q4 -- Live-auth check cadence - -Doctor's live-auth mode is opt-in. Should there be a periodic -auto-check (every N hours via timer) that catches expiry between -explicit doctor runs? Adds complexity; defer until usage shows -need. - -* Considered Alternatives - -** =gptel-mcp.el= (declined; cited as prior art) - -[[https://github.com/lizqwerscott/gptel-mcp.el][lizqwerscott/gptel-mcp.el]] is a 96-line wrapper from the same -author as mcp.el that bridges mcp.el's tool inventory into gptel. -It exposes five functions: - -- =gptel-mcp-register-tool= -- walks =mcp-hub-get-all-tool= and - calls =gptel-make-tool= on each plist. -- =gptel-mcp-activate-all-tool= -- pushes tools into - =gptel-tools=. -- =gptel-mcp-deactivate-all-tool= -- removes them by category + - name lookup. -- =gptel-mcp-start-all-server-and-register= -- chains - =mcp-hub-start-all-server= with the register callback. -- =gptel-mcp-dispatch= -- a transient menu with three keys (=s= - start, =A= activate, =C= deactivate). - -*Why this matters for the spec.* The package independently -validates the integration shape this spec converged on: -=mcp-hub-start-all-server= → walk connections → =gptel-make-tool= -→ =add-to-list gptel-tools= is the canonical path. - -*Why we are not adopting it.* The package solves the trivial -"wire tools through" problem but skips every concern this spec -exists to address: - -| Concern | Spec | gptel-mcp.el | -|---------+------+--------------| -| GPTel confirmation contract (=gptel-confirm-tool-calls 'auto=) | Required precondition | Not addressed | -| Tool-name collisions | Rewrites to =mcp__SERVER__TOOL= | Silent overwrite | -| Confirm-on-write policy | Per-tool =:confirm t= for writes | All tools register with default | -| Async startup contract | =cj/mcp-ensure-started= returns in <100 ms | Synchronous-feel start | -| Async per-call timeout | Explicit timer/callback race | None | -| State machine | =cj/mcp--state= + per-server status | None | -| Server identity strategy | Walks =mcp-server-connections= directly | Uses =mcp-hub-get-all-tool= + parses =mcp-SERVER= | -| Secrets handling | Reads env from =~/.claude.json= with mtime cache | None | -| Deregistration tracking | =cj/mcp--registered-tools= hash, preserves local tools | Removes by category, no local-tool guarantee | -| Server enablement | =cj/mcp-enabled-servers= defcustom | None | -| Entry-point scoping | =cj/mcp-start-on-entry-points= defcustom | Manual via dispatch menu | -| Status UX | =cj/mcp-status=, =cj/mcp-list-tools= audit buffer | Just dispatch menu | -| OAuth recovery | Pattern matcher with per-auth-class recovery | None | -| Secrets redaction | =cj/mcp--redact= applied everywhere | None | -| mcp.el compat layer | Isolated wrappers around private API | Direct =mcp--*= access scattered | -| Tests | 10 acceptance criteria + manual matrix + no-real-process | None | -| Doctor / live-auth check | Static + live-probe diagnostic | None | - -Adopting it would force shipping safely or wrapping with -everything specified above (two layers, no code reduction). - -*What we are taking from it.* Confidence the API path is right. -The transient-dispatch UX was considered for the keymap (rev 2), -but the keymap is now pinned to discrete commands under =C-; a -C= so existing GPTel keys aren't disturbed (rev 3). - -* References - -- [[file:../../modules/ai-config.el][modules/ai-config.el]] -- =gptel-confirm-tool-calls nil= at - line 386 (removed by this integration); loader at lines 71-96. -- [[file:gptel-tools-shortlist.org][gptel-tools-shortlist.org]] -- local-tools shortlist; MCP servers - slot in as the "external" tier. -- [[file:gptel-agentic-tool-ideas.org][gptel-agentic-tool-ideas.org]] -- broader agentic-tool design. -- [[id:a124dd0f-1f40-4533-aeb8-595d93e20865][gptel-gh-tool-spec.org]] -- sibling design; same confirm-on-write - pattern. -- [[https://github.com/lizqwerscott/mcp.el][lizqwerscott/mcp.el]] -- upstream. -- [[https://github.com/lizqwerscott/gptel-mcp.el][lizqwerscott/gptel-mcp.el]] -- considered and declined; see § - Considered Alternatives. -- =~/code/mcp.el/mcp-hub.el:53-90,131-160= -- verified callback - ownership and start-all helper. -- =elpa/gptel-0.9.8.5/gptel.el:1595-1607,2244-2245= -- verified - =gptel-confirm-tool-calls= semantics and tool-confirm gate. -- =elpa/gptel-0.9.8.5/gptel.el:1729-1820= -- verified - =gptel-make-tool= registration. -- =~/.claude.json= -- Claude Code config. |
