diff options
| author | Craig Jennings <c@cjennings.net> | 2026-05-17 01:16:03 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-05-17 01:16:03 -0500 |
| commit | 2b61aa82afa39a0ff1b165fa9ff09d55d21bfabf (patch) | |
| tree | 0ff4916e842caff081bfa2a70d8753dc6fa21044 /docs/design/gptel-gh-tool.org | |
| parent | 1fef9aed248aaf7586950475aa22849e413f1c04 (diff) | |
| download | dotemacs-2b61aa82afa39a0ff1b165fa9ff09d55d21bfabf.tar.gz dotemacs-2b61aa82afa39a0ff1b165fa9ff09d55d21bfabf.zip | |
docs(design): MCP-into-gptel + gh-as-gptel-tool specs + MCP phases
Two new design docs in docs/design/ covering the next two GPTel
work items, plus matching task scaffolding in todo.org.
mcp-el-gptel-integration.org wires mcp.el into the config so GPTel
gets access to the nine MCP servers Claude Code already uses
(linear, notion, figma, slack-deepsat, drawio, google-calendar,
google-docs-personal, google-docs-work, google-keep). The design
covers async startup, the write-confirmation policy, a
server-enablement defcustom, a doctor with live-auth-check, the
audit buffer, and the mcp.el compatibility layer. The spec is at
revision 3 after two code-review passes flagged a critical
confirmation gap (gptel-confirm-tool-calls nil at ai-config.el:386
silently ignored per-tool :confirm slots) and several incorrect
mcp.el API assumptions. Both are addressed.
gptel-gh-tool.org wraps the gh CLI as a hybrid surface: 14 typed
read wrappers plus one general write tool gated by :confirm t.
Host/repo resolution is command-aware: --repo HOST/OWNER/REPO for
repo commands, --hostname only for api and auth status. The runner
enforces an irreversible-command blocklist, a 64KB in-flight output
cap, and a debug-record plus last-error-buffer story. The spec is
at revision 2 after a code-review pass corrected gh flag
assumptions and reframed the safety story around per-tool confirm.
todo.org gains a link to the MCP spec under the parent task plus
nine TODO sub-tasks (one per implementation phase), and a new
gh-tool TODO with the same spec-link shape.
Diffstat (limited to 'docs/design/gptel-gh-tool.org')
| -rw-r--r-- | docs/design/gptel-gh-tool.org | 1061 |
1 files changed, 1061 insertions, 0 deletions
diff --git a/docs/design/gptel-gh-tool.org b/docs/design/gptel-gh-tool.org new file mode 100644 index 00000000..a9ba22bb --- /dev/null +++ b/docs/design/gptel-gh-tool.org @@ -0,0 +1,1061 @@ +#+TITLE: Design: Wrap the gh CLI as a GPTel tool +#+AUTHOR: Craig Jennings +#+DATE: 2026-05-16 +#+OPTIONS: toc:nil num:nil + +* Status + +Draft (revision 2). Pre-implementation; no code shipped yet. =gh= +v2.92.0 installed and authenticated against both =github.com= (account +=cjennings=) and =deepsat.ghe.com= (account =craig-jennings=). + +Revision 2 incorporates a code-review pass that caught several +incorrect =gh= CLI assumptions and a critical safety gap: with +=gptel-confirm-tool-calls= set to =nil= in =modules/ai-config.el:386=, +the "irreversible-only blocklist" V1 from revision 1 would let the +agent run unconfirmed writes (=pr merge=, =run cancel=, =api -f +body=...=, =release create=, etc.). Revision 2 keeps Craig's intent +of a general-capability tool but applies =:confirm t= to every +classified write, drops the "scan command string for =--hostname= +and =-H=" approach in favor of argv-list builders, and expands the +wrapper set so most agent workflows don't need to fall back to the +general tool. + +* Problem + +GPTel agents can read git history locally (=git_log=, =git_diff=, +=git_status=) but have no access to the GitHub side of the workflow: +PRs, issues, reviews, CI runs, releases, gists, repo metadata. The +=gh= CLI does all of this and is already authenticated against both +hosts the user works with. Wrapping it as a GPTel tool gives the +agent the same GitHub surface the user has. + +The wrapper has to handle four complications the local git tools +don't: + +1. *Two authenticated hosts* -- personal (=github.com=) and work + GHE (=deepsat.ghe.com=) -- with different policies and different + blast radii for destructive operations. +2. *A vast subcommand surface* (~30 top-level subcommands, hundreds + of leaves) that doesn't fit cleanly into one typed schema per + leaf. +3. *Inconsistent flag semantics across subcommands.* =--hostname= + exists only on =gh api= and =gh auth status=; the rest use + =--repo [HOST/]OWNER/REPO=. =-H= means =--head= on =gh pr list= + but =--header= on =gh api=. Naive string-scanning to detect + user intent is wrong. +4. *Global GPTel setting* (=gptel-confirm-tool-calls nil=) means + the agent can call any registered tool without user confirmation + unless that specific tool is registered with =:confirm t=. V1 + safety must come from per-tool flags, not from a session-level + gate. + +* Goals + +1. The agent can invoke the non-interactive subset of =gh= against + either host, gated by a safety policy. +2. High-frequency reads have typed wrappers with sensible defaults + (host/repo resolution, JSON-by-default with minimal fields, + output cap, timeout). Wrappers auto-execute. +3. The active host is resolved from a single context object that + considers (in order): explicit =hostname=/=repo= args, branch + upstream remote, =origin= remote, =cj/gh-default-host=. Every + tool response prefixes the resolved host/repo so cross-host + mistakes are visible. +4. Writes via the general tool execute only after user + confirmation (=:confirm t= on the gptel tool registration). + Unknown classifications fail closed (=:confirm t=). + Irreversible commands hard-block. +5. Interactive commands (=auth login=, =codespace ssh=, =--web=, + =browse=, =pr checkout=, =repo clone=, =config set=, + =alias set=, =extension exec=) are hard-blocked in V1. +6. File exfiltration paths (=api --input=, =api -F key=@file=, + =release upload=, =gist create=) are hard-blocked in V1. +7. Output is capped *during capture*, not after, so a runaway + =--paginate= or =--log= can't fill memory or block Emacs. + Long-output commands have higher timeouts but the same byte + cap. +8. Every call produces a structured debug record (host, repo, cwd, + sanitized argv, classification, policy decision, exit code, + duration, bytes captured, truncation flag, error kind) inspectable + via =cj/gh-tool-last-error=. +9. =cj/gh-doctor= diagnoses missing prerequisites (=gh= + executable, version floor, auth per host, cwd repo detection, + environment overrides) before they fail at runtime. +10. V2 adds true per-host policy (currently uniform across both + hosts), profile-based tool subsets, and bridge helpers between + local git tools and the gh wrappers. + +* Non-Goals + +- *Interactive subcommands.* =gh auth login=, =gh codespace ssh=, + =gh pr checkout= (modifies working tree), =gh repo clone= + (writes outside controlled paths) are not in scope. The user + runs those in a real terminal. +- *Replicating gh extensions.* Core gh only. Extensions can be + invoked from a regular terminal if needed. +- *Streaming long-running output.* =gh run watch=, =gh + --paginate= for unbounded result sets, =gh repo clone= for large + repos. V1 blocks =--paginate= and =watch= verbs; future + versions may add streaming support. +- *Per-host write policies* (work-read-only / personal-full-write). + V1 applies the same write-confirmation policy to both hosts. + V2 makes the policy host-keyed. +- *Conversation-context injection* and *local/remote bridge + helpers* (current branch → PR, changed files → PR comments). + Future enhancements; tracked as a follow-up. +- *Artifact download* (=gh run download=). Writes files; needs + its own confirmation flow. Deferred to V2. + +* Verified gh CLI Contracts + +These were checked against =gh --version 2.92.0= help output before +the spec was revised. Implementation can rely on them. + +** Host / repo selection per subcommand + +- =gh api= and =gh auth status=: use =--hostname HOST= (long form + only; no short form). +- =gh pr {view,list,diff,checks}=, =gh issue {view,list}=, + =gh run {view,list}=, =gh release *=, =gh search *=: use + =--repo [HOST/]OWNER/REPO= (short =-R=). +- =gh repo view= and similar root-level commands: infer from cwd + remote; no =--hostname= or =--repo= flag. +- Environment variables: =GH_HOST= and =GH_REPO= work for commands + that lack explicit flags. + +** Flags with overloaded short forms + +- =gh pr list -H FOO= → =--head FOO= (branch). +- =gh search prs -H FOO= → =--head FOO= (branch). +- =gh api -H KEY:VAL= → =--header KEY:VAL=. + +Substring scanning =-H= as "hostname" is wrong on every command +that uses it. + +** =gh api= method classification + +The help text explicitly says: "adding request parameters will +automatically switch the method to POST". This means: + +- =gh api repos/x/y/issues= → GET (read). +- =gh api repos/x/y/issues -f title=Foo= → POST (write). +- =gh api repos/x/y/issues -F body=@file= → POST (write + + exfiltration). +- =gh api --method GET repos/x/y/issues -f q=x= → GET (read, + explicit override). + +Classifier must inspect for =-f=, =-F=, =--field=, =--raw-field=, +=--input=, and =--method= before defaulting to GET. + +** Wrappers without =--json= support + +- =gh pr diff= produces patch output; no =--json= flag. Wrapper + is text-only. Structured file metadata available via + =gh pr view --json files= (new wrapper =gh_pr_files=). + +** Environment variables for non-interactive use + +All confirmed via =gh help environment=: + +- =GH_PROMPT_DISABLED= -- disable interactive prompting. +- =GH_PAGER= / =PAGER= -- set to =cat= so no pager fires. +- =GH_NO_UPDATE_NOTIFIER= -- silence the "new version" banner. +- =GH_NO_EXTENSION_UPDATE_NOTIFIER= -- same for extensions. +- =GH_SPINNER_DISABLED= -- silence the progress spinner. +- =NO_COLOR= -- disable ANSI color (cleaner LLM context). + +V1 subprocess env always includes all of the above. + +** Authentication + +#+begin_example +✓ Logged in to github.com account cjennings (keyring) +✓ Logged in to deepsat.ghe.com account craig-jennings (keyring) +#+end_example + +Both stored in the system keyring. =gh= reads the keyring per +call; the wrapper does not handle tokens. Expired auth surfaces +as exit code 4 with a clear stderr message; recovery is interactive +(=gh auth login --hostname HOST= in a terminal). + +* Current State + +** =modules/ai-config.el= + +- =gptel-confirm-tool-calls nil= at line 386. Per-tool + =:confirm t= is the only confirmation mechanism in V1. +- =cj/gptel-load-local-tools= (lines 71-96) loads tools from + =gptel-tools/=. Add new gh feature names to + =cj/gptel-local-tool-features=. + +** =gptel-tools/= + +Ten tools today. =git_log.el= is the closest analogue: validates +input, runs subprocess with =process-file=, caps output, signals +clear errors. The gh wrappers follow the same shape with shared +helpers in =gh-common.el=. + +* Design + +** File layout + +Two files plus per-wrapper registrations. Wrappers are +deliberately small (argv builders + JSON-field defaults) so they +all live in one file: + +- =gptel-tools/gh-common.el= -- shared helpers: host/repo context, + argv runner, subprocess env, classifier, blocklist, redaction, + debug record, last-error buffer. No tool registrations. +- =gptel-tools/gh.el= -- the general tool + every wrapper + registration. Pulls in =gh-common= via =require=. + +This matches Craig's preference for low loader surface (per the +MCP revision discussion). If =gh.el= grows past ~600 lines we +split per-resource later (=gh-pr.el=, =gh-issue.el=, etc.). + +** Host / repo resolution + +A single helper returns a structured context object used by every +tool entry point: + +#+begin_src emacs-lisp +(defun cj/gh--resolve-context (cwd hostname repo) + "Return a context plist for a gh invocation. +Resolution order: +1. Explicit REPO argument (`[HOST/]OWNER/REPO'). If it contains + `HOST/', split off the host. +2. Explicit HOSTNAME argument. +3. CWD's branch upstream remote URL (`git rev-parse --abbrev-ref + @{u}' then `git remote get-url REMOTE'). +4. CWD's `origin' remote URL. +5. `cj/gh-default-host'. + +Returns: + (:host HOST :owner OWNER :repo REPO :source SYM) +where SOURCE is one of: explicit-repo, explicit-host, upstream, +origin, default." + ...) +#+end_src + +URL parsing handles all three forms produced by git: + +- =git@HOST:OWNER/REPO.git= +- =ssh://git@HOST/OWNER/REPO.git= +- =https://HOST/OWNER/REPO.git= + +Returns =nil host= if the URL doesn't match a known host; the +caller uses =cj/gh-default-host= in that case. + +#+begin_src emacs-lisp +(defcustom cj/gh-default-host "github.com" + "Host used when no other resolution path yields one." + :type 'string + :group 'cj) + +(defcustom cj/gh-known-hosts '("github.com" "deepsat.ghe.com") + "Hosts the gh CLI is authenticated against." + :type '(repeat string) + :group 'cj) +#+end_src + +Context appears in every tool response as a one-line prefix: + +#+begin_example +[gh github.com/cjennings/dotemacs read ok 12.4KB] ... +[gh deepsat.ghe.com/org/repo write confirmed 0.8KB] ... +[gh github.com api unknown blocked] ... +#+end_example + +This makes cross-host accidents visible. + +** Argv-list builders, never command strings + +Every tool builds an explicit argv list and hands it to a shared +runner. Wrappers expose typed parameters that map directly into +argv positions: + +#+begin_src emacs-lisp +(defun cj/gh-pr-view--build-argv (context number include-comments fields) + "Return an argv list for `gh pr view NUMBER ...'." + (let ((repo-arg (format "%s/%s/%s" + (plist-get context :host) + (plist-get context :owner) + (plist-get context :repo)))) + (append (list "pr" "view" (number-to-string number) + "--repo" repo-arg) + (when include-comments '("--comments")) + (when fields (list "--json" fields))))) +#+end_src + +The general tool takes an =args= parameter typed as array-of-string +(not a shell string) so the agent constructs argv directly: + +#+begin_src emacs-lisp +;; agent passes: ["pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo"] +;; runner runs: gh pr view 171 --repo deepsat.ghe.com/org/repo +#+end_src + +If the agent accidentally includes leading =gh= in args[0], the +runner strips it and logs the normalization. + +** Subprocess contract + +One runner for everything: + +#+begin_src emacs-lisp +(defun cj/gh--run (argv &key timeout context) + "Run gh with ARGV (a list of strings). +Returns a plist: + (:exit-code N :stdout STR :stderr STR :truncated BOOL :duration-ms N + :argv ARGV :context CONTEXT) +Enforces TIMEOUT (default `cj/gh--default-timeout'). +Captures output in-flight, killing the process at +`cj/gh--max-bytes' to prevent runaway memory use." + ...) +#+end_src + +Contract: + +- *Env vars set every call:* =GH_PROMPT_DISABLED=1=, =GH_PAGER=cat=, + =PAGER=cat=, =NO_COLOR=1=, =GH_NO_UPDATE_NOTIFIER=1=, + =GH_NO_EXTENSION_UPDATE_NOTIFIER=1=, =GH_SPINNER_DISABLED=1=. +- *Timeout:* default 20 s for reads, 60 s for diffs/logs/search. + Per-tool override allowed but capped at 120 s. Timeout kills + the process and returns =:error-kind 'timeout=. +- *Output cap during capture:* a process filter accumulates bytes + up to =cj/gh--max-bytes= (64 KB). At the cap, the filter sets + =truncated=, ignores further output, and sends SIGTERM after a + short grace. Truncation marker appended to returned stdout. +- *TRAMP rejection:* if =(file-remote-p default-directory)= is + non-nil, return =:error-kind 'remote-cwd= without invoking gh. +- *Executable check:* if =(executable-find cj/gh--executable)= is + nil, return =:error-kind 'no-executable= with install + instructions. +- *Version check:* the first call per session verifies + =gh --version= meets =cj/gh--min-version= (default "2.50.0"); + cached for the session. + +** Read / write / destructive classifier + +Used to apply =:confirm t= and the irreversible blocklist: + +#+begin_src emacs-lisp +(defconst cj/gh--read-verbs + '("view" "list" "status" "search" "diff" "checks" "describe" + "show" "logs" "auth-status")) + +(defconst cj/gh--write-verbs + '("create" "edit" "merge" "close" "reopen" "comment" "review" + "upload" "set" "add" "remove" "rerun" "cancel" "delete" + "fork" "archive" "unarchive" "lock" "unlock" "pin" "unpin" + "ready" "draft" "rename" "transfer" "approve" "label" "assign")) + +(defconst cj/gh--blocked-verbs + '(;; interactive / opens UI + "login" "logout" "checkout" "clone" "ssh" "code" "edit-prompt" + ;; modifies user config + "alias" "config" "extension" + ;; opens browser + "browse")) + +(defconst cj/gh--blocked-flags + '("--web" ; many commands + "--paginate" ; can produce unbounded output + "--input" ; gh api file upload + "--editor")) ; opens editor + +(defconst cj/gh--irreversible-patterns + '("\\`repo delete\\b" + "\\`release delete\\b" + "\\`secret delete\\b" + "\\`ssh-key delete\\b" + "\\`gpg-key delete\\b" + "\\`org delete\\b" + "\\`project delete\\b" + "\\`variable delete\\b" + "\\`ruleset delete\\b" + "\\`label delete\\b.*--yes")) +#+end_src + +Classifier: + +#+begin_src emacs-lisp +(defun cj/gh--classify (argv) + "Return one of: read, write, destructive, blocked, unknown." + (let* ((stripped (cj/gh--strip-flags argv)) + (resource (car stripped)) + (verb (cadr stripped))) + (cond + ;; Hard blocks first. + ((cj/gh--has-blocked-verb-p argv) 'blocked) + ((cj/gh--has-blocked-flag-p argv) 'blocked) + ((cj/gh--has-file-arg-p argv) 'blocked) ; -F key=@file + ((cj/gh--matches-irreversible-p argv) 'destructive) + ;; gh api is special. + ((string= resource "api") (cj/gh--classify-api argv)) + ;; Verb match. + ((member verb cj/gh--read-verbs) 'read) + ((member verb cj/gh--write-verbs) 'write) + (t 'unknown)))) + +(defun cj/gh--classify-api (argv) + "Classify a `gh api ...' invocation. +Reads: explicit method GET/HEAD with no -f/-F/--input. +Writes: any -f/-F/--field/--raw-field/--input, OR explicit +non-GET/HEAD method. +Default (no method, no field): read (matches gh's GET default)." + (let* ((explicit-method (or (cadr (member "-X" argv)) + (cadr (member "--method" argv)))) + (has-field (cl-some + (lambda (f) (cl-some (lambda (a) (string-prefix-p f a)) + argv)) + '("-f" "--raw-field" "-F" "--field"))) + (has-input (member "--input" argv))) + (cond + (has-input 'blocked) ; file exfiltration + ((and explicit-method + (not (member (upcase explicit-method) '("GET" "HEAD")))) + 'write) + (has-field 'write) ; -f/-F auto-promotes to POST + ((and explicit-method + (member (upcase explicit-method) '("GET" "HEAD"))) + 'read) + (t 'read)))) +#+end_src + +Classifier-driven policy table: + +| Classification | Policy | +|----------------+--------| +| =read= | auto-execute | +| =write= | =:confirm t= on registration; agent's call shows in confirm prompt | +| =destructive= | hard-block; return =:error-kind 'irreversible-blocked= | +| =blocked= | hard-block; return =:error-kind 'policy-blocked= with reason | +| =unknown= | =:confirm t= (fail closed) | + +** Safety policy + +V1 uniform policy applied to both hosts: + +#+begin_src emacs-lisp +(defcustom cj/gh-policy + '((read . auto) + (write . confirm) + (destructive . block) + (blocked . block) + (unknown . confirm)) + "Per-classification policy. V1 applies uniformly to both hosts." + :type '(alist :key-type symbol :value-type symbol) + :group 'cj) +#+end_src + +Since GPTel's =:confirm t= flag is per-tool-registration (not +per-call), the general tool is registered with =:confirm t= +always. The wrappers are registered per their classification: + +| Tool | Registered with | +|------+-----------------| +| Read wrappers (=gh_pr_view=, etc.) | =:confirm nil= | +| =gh= general tool | =:confirm t= | + +The general tool then applies the policy table at invocation +time: reads execute without further prompting (already past +GPTel's confirm because :confirm t fires once); writes show +detail before invocation; destructive/blocked never reach gh. + +** General tool: =gh= + +Single tool registered with =:confirm t= covering everything the +wrappers don't. Schema: + +| Arg | Type | Required | Purpose | +|-----+------+----------+---------| +| =args= | array of string | yes | argv list, e.g. =["pr", "view", "171", "--repo", "deepsat.ghe.com/org/repo"]= | +| =hostname= | string | no | Override host for commands that accept =--hostname= (=api=, =auth status=); ignored otherwise | +| =repo= | string | no | =[HOST/]OWNER/REPO= for repo-scoped commands; if HOST is present in repo, hostname arg is overridden | +| =cwd= | string | no | Working directory; defaults to current buffer; must be under =$HOME=, must not be TRAMP | +| =timeout= | integer | no | Seconds before kill; default 20, max 120 | + +Description (registered) explicitly says: + +#+begin_example +Use this only when no task-specific gh_* tool fits. Prefer +gh_pr_view, gh_pr_list, gh_pr_checks, gh_issue_view, gh_issue_list, +gh_run_view, gh_run_list, gh_run_logs_failed, gh_repo_view, +gh_search_prs, gh_search_issues, gh_api_get. + +Writes (create/edit/merge/etc.) require user confirmation. +Destructive (repo/release/secret delete) are hard-blocked. +Interactive commands (auth login, codespace ssh, --web, browse) +are hard-blocked. File uploads (api --input, -F @file, release +upload, gist create) are hard-blocked. +#+end_example + +** Wrapper inventory + +Twelve wrappers grouped by resource. Each has typed args, JSON +field defaults, output truncation, timeout override, and a +description that names its scope. + +| Tool | gh command | Defaults | Timeout | +|------+------------+----------+---------| +| =gh_repo_view= | =repo view --json= | JSON fields: =name,nameWithOwner,description,defaultBranchRef,url,visibility= | 20s | +| =gh_pr_view= | =pr view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | +| =gh_pr_list= | =pr list --json= | JSON fields: =number,title,state,author,createdAt,headRefName=; default =--limit 30= (capped at 100) | 20s | +| =gh_pr_diff= | =pr diff N --color never= | text only; capped at 64 KB | 60s | +| =gh_pr_checks= | =pr checks N --json= | JSON fields: =name,status,conclusion,startedAt,completedAt,link= | 20s | +| =gh_pr_files= | =pr view N --json files= | JSON fields: =files= (path, additions, deletions, mode) | 20s | +| =gh_pr_current= | =pr view --json= (no number — auto-detect) | Same as =gh_pr_view= | 20s | +| =gh_issue_view= | =issue view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | +| =gh_issue_list= | =issue list --json= | JSON fields: =number,title,state,author,createdAt,labels=; default =--limit 30= | 20s | +| =gh_run_view= | =run view RUN-ID --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch,event,url= | 20s | +| =gh_run_list= | =run list --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch=; default =--limit 20= | 20s | +| =gh_run_logs_failed= | =run view RUN-ID --log-failed= | text only; capped at 64 KB | 60s | +| =gh_search_prs= | =search prs --json= | JSON fields: =number,title,state,author,repository,url=; default =--limit 30= (capped at 100) | 30s | +| =gh_search_issues= | =search issues --json= | Same as PR search | 30s | +| =gh_api_get= | =api ENDPOINT --method GET= | text/JSON pass-through; rejects fields/input args | 30s | + +Common args for every wrapper (unless noted): + +| Arg | Type | Required | Purpose | +|-----+------+----------+---------| +| =repo= | string | no | =[HOST/]OWNER/REPO=; resolved from context otherwise | +| =hostname= | string | no | Override host for context resolution | +| =limit= | integer | no | =--limit= for list/search wrappers; clamped to per-wrapper max | + +The =gh_api_get= wrapper *explicitly* rejects =-f=, =-F=, +=--field=, =--raw-field=, =--input=, =--method=, and any method +override. It only accepts =ENDPOINT= and optional =-H= headers +that don't carry secrets (the runner redacts =Authorization:= and +similar regardless). Writes via API go through the general tool +with confirmation. + +** JSON field defaults + +Per-wrapper defaults are minimal -- enough for the agent to decide +whether to drill in, not so much that one call fills the context. + +For =gh_pr_view= specifically: + +- *Default* (no =fields= override): the small list above (number, + title, state, author, createdAt, url, body-truncated-to-2KB). +- *Override*: agent passes =fields= as a comma-separated string; + wrapper validates against a per-resource allowlist (so the agent + can't request =reviews,comments,files= in one call to bypass the + cap). +- *Include flags*: =include-body t/nil=, =include-comments t/nil=, + =include-reviews t/nil= as boolean args. Each adds the + corresponding JSON field; agent opts in only when needed. + +** Output truncation + +Process filter pattern, not post-hoc cap: + +#+begin_src emacs-lisp +(defun cj/gh--make-filter (state-var) + "Return a process filter that accumulates into STATE-VAR's :stdout, +stops collecting after `cj/gh--max-bytes', sets :truncated, and +sends SIGTERM to the process." + (lambda (proc output) + (let* ((state (symbol-value state-var)) + (current (plist-get state :stdout)) + (current-len (length current)) + (remaining (- cj/gh--max-bytes current-len))) + (cond + ((<= remaining 0) nil) ; already at cap + ((<= (length output) remaining) + (plist-put state :stdout (concat current output))) + (t + (plist-put state :stdout + (concat current (substring output 0 remaining))) + (plist-put state :truncated t) + (ignore-errors (delete-process proc))))))) +#+end_src + +Truncation marker appended before return: + +#+begin_example +[truncated at 64KB; use --limit, narrower fields, or a specific +wrapper to reduce output] +#+end_example + +** Error classification + debug record + +Every call returns (and =cj/gh-tool-last-error= caches) a debug +record: + +#+begin_src emacs-lisp +(defun cj/gh--debug-record (argv context exit-code stdout stderr + duration-ms truncated) + (list :host (plist-get context :host) + :repo (cj/gh--repo-arg context) + :cwd default-directory + :argv (cj/gh--redact-argv argv) + :classification (cj/gh--classify argv) + :policy (cj/gh--policy-decision argv) + :exit-code exit-code + :duration-ms duration-ms + :bytes-captured (length stdout) + :truncated truncated + :error-kind (cj/gh--error-kind exit-code stderr))) +#+end_src + +=:error-kind= mapping: + +| Condition | =:error-kind= | Returned message | +|-----------+---------------+------------------| +| Exit 4 + "authentication required" | =auth= | "Run =gh auth login --hostname HOST= in a terminal." | +| Process killed by timeout timer | =timeout= | "Command exceeded N seconds; narrow the query or use a more specific wrapper." | +| Policy block | =policy-blocked= | "Blocked by V1 policy: REASON." | +| Irreversible match | =irreversible-blocked= | "Hard-blocked irreversible command: COMMAND." | +| Truncated output | =truncated= | "Output truncated at 64KB; reduce scope." | +| TRAMP cwd | =remote-cwd= | "Cannot run gh from remote directory: CWD." | +| Missing executable | =no-executable= | "gh not found at CJ/GH--EXECUTABLE; install via 'pacman -S github-cli' (or equivalent)." | +| Other non-zero exit | =gh-exit= | (raw stderr, redacted) | + +Each error includes a sanitized reproduce line: + +#+begin_example +Reproduce: GH_PROMPT_DISABLED=1 GH_PAGER=cat gh pr view 171 --repo HOST/OWNER/REPO +#+end_example + +(Secrets, body text, file paths redacted via =cj/gh--redact-argv=.) + +** Audit log (V1, opt-out) + +Every call appends one line to +=~/.emacs.d/data/gh-tool-log/YYYY-MM-DD.log=: + +#+begin_example +2026-05-16T14:23:45-0500 host=github.com repo=cjennings/dotemacs class=read policy=auto exit=0 duration=128ms bytes=4321 +2026-05-16T14:24:02-0500 host=deepsat.ghe.com repo=org/repo class=write policy=confirm exit=0 duration=412ms bytes=88 +#+end_example + +Metadata only, not output bodies. Defcustom +=cj/gh-tool-audit-log-enabled= (default =t=). Daily rotation +implicit (one file per day). Cleanup manual. + +** Secrets redaction + +=cj/gh--redact-argv= masks: + +- Anything after =--token=, =--secret=, =--password= flags. +- Authorization headers (=-H "Authorization: ..."=). +- =--figma-api-key=KEY= (in case general tool spawns figma-mcp + somehow). +- Bearer tokens in URLs (=?token=...=). +- Values for =-f=/=-F= keys named like =body=, =text=, + =description= (private content; metadata still logged). + +Applied to: +- All stderr returned to the agent. +- All audit-log lines. +- All debug records. +- The reproduce line on error. + +* Commands & UX + +** =cj/gh-doctor= + +Diagnostic command. No side effects. Checks: + +- =gh= executable found at =cj/gh--executable=. +- =gh --version= meets =cj/gh--min-version=. +- =gh auth status= for each host in =cj/gh-known-hosts=. +- Current buffer's cwd repo detection: resolves to which host/repo? +- Environment overrides effective (=GH_PROMPT_DISABLED= etc. would + be set by the runner). +- Active account per host. +- Warnings if any block-list-relevant env is set externally + (e.g. user already has =GH_PAGER= set to something that pages). + +Output: a buffer with PASS / FAIL / WARN per check + recovery +actions for failures. + +** =cj/gh-tool-last-error= + +Opens a buffer showing the last call's debug record: + +#+begin_example +Host: deepsat.ghe.com +Repo: org/repo +Source: upstream +CWD: ~/projects/work/foo +Argv: ("pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo") +Classification: read +Policy: auto +Exit code: 0 +Duration: 412 ms +Bytes captured: 4321 +Truncated: no +Error kind: none + +Reproduce: + GH_PROMPT_DISABLED=1 GH_PAGER=cat NO_COLOR=1 \ + gh pr view 171 --repo deepsat.ghe.com/org/repo +#+end_example + +** Tool response header + +Every tool result begins with a one-line header so cross-host / +policy decisions are visible: + +#+begin_example +[gh github.com/cjennings/dotemacs read ok 4.3KB] +{ ... } +[gh deepsat.ghe.com/org/repo write confirmed 0.2KB] +{ ... } +[gh github.com/cjennings/dotemacs api blocked policy-blocked] +Error: Hard-blocked file-upload path: --input file. +Reproduce: gh api repos/cjennings/dotemacs/contents/foo --input file +#+end_example + +* Implementation Plan + +Eight phases. Each ends with green ERT tests + manual smoke +before the next. + +** Phase 1 -- Common helpers + context resolver + +=gh-common.el=: =cj/gh--executable=, =cj/gh--available-p=, +=cj/gh--version= (cached), =cj/gh--validate-cwd= (HOME + non-TRAMP), +=cj/gh--parse-remote-url=, =cj/gh--resolve-context=, +=cj/gh--redact-argv=. No subprocess execution yet. + +Tests cover all helpers against fixture remote URLs and synthetic +git directories. No real gh calls. + +** Phase 2 -- Runner with subprocess env + timeout + in-flight cap + +=cj/gh--run=, the process filter, the timer kill path, env-var +setup. Tests stub =make-process= to simulate output / exit / hang +/ truncation paths. + +Acceptance: with stub configured to produce 100 KB of output, +returned stdout is exactly 64 KB plus the truncation marker, and +the process gets SIGTERM. + +** Phase 3 -- Classifier + blocklist + policy + +=cj/gh--classify=, =cj/gh--classify-api=, blocklist constants, +=cj/gh--policy-decision=. Tests for every blocklist pattern +(verbs + flags + file-arg paths), API edge cases (=-f= promotes +to POST, =--method GET -f x=y= stays GET, etc.), and the +unknown-fails-closed contract. + +** Phase 4 -- Read wrappers (5 first) + +=gh_repo_view=, =gh_pr_view=, =gh_pr_list=, =gh_pr_diff=, +=gh_issue_view=. Each is a thin schema + argv builder + delegate +to =cj/gh--run=. Tests verify argv shape for typical args. + +Manual smoke against both hosts. First real gh calls. + +** Phase 5 -- Remaining wrappers + JSON defaults + +Eight more wrappers (=gh_pr_checks=, =gh_pr_files=, +=gh_pr_current=, =gh_issue_list=, =gh_run_view=, =gh_run_list=, +=gh_run_logs_failed=, =gh_search_prs=, =gh_search_issues=, +=gh_api_get=). JSON-field defaults per wrapper. Tests for the +=gh_api_get= flag-rejection contract. + +** Phase 6 -- General tool + +=gh.el= general tool registration with =:confirm t=, blocklist +enforcement, policy decision applied before invocation. Tests +verify confirmation gate (stub gptel's confirm flow), blocked +commands never reach =cj/gh--run=, destructive commands return +=:irreversible-blocked= without prompting. + +** Phase 7 -- UX: doctor, last-error, response header + +=cj/gh-doctor=, =cj/gh-tool-last-error=, response header +formatting. Audit log writer. Defcustom for log enable. + +** Phase 8 -- Loader wiring + integration + +Add the 16 feature names to =cj/gptel-local-tool-features= (one +per wrapper + the general tool + the helpers feature). Verify +they land in =gptel-tools=. + +* Test Plan + +Target: 55-70 ERT tests across four files. No real subprocesses, +no real network, no real =~/.claude.json= (gh tools don't use it, +but the no-real-process rule applies uniformly). + +** =tests/test-gh-common.el= -- pure helpers (~25 tests) + +- =cj/gh--parse-remote-url=: ssh-scp, ssh-url, https, with/without + =.git=, with/without trailing slash, unknown host returns =:host + nil=, =github.com.evil.example= does NOT match =github.com=. +- =cj/gh--resolve-context=: explicit repo wins; explicit hostname + wins over remote; upstream beats origin; origin beats default; + default fires when no git. +- =cj/gh--validate-cwd=: HOME-rooted ok; outside HOME errors; + TRAMP errors; non-directory errors. +- =cj/gh--redact-argv=: =--token=, =-H "Authorization: ..."=, + =--figma-api-key=, =-f body=...= → body redacted; sentinel + =REDACTED_TEST_SECRET= never appears in any output of any + helper. +- =cj/gh--available-p=: nil when =executable-find= fails; t + otherwise. +- =cj/gh--version=: caches per session; floor check rejects + too-old. + +** =tests/test-gh-runner.el= -- runner contract (~15 tests) + +Stub =make-process=: +- Normal exit 0 with short output: returned verbatim, no + truncation flag. +- Long output (100 KB stub): truncated at 64 KB exactly, + truncation marker present, =:truncated t=. +- Process hangs past timeout: timer fires, SIGTERM sent, returns + =:error-kind 'timeout=. +- Exit 4 + auth stderr: returns =:error-kind 'auth= with recovery + message. +- TRAMP cwd: never invokes =make-process=, returns =:error-kind + 'remote-cwd=. +- Missing executable: returns =:error-kind 'no-executable=. +- Environment: =process-environment= includes all six required + vars before =make-process= call. +- Argv with leading "gh" stripped + logged. + +** =tests/test-gh-classifier.el= -- policy logic (~20 tests) + +- Every read verb classifies as read. +- Every write verb classifies as write. +- Every destructive pattern matches. +- Every blocked verb (=login=, =browse=, =clone=, etc.) classifies + as blocked. +- Every blocked flag (=--web=, =--paginate=, =--input=) classifies + as blocked. +- File-upload (=-F key=@/path=) classifies as blocked. +- =gh api= GET (no fields): read. +- =gh api= GET (=-f x=y=): write (auto-POST per gh's rules). +- =gh api --method GET -f q=x=: read (explicit override). +- =gh api -X DELETE=: destructive. +- =gh api -X PATCH=: write. +- =gh api --input file=: blocked. +- Unknown verb (=gh frobnicate=): unknown → confirm. +- =-H= as branch (=gh pr list -H feature=) doesn't trigger host + treatment. +- =-H= as header (=gh api -H Accept:json=) doesn't trigger host + treatment. +- Policy decision: read → auto; write → confirm; destructive → + block; blocked → block; unknown → confirm. + +** =tests/test-gh-wrappers.el= -- per-wrapper builders + schemas (~15 tests) + +- Every wrapper's schema is valid (correct =:name=, =:type=, + =:description=, =:args= shape). +- =gh_pr_view= argv with number 171 and repo "host/o/r" produces + =("pr" "view" "171" "--repo" "host/o/r" "--json" "DEFAULTS")=. +- =gh_pr_diff= rejects =format='json=. +- =gh_api_get= rejects =-f=, =-F=, =--field=, =--raw-field=, + =--input=, =--method= other than GET. +- =gh_pr_current= invokes without a number arg (uses cwd). +- =limit= clamped to per-wrapper max. +- =fields= validated against per-resource allowlist. + +** Manual smoke (every phase) + +| Phase | Smoke | +|-------+-------| +| 4 | =gh_pr_view N= against both hosts | +| 4 | =gh_pr_list= in =~/.emacs.d= → uses github.com/cjennings/dotemacs | +| 5 | =gh_pr_checks= shows CI status without full logs | +| 5 | =gh_run_logs_failed= cap kicks in on a long-failed run | +| 5 | =gh_api_get= rejects a =-f= arg with clear error | +| 6 | =gh= general tool: agent asked to merge a PR triggers GPTel confirm prompt | +| 6 | Agent asked to =gh repo delete= gets irreversible-blocked | +| 6 | Agent asked to =gh --web=... gets policy-blocked | +| 7 | =cj/gh-doctor= correctly identifies an unauthenticated host | +| 7 | =cj/gh-tool-last-error= shows debug record after a failing call | + +** Opt-in integration suite + +A small set of real-gh tests (in =tests/test-gh-integration.el= +marked =:tag :integration=, default skipped): + +- =gh auth status --hostname github.com= ok. +- =gh auth status --hostname deepsat.ghe.com= ok. +- =gh repo view cjennings/dotemacs --json name= + returns parseable JSON. +- =gh pr list --repo cjennings/dotemacs --limit 1= returns ≤ 1 + PR. + +Run manually via =make test-name TEST=gh-integration=. + +* Acceptance Criteria + +1. *Argv contract.* No tool produces a command string for execution; + every call goes through the argv-list runner. +2. *No silent writes.* Every classified write either prompts + GPTel's confirm or hard-blocks. Verified by an end-to-end + test where the agent attempts =pr merge= and the test fails if + =make-process= is invoked before confirm. +3. *In-flight cap.* A stubbed process emitting 1 MB returns + exactly 64 KB; the runner never holds more than 65 KB in + memory. +4. *Host visibility.* Every successful tool response begins with + =[gh HOST/REPO ...]=. Verified by a test that greps the + response text. +5. *Doctor coverage.* =cj/gh-doctor= correctly identifies (a) no + gh executable, (b) too-old gh, (c) unauthenticated host, (d) + non-git cwd, (e) git cwd whose remote points to an unknown + host. +6. *No secret leakage.* Test fixtures containing + =REDACTED_TEST_SECRET= in every secret-bearing slot + (=--token=, =-H Authorization=, =-f body=, etc.) produce zero + matches when grepping audit log, debug record, and any + user-facing message. + +* Risks + +** R1 -- gh CLI evolves and verbs drift + +New =gh= versions may add subcommands the blocklist doesn't +cover. Or rename verbs. + +*Mitigation:* the blocklist works on verbs (not full subcommand +paths) so most additions are caught. Doctor includes a +=gh --version= floor. Periodic review when gh bumps a major +version. + +** R2 -- The "all reads auto-execute" default may still be too broad + +Some reads expose private content (issue bodies, PR descriptions +from private repos). An agent surfacing a confidential issue +body into a saved conversation has data-leak implications. + +*Mitigation:* response header makes the host/repo visible in +every result, so the saved conversation makes the privacy +boundary auditable. Wrappers truncate body/comments by default; +agent must explicitly opt-in to include them. Documented in +commentary. + +** R3 -- The general tool's =:confirm t= prompt may become click-fatigue + +If the agent uses the general tool heavily during a write-heavy +workflow (PR creation, label management), confirming every call +becomes tedious. + +*Mitigation:* the expanded wrapper set covers most reads, so the +general tool fires mainly for writes -- where confirmation is +exactly the right behavior. If usage shows confirm fatigue, +V2's per-host policy can add =:auto= for explicit +write-confirmed contexts. + +** R4 -- =--paginate= block conflicts with legitimate large queries + +Blocking =--paginate= globally means the agent can't get +historical CI runs (which may need pagination). + +*Mitigation:* =gh_run_list= and =gh_search_*= accept a clamped +=--limit= which usually substitutes. If a use case needs more, +the agent can request multiple non-paginated pages explicitly. + +** R5 -- Token expiry surfaces as cryptic exit 4 + +When a host's keyring entry expires, every call returns exit 4 +with "authentication required" on stderr. The agent sees the +error but may not realize the fix is interactive. + +*Mitigation:* the runner's =:error-kind 'auth= mapping prepends +the recovery message before returning to the agent. =cj/gh-doctor= +proactively checks auth status. + +** R6 -- TRAMP cwd silently runs gh remotely + +Without the explicit TRAMP rejection, =process-file= would try to +spawn =gh= on the remote host (where it may not exist, or may +authenticate against the wrong keyring). + +*Mitigation:* runner checks =(file-remote-p default-directory)= +first and returns =:error-kind 'remote-cwd= with a clear message. + +** R7 -- =gh search= behaves differently on GHE + +GHE may not support every advanced search operator +=github.com= does. Search wrappers may return inconsistent +results across hosts. + +*Mitigation:* documented in =gh_search_*= wrapper descriptions. +Result header makes the host visible so the agent can adjust. + +** R8 -- Audit log grows unbounded + +One file per day, but no automatic cleanup. + +*Mitigation:* metadata-only entries are tiny (~150 bytes); a +year of heavy use is a few MB. Manual cleanup acceptable. +Defcustom to disable for users who don't want it. + +* Open Questions + +** Q1 -- Should the general tool's confirmation prompt include the classification? + +When GPTel asks "Run gh tool? (y/n)" the prompt shows the argv but +not the classification. Showing "WRITE: gh pr merge 171" gives the +user more context. Need to investigate gptel's confirm-prompt +extensibility. + +** Q2 -- Should =gh_pr_diff= cap differently from text wrappers? + +A PR diff can legitimately be 100KB+ for a large refactor. The +64KB cap is the same as everywhere else. If diffs need a higher +cap (256KB?), that's per-wrapper config. + +** Q3 -- Should wrappers expose =include-body=, =include-comments=, etc., as separate args, or as a comma-separated list? + +The spec proposes separate boolean args (=:include-body t=, +=:include-comments t=). Alternative: one =:include= comma-list +arg. Separate args are more discoverable; comma-list is more +compact. Decide during Phase 4. + +** Q4 -- Should =cj/gh-tool-audit-log= grow into a query interface? + +V1 writes one line per call. Future: a command to query the +log (=cj/gh-audit-search REGEX=) for surfacing "what did the agent +do to this PR last week?". + +* V2 Roadmap + +Items intentionally deferred: + +- *Per-host policy.* =cj/gh-host-policy= alist keyed by hostname + (mirror of the MCP spec's structure) so work GHE can be + read-only while personal allows writes-with-confirm. +- *Conversation context injection.* After a PR view, the wrapper + inserts a "GitHub context: HOST/REPO PR #N at URL" line into the + GPTel buffer so saved conversations stay traceable without + bundling full output. +- *Local/remote bridge helpers.* current-branch → PR-number, + changed-files → matching PR file comments, etc. +- *Artifact download.* =gh_run_artifacts=, =gh_release_download= + with explicit confirm and write to a controlled directory. +- *Async general tool.* =make-process= + sentinel for the cases + where 60s timeout isn't enough (rare, but real for some + =--paginate= scenarios). +- *Audit log query interface.* =cj/gh-audit-search=, + =cj/gh-audit-by-host=. +- *Profile-based tool subsets.* e.g. read-only profile + vs. write-capable profile per buffer. + +* References + +- [[file:../../gptel-tools/git_log.el][gptel-tools/git_log.el]] -- pattern reference for new tool files. +- [[file:../../modules/ai-config.el][modules/ai-config.el]] -- =gptel-confirm-tool-calls nil= at + line 386; loader at lines 71-96. +- [[file:gptel-agentic-tool-ideas.org][gptel-agentic-tool-ideas.org]] -- broader agentic-tool design; + =gh= sits alongside the MCP integration as the + collaboration tier. +- [[file:mcp-el-gptel-integration.org][mcp-el-gptel-integration.org]] -- sibling design; same + confirm-on-write pattern for safety. +- [[https://cli.github.com/manual/][gh CLI manual]] -- subcommand reference. +- =gh --version 2.92.0= help output -- verified flag semantics + per subcommand. +- =gh help environment= -- verified env-var names for non-interactive + mode. |
