#+TITLE: Design: Wrap the gh CLI as a GPTel tool #+AUTHOR: Craig Jennings #+DATE: 2026-05-16 #+OPTIONS: toc:nil num:nil * Status Draft (revision 2). Pre-implementation; no code shipped yet. =gh= v2.92.0 installed and authenticated against both =github.com= (account =cjennings=) and =deepsat.ghe.com= (account =craig-jennings=). Revision 2 incorporates a code-review pass that caught several incorrect =gh= CLI assumptions and a critical safety gap: with =gptel-confirm-tool-calls= set to =nil= in =modules/ai-config.el:386=, the "irreversible-only blocklist" V1 from revision 1 would let the agent run unconfirmed writes (=pr merge=, =run cancel=, =api -f body=...=, =release create=, etc.). Revision 2 keeps Craig's intent of a general-capability tool but applies =:confirm t= to every classified write, drops the "scan command string for =--hostname= and =-H=" approach in favor of argv-list builders, and expands the wrapper set so most agent workflows don't need to fall back to the general tool. * Problem GPTel agents can read git history locally (=git_log=, =git_diff=, =git_status=) but have no access to the GitHub side of the workflow: PRs, issues, reviews, CI runs, releases, gists, repo metadata. The =gh= CLI does all of this and is already authenticated against both hosts the user works with. Wrapping it as a GPTel tool gives the agent the same GitHub surface the user has. The wrapper has to handle four complications the local git tools don't: 1. *Two authenticated hosts* -- personal (=github.com=) and work GHE (=deepsat.ghe.com=) -- with different policies and different blast radii for destructive operations. 2. *A vast subcommand surface* (~30 top-level subcommands, hundreds of leaves) that doesn't fit cleanly into one typed schema per leaf. 3. *Inconsistent flag semantics across subcommands.* =--hostname= exists only on =gh api= and =gh auth status=; the rest use =--repo [HOST/]OWNER/REPO=. =-H= means =--head= on =gh pr list= but =--header= on =gh api=. Naive string-scanning to detect user intent is wrong. 4. *Global GPTel setting* (=gptel-confirm-tool-calls nil=) means the agent can call any registered tool without user confirmation unless that specific tool is registered with =:confirm t=. V1 safety must come from per-tool flags, not from a session-level gate. * Goals 1. The agent can invoke the non-interactive subset of =gh= against either host, gated by a safety policy. 2. High-frequency reads have typed wrappers with sensible defaults (host/repo resolution, JSON-by-default with minimal fields, output cap, timeout). Wrappers auto-execute. 3. The active host is resolved from a single context object that considers (in order): explicit =hostname=/=repo= args, branch upstream remote, =origin= remote, =cj/gh-default-host=. Every tool response prefixes the resolved host/repo so cross-host mistakes are visible. 4. Writes via the general tool execute only after user confirmation (=:confirm t= on the gptel tool registration). Unknown classifications fail closed (=:confirm t=). Irreversible commands hard-block. 5. Interactive commands (=auth login=, =codespace ssh=, =--web=, =browse=, =pr checkout=, =repo clone=, =config set=, =alias set=, =extension exec=) are hard-blocked in V1. 6. File exfiltration paths (=api --input=, =api -F key=@file=, =release upload=, =gist create=) are hard-blocked in V1. 7. Output is capped *during capture*, not after, so a runaway =--paginate= or =--log= can't fill memory or block Emacs. Long-output commands have higher timeouts but the same byte cap. 8. Every call produces a structured debug record (host, repo, cwd, sanitized argv, classification, policy decision, exit code, duration, bytes captured, truncation flag, error kind) inspectable via =cj/gh-tool-last-error=. 9. =cj/gh-doctor= diagnoses missing prerequisites (=gh= executable, version floor, auth per host, cwd repo detection, environment overrides) before they fail at runtime. 10. V2 adds true per-host policy (currently uniform across both hosts), profile-based tool subsets, and bridge helpers between local git tools and the gh wrappers. * Non-Goals - *Interactive subcommands.* =gh auth login=, =gh codespace ssh=, =gh pr checkout= (modifies working tree), =gh repo clone= (writes outside controlled paths) are not in scope. The user runs those in a real terminal. - *Replicating gh extensions.* Core gh only. Extensions can be invoked from a regular terminal if needed. - *Streaming long-running output.* =gh run watch=, =gh --paginate= for unbounded result sets, =gh repo clone= for large repos. V1 blocks =--paginate= and =watch= verbs; future versions may add streaming support. - *Per-host write policies* (work-read-only / personal-full-write). V1 applies the same write-confirmation policy to both hosts. V2 makes the policy host-keyed. - *Conversation-context injection* and *local/remote bridge helpers* (current branch → PR, changed files → PR comments). Future enhancements; tracked as a follow-up. - *Artifact download* (=gh run download=). Writes files; needs its own confirmation flow. Deferred to V2. * Verified gh CLI Contracts These were checked against =gh --version 2.92.0= help output before the spec was revised. Implementation can rely on them. ** Host / repo selection per subcommand - =gh api= and =gh auth status=: use =--hostname HOST= (long form only; no short form). - =gh pr {view,list,diff,checks}=, =gh issue {view,list}=, =gh run {view,list}=, =gh release *=, =gh search *=: use =--repo [HOST/]OWNER/REPO= (short =-R=). - =gh repo view= and similar root-level commands: infer from cwd remote; no =--hostname= or =--repo= flag. - Environment variables: =GH_HOST= and =GH_REPO= work for commands that lack explicit flags. ** Flags with overloaded short forms - =gh pr list -H FOO= → =--head FOO= (branch). - =gh search prs -H FOO= → =--head FOO= (branch). - =gh api -H KEY:VAL= → =--header KEY:VAL=. Substring scanning =-H= as "hostname" is wrong on every command that uses it. ** =gh api= method classification The help text explicitly says: "adding request parameters will automatically switch the method to POST". This means: - =gh api repos/x/y/issues= → GET (read). - =gh api repos/x/y/issues -f title=Foo= → POST (write). - =gh api repos/x/y/issues -F body=@file= → POST (write + exfiltration). - =gh api --method GET repos/x/y/issues -f q=x= → GET (read, explicit override). Classifier must inspect for =-f=, =-F=, =--field=, =--raw-field=, =--input=, and =--method= before defaulting to GET. ** Wrappers without =--json= support - =gh pr diff= produces patch output; no =--json= flag. Wrapper is text-only. Structured file metadata available via =gh pr view --json files= (new wrapper =gh_pr_files=). ** Environment variables for non-interactive use All confirmed via =gh help environment=: - =GH_PROMPT_DISABLED= -- disable interactive prompting. - =GH_PAGER= / =PAGER= -- set to =cat= so no pager fires. - =GH_NO_UPDATE_NOTIFIER= -- silence the "new version" banner. - =GH_NO_EXTENSION_UPDATE_NOTIFIER= -- same for extensions. - =GH_SPINNER_DISABLED= -- silence the progress spinner. - =NO_COLOR= -- disable ANSI color (cleaner LLM context). V1 subprocess env always includes all of the above. ** Authentication #+begin_example ✓ Logged in to github.com account cjennings (keyring) ✓ Logged in to deepsat.ghe.com account craig-jennings (keyring) #+end_example Both stored in the system keyring. =gh= reads the keyring per call; the wrapper does not handle tokens. Expired auth surfaces as exit code 4 with a clear stderr message; recovery is interactive (=gh auth login --hostname HOST= in a terminal). * Current State ** =modules/ai-config.el= - =gptel-confirm-tool-calls nil= at line 386. Per-tool =:confirm t= is the only confirmation mechanism in V1. - =cj/gptel-load-local-tools= (lines 71-96) loads tools from =gptel-tools/=. Add new gh feature names to =cj/gptel-local-tool-features=. ** =gptel-tools/= Ten tools today. =git_log.el= is the closest analogue: validates input, runs subprocess with =process-file=, caps output, signals clear errors. The gh wrappers follow the same shape with shared helpers in =gh-common.el=. * Design ** File layout Two files plus per-wrapper registrations. Wrappers are deliberately small (argv builders + JSON-field defaults) so they all live in one file: - =gptel-tools/gh-common.el= -- shared helpers: host/repo context, argv runner, subprocess env, classifier, blocklist, redaction, debug record, last-error buffer. No tool registrations. - =gptel-tools/gh.el= -- the general tool + every wrapper registration. Pulls in =gh-common= via =require=. This matches Craig's preference for low loader surface (per the MCP revision discussion). If =gh.el= grows past ~600 lines we split per-resource later (=gh-pr.el=, =gh-issue.el=, etc.). ** Host / repo resolution A single helper returns a structured context object used by every tool entry point: #+begin_src emacs-lisp (defun cj/gh--resolve-context (cwd hostname repo) "Return a context plist for a gh invocation. Resolution order: 1. Explicit REPO argument (`[HOST/]OWNER/REPO'). If it contains `HOST/', split off the host. 2. Explicit HOSTNAME argument. 3. CWD's branch upstream remote URL (`git rev-parse --abbrev-ref @{u}' then `git remote get-url REMOTE'). 4. CWD's `origin' remote URL. 5. `cj/gh-default-host'. Returns: (:host HOST :owner OWNER :repo REPO :source SYM) where SOURCE is one of: explicit-repo, explicit-host, upstream, origin, default." ...) #+end_src URL parsing handles all three forms produced by git: - =git@HOST:OWNER/REPO.git= - =ssh://git@HOST/OWNER/REPO.git= - =https://HOST/OWNER/REPO.git= Returns =nil host= if the URL doesn't match a known host; the caller uses =cj/gh-default-host= in that case. #+begin_src emacs-lisp (defcustom cj/gh-default-host "github.com" "Host used when no other resolution path yields one." :type 'string :group 'cj) (defcustom cj/gh-known-hosts '("github.com" "deepsat.ghe.com") "Hosts the gh CLI is authenticated against." :type '(repeat string) :group 'cj) #+end_src Context appears in every tool response as a one-line prefix: #+begin_example [gh github.com/cjennings/dotemacs read ok 12.4KB] ... [gh deepsat.ghe.com/org/repo write confirmed 0.8KB] ... [gh github.com api unknown blocked] ... #+end_example This makes cross-host accidents visible. ** Argv-list builders, never command strings Every tool builds an explicit argv list and hands it to a shared runner. Wrappers expose typed parameters that map directly into argv positions: #+begin_src emacs-lisp (defun cj/gh-pr-view--build-argv (context number include-comments fields) "Return an argv list for `gh pr view NUMBER ...'." (let ((repo-arg (format "%s/%s/%s" (plist-get context :host) (plist-get context :owner) (plist-get context :repo)))) (append (list "pr" "view" (number-to-string number) "--repo" repo-arg) (when include-comments '("--comments")) (when fields (list "--json" fields))))) #+end_src The general tool takes an =args= parameter typed as array-of-string (not a shell string) so the agent constructs argv directly: #+begin_src emacs-lisp ;; agent passes: ["pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo"] ;; runner runs: gh pr view 171 --repo deepsat.ghe.com/org/repo #+end_src If the agent accidentally includes leading =gh= in args[0], the runner strips it and logs the normalization. ** Subprocess contract One runner for everything: #+begin_src emacs-lisp (defun cj/gh--run (argv &key timeout context) "Run gh with ARGV (a list of strings). Returns a plist: (:exit-code N :stdout STR :stderr STR :truncated BOOL :duration-ms N :argv ARGV :context CONTEXT) Enforces TIMEOUT (default `cj/gh--default-timeout'). Captures output in-flight, killing the process at `cj/gh--max-bytes' to prevent runaway memory use." ...) #+end_src Contract: - *Env vars set every call:* =GH_PROMPT_DISABLED=1=, =GH_PAGER=cat=, =PAGER=cat=, =NO_COLOR=1=, =GH_NO_UPDATE_NOTIFIER=1=, =GH_NO_EXTENSION_UPDATE_NOTIFIER=1=, =GH_SPINNER_DISABLED=1=. - *Timeout:* default 20 s for reads, 60 s for diffs/logs/search. Per-tool override allowed but capped at 120 s. Timeout kills the process and returns =:error-kind 'timeout=. - *Output cap during capture:* a process filter accumulates bytes up to =cj/gh--max-bytes= (64 KB). At the cap, the filter sets =truncated=, ignores further output, and sends SIGTERM after a short grace. Truncation marker appended to returned stdout. - *TRAMP rejection:* if =(file-remote-p default-directory)= is non-nil, return =:error-kind 'remote-cwd= without invoking gh. - *Executable check:* if =(executable-find cj/gh--executable)= is nil, return =:error-kind 'no-executable= with install instructions. - *Version check:* the first call per session verifies =gh --version= meets =cj/gh--min-version= (default "2.50.0"); cached for the session. ** Read / write / destructive classifier Used to apply =:confirm t= and the irreversible blocklist: #+begin_src emacs-lisp (defconst cj/gh--read-verbs '("view" "list" "status" "search" "diff" "checks" "describe" "show" "logs" "auth-status")) (defconst cj/gh--write-verbs '("create" "edit" "merge" "close" "reopen" "comment" "review" "upload" "set" "add" "remove" "rerun" "cancel" "delete" "fork" "archive" "unarchive" "lock" "unlock" "pin" "unpin" "ready" "draft" "rename" "transfer" "approve" "label" "assign")) (defconst cj/gh--blocked-verbs '(;; interactive / opens UI "login" "logout" "checkout" "clone" "ssh" "code" "edit-prompt" ;; modifies user config "alias" "config" "extension" ;; opens browser "browse")) (defconst cj/gh--blocked-flags '("--web" ; many commands "--paginate" ; can produce unbounded output "--input" ; gh api file upload "--editor")) ; opens editor (defconst cj/gh--irreversible-patterns '("\\`repo delete\\b" "\\`release delete\\b" "\\`secret delete\\b" "\\`ssh-key delete\\b" "\\`gpg-key delete\\b" "\\`org delete\\b" "\\`project delete\\b" "\\`variable delete\\b" "\\`ruleset delete\\b" "\\`label delete\\b.*--yes")) #+end_src Classifier: #+begin_src emacs-lisp (defun cj/gh--classify (argv) "Return one of: read, write, destructive, blocked, unknown." (let* ((stripped (cj/gh--strip-flags argv)) (resource (car stripped)) (verb (cadr stripped))) (cond ;; Hard blocks first. ((cj/gh--has-blocked-verb-p argv) 'blocked) ((cj/gh--has-blocked-flag-p argv) 'blocked) ((cj/gh--has-file-arg-p argv) 'blocked) ; -F key=@file ((cj/gh--matches-irreversible-p argv) 'destructive) ;; gh api is special. ((string= resource "api") (cj/gh--classify-api argv)) ;; Verb match. ((member verb cj/gh--read-verbs) 'read) ((member verb cj/gh--write-verbs) 'write) (t 'unknown)))) (defun cj/gh--classify-api (argv) "Classify a `gh api ...' invocation. Reads: explicit method GET/HEAD with no -f/-F/--input. Writes: any -f/-F/--field/--raw-field/--input, OR explicit non-GET/HEAD method. Default (no method, no field): read (matches gh's GET default)." (let* ((explicit-method (or (cadr (member "-X" argv)) (cadr (member "--method" argv)))) (has-field (cl-some (lambda (f) (cl-some (lambda (a) (string-prefix-p f a)) argv)) '("-f" "--raw-field" "-F" "--field"))) (has-input (member "--input" argv))) (cond (has-input 'blocked) ; file exfiltration ((and explicit-method (not (member (upcase explicit-method) '("GET" "HEAD")))) 'write) (has-field 'write) ; -f/-F auto-promotes to POST ((and explicit-method (member (upcase explicit-method) '("GET" "HEAD"))) 'read) (t 'read)))) #+end_src Classifier-driven policy table: | Classification | Policy | |----------------+--------| | =read= | auto-execute | | =write= | =:confirm t= on registration; agent's call shows in confirm prompt | | =destructive= | hard-block; return =:error-kind 'irreversible-blocked= | | =blocked= | hard-block; return =:error-kind 'policy-blocked= with reason | | =unknown= | =:confirm t= (fail closed) | ** Safety policy V1 uniform policy applied to both hosts: #+begin_src emacs-lisp (defcustom cj/gh-policy '((read . auto) (write . confirm) (destructive . block) (blocked . block) (unknown . confirm)) "Per-classification policy. V1 applies uniformly to both hosts." :type '(alist :key-type symbol :value-type symbol) :group 'cj) #+end_src Since GPTel's =:confirm t= flag is per-tool-registration (not per-call), the general tool is registered with =:confirm t= always. The wrappers are registered per their classification: | Tool | Registered with | |------+-----------------| | Read wrappers (=gh_pr_view=, etc.) | =:confirm nil= | | =gh= general tool | =:confirm t= | The general tool then applies the policy table at invocation time: reads execute without further prompting (already past GPTel's confirm because :confirm t fires once); writes show detail before invocation; destructive/blocked never reach gh. ** General tool: =gh= Single tool registered with =:confirm t= covering everything the wrappers don't. Schema: | Arg | Type | Required | Purpose | |-----+------+----------+---------| | =args= | array of string | yes | argv list, e.g. =["pr", "view", "171", "--repo", "deepsat.ghe.com/org/repo"]= | | =hostname= | string | no | Override host for commands that accept =--hostname= (=api=, =auth status=); ignored otherwise | | =repo= | string | no | =[HOST/]OWNER/REPO= for repo-scoped commands; if HOST is present in repo, hostname arg is overridden | | =cwd= | string | no | Working directory; defaults to current buffer; must be under =$HOME=, must not be TRAMP | | =timeout= | integer | no | Seconds before kill; default 20, max 120 | Description (registered) explicitly says: #+begin_example Use this only when no task-specific gh_* tool fits. Prefer gh_pr_view, gh_pr_list, gh_pr_checks, gh_issue_view, gh_issue_list, gh_run_view, gh_run_list, gh_run_logs_failed, gh_repo_view, gh_search_prs, gh_search_issues, gh_api_get. Writes (create/edit/merge/etc.) require user confirmation. Destructive (repo/release/secret delete) are hard-blocked. Interactive commands (auth login, codespace ssh, --web, browse) are hard-blocked. File uploads (api --input, -F @file, release upload, gist create) are hard-blocked. #+end_example ** Wrapper inventory Twelve wrappers grouped by resource. Each has typed args, JSON field defaults, output truncation, timeout override, and a description that names its scope. | Tool | gh command | Defaults | Timeout | |------+------------+----------+---------| | =gh_repo_view= | =repo view --json= | JSON fields: =name,nameWithOwner,description,defaultBranchRef,url,visibility= | 20s | | =gh_pr_view= | =pr view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | | =gh_pr_list= | =pr list --json= | JSON fields: =number,title,state,author,createdAt,headRefName=; default =--limit 30= (capped at 100) | 20s | | =gh_pr_diff= | =pr diff N --color never= | text only; capped at 64 KB | 60s | | =gh_pr_checks= | =pr checks N --json= | JSON fields: =name,status,conclusion,startedAt,completedAt,link= | 20s | | =gh_pr_files= | =pr view N --json files= | JSON fields: =files= (path, additions, deletions, mode) | 20s | | =gh_pr_current= | =pr view --json= (no number — auto-detect) | Same as =gh_pr_view= | 20s | | =gh_issue_view= | =issue view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | | =gh_issue_list= | =issue list --json= | JSON fields: =number,title,state,author,createdAt,labels=; default =--limit 30= | 20s | | =gh_run_view= | =run view RUN-ID --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch,event,url= | 20s | | =gh_run_list= | =run list --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch=; default =--limit 20= | 20s | | =gh_run_logs_failed= | =run view RUN-ID --log-failed= | text only; capped at 64 KB | 60s | | =gh_search_prs= | =search prs --json= | JSON fields: =number,title,state,author,repository,url=; default =--limit 30= (capped at 100) | 30s | | =gh_search_issues= | =search issues --json= | Same as PR search | 30s | | =gh_api_get= | =api ENDPOINT --method GET= | text/JSON pass-through; rejects fields/input args | 30s | Common args for every wrapper (unless noted): | Arg | Type | Required | Purpose | |-----+------+----------+---------| | =repo= | string | no | =[HOST/]OWNER/REPO=; resolved from context otherwise | | =hostname= | string | no | Override host for context resolution | | =limit= | integer | no | =--limit= for list/search wrappers; clamped to per-wrapper max | The =gh_api_get= wrapper *explicitly* rejects =-f=, =-F=, =--field=, =--raw-field=, =--input=, =--method=, and any method override. It only accepts =ENDPOINT= and optional =-H= headers that don't carry secrets (the runner redacts =Authorization:= and similar regardless). Writes via API go through the general tool with confirmation. ** JSON field defaults Per-wrapper defaults are minimal -- enough for the agent to decide whether to drill in, not so much that one call fills the context. For =gh_pr_view= specifically: - *Default* (no =fields= override): the small list above (number, title, state, author, createdAt, url, body-truncated-to-2KB). - *Override*: agent passes =fields= as a comma-separated string; wrapper validates against a per-resource allowlist (so the agent can't request =reviews,comments,files= in one call to bypass the cap). - *Include flags*: =include-body t/nil=, =include-comments t/nil=, =include-reviews t/nil= as boolean args. Each adds the corresponding JSON field; agent opts in only when needed. ** Output truncation Process filter pattern, not post-hoc cap: #+begin_src emacs-lisp (defun cj/gh--make-filter (state-var) "Return a process filter that accumulates into STATE-VAR's :stdout, stops collecting after `cj/gh--max-bytes', sets :truncated, and sends SIGTERM to the process." (lambda (proc output) (let* ((state (symbol-value state-var)) (current (plist-get state :stdout)) (current-len (length current)) (remaining (- cj/gh--max-bytes current-len))) (cond ((<= remaining 0) nil) ; already at cap ((<= (length output) remaining) (plist-put state :stdout (concat current output))) (t (plist-put state :stdout (concat current (substring output 0 remaining))) (plist-put state :truncated t) (ignore-errors (delete-process proc))))))) #+end_src Truncation marker appended before return: #+begin_example [truncated at 64KB; use --limit, narrower fields, or a specific wrapper to reduce output] #+end_example ** Error classification + debug record Every call returns (and =cj/gh-tool-last-error= caches) a debug record: #+begin_src emacs-lisp (defun cj/gh--debug-record (argv context exit-code stdout stderr duration-ms truncated) (list :host (plist-get context :host) :repo (cj/gh--repo-arg context) :cwd default-directory :argv (cj/gh--redact-argv argv) :classification (cj/gh--classify argv) :policy (cj/gh--policy-decision argv) :exit-code exit-code :duration-ms duration-ms :bytes-captured (length stdout) :truncated truncated :error-kind (cj/gh--error-kind exit-code stderr))) #+end_src =:error-kind= mapping: | Condition | =:error-kind= | Returned message | |-----------+---------------+------------------| | Exit 4 + "authentication required" | =auth= | "Run =gh auth login --hostname HOST= in a terminal." | | Process killed by timeout timer | =timeout= | "Command exceeded N seconds; narrow the query or use a more specific wrapper." | | Policy block | =policy-blocked= | "Blocked by V1 policy: REASON." | | Irreversible match | =irreversible-blocked= | "Hard-blocked irreversible command: COMMAND." | | Truncated output | =truncated= | "Output truncated at 64KB; reduce scope." | | TRAMP cwd | =remote-cwd= | "Cannot run gh from remote directory: CWD." | | Missing executable | =no-executable= | "gh not found at CJ/GH--EXECUTABLE; install via 'pacman -S github-cli' (or equivalent)." | | Other non-zero exit | =gh-exit= | (raw stderr, redacted) | Each error includes a sanitized reproduce line: #+begin_example Reproduce: GH_PROMPT_DISABLED=1 GH_PAGER=cat gh pr view 171 --repo HOST/OWNER/REPO #+end_example (Secrets, body text, file paths redacted via =cj/gh--redact-argv=.) ** Audit log (V1, opt-out) Every call appends one line to =~/.emacs.d/data/gh-tool-log/YYYY-MM-DD.log=: #+begin_example 2026-05-16T14:23:45-0500 host=github.com repo=cjennings/dotemacs class=read policy=auto exit=0 duration=128ms bytes=4321 2026-05-16T14:24:02-0500 host=deepsat.ghe.com repo=org/repo class=write policy=confirm exit=0 duration=412ms bytes=88 #+end_example Metadata only, not output bodies. Defcustom =cj/gh-tool-audit-log-enabled= (default =t=). Daily rotation implicit (one file per day). Cleanup manual. ** Secrets redaction =cj/gh--redact-argv= masks: - Anything after =--token=, =--secret=, =--password= flags. - Authorization headers (=-H "Authorization: ..."=). - =--figma-api-key=KEY= (in case general tool spawns figma-mcp somehow). - Bearer tokens in URLs (=?token=...=). - Values for =-f=/=-F= keys named like =body=, =text=, =description= (private content; metadata still logged). Applied to: - All stderr returned to the agent. - All audit-log lines. - All debug records. - The reproduce line on error. * Commands & UX ** =cj/gh-doctor= Diagnostic command. No side effects. Checks: - =gh= executable found at =cj/gh--executable=. - =gh --version= meets =cj/gh--min-version=. - =gh auth status= for each host in =cj/gh-known-hosts=. - Current buffer's cwd repo detection: resolves to which host/repo? - Environment overrides effective (=GH_PROMPT_DISABLED= etc. would be set by the runner). - Active account per host. - Warnings if any block-list-relevant env is set externally (e.g. user already has =GH_PAGER= set to something that pages). Output: a buffer with PASS / FAIL / WARN per check + recovery actions for failures. ** =cj/gh-tool-last-error= Opens a buffer showing the last call's debug record: #+begin_example Host: deepsat.ghe.com Repo: org/repo Source: upstream CWD: ~/projects/work/foo Argv: ("pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo") Classification: read Policy: auto Exit code: 0 Duration: 412 ms Bytes captured: 4321 Truncated: no Error kind: none Reproduce: GH_PROMPT_DISABLED=1 GH_PAGER=cat NO_COLOR=1 \ gh pr view 171 --repo deepsat.ghe.com/org/repo #+end_example ** Tool response header Every tool result begins with a one-line header so cross-host / policy decisions are visible: #+begin_example [gh github.com/cjennings/dotemacs read ok 4.3KB] { ... } [gh deepsat.ghe.com/org/repo write confirmed 0.2KB] { ... } [gh github.com/cjennings/dotemacs api blocked policy-blocked] Error: Hard-blocked file-upload path: --input file. Reproduce: gh api repos/cjennings/dotemacs/contents/foo --input file #+end_example * Implementation Plan Eight phases. Each ends with green ERT tests + manual smoke before the next. ** Phase 1 -- Common helpers + context resolver =gh-common.el=: =cj/gh--executable=, =cj/gh--available-p=, =cj/gh--version= (cached), =cj/gh--validate-cwd= (HOME + non-TRAMP), =cj/gh--parse-remote-url=, =cj/gh--resolve-context=, =cj/gh--redact-argv=. No subprocess execution yet. Tests cover all helpers against fixture remote URLs and synthetic git directories. No real gh calls. ** Phase 2 -- Runner with subprocess env + timeout + in-flight cap =cj/gh--run=, the process filter, the timer kill path, env-var setup. Tests stub =make-process= to simulate output / exit / hang / truncation paths. Acceptance: with stub configured to produce 100 KB of output, returned stdout is exactly 64 KB plus the truncation marker, and the process gets SIGTERM. ** Phase 3 -- Classifier + blocklist + policy =cj/gh--classify=, =cj/gh--classify-api=, blocklist constants, =cj/gh--policy-decision=. Tests for every blocklist pattern (verbs + flags + file-arg paths), API edge cases (=-f= promotes to POST, =--method GET -f x=y= stays GET, etc.), and the unknown-fails-closed contract. ** Phase 4 -- Read wrappers (5 first) =gh_repo_view=, =gh_pr_view=, =gh_pr_list=, =gh_pr_diff=, =gh_issue_view=. Each is a thin schema + argv builder + delegate to =cj/gh--run=. Tests verify argv shape for typical args. Manual smoke against both hosts. First real gh calls. ** Phase 5 -- Remaining wrappers + JSON defaults Eight more wrappers (=gh_pr_checks=, =gh_pr_files=, =gh_pr_current=, =gh_issue_list=, =gh_run_view=, =gh_run_list=, =gh_run_logs_failed=, =gh_search_prs=, =gh_search_issues=, =gh_api_get=). JSON-field defaults per wrapper. Tests for the =gh_api_get= flag-rejection contract. ** Phase 6 -- General tool =gh.el= general tool registration with =:confirm t=, blocklist enforcement, policy decision applied before invocation. Tests verify confirmation gate (stub gptel's confirm flow), blocked commands never reach =cj/gh--run=, destructive commands return =:irreversible-blocked= without prompting. ** Phase 7 -- UX: doctor, last-error, response header =cj/gh-doctor=, =cj/gh-tool-last-error=, response header formatting. Audit log writer. Defcustom for log enable. ** Phase 8 -- Loader wiring + integration Add the 16 feature names to =cj/gptel-local-tool-features= (one per wrapper + the general tool + the helpers feature). Verify they land in =gptel-tools=. * Test Plan Target: 55-70 ERT tests across four files. No real subprocesses, no real network, no real =~/.claude.json= (gh tools don't use it, but the no-real-process rule applies uniformly). ** =tests/test-gh-common.el= -- pure helpers (~25 tests) - =cj/gh--parse-remote-url=: ssh-scp, ssh-url, https, with/without =.git=, with/without trailing slash, unknown host returns =:host nil=, =github.com.evil.example= does NOT match =github.com=. - =cj/gh--resolve-context=: explicit repo wins; explicit hostname wins over remote; upstream beats origin; origin beats default; default fires when no git. - =cj/gh--validate-cwd=: HOME-rooted ok; outside HOME errors; TRAMP errors; non-directory errors. - =cj/gh--redact-argv=: =--token=, =-H "Authorization: ..."=, =--figma-api-key=, =-f body=...= → body redacted; sentinel =REDACTED_TEST_SECRET= never appears in any output of any helper. - =cj/gh--available-p=: nil when =executable-find= fails; t otherwise. - =cj/gh--version=: caches per session; floor check rejects too-old. ** =tests/test-gh-runner.el= -- runner contract (~15 tests) Stub =make-process=: - Normal exit 0 with short output: returned verbatim, no truncation flag. - Long output (100 KB stub): truncated at 64 KB exactly, truncation marker present, =:truncated t=. - Process hangs past timeout: timer fires, SIGTERM sent, returns =:error-kind 'timeout=. - Exit 4 + auth stderr: returns =:error-kind 'auth= with recovery message. - TRAMP cwd: never invokes =make-process=, returns =:error-kind 'remote-cwd=. - Missing executable: returns =:error-kind 'no-executable=. - Environment: =process-environment= includes all six required vars before =make-process= call. - Argv with leading "gh" stripped + logged. ** =tests/test-gh-classifier.el= -- policy logic (~20 tests) - Every read verb classifies as read. - Every write verb classifies as write. - Every destructive pattern matches. - Every blocked verb (=login=, =browse=, =clone=, etc.) classifies as blocked. - Every blocked flag (=--web=, =--paginate=, =--input=) classifies as blocked. - File-upload (=-F key=@/path=) classifies as blocked. - =gh api= GET (no fields): read. - =gh api= GET (=-f x=y=): write (auto-POST per gh's rules). - =gh api --method GET -f q=x=: read (explicit override). - =gh api -X DELETE=: destructive. - =gh api -X PATCH=: write. - =gh api --input file=: blocked. - Unknown verb (=gh frobnicate=): unknown → confirm. - =-H= as branch (=gh pr list -H feature=) doesn't trigger host treatment. - =-H= as header (=gh api -H Accept:json=) doesn't trigger host treatment. - Policy decision: read → auto; write → confirm; destructive → block; blocked → block; unknown → confirm. ** =tests/test-gh-wrappers.el= -- per-wrapper builders + schemas (~15 tests) - Every wrapper's schema is valid (correct =:name=, =:type=, =:description=, =:args= shape). - =gh_pr_view= argv with number 171 and repo "host/o/r" produces =("pr" "view" "171" "--repo" "host/o/r" "--json" "DEFAULTS")=. - =gh_pr_diff= rejects =format='json=. - =gh_api_get= rejects =-f=, =-F=, =--field=, =--raw-field=, =--input=, =--method= other than GET. - =gh_pr_current= invokes without a number arg (uses cwd). - =limit= clamped to per-wrapper max. - =fields= validated against per-resource allowlist. ** Manual smoke (every phase) | Phase | Smoke | |-------+-------| | 4 | =gh_pr_view N= against both hosts | | 4 | =gh_pr_list= in =~/.emacs.d= → uses github.com/cjennings/dotemacs | | 5 | =gh_pr_checks= shows CI status without full logs | | 5 | =gh_run_logs_failed= cap kicks in on a long-failed run | | 5 | =gh_api_get= rejects a =-f= arg with clear error | | 6 | =gh= general tool: agent asked to merge a PR triggers GPTel confirm prompt | | 6 | Agent asked to =gh repo delete= gets irreversible-blocked | | 6 | Agent asked to =gh --web=... gets policy-blocked | | 7 | =cj/gh-doctor= correctly identifies an unauthenticated host | | 7 | =cj/gh-tool-last-error= shows debug record after a failing call | ** Opt-in integration suite A small set of real-gh tests (in =tests/test-gh-integration.el= marked =:tag :integration=, default skipped): - =gh auth status --hostname github.com= ok. - =gh auth status --hostname deepsat.ghe.com= ok. - =gh repo view cjennings/dotemacs --json name= returns parseable JSON. - =gh pr list --repo cjennings/dotemacs --limit 1= returns ≤ 1 PR. Run manually via =make test-name TEST=gh-integration=. * Acceptance Criteria 1. *Argv contract.* No tool produces a command string for execution; every call goes through the argv-list runner. 2. *No silent writes.* Every classified write either prompts GPTel's confirm or hard-blocks. Verified by an end-to-end test where the agent attempts =pr merge= and the test fails if =make-process= is invoked before confirm. 3. *In-flight cap.* A stubbed process emitting 1 MB returns exactly 64 KB; the runner never holds more than 65 KB in memory. 4. *Host visibility.* Every successful tool response begins with =[gh HOST/REPO ...]=. Verified by a test that greps the response text. 5. *Doctor coverage.* =cj/gh-doctor= correctly identifies (a) no gh executable, (b) too-old gh, (c) unauthenticated host, (d) non-git cwd, (e) git cwd whose remote points to an unknown host. 6. *No secret leakage.* Test fixtures containing =REDACTED_TEST_SECRET= in every secret-bearing slot (=--token=, =-H Authorization=, =-f body=, etc.) produce zero matches when grepping audit log, debug record, and any user-facing message. * Risks ** R1 -- gh CLI evolves and verbs drift New =gh= versions may add subcommands the blocklist doesn't cover. Or rename verbs. *Mitigation:* the blocklist works on verbs (not full subcommand paths) so most additions are caught. Doctor includes a =gh --version= floor. Periodic review when gh bumps a major version. ** R2 -- The "all reads auto-execute" default may still be too broad Some reads expose private content (issue bodies, PR descriptions from private repos). An agent surfacing a confidential issue body into a saved conversation has data-leak implications. *Mitigation:* response header makes the host/repo visible in every result, so the saved conversation makes the privacy boundary auditable. Wrappers truncate body/comments by default; agent must explicitly opt-in to include them. Documented in commentary. ** R3 -- The general tool's =:confirm t= prompt may become click-fatigue If the agent uses the general tool heavily during a write-heavy workflow (PR creation, label management), confirming every call becomes tedious. *Mitigation:* the expanded wrapper set covers most reads, so the general tool fires mainly for writes -- where confirmation is exactly the right behavior. If usage shows confirm fatigue, V2's per-host policy can add =:auto= for explicit write-confirmed contexts. ** R4 -- =--paginate= block conflicts with legitimate large queries Blocking =--paginate= globally means the agent can't get historical CI runs (which may need pagination). *Mitigation:* =gh_run_list= and =gh_search_*= accept a clamped =--limit= which usually substitutes. If a use case needs more, the agent can request multiple non-paginated pages explicitly. ** R5 -- Token expiry surfaces as cryptic exit 4 When a host's keyring entry expires, every call returns exit 4 with "authentication required" on stderr. The agent sees the error but may not realize the fix is interactive. *Mitigation:* the runner's =:error-kind 'auth= mapping prepends the recovery message before returning to the agent. =cj/gh-doctor= proactively checks auth status. ** R6 -- TRAMP cwd silently runs gh remotely Without the explicit TRAMP rejection, =process-file= would try to spawn =gh= on the remote host (where it may not exist, or may authenticate against the wrong keyring). *Mitigation:* runner checks =(file-remote-p default-directory)= first and returns =:error-kind 'remote-cwd= with a clear message. ** R7 -- =gh search= behaves differently on GHE GHE may not support every advanced search operator =github.com= does. Search wrappers may return inconsistent results across hosts. *Mitigation:* documented in =gh_search_*= wrapper descriptions. Result header makes the host visible so the agent can adjust. ** R8 -- Audit log grows unbounded One file per day, but no automatic cleanup. *Mitigation:* metadata-only entries are tiny (~150 bytes); a year of heavy use is a few MB. Manual cleanup acceptable. Defcustom to disable for users who don't want it. * Open Questions ** Q1 -- Should the general tool's confirmation prompt include the classification? When GPTel asks "Run gh tool? (y/n)" the prompt shows the argv but not the classification. Showing "WRITE: gh pr merge 171" gives the user more context. Need to investigate gptel's confirm-prompt extensibility. ** Q2 -- Should =gh_pr_diff= cap differently from text wrappers? A PR diff can legitimately be 100KB+ for a large refactor. The 64KB cap is the same as everywhere else. If diffs need a higher cap (256KB?), that's per-wrapper config. ** Q3 -- Should wrappers expose =include-body=, =include-comments=, etc., as separate args, or as a comma-separated list? The spec proposes separate boolean args (=:include-body t=, =:include-comments t=). Alternative: one =:include= comma-list arg. Separate args are more discoverable; comma-list is more compact. Decide during Phase 4. ** Q4 -- Should =cj/gh-tool-audit-log= grow into a query interface? V1 writes one line per call. Future: a command to query the log (=cj/gh-audit-search REGEX=) for surfacing "what did the agent do to this PR last week?". * V2 Roadmap Items intentionally deferred: - *Per-host policy.* =cj/gh-host-policy= alist keyed by hostname (mirror of the MCP spec's structure) so work GHE can be read-only while personal allows writes-with-confirm. - *Conversation context injection.* After a PR view, the wrapper inserts a "GitHub context: HOST/REPO PR #N at URL" line into the GPTel buffer so saved conversations stay traceable without bundling full output. - *Local/remote bridge helpers.* current-branch → PR-number, changed-files → matching PR file comments, etc. - *Artifact download.* =gh_run_artifacts=, =gh_release_download= with explicit confirm and write to a controlled directory. - *Async general tool.* =make-process= + sentinel for the cases where 60s timeout isn't enough (rare, but real for some =--paginate= scenarios). - *Audit log query interface.* =cj/gh-audit-search=, =cj/gh-audit-by-host=. - *Profile-based tool subsets.* e.g. read-only profile vs. write-capable profile per buffer. * References - [[file:../../gptel-tools/git_log.el][gptel-tools/git_log.el]] -- pattern reference for new tool files. - [[file:../../modules/ai-config.el][modules/ai-config.el]] -- =gptel-confirm-tool-calls nil= at line 386; loader at lines 71-96. - [[file:gptel-agentic-tool-ideas.org][gptel-agentic-tool-ideas.org]] -- broader agentic-tool design; =gh= sits alongside the MCP integration as the collaboration tier. - [[file:mcp-el-gptel-integration.org][mcp-el-gptel-integration.org]] -- sibling design; same confirm-on-write pattern for safety. - [[https://cli.github.com/manual/][gh CLI manual]] -- subcommand reference. - =gh --version 2.92.0= help output -- verified flag semantics per subcommand. - =gh help environment= -- verified env-var names for non-interactive mode.