diff options
Diffstat (limited to 'docs/design/gptel-gh-tool.org')
| -rw-r--r-- | docs/design/gptel-gh-tool.org | 1061 |
1 files changed, 0 insertions, 1061 deletions
diff --git a/docs/design/gptel-gh-tool.org b/docs/design/gptel-gh-tool.org deleted file mode 100644 index a9ba22bb1..000000000 --- a/docs/design/gptel-gh-tool.org +++ /dev/null @@ -1,1061 +0,0 @@ -#+TITLE: Design: Wrap the gh CLI as a GPTel tool -#+AUTHOR: Craig Jennings -#+DATE: 2026-05-16 -#+OPTIONS: toc:nil num:nil - -* Status - -Draft (revision 2). Pre-implementation; no code shipped yet. =gh= -v2.92.0 installed and authenticated against both =github.com= (account -=cjennings=) and =deepsat.ghe.com= (account =craig-jennings=). - -Revision 2 incorporates a code-review pass that caught several -incorrect =gh= CLI assumptions and a critical safety gap: with -=gptel-confirm-tool-calls= set to =nil= in =modules/ai-config.el:386=, -the "irreversible-only blocklist" V1 from revision 1 would let the -agent run unconfirmed writes (=pr merge=, =run cancel=, =api -f -body=...=, =release create=, etc.). Revision 2 keeps Craig's intent -of a general-capability tool but applies =:confirm t= to every -classified write, drops the "scan command string for =--hostname= -and =-H=" approach in favor of argv-list builders, and expands the -wrapper set so most agent workflows don't need to fall back to the -general tool. - -* Problem - -GPTel agents can read git history locally (=git_log=, =git_diff=, -=git_status=) but have no access to the GitHub side of the workflow: -PRs, issues, reviews, CI runs, releases, gists, repo metadata. The -=gh= CLI does all of this and is already authenticated against both -hosts the user works with. Wrapping it as a GPTel tool gives the -agent the same GitHub surface the user has. - -The wrapper has to handle four complications the local git tools -don't: - -1. *Two authenticated hosts* -- personal (=github.com=) and work - GHE (=deepsat.ghe.com=) -- with different policies and different - blast radii for destructive operations. -2. *A vast subcommand surface* (~30 top-level subcommands, hundreds - of leaves) that doesn't fit cleanly into one typed schema per - leaf. -3. *Inconsistent flag semantics across subcommands.* =--hostname= - exists only on =gh api= and =gh auth status=; the rest use - =--repo [HOST/]OWNER/REPO=. =-H= means =--head= on =gh pr list= - but =--header= on =gh api=. Naive string-scanning to detect - user intent is wrong. -4. *Global GPTel setting* (=gptel-confirm-tool-calls nil=) means - the agent can call any registered tool without user confirmation - unless that specific tool is registered with =:confirm t=. V1 - safety must come from per-tool flags, not from a session-level - gate. - -* Goals - -1. The agent can invoke the non-interactive subset of =gh= against - either host, gated by a safety policy. -2. High-frequency reads have typed wrappers with sensible defaults - (host/repo resolution, JSON-by-default with minimal fields, - output cap, timeout). Wrappers auto-execute. -3. The active host is resolved from a single context object that - considers (in order): explicit =hostname=/=repo= args, branch - upstream remote, =origin= remote, =cj/gh-default-host=. Every - tool response prefixes the resolved host/repo so cross-host - mistakes are visible. -4. Writes via the general tool execute only after user - confirmation (=:confirm t= on the gptel tool registration). - Unknown classifications fail closed (=:confirm t=). - Irreversible commands hard-block. -5. Interactive commands (=auth login=, =codespace ssh=, =--web=, - =browse=, =pr checkout=, =repo clone=, =config set=, - =alias set=, =extension exec=) are hard-blocked in V1. -6. File exfiltration paths (=api --input=, =api -F key=@file=, - =release upload=, =gist create=) are hard-blocked in V1. -7. Output is capped *during capture*, not after, so a runaway - =--paginate= or =--log= can't fill memory or block Emacs. - Long-output commands have higher timeouts but the same byte - cap. -8. Every call produces a structured debug record (host, repo, cwd, - sanitized argv, classification, policy decision, exit code, - duration, bytes captured, truncation flag, error kind) inspectable - via =cj/gh-tool-last-error=. -9. =cj/gh-doctor= diagnoses missing prerequisites (=gh= - executable, version floor, auth per host, cwd repo detection, - environment overrides) before they fail at runtime. -10. V2 adds true per-host policy (currently uniform across both - hosts), profile-based tool subsets, and bridge helpers between - local git tools and the gh wrappers. - -* Non-Goals - -- *Interactive subcommands.* =gh auth login=, =gh codespace ssh=, - =gh pr checkout= (modifies working tree), =gh repo clone= - (writes outside controlled paths) are not in scope. The user - runs those in a real terminal. -- *Replicating gh extensions.* Core gh only. Extensions can be - invoked from a regular terminal if needed. -- *Streaming long-running output.* =gh run watch=, =gh - --paginate= for unbounded result sets, =gh repo clone= for large - repos. V1 blocks =--paginate= and =watch= verbs; future - versions may add streaming support. -- *Per-host write policies* (work-read-only / personal-full-write). - V1 applies the same write-confirmation policy to both hosts. - V2 makes the policy host-keyed. -- *Conversation-context injection* and *local/remote bridge - helpers* (current branch → PR, changed files → PR comments). - Future enhancements; tracked as a follow-up. -- *Artifact download* (=gh run download=). Writes files; needs - its own confirmation flow. Deferred to V2. - -* Verified gh CLI Contracts - -These were checked against =gh --version 2.92.0= help output before -the spec was revised. Implementation can rely on them. - -** Host / repo selection per subcommand - -- =gh api= and =gh auth status=: use =--hostname HOST= (long form - only; no short form). -- =gh pr {view,list,diff,checks}=, =gh issue {view,list}=, - =gh run {view,list}=, =gh release *=, =gh search *=: use - =--repo [HOST/]OWNER/REPO= (short =-R=). -- =gh repo view= and similar root-level commands: infer from cwd - remote; no =--hostname= or =--repo= flag. -- Environment variables: =GH_HOST= and =GH_REPO= work for commands - that lack explicit flags. - -** Flags with overloaded short forms - -- =gh pr list -H FOO= → =--head FOO= (branch). -- =gh search prs -H FOO= → =--head FOO= (branch). -- =gh api -H KEY:VAL= → =--header KEY:VAL=. - -Substring scanning =-H= as "hostname" is wrong on every command -that uses it. - -** =gh api= method classification - -The help text explicitly says: "adding request parameters will -automatically switch the method to POST". This means: - -- =gh api repos/x/y/issues= → GET (read). -- =gh api repos/x/y/issues -f title=Foo= → POST (write). -- =gh api repos/x/y/issues -F body=@file= → POST (write + - exfiltration). -- =gh api --method GET repos/x/y/issues -f q=x= → GET (read, - explicit override). - -Classifier must inspect for =-f=, =-F=, =--field=, =--raw-field=, -=--input=, and =--method= before defaulting to GET. - -** Wrappers without =--json= support - -- =gh pr diff= produces patch output; no =--json= flag. Wrapper - is text-only. Structured file metadata available via - =gh pr view --json files= (new wrapper =gh_pr_files=). - -** Environment variables for non-interactive use - -All confirmed via =gh help environment=: - -- =GH_PROMPT_DISABLED= -- disable interactive prompting. -- =GH_PAGER= / =PAGER= -- set to =cat= so no pager fires. -- =GH_NO_UPDATE_NOTIFIER= -- silence the "new version" banner. -- =GH_NO_EXTENSION_UPDATE_NOTIFIER= -- same for extensions. -- =GH_SPINNER_DISABLED= -- silence the progress spinner. -- =NO_COLOR= -- disable ANSI color (cleaner LLM context). - -V1 subprocess env always includes all of the above. - -** Authentication - -#+begin_example -✓ Logged in to github.com account cjennings (keyring) -✓ Logged in to deepsat.ghe.com account craig-jennings (keyring) -#+end_example - -Both stored in the system keyring. =gh= reads the keyring per -call; the wrapper does not handle tokens. Expired auth surfaces -as exit code 4 with a clear stderr message; recovery is interactive -(=gh auth login --hostname HOST= in a terminal). - -* Current State - -** =modules/ai-config.el= - -- =gptel-confirm-tool-calls nil= at line 386. Per-tool - =:confirm t= is the only confirmation mechanism in V1. -- =cj/gptel-load-local-tools= (lines 71-96) loads tools from - =gptel-tools/=. Add new gh feature names to - =cj/gptel-local-tool-features=. - -** =gptel-tools/= - -Ten tools today. =git_log.el= is the closest analogue: validates -input, runs subprocess with =process-file=, caps output, signals -clear errors. The gh wrappers follow the same shape with shared -helpers in =gh-common.el=. - -* Design - -** File layout - -Two files plus per-wrapper registrations. Wrappers are -deliberately small (argv builders + JSON-field defaults) so they -all live in one file: - -- =gptel-tools/gh-common.el= -- shared helpers: host/repo context, - argv runner, subprocess env, classifier, blocklist, redaction, - debug record, last-error buffer. No tool registrations. -- =gptel-tools/gh.el= -- the general tool + every wrapper - registration. Pulls in =gh-common= via =require=. - -This matches Craig's preference for low loader surface (per the -MCP revision discussion). If =gh.el= grows past ~600 lines we -split per-resource later (=gh-pr.el=, =gh-issue.el=, etc.). - -** Host / repo resolution - -A single helper returns a structured context object used by every -tool entry point: - -#+begin_src emacs-lisp -(defun cj/gh--resolve-context (cwd hostname repo) - "Return a context plist for a gh invocation. -Resolution order: -1. Explicit REPO argument (`[HOST/]OWNER/REPO'). If it contains - `HOST/', split off the host. -2. Explicit HOSTNAME argument. -3. CWD's branch upstream remote URL (`git rev-parse --abbrev-ref - @{u}' then `git remote get-url REMOTE'). -4. CWD's `origin' remote URL. -5. `cj/gh-default-host'. - -Returns: - (:host HOST :owner OWNER :repo REPO :source SYM) -where SOURCE is one of: explicit-repo, explicit-host, upstream, -origin, default." - ...) -#+end_src - -URL parsing handles all three forms produced by git: - -- =git@HOST:OWNER/REPO.git= -- =ssh://git@HOST/OWNER/REPO.git= -- =https://HOST/OWNER/REPO.git= - -Returns =nil host= if the URL doesn't match a known host; the -caller uses =cj/gh-default-host= in that case. - -#+begin_src emacs-lisp -(defcustom cj/gh-default-host "github.com" - "Host used when no other resolution path yields one." - :type 'string - :group 'cj) - -(defcustom cj/gh-known-hosts '("github.com" "deepsat.ghe.com") - "Hosts the gh CLI is authenticated against." - :type '(repeat string) - :group 'cj) -#+end_src - -Context appears in every tool response as a one-line prefix: - -#+begin_example -[gh github.com/cjennings/dotemacs read ok 12.4KB] ... -[gh deepsat.ghe.com/org/repo write confirmed 0.8KB] ... -[gh github.com api unknown blocked] ... -#+end_example - -This makes cross-host accidents visible. - -** Argv-list builders, never command strings - -Every tool builds an explicit argv list and hands it to a shared -runner. Wrappers expose typed parameters that map directly into -argv positions: - -#+begin_src emacs-lisp -(defun cj/gh-pr-view--build-argv (context number include-comments fields) - "Return an argv list for `gh pr view NUMBER ...'." - (let ((repo-arg (format "%s/%s/%s" - (plist-get context :host) - (plist-get context :owner) - (plist-get context :repo)))) - (append (list "pr" "view" (number-to-string number) - "--repo" repo-arg) - (when include-comments '("--comments")) - (when fields (list "--json" fields))))) -#+end_src - -The general tool takes an =args= parameter typed as array-of-string -(not a shell string) so the agent constructs argv directly: - -#+begin_src emacs-lisp -;; agent passes: ["pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo"] -;; runner runs: gh pr view 171 --repo deepsat.ghe.com/org/repo -#+end_src - -If the agent accidentally includes leading =gh= in args[0], the -runner strips it and logs the normalization. - -** Subprocess contract - -One runner for everything: - -#+begin_src emacs-lisp -(defun cj/gh--run (argv &key timeout context) - "Run gh with ARGV (a list of strings). -Returns a plist: - (:exit-code N :stdout STR :stderr STR :truncated BOOL :duration-ms N - :argv ARGV :context CONTEXT) -Enforces TIMEOUT (default `cj/gh--default-timeout'). -Captures output in-flight, killing the process at -`cj/gh--max-bytes' to prevent runaway memory use." - ...) -#+end_src - -Contract: - -- *Env vars set every call:* =GH_PROMPT_DISABLED=1=, =GH_PAGER=cat=, - =PAGER=cat=, =NO_COLOR=1=, =GH_NO_UPDATE_NOTIFIER=1=, - =GH_NO_EXTENSION_UPDATE_NOTIFIER=1=, =GH_SPINNER_DISABLED=1=. -- *Timeout:* default 20 s for reads, 60 s for diffs/logs/search. - Per-tool override allowed but capped at 120 s. Timeout kills - the process and returns =:error-kind 'timeout=. -- *Output cap during capture:* a process filter accumulates bytes - up to =cj/gh--max-bytes= (64 KB). At the cap, the filter sets - =truncated=, ignores further output, and sends SIGTERM after a - short grace. Truncation marker appended to returned stdout. -- *TRAMP rejection:* if =(file-remote-p default-directory)= is - non-nil, return =:error-kind 'remote-cwd= without invoking gh. -- *Executable check:* if =(executable-find cj/gh--executable)= is - nil, return =:error-kind 'no-executable= with install - instructions. -- *Version check:* the first call per session verifies - =gh --version= meets =cj/gh--min-version= (default "2.50.0"); - cached for the session. - -** Read / write / destructive classifier - -Used to apply =:confirm t= and the irreversible blocklist: - -#+begin_src emacs-lisp -(defconst cj/gh--read-verbs - '("view" "list" "status" "search" "diff" "checks" "describe" - "show" "logs" "auth-status")) - -(defconst cj/gh--write-verbs - '("create" "edit" "merge" "close" "reopen" "comment" "review" - "upload" "set" "add" "remove" "rerun" "cancel" "delete" - "fork" "archive" "unarchive" "lock" "unlock" "pin" "unpin" - "ready" "draft" "rename" "transfer" "approve" "label" "assign")) - -(defconst cj/gh--blocked-verbs - '(;; interactive / opens UI - "login" "logout" "checkout" "clone" "ssh" "code" "edit-prompt" - ;; modifies user config - "alias" "config" "extension" - ;; opens browser - "browse")) - -(defconst cj/gh--blocked-flags - '("--web" ; many commands - "--paginate" ; can produce unbounded output - "--input" ; gh api file upload - "--editor")) ; opens editor - -(defconst cj/gh--irreversible-patterns - '("\\`repo delete\\b" - "\\`release delete\\b" - "\\`secret delete\\b" - "\\`ssh-key delete\\b" - "\\`gpg-key delete\\b" - "\\`org delete\\b" - "\\`project delete\\b" - "\\`variable delete\\b" - "\\`ruleset delete\\b" - "\\`label delete\\b.*--yes")) -#+end_src - -Classifier: - -#+begin_src emacs-lisp -(defun cj/gh--classify (argv) - "Return one of: read, write, destructive, blocked, unknown." - (let* ((stripped (cj/gh--strip-flags argv)) - (resource (car stripped)) - (verb (cadr stripped))) - (cond - ;; Hard blocks first. - ((cj/gh--has-blocked-verb-p argv) 'blocked) - ((cj/gh--has-blocked-flag-p argv) 'blocked) - ((cj/gh--has-file-arg-p argv) 'blocked) ; -F key=@file - ((cj/gh--matches-irreversible-p argv) 'destructive) - ;; gh api is special. - ((string= resource "api") (cj/gh--classify-api argv)) - ;; Verb match. - ((member verb cj/gh--read-verbs) 'read) - ((member verb cj/gh--write-verbs) 'write) - (t 'unknown)))) - -(defun cj/gh--classify-api (argv) - "Classify a `gh api ...' invocation. -Reads: explicit method GET/HEAD with no -f/-F/--input. -Writes: any -f/-F/--field/--raw-field/--input, OR explicit -non-GET/HEAD method. -Default (no method, no field): read (matches gh's GET default)." - (let* ((explicit-method (or (cadr (member "-X" argv)) - (cadr (member "--method" argv)))) - (has-field (cl-some - (lambda (f) (cl-some (lambda (a) (string-prefix-p f a)) - argv)) - '("-f" "--raw-field" "-F" "--field"))) - (has-input (member "--input" argv))) - (cond - (has-input 'blocked) ; file exfiltration - ((and explicit-method - (not (member (upcase explicit-method) '("GET" "HEAD")))) - 'write) - (has-field 'write) ; -f/-F auto-promotes to POST - ((and explicit-method - (member (upcase explicit-method) '("GET" "HEAD"))) - 'read) - (t 'read)))) -#+end_src - -Classifier-driven policy table: - -| Classification | Policy | -|----------------+--------| -| =read= | auto-execute | -| =write= | =:confirm t= on registration; agent's call shows in confirm prompt | -| =destructive= | hard-block; return =:error-kind 'irreversible-blocked= | -| =blocked= | hard-block; return =:error-kind 'policy-blocked= with reason | -| =unknown= | =:confirm t= (fail closed) | - -** Safety policy - -V1 uniform policy applied to both hosts: - -#+begin_src emacs-lisp -(defcustom cj/gh-policy - '((read . auto) - (write . confirm) - (destructive . block) - (blocked . block) - (unknown . confirm)) - "Per-classification policy. V1 applies uniformly to both hosts." - :type '(alist :key-type symbol :value-type symbol) - :group 'cj) -#+end_src - -Since GPTel's =:confirm t= flag is per-tool-registration (not -per-call), the general tool is registered with =:confirm t= -always. The wrappers are registered per their classification: - -| Tool | Registered with | -|------+-----------------| -| Read wrappers (=gh_pr_view=, etc.) | =:confirm nil= | -| =gh= general tool | =:confirm t= | - -The general tool then applies the policy table at invocation -time: reads execute without further prompting (already past -GPTel's confirm because :confirm t fires once); writes show -detail before invocation; destructive/blocked never reach gh. - -** General tool: =gh= - -Single tool registered with =:confirm t= covering everything the -wrappers don't. Schema: - -| Arg | Type | Required | Purpose | -|-----+------+----------+---------| -| =args= | array of string | yes | argv list, e.g. =["pr", "view", "171", "--repo", "deepsat.ghe.com/org/repo"]= | -| =hostname= | string | no | Override host for commands that accept =--hostname= (=api=, =auth status=); ignored otherwise | -| =repo= | string | no | =[HOST/]OWNER/REPO= for repo-scoped commands; if HOST is present in repo, hostname arg is overridden | -| =cwd= | string | no | Working directory; defaults to current buffer; must be under =$HOME=, must not be TRAMP | -| =timeout= | integer | no | Seconds before kill; default 20, max 120 | - -Description (registered) explicitly says: - -#+begin_example -Use this only when no task-specific gh_* tool fits. Prefer -gh_pr_view, gh_pr_list, gh_pr_checks, gh_issue_view, gh_issue_list, -gh_run_view, gh_run_list, gh_run_logs_failed, gh_repo_view, -gh_search_prs, gh_search_issues, gh_api_get. - -Writes (create/edit/merge/etc.) require user confirmation. -Destructive (repo/release/secret delete) are hard-blocked. -Interactive commands (auth login, codespace ssh, --web, browse) -are hard-blocked. File uploads (api --input, -F @file, release -upload, gist create) are hard-blocked. -#+end_example - -** Wrapper inventory - -Twelve wrappers grouped by resource. Each has typed args, JSON -field defaults, output truncation, timeout override, and a -description that names its scope. - -| Tool | gh command | Defaults | Timeout | -|------+------------+----------+---------| -| =gh_repo_view= | =repo view --json= | JSON fields: =name,nameWithOwner,description,defaultBranchRef,url,visibility= | 20s | -| =gh_pr_view= | =pr view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | -| =gh_pr_list= | =pr list --json= | JSON fields: =number,title,state,author,createdAt,headRefName=; default =--limit 30= (capped at 100) | 20s | -| =gh_pr_diff= | =pr diff N --color never= | text only; capped at 64 KB | 60s | -| =gh_pr_checks= | =pr checks N --json= | JSON fields: =name,status,conclusion,startedAt,completedAt,link= | 20s | -| =gh_pr_files= | =pr view N --json files= | JSON fields: =files= (path, additions, deletions, mode) | 20s | -| =gh_pr_current= | =pr view --json= (no number — auto-detect) | Same as =gh_pr_view= | 20s | -| =gh_issue_view= | =issue view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s | -| =gh_issue_list= | =issue list --json= | JSON fields: =number,title,state,author,createdAt,labels=; default =--limit 30= | 20s | -| =gh_run_view= | =run view RUN-ID --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch,event,url= | 20s | -| =gh_run_list= | =run list --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch=; default =--limit 20= | 20s | -| =gh_run_logs_failed= | =run view RUN-ID --log-failed= | text only; capped at 64 KB | 60s | -| =gh_search_prs= | =search prs --json= | JSON fields: =number,title,state,author,repository,url=; default =--limit 30= (capped at 100) | 30s | -| =gh_search_issues= | =search issues --json= | Same as PR search | 30s | -| =gh_api_get= | =api ENDPOINT --method GET= | text/JSON pass-through; rejects fields/input args | 30s | - -Common args for every wrapper (unless noted): - -| Arg | Type | Required | Purpose | -|-----+------+----------+---------| -| =repo= | string | no | =[HOST/]OWNER/REPO=; resolved from context otherwise | -| =hostname= | string | no | Override host for context resolution | -| =limit= | integer | no | =--limit= for list/search wrappers; clamped to per-wrapper max | - -The =gh_api_get= wrapper *explicitly* rejects =-f=, =-F=, -=--field=, =--raw-field=, =--input=, =--method=, and any method -override. It only accepts =ENDPOINT= and optional =-H= headers -that don't carry secrets (the runner redacts =Authorization:= and -similar regardless). Writes via API go through the general tool -with confirmation. - -** JSON field defaults - -Per-wrapper defaults are minimal -- enough for the agent to decide -whether to drill in, not so much that one call fills the context. - -For =gh_pr_view= specifically: - -- *Default* (no =fields= override): the small list above (number, - title, state, author, createdAt, url, body-truncated-to-2KB). -- *Override*: agent passes =fields= as a comma-separated string; - wrapper validates against a per-resource allowlist (so the agent - can't request =reviews,comments,files= in one call to bypass the - cap). -- *Include flags*: =include-body t/nil=, =include-comments t/nil=, - =include-reviews t/nil= as boolean args. Each adds the - corresponding JSON field; agent opts in only when needed. - -** Output truncation - -Process filter pattern, not post-hoc cap: - -#+begin_src emacs-lisp -(defun cj/gh--make-filter (state-var) - "Return a process filter that accumulates into STATE-VAR's :stdout, -stops collecting after `cj/gh--max-bytes', sets :truncated, and -sends SIGTERM to the process." - (lambda (proc output) - (let* ((state (symbol-value state-var)) - (current (plist-get state :stdout)) - (current-len (length current)) - (remaining (- cj/gh--max-bytes current-len))) - (cond - ((<= remaining 0) nil) ; already at cap - ((<= (length output) remaining) - (plist-put state :stdout (concat current output))) - (t - (plist-put state :stdout - (concat current (substring output 0 remaining))) - (plist-put state :truncated t) - (ignore-errors (delete-process proc))))))) -#+end_src - -Truncation marker appended before return: - -#+begin_example -[truncated at 64KB; use --limit, narrower fields, or a specific -wrapper to reduce output] -#+end_example - -** Error classification + debug record - -Every call returns (and =cj/gh-tool-last-error= caches) a debug -record: - -#+begin_src emacs-lisp -(defun cj/gh--debug-record (argv context exit-code stdout stderr - duration-ms truncated) - (list :host (plist-get context :host) - :repo (cj/gh--repo-arg context) - :cwd default-directory - :argv (cj/gh--redact-argv argv) - :classification (cj/gh--classify argv) - :policy (cj/gh--policy-decision argv) - :exit-code exit-code - :duration-ms duration-ms - :bytes-captured (length stdout) - :truncated truncated - :error-kind (cj/gh--error-kind exit-code stderr))) -#+end_src - -=:error-kind= mapping: - -| Condition | =:error-kind= | Returned message | -|-----------+---------------+------------------| -| Exit 4 + "authentication required" | =auth= | "Run =gh auth login --hostname HOST= in a terminal." | -| Process killed by timeout timer | =timeout= | "Command exceeded N seconds; narrow the query or use a more specific wrapper." | -| Policy block | =policy-blocked= | "Blocked by V1 policy: REASON." | -| Irreversible match | =irreversible-blocked= | "Hard-blocked irreversible command: COMMAND." | -| Truncated output | =truncated= | "Output truncated at 64KB; reduce scope." | -| TRAMP cwd | =remote-cwd= | "Cannot run gh from remote directory: CWD." | -| Missing executable | =no-executable= | "gh not found at CJ/GH--EXECUTABLE; install via 'pacman -S github-cli' (or equivalent)." | -| Other non-zero exit | =gh-exit= | (raw stderr, redacted) | - -Each error includes a sanitized reproduce line: - -#+begin_example -Reproduce: GH_PROMPT_DISABLED=1 GH_PAGER=cat gh pr view 171 --repo HOST/OWNER/REPO -#+end_example - -(Secrets, body text, file paths redacted via =cj/gh--redact-argv=.) - -** Audit log (V1, opt-out) - -Every call appends one line to -=~/.emacs.d/data/gh-tool-log/YYYY-MM-DD.log=: - -#+begin_example -2026-05-16T14:23:45-0500 host=github.com repo=cjennings/dotemacs class=read policy=auto exit=0 duration=128ms bytes=4321 -2026-05-16T14:24:02-0500 host=deepsat.ghe.com repo=org/repo class=write policy=confirm exit=0 duration=412ms bytes=88 -#+end_example - -Metadata only, not output bodies. Defcustom -=cj/gh-tool-audit-log-enabled= (default =t=). Daily rotation -implicit (one file per day). Cleanup manual. - -** Secrets redaction - -=cj/gh--redact-argv= masks: - -- Anything after =--token=, =--secret=, =--password= flags. -- Authorization headers (=-H "Authorization: ..."=). -- =--figma-api-key=KEY= (in case general tool spawns figma-mcp - somehow). -- Bearer tokens in URLs (=?token=...=). -- Values for =-f=/=-F= keys named like =body=, =text=, - =description= (private content; metadata still logged). - -Applied to: -- All stderr returned to the agent. -- All audit-log lines. -- All debug records. -- The reproduce line on error. - -* Commands & UX - -** =cj/gh-doctor= - -Diagnostic command. No side effects. Checks: - -- =gh= executable found at =cj/gh--executable=. -- =gh --version= meets =cj/gh--min-version=. -- =gh auth status= for each host in =cj/gh-known-hosts=. -- Current buffer's cwd repo detection: resolves to which host/repo? -- Environment overrides effective (=GH_PROMPT_DISABLED= etc. would - be set by the runner). -- Active account per host. -- Warnings if any block-list-relevant env is set externally - (e.g. user already has =GH_PAGER= set to something that pages). - -Output: a buffer with PASS / FAIL / WARN per check + recovery -actions for failures. - -** =cj/gh-tool-last-error= - -Opens a buffer showing the last call's debug record: - -#+begin_example -Host: deepsat.ghe.com -Repo: org/repo -Source: upstream -CWD: ~/projects/work/foo -Argv: ("pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo") -Classification: read -Policy: auto -Exit code: 0 -Duration: 412 ms -Bytes captured: 4321 -Truncated: no -Error kind: none - -Reproduce: - GH_PROMPT_DISABLED=1 GH_PAGER=cat NO_COLOR=1 \ - gh pr view 171 --repo deepsat.ghe.com/org/repo -#+end_example - -** Tool response header - -Every tool result begins with a one-line header so cross-host / -policy decisions are visible: - -#+begin_example -[gh github.com/cjennings/dotemacs read ok 4.3KB] -{ ... } -[gh deepsat.ghe.com/org/repo write confirmed 0.2KB] -{ ... } -[gh github.com/cjennings/dotemacs api blocked policy-blocked] -Error: Hard-blocked file-upload path: --input file. -Reproduce: gh api repos/cjennings/dotemacs/contents/foo --input file -#+end_example - -* Implementation Plan - -Eight phases. Each ends with green ERT tests + manual smoke -before the next. - -** Phase 1 -- Common helpers + context resolver - -=gh-common.el=: =cj/gh--executable=, =cj/gh--available-p=, -=cj/gh--version= (cached), =cj/gh--validate-cwd= (HOME + non-TRAMP), -=cj/gh--parse-remote-url=, =cj/gh--resolve-context=, -=cj/gh--redact-argv=. No subprocess execution yet. - -Tests cover all helpers against fixture remote URLs and synthetic -git directories. No real gh calls. - -** Phase 2 -- Runner with subprocess env + timeout + in-flight cap - -=cj/gh--run=, the process filter, the timer kill path, env-var -setup. Tests stub =make-process= to simulate output / exit / hang -/ truncation paths. - -Acceptance: with stub configured to produce 100 KB of output, -returned stdout is exactly 64 KB plus the truncation marker, and -the process gets SIGTERM. - -** Phase 3 -- Classifier + blocklist + policy - -=cj/gh--classify=, =cj/gh--classify-api=, blocklist constants, -=cj/gh--policy-decision=. Tests for every blocklist pattern -(verbs + flags + file-arg paths), API edge cases (=-f= promotes -to POST, =--method GET -f x=y= stays GET, etc.), and the -unknown-fails-closed contract. - -** Phase 4 -- Read wrappers (5 first) - -=gh_repo_view=, =gh_pr_view=, =gh_pr_list=, =gh_pr_diff=, -=gh_issue_view=. Each is a thin schema + argv builder + delegate -to =cj/gh--run=. Tests verify argv shape for typical args. - -Manual smoke against both hosts. First real gh calls. - -** Phase 5 -- Remaining wrappers + JSON defaults - -Eight more wrappers (=gh_pr_checks=, =gh_pr_files=, -=gh_pr_current=, =gh_issue_list=, =gh_run_view=, =gh_run_list=, -=gh_run_logs_failed=, =gh_search_prs=, =gh_search_issues=, -=gh_api_get=). JSON-field defaults per wrapper. Tests for the -=gh_api_get= flag-rejection contract. - -** Phase 6 -- General tool - -=gh.el= general tool registration with =:confirm t=, blocklist -enforcement, policy decision applied before invocation. Tests -verify confirmation gate (stub gptel's confirm flow), blocked -commands never reach =cj/gh--run=, destructive commands return -=:irreversible-blocked= without prompting. - -** Phase 7 -- UX: doctor, last-error, response header - -=cj/gh-doctor=, =cj/gh-tool-last-error=, response header -formatting. Audit log writer. Defcustom for log enable. - -** Phase 8 -- Loader wiring + integration - -Add the 16 feature names to =cj/gptel-local-tool-features= (one -per wrapper + the general tool + the helpers feature). Verify -they land in =gptel-tools=. - -* Test Plan - -Target: 55-70 ERT tests across four files. No real subprocesses, -no real network, no real =~/.claude.json= (gh tools don't use it, -but the no-real-process rule applies uniformly). - -** =tests/test-gh-common.el= -- pure helpers (~25 tests) - -- =cj/gh--parse-remote-url=: ssh-scp, ssh-url, https, with/without - =.git=, with/without trailing slash, unknown host returns =:host - nil=, =github.com.evil.example= does NOT match =github.com=. -- =cj/gh--resolve-context=: explicit repo wins; explicit hostname - wins over remote; upstream beats origin; origin beats default; - default fires when no git. -- =cj/gh--validate-cwd=: HOME-rooted ok; outside HOME errors; - TRAMP errors; non-directory errors. -- =cj/gh--redact-argv=: =--token=, =-H "Authorization: ..."=, - =--figma-api-key=, =-f body=...= → body redacted; sentinel - =REDACTED_TEST_SECRET= never appears in any output of any - helper. -- =cj/gh--available-p=: nil when =executable-find= fails; t - otherwise. -- =cj/gh--version=: caches per session; floor check rejects - too-old. - -** =tests/test-gh-runner.el= -- runner contract (~15 tests) - -Stub =make-process=: -- Normal exit 0 with short output: returned verbatim, no - truncation flag. -- Long output (100 KB stub): truncated at 64 KB exactly, - truncation marker present, =:truncated t=. -- Process hangs past timeout: timer fires, SIGTERM sent, returns - =:error-kind 'timeout=. -- Exit 4 + auth stderr: returns =:error-kind 'auth= with recovery - message. -- TRAMP cwd: never invokes =make-process=, returns =:error-kind - 'remote-cwd=. -- Missing executable: returns =:error-kind 'no-executable=. -- Environment: =process-environment= includes all six required - vars before =make-process= call. -- Argv with leading "gh" stripped + logged. - -** =tests/test-gh-classifier.el= -- policy logic (~20 tests) - -- Every read verb classifies as read. -- Every write verb classifies as write. -- Every destructive pattern matches. -- Every blocked verb (=login=, =browse=, =clone=, etc.) classifies - as blocked. -- Every blocked flag (=--web=, =--paginate=, =--input=) classifies - as blocked. -- File-upload (=-F key=@/path=) classifies as blocked. -- =gh api= GET (no fields): read. -- =gh api= GET (=-f x=y=): write (auto-POST per gh's rules). -- =gh api --method GET -f q=x=: read (explicit override). -- =gh api -X DELETE=: destructive. -- =gh api -X PATCH=: write. -- =gh api --input file=: blocked. -- Unknown verb (=gh frobnicate=): unknown → confirm. -- =-H= as branch (=gh pr list -H feature=) doesn't trigger host - treatment. -- =-H= as header (=gh api -H Accept:json=) doesn't trigger host - treatment. -- Policy decision: read → auto; write → confirm; destructive → - block; blocked → block; unknown → confirm. - -** =tests/test-gh-wrappers.el= -- per-wrapper builders + schemas (~15 tests) - -- Every wrapper's schema is valid (correct =:name=, =:type=, - =:description=, =:args= shape). -- =gh_pr_view= argv with number 171 and repo "host/o/r" produces - =("pr" "view" "171" "--repo" "host/o/r" "--json" "DEFAULTS")=. -- =gh_pr_diff= rejects =format='json=. -- =gh_api_get= rejects =-f=, =-F=, =--field=, =--raw-field=, - =--input=, =--method= other than GET. -- =gh_pr_current= invokes without a number arg (uses cwd). -- =limit= clamped to per-wrapper max. -- =fields= validated against per-resource allowlist. - -** Manual smoke (every phase) - -| Phase | Smoke | -|-------+-------| -| 4 | =gh_pr_view N= against both hosts | -| 4 | =gh_pr_list= in =~/.emacs.d= → uses github.com/cjennings/dotemacs | -| 5 | =gh_pr_checks= shows CI status without full logs | -| 5 | =gh_run_logs_failed= cap kicks in on a long-failed run | -| 5 | =gh_api_get= rejects a =-f= arg with clear error | -| 6 | =gh= general tool: agent asked to merge a PR triggers GPTel confirm prompt | -| 6 | Agent asked to =gh repo delete= gets irreversible-blocked | -| 6 | Agent asked to =gh --web=... gets policy-blocked | -| 7 | =cj/gh-doctor= correctly identifies an unauthenticated host | -| 7 | =cj/gh-tool-last-error= shows debug record after a failing call | - -** Opt-in integration suite - -A small set of real-gh tests (in =tests/test-gh-integration.el= -marked =:tag :integration=, default skipped): - -- =gh auth status --hostname github.com= ok. -- =gh auth status --hostname deepsat.ghe.com= ok. -- =gh repo view cjennings/dotemacs --json name= - returns parseable JSON. -- =gh pr list --repo cjennings/dotemacs --limit 1= returns ≤ 1 - PR. - -Run manually via =make test-name TEST=gh-integration=. - -* Acceptance Criteria - -1. *Argv contract.* No tool produces a command string for execution; - every call goes through the argv-list runner. -2. *No silent writes.* Every classified write either prompts - GPTel's confirm or hard-blocks. Verified by an end-to-end - test where the agent attempts =pr merge= and the test fails if - =make-process= is invoked before confirm. -3. *In-flight cap.* A stubbed process emitting 1 MB returns - exactly 64 KB; the runner never holds more than 65 KB in - memory. -4. *Host visibility.* Every successful tool response begins with - =[gh HOST/REPO ...]=. Verified by a test that greps the - response text. -5. *Doctor coverage.* =cj/gh-doctor= correctly identifies (a) no - gh executable, (b) too-old gh, (c) unauthenticated host, (d) - non-git cwd, (e) git cwd whose remote points to an unknown - host. -6. *No secret leakage.* Test fixtures containing - =REDACTED_TEST_SECRET= in every secret-bearing slot - (=--token=, =-H Authorization=, =-f body=, etc.) produce zero - matches when grepping audit log, debug record, and any - user-facing message. - -* Risks - -** R1 -- gh CLI evolves and verbs drift - -New =gh= versions may add subcommands the blocklist doesn't -cover. Or rename verbs. - -*Mitigation:* the blocklist works on verbs (not full subcommand -paths) so most additions are caught. Doctor includes a -=gh --version= floor. Periodic review when gh bumps a major -version. - -** R2 -- The "all reads auto-execute" default may still be too broad - -Some reads expose private content (issue bodies, PR descriptions -from private repos). An agent surfacing a confidential issue -body into a saved conversation has data-leak implications. - -*Mitigation:* response header makes the host/repo visible in -every result, so the saved conversation makes the privacy -boundary auditable. Wrappers truncate body/comments by default; -agent must explicitly opt-in to include them. Documented in -commentary. - -** R3 -- The general tool's =:confirm t= prompt may become click-fatigue - -If the agent uses the general tool heavily during a write-heavy -workflow (PR creation, label management), confirming every call -becomes tedious. - -*Mitigation:* the expanded wrapper set covers most reads, so the -general tool fires mainly for writes -- where confirmation is -exactly the right behavior. If usage shows confirm fatigue, -V2's per-host policy can add =:auto= for explicit -write-confirmed contexts. - -** R4 -- =--paginate= block conflicts with legitimate large queries - -Blocking =--paginate= globally means the agent can't get -historical CI runs (which may need pagination). - -*Mitigation:* =gh_run_list= and =gh_search_*= accept a clamped -=--limit= which usually substitutes. If a use case needs more, -the agent can request multiple non-paginated pages explicitly. - -** R5 -- Token expiry surfaces as cryptic exit 4 - -When a host's keyring entry expires, every call returns exit 4 -with "authentication required" on stderr. The agent sees the -error but may not realize the fix is interactive. - -*Mitigation:* the runner's =:error-kind 'auth= mapping prepends -the recovery message before returning to the agent. =cj/gh-doctor= -proactively checks auth status. - -** R6 -- TRAMP cwd silently runs gh remotely - -Without the explicit TRAMP rejection, =process-file= would try to -spawn =gh= on the remote host (where it may not exist, or may -authenticate against the wrong keyring). - -*Mitigation:* runner checks =(file-remote-p default-directory)= -first and returns =:error-kind 'remote-cwd= with a clear message. - -** R7 -- =gh search= behaves differently on GHE - -GHE may not support every advanced search operator -=github.com= does. Search wrappers may return inconsistent -results across hosts. - -*Mitigation:* documented in =gh_search_*= wrapper descriptions. -Result header makes the host visible so the agent can adjust. - -** R8 -- Audit log grows unbounded - -One file per day, but no automatic cleanup. - -*Mitigation:* metadata-only entries are tiny (~150 bytes); a -year of heavy use is a few MB. Manual cleanup acceptable. -Defcustom to disable for users who don't want it. - -* Open Questions - -** Q1 -- Should the general tool's confirmation prompt include the classification? - -When GPTel asks "Run gh tool? (y/n)" the prompt shows the argv but -not the classification. Showing "WRITE: gh pr merge 171" gives the -user more context. Need to investigate gptel's confirm-prompt -extensibility. - -** Q2 -- Should =gh_pr_diff= cap differently from text wrappers? - -A PR diff can legitimately be 100KB+ for a large refactor. The -64KB cap is the same as everywhere else. If diffs need a higher -cap (256KB?), that's per-wrapper config. - -** Q3 -- Should wrappers expose =include-body=, =include-comments=, etc., as separate args, or as a comma-separated list? - -The spec proposes separate boolean args (=:include-body t=, -=:include-comments t=). Alternative: one =:include= comma-list -arg. Separate args are more discoverable; comma-list is more -compact. Decide during Phase 4. - -** Q4 -- Should =cj/gh-tool-audit-log= grow into a query interface? - -V1 writes one line per call. Future: a command to query the -log (=cj/gh-audit-search REGEX=) for surfacing "what did the agent -do to this PR last week?". - -* V2 Roadmap - -Items intentionally deferred: - -- *Per-host policy.* =cj/gh-host-policy= alist keyed by hostname - (mirror of the MCP spec's structure) so work GHE can be - read-only while personal allows writes-with-confirm. -- *Conversation context injection.* After a PR view, the wrapper - inserts a "GitHub context: HOST/REPO PR #N at URL" line into the - GPTel buffer so saved conversations stay traceable without - bundling full output. -- *Local/remote bridge helpers.* current-branch → PR-number, - changed-files → matching PR file comments, etc. -- *Artifact download.* =gh_run_artifacts=, =gh_release_download= - with explicit confirm and write to a controlled directory. -- *Async general tool.* =make-process= + sentinel for the cases - where 60s timeout isn't enough (rare, but real for some - =--paginate= scenarios). -- *Audit log query interface.* =cj/gh-audit-search=, - =cj/gh-audit-by-host=. -- *Profile-based tool subsets.* e.g. read-only profile - vs. write-capable profile per buffer. - -* References - -- [[file:../../gptel-tools/git_log.el][gptel-tools/git_log.el]] -- pattern reference for new tool files. -- [[file:../../modules/ai-config.el][modules/ai-config.el]] -- =gptel-confirm-tool-calls nil= at - line 386; loader at lines 71-96. -- [[file:gptel-agentic-tool-ideas.org][gptel-agentic-tool-ideas.org]] -- broader agentic-tool design; - =gh= sits alongside the MCP integration as the - collaboration tier. -- [[file:mcp-el-gptel-integration.org][mcp-el-gptel-integration.org]] -- sibling design; same - confirm-on-write pattern for safety. -- [[https://cli.github.com/manual/][gh CLI manual]] -- subcommand reference. -- =gh --version 2.92.0= help output -- verified flag semantics - per subcommand. -- =gh help environment= -- verified env-var names for non-interactive - mode. |
