aboutsummaryrefslogtreecommitdiff
path: root/docs/design/gptel-gh-tool.org
diff options
context:
space:
mode:
Diffstat (limited to 'docs/design/gptel-gh-tool.org')
-rw-r--r--docs/design/gptel-gh-tool.org1061
1 files changed, 0 insertions, 1061 deletions
diff --git a/docs/design/gptel-gh-tool.org b/docs/design/gptel-gh-tool.org
deleted file mode 100644
index a9ba22bb1..000000000
--- a/docs/design/gptel-gh-tool.org
+++ /dev/null
@@ -1,1061 +0,0 @@
-#+TITLE: Design: Wrap the gh CLI as a GPTel tool
-#+AUTHOR: Craig Jennings
-#+DATE: 2026-05-16
-#+OPTIONS: toc:nil num:nil
-
-* Status
-
-Draft (revision 2). Pre-implementation; no code shipped yet. =gh=
-v2.92.0 installed and authenticated against both =github.com= (account
-=cjennings=) and =deepsat.ghe.com= (account =craig-jennings=).
-
-Revision 2 incorporates a code-review pass that caught several
-incorrect =gh= CLI assumptions and a critical safety gap: with
-=gptel-confirm-tool-calls= set to =nil= in =modules/ai-config.el:386=,
-the "irreversible-only blocklist" V1 from revision 1 would let the
-agent run unconfirmed writes (=pr merge=, =run cancel=, =api -f
-body=...=, =release create=, etc.). Revision 2 keeps Craig's intent
-of a general-capability tool but applies =:confirm t= to every
-classified write, drops the "scan command string for =--hostname=
-and =-H=" approach in favor of argv-list builders, and expands the
-wrapper set so most agent workflows don't need to fall back to the
-general tool.
-
-* Problem
-
-GPTel agents can read git history locally (=git_log=, =git_diff=,
-=git_status=) but have no access to the GitHub side of the workflow:
-PRs, issues, reviews, CI runs, releases, gists, repo metadata. The
-=gh= CLI does all of this and is already authenticated against both
-hosts the user works with. Wrapping it as a GPTel tool gives the
-agent the same GitHub surface the user has.
-
-The wrapper has to handle four complications the local git tools
-don't:
-
-1. *Two authenticated hosts* -- personal (=github.com=) and work
- GHE (=deepsat.ghe.com=) -- with different policies and different
- blast radii for destructive operations.
-2. *A vast subcommand surface* (~30 top-level subcommands, hundreds
- of leaves) that doesn't fit cleanly into one typed schema per
- leaf.
-3. *Inconsistent flag semantics across subcommands.* =--hostname=
- exists only on =gh api= and =gh auth status=; the rest use
- =--repo [HOST/]OWNER/REPO=. =-H= means =--head= on =gh pr list=
- but =--header= on =gh api=. Naive string-scanning to detect
- user intent is wrong.
-4. *Global GPTel setting* (=gptel-confirm-tool-calls nil=) means
- the agent can call any registered tool without user confirmation
- unless that specific tool is registered with =:confirm t=. V1
- safety must come from per-tool flags, not from a session-level
- gate.
-
-* Goals
-
-1. The agent can invoke the non-interactive subset of =gh= against
- either host, gated by a safety policy.
-2. High-frequency reads have typed wrappers with sensible defaults
- (host/repo resolution, JSON-by-default with minimal fields,
- output cap, timeout). Wrappers auto-execute.
-3. The active host is resolved from a single context object that
- considers (in order): explicit =hostname=/=repo= args, branch
- upstream remote, =origin= remote, =cj/gh-default-host=. Every
- tool response prefixes the resolved host/repo so cross-host
- mistakes are visible.
-4. Writes via the general tool execute only after user
- confirmation (=:confirm t= on the gptel tool registration).
- Unknown classifications fail closed (=:confirm t=).
- Irreversible commands hard-block.
-5. Interactive commands (=auth login=, =codespace ssh=, =--web=,
- =browse=, =pr checkout=, =repo clone=, =config set=,
- =alias set=, =extension exec=) are hard-blocked in V1.
-6. File exfiltration paths (=api --input=, =api -F key=@file=,
- =release upload=, =gist create=) are hard-blocked in V1.
-7. Output is capped *during capture*, not after, so a runaway
- =--paginate= or =--log= can't fill memory or block Emacs.
- Long-output commands have higher timeouts but the same byte
- cap.
-8. Every call produces a structured debug record (host, repo, cwd,
- sanitized argv, classification, policy decision, exit code,
- duration, bytes captured, truncation flag, error kind) inspectable
- via =cj/gh-tool-last-error=.
-9. =cj/gh-doctor= diagnoses missing prerequisites (=gh=
- executable, version floor, auth per host, cwd repo detection,
- environment overrides) before they fail at runtime.
-10. V2 adds true per-host policy (currently uniform across both
- hosts), profile-based tool subsets, and bridge helpers between
- local git tools and the gh wrappers.
-
-* Non-Goals
-
-- *Interactive subcommands.* =gh auth login=, =gh codespace ssh=,
- =gh pr checkout= (modifies working tree), =gh repo clone=
- (writes outside controlled paths) are not in scope. The user
- runs those in a real terminal.
-- *Replicating gh extensions.* Core gh only. Extensions can be
- invoked from a regular terminal if needed.
-- *Streaming long-running output.* =gh run watch=, =gh
- --paginate= for unbounded result sets, =gh repo clone= for large
- repos. V1 blocks =--paginate= and =watch= verbs; future
- versions may add streaming support.
-- *Per-host write policies* (work-read-only / personal-full-write).
- V1 applies the same write-confirmation policy to both hosts.
- V2 makes the policy host-keyed.
-- *Conversation-context injection* and *local/remote bridge
- helpers* (current branch → PR, changed files → PR comments).
- Future enhancements; tracked as a follow-up.
-- *Artifact download* (=gh run download=). Writes files; needs
- its own confirmation flow. Deferred to V2.
-
-* Verified gh CLI Contracts
-
-These were checked against =gh --version 2.92.0= help output before
-the spec was revised. Implementation can rely on them.
-
-** Host / repo selection per subcommand
-
-- =gh api= and =gh auth status=: use =--hostname HOST= (long form
- only; no short form).
-- =gh pr {view,list,diff,checks}=, =gh issue {view,list}=,
- =gh run {view,list}=, =gh release *=, =gh search *=: use
- =--repo [HOST/]OWNER/REPO= (short =-R=).
-- =gh repo view= and similar root-level commands: infer from cwd
- remote; no =--hostname= or =--repo= flag.
-- Environment variables: =GH_HOST= and =GH_REPO= work for commands
- that lack explicit flags.
-
-** Flags with overloaded short forms
-
-- =gh pr list -H FOO= → =--head FOO= (branch).
-- =gh search prs -H FOO= → =--head FOO= (branch).
-- =gh api -H KEY:VAL= → =--header KEY:VAL=.
-
-Substring scanning =-H= as "hostname" is wrong on every command
-that uses it.
-
-** =gh api= method classification
-
-The help text explicitly says: "adding request parameters will
-automatically switch the method to POST". This means:
-
-- =gh api repos/x/y/issues= → GET (read).
-- =gh api repos/x/y/issues -f title=Foo= → POST (write).
-- =gh api repos/x/y/issues -F body=@file= → POST (write +
- exfiltration).
-- =gh api --method GET repos/x/y/issues -f q=x= → GET (read,
- explicit override).
-
-Classifier must inspect for =-f=, =-F=, =--field=, =--raw-field=,
-=--input=, and =--method= before defaulting to GET.
-
-** Wrappers without =--json= support
-
-- =gh pr diff= produces patch output; no =--json= flag. Wrapper
- is text-only. Structured file metadata available via
- =gh pr view --json files= (new wrapper =gh_pr_files=).
-
-** Environment variables for non-interactive use
-
-All confirmed via =gh help environment=:
-
-- =GH_PROMPT_DISABLED= -- disable interactive prompting.
-- =GH_PAGER= / =PAGER= -- set to =cat= so no pager fires.
-- =GH_NO_UPDATE_NOTIFIER= -- silence the "new version" banner.
-- =GH_NO_EXTENSION_UPDATE_NOTIFIER= -- same for extensions.
-- =GH_SPINNER_DISABLED= -- silence the progress spinner.
-- =NO_COLOR= -- disable ANSI color (cleaner LLM context).
-
-V1 subprocess env always includes all of the above.
-
-** Authentication
-
-#+begin_example
-✓ Logged in to github.com account cjennings (keyring)
-✓ Logged in to deepsat.ghe.com account craig-jennings (keyring)
-#+end_example
-
-Both stored in the system keyring. =gh= reads the keyring per
-call; the wrapper does not handle tokens. Expired auth surfaces
-as exit code 4 with a clear stderr message; recovery is interactive
-(=gh auth login --hostname HOST= in a terminal).
-
-* Current State
-
-** =modules/ai-config.el=
-
-- =gptel-confirm-tool-calls nil= at line 386. Per-tool
- =:confirm t= is the only confirmation mechanism in V1.
-- =cj/gptel-load-local-tools= (lines 71-96) loads tools from
- =gptel-tools/=. Add new gh feature names to
- =cj/gptel-local-tool-features=.
-
-** =gptel-tools/=
-
-Ten tools today. =git_log.el= is the closest analogue: validates
-input, runs subprocess with =process-file=, caps output, signals
-clear errors. The gh wrappers follow the same shape with shared
-helpers in =gh-common.el=.
-
-* Design
-
-** File layout
-
-Two files plus per-wrapper registrations. Wrappers are
-deliberately small (argv builders + JSON-field defaults) so they
-all live in one file:
-
-- =gptel-tools/gh-common.el= -- shared helpers: host/repo context,
- argv runner, subprocess env, classifier, blocklist, redaction,
- debug record, last-error buffer. No tool registrations.
-- =gptel-tools/gh.el= -- the general tool + every wrapper
- registration. Pulls in =gh-common= via =require=.
-
-This matches Craig's preference for low loader surface (per the
-MCP revision discussion). If =gh.el= grows past ~600 lines we
-split per-resource later (=gh-pr.el=, =gh-issue.el=, etc.).
-
-** Host / repo resolution
-
-A single helper returns a structured context object used by every
-tool entry point:
-
-#+begin_src emacs-lisp
-(defun cj/gh--resolve-context (cwd hostname repo)
- "Return a context plist for a gh invocation.
-Resolution order:
-1. Explicit REPO argument (`[HOST/]OWNER/REPO'). If it contains
- `HOST/', split off the host.
-2. Explicit HOSTNAME argument.
-3. CWD's branch upstream remote URL (`git rev-parse --abbrev-ref
- @{u}' then `git remote get-url REMOTE').
-4. CWD's `origin' remote URL.
-5. `cj/gh-default-host'.
-
-Returns:
- (:host HOST :owner OWNER :repo REPO :source SYM)
-where SOURCE is one of: explicit-repo, explicit-host, upstream,
-origin, default."
- ...)
-#+end_src
-
-URL parsing handles all three forms produced by git:
-
-- =git@HOST:OWNER/REPO.git=
-- =ssh://git@HOST/OWNER/REPO.git=
-- =https://HOST/OWNER/REPO.git=
-
-Returns =nil host= if the URL doesn't match a known host; the
-caller uses =cj/gh-default-host= in that case.
-
-#+begin_src emacs-lisp
-(defcustom cj/gh-default-host "github.com"
- "Host used when no other resolution path yields one."
- :type 'string
- :group 'cj)
-
-(defcustom cj/gh-known-hosts '("github.com" "deepsat.ghe.com")
- "Hosts the gh CLI is authenticated against."
- :type '(repeat string)
- :group 'cj)
-#+end_src
-
-Context appears in every tool response as a one-line prefix:
-
-#+begin_example
-[gh github.com/cjennings/dotemacs read ok 12.4KB] ...
-[gh deepsat.ghe.com/org/repo write confirmed 0.8KB] ...
-[gh github.com api unknown blocked] ...
-#+end_example
-
-This makes cross-host accidents visible.
-
-** Argv-list builders, never command strings
-
-Every tool builds an explicit argv list and hands it to a shared
-runner. Wrappers expose typed parameters that map directly into
-argv positions:
-
-#+begin_src emacs-lisp
-(defun cj/gh-pr-view--build-argv (context number include-comments fields)
- "Return an argv list for `gh pr view NUMBER ...'."
- (let ((repo-arg (format "%s/%s/%s"
- (plist-get context :host)
- (plist-get context :owner)
- (plist-get context :repo))))
- (append (list "pr" "view" (number-to-string number)
- "--repo" repo-arg)
- (when include-comments '("--comments"))
- (when fields (list "--json" fields)))))
-#+end_src
-
-The general tool takes an =args= parameter typed as array-of-string
-(not a shell string) so the agent constructs argv directly:
-
-#+begin_src emacs-lisp
-;; agent passes: ["pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo"]
-;; runner runs: gh pr view 171 --repo deepsat.ghe.com/org/repo
-#+end_src
-
-If the agent accidentally includes leading =gh= in args[0], the
-runner strips it and logs the normalization.
-
-** Subprocess contract
-
-One runner for everything:
-
-#+begin_src emacs-lisp
-(defun cj/gh--run (argv &key timeout context)
- "Run gh with ARGV (a list of strings).
-Returns a plist:
- (:exit-code N :stdout STR :stderr STR :truncated BOOL :duration-ms N
- :argv ARGV :context CONTEXT)
-Enforces TIMEOUT (default `cj/gh--default-timeout').
-Captures output in-flight, killing the process at
-`cj/gh--max-bytes' to prevent runaway memory use."
- ...)
-#+end_src
-
-Contract:
-
-- *Env vars set every call:* =GH_PROMPT_DISABLED=1=, =GH_PAGER=cat=,
- =PAGER=cat=, =NO_COLOR=1=, =GH_NO_UPDATE_NOTIFIER=1=,
- =GH_NO_EXTENSION_UPDATE_NOTIFIER=1=, =GH_SPINNER_DISABLED=1=.
-- *Timeout:* default 20 s for reads, 60 s for diffs/logs/search.
- Per-tool override allowed but capped at 120 s. Timeout kills
- the process and returns =:error-kind 'timeout=.
-- *Output cap during capture:* a process filter accumulates bytes
- up to =cj/gh--max-bytes= (64 KB). At the cap, the filter sets
- =truncated=, ignores further output, and sends SIGTERM after a
- short grace. Truncation marker appended to returned stdout.
-- *TRAMP rejection:* if =(file-remote-p default-directory)= is
- non-nil, return =:error-kind 'remote-cwd= without invoking gh.
-- *Executable check:* if =(executable-find cj/gh--executable)= is
- nil, return =:error-kind 'no-executable= with install
- instructions.
-- *Version check:* the first call per session verifies
- =gh --version= meets =cj/gh--min-version= (default "2.50.0");
- cached for the session.
-
-** Read / write / destructive classifier
-
-Used to apply =:confirm t= and the irreversible blocklist:
-
-#+begin_src emacs-lisp
-(defconst cj/gh--read-verbs
- '("view" "list" "status" "search" "diff" "checks" "describe"
- "show" "logs" "auth-status"))
-
-(defconst cj/gh--write-verbs
- '("create" "edit" "merge" "close" "reopen" "comment" "review"
- "upload" "set" "add" "remove" "rerun" "cancel" "delete"
- "fork" "archive" "unarchive" "lock" "unlock" "pin" "unpin"
- "ready" "draft" "rename" "transfer" "approve" "label" "assign"))
-
-(defconst cj/gh--blocked-verbs
- '(;; interactive / opens UI
- "login" "logout" "checkout" "clone" "ssh" "code" "edit-prompt"
- ;; modifies user config
- "alias" "config" "extension"
- ;; opens browser
- "browse"))
-
-(defconst cj/gh--blocked-flags
- '("--web" ; many commands
- "--paginate" ; can produce unbounded output
- "--input" ; gh api file upload
- "--editor")) ; opens editor
-
-(defconst cj/gh--irreversible-patterns
- '("\\`repo delete\\b"
- "\\`release delete\\b"
- "\\`secret delete\\b"
- "\\`ssh-key delete\\b"
- "\\`gpg-key delete\\b"
- "\\`org delete\\b"
- "\\`project delete\\b"
- "\\`variable delete\\b"
- "\\`ruleset delete\\b"
- "\\`label delete\\b.*--yes"))
-#+end_src
-
-Classifier:
-
-#+begin_src emacs-lisp
-(defun cj/gh--classify (argv)
- "Return one of: read, write, destructive, blocked, unknown."
- (let* ((stripped (cj/gh--strip-flags argv))
- (resource (car stripped))
- (verb (cadr stripped)))
- (cond
- ;; Hard blocks first.
- ((cj/gh--has-blocked-verb-p argv) 'blocked)
- ((cj/gh--has-blocked-flag-p argv) 'blocked)
- ((cj/gh--has-file-arg-p argv) 'blocked) ; -F key=@file
- ((cj/gh--matches-irreversible-p argv) 'destructive)
- ;; gh api is special.
- ((string= resource "api") (cj/gh--classify-api argv))
- ;; Verb match.
- ((member verb cj/gh--read-verbs) 'read)
- ((member verb cj/gh--write-verbs) 'write)
- (t 'unknown))))
-
-(defun cj/gh--classify-api (argv)
- "Classify a `gh api ...' invocation.
-Reads: explicit method GET/HEAD with no -f/-F/--input.
-Writes: any -f/-F/--field/--raw-field/--input, OR explicit
-non-GET/HEAD method.
-Default (no method, no field): read (matches gh's GET default)."
- (let* ((explicit-method (or (cadr (member "-X" argv))
- (cadr (member "--method" argv))))
- (has-field (cl-some
- (lambda (f) (cl-some (lambda (a) (string-prefix-p f a))
- argv))
- '("-f" "--raw-field" "-F" "--field")))
- (has-input (member "--input" argv)))
- (cond
- (has-input 'blocked) ; file exfiltration
- ((and explicit-method
- (not (member (upcase explicit-method) '("GET" "HEAD"))))
- 'write)
- (has-field 'write) ; -f/-F auto-promotes to POST
- ((and explicit-method
- (member (upcase explicit-method) '("GET" "HEAD")))
- 'read)
- (t 'read))))
-#+end_src
-
-Classifier-driven policy table:
-
-| Classification | Policy |
-|----------------+--------|
-| =read= | auto-execute |
-| =write= | =:confirm t= on registration; agent's call shows in confirm prompt |
-| =destructive= | hard-block; return =:error-kind 'irreversible-blocked= |
-| =blocked= | hard-block; return =:error-kind 'policy-blocked= with reason |
-| =unknown= | =:confirm t= (fail closed) |
-
-** Safety policy
-
-V1 uniform policy applied to both hosts:
-
-#+begin_src emacs-lisp
-(defcustom cj/gh-policy
- '((read . auto)
- (write . confirm)
- (destructive . block)
- (blocked . block)
- (unknown . confirm))
- "Per-classification policy. V1 applies uniformly to both hosts."
- :type '(alist :key-type symbol :value-type symbol)
- :group 'cj)
-#+end_src
-
-Since GPTel's =:confirm t= flag is per-tool-registration (not
-per-call), the general tool is registered with =:confirm t=
-always. The wrappers are registered per their classification:
-
-| Tool | Registered with |
-|------+-----------------|
-| Read wrappers (=gh_pr_view=, etc.) | =:confirm nil= |
-| =gh= general tool | =:confirm t= |
-
-The general tool then applies the policy table at invocation
-time: reads execute without further prompting (already past
-GPTel's confirm because :confirm t fires once); writes show
-detail before invocation; destructive/blocked never reach gh.
-
-** General tool: =gh=
-
-Single tool registered with =:confirm t= covering everything the
-wrappers don't. Schema:
-
-| Arg | Type | Required | Purpose |
-|-----+------+----------+---------|
-| =args= | array of string | yes | argv list, e.g. =["pr", "view", "171", "--repo", "deepsat.ghe.com/org/repo"]= |
-| =hostname= | string | no | Override host for commands that accept =--hostname= (=api=, =auth status=); ignored otherwise |
-| =repo= | string | no | =[HOST/]OWNER/REPO= for repo-scoped commands; if HOST is present in repo, hostname arg is overridden |
-| =cwd= | string | no | Working directory; defaults to current buffer; must be under =$HOME=, must not be TRAMP |
-| =timeout= | integer | no | Seconds before kill; default 20, max 120 |
-
-Description (registered) explicitly says:
-
-#+begin_example
-Use this only when no task-specific gh_* tool fits. Prefer
-gh_pr_view, gh_pr_list, gh_pr_checks, gh_issue_view, gh_issue_list,
-gh_run_view, gh_run_list, gh_run_logs_failed, gh_repo_view,
-gh_search_prs, gh_search_issues, gh_api_get.
-
-Writes (create/edit/merge/etc.) require user confirmation.
-Destructive (repo/release/secret delete) are hard-blocked.
-Interactive commands (auth login, codespace ssh, --web, browse)
-are hard-blocked. File uploads (api --input, -F @file, release
-upload, gist create) are hard-blocked.
-#+end_example
-
-** Wrapper inventory
-
-Twelve wrappers grouped by resource. Each has typed args, JSON
-field defaults, output truncation, timeout override, and a
-description that names its scope.
-
-| Tool | gh command | Defaults | Timeout |
-|------+------------+----------+---------|
-| =gh_repo_view= | =repo view --json= | JSON fields: =name,nameWithOwner,description,defaultBranchRef,url,visibility= | 20s |
-| =gh_pr_view= | =pr view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s |
-| =gh_pr_list= | =pr list --json= | JSON fields: =number,title,state,author,createdAt,headRefName=; default =--limit 30= (capped at 100) | 20s |
-| =gh_pr_diff= | =pr diff N --color never= | text only; capped at 64 KB | 60s |
-| =gh_pr_checks= | =pr checks N --json= | JSON fields: =name,status,conclusion,startedAt,completedAt,link= | 20s |
-| =gh_pr_files= | =pr view N --json files= | JSON fields: =files= (path, additions, deletions, mode) | 20s |
-| =gh_pr_current= | =pr view --json= (no number — auto-detect) | Same as =gh_pr_view= | 20s |
-| =gh_issue_view= | =issue view N --json= | JSON fields: =number,title,state,author,createdAt,url,body= (body truncated) | 20s |
-| =gh_issue_list= | =issue list --json= | JSON fields: =number,title,state,author,createdAt,labels=; default =--limit 30= | 20s |
-| =gh_run_view= | =run view RUN-ID --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch,event,url= | 20s |
-| =gh_run_list= | =run list --json= | JSON fields: =databaseId,name,status,conclusion,startedAt,headBranch=; default =--limit 20= | 20s |
-| =gh_run_logs_failed= | =run view RUN-ID --log-failed= | text only; capped at 64 KB | 60s |
-| =gh_search_prs= | =search prs --json= | JSON fields: =number,title,state,author,repository,url=; default =--limit 30= (capped at 100) | 30s |
-| =gh_search_issues= | =search issues --json= | Same as PR search | 30s |
-| =gh_api_get= | =api ENDPOINT --method GET= | text/JSON pass-through; rejects fields/input args | 30s |
-
-Common args for every wrapper (unless noted):
-
-| Arg | Type | Required | Purpose |
-|-----+------+----------+---------|
-| =repo= | string | no | =[HOST/]OWNER/REPO=; resolved from context otherwise |
-| =hostname= | string | no | Override host for context resolution |
-| =limit= | integer | no | =--limit= for list/search wrappers; clamped to per-wrapper max |
-
-The =gh_api_get= wrapper *explicitly* rejects =-f=, =-F=,
-=--field=, =--raw-field=, =--input=, =--method=, and any method
-override. It only accepts =ENDPOINT= and optional =-H= headers
-that don't carry secrets (the runner redacts =Authorization:= and
-similar regardless). Writes via API go through the general tool
-with confirmation.
-
-** JSON field defaults
-
-Per-wrapper defaults are minimal -- enough for the agent to decide
-whether to drill in, not so much that one call fills the context.
-
-For =gh_pr_view= specifically:
-
-- *Default* (no =fields= override): the small list above (number,
- title, state, author, createdAt, url, body-truncated-to-2KB).
-- *Override*: agent passes =fields= as a comma-separated string;
- wrapper validates against a per-resource allowlist (so the agent
- can't request =reviews,comments,files= in one call to bypass the
- cap).
-- *Include flags*: =include-body t/nil=, =include-comments t/nil=,
- =include-reviews t/nil= as boolean args. Each adds the
- corresponding JSON field; agent opts in only when needed.
-
-** Output truncation
-
-Process filter pattern, not post-hoc cap:
-
-#+begin_src emacs-lisp
-(defun cj/gh--make-filter (state-var)
- "Return a process filter that accumulates into STATE-VAR's :stdout,
-stops collecting after `cj/gh--max-bytes', sets :truncated, and
-sends SIGTERM to the process."
- (lambda (proc output)
- (let* ((state (symbol-value state-var))
- (current (plist-get state :stdout))
- (current-len (length current))
- (remaining (- cj/gh--max-bytes current-len)))
- (cond
- ((<= remaining 0) nil) ; already at cap
- ((<= (length output) remaining)
- (plist-put state :stdout (concat current output)))
- (t
- (plist-put state :stdout
- (concat current (substring output 0 remaining)))
- (plist-put state :truncated t)
- (ignore-errors (delete-process proc)))))))
-#+end_src
-
-Truncation marker appended before return:
-
-#+begin_example
-[truncated at 64KB; use --limit, narrower fields, or a specific
-wrapper to reduce output]
-#+end_example
-
-** Error classification + debug record
-
-Every call returns (and =cj/gh-tool-last-error= caches) a debug
-record:
-
-#+begin_src emacs-lisp
-(defun cj/gh--debug-record (argv context exit-code stdout stderr
- duration-ms truncated)
- (list :host (plist-get context :host)
- :repo (cj/gh--repo-arg context)
- :cwd default-directory
- :argv (cj/gh--redact-argv argv)
- :classification (cj/gh--classify argv)
- :policy (cj/gh--policy-decision argv)
- :exit-code exit-code
- :duration-ms duration-ms
- :bytes-captured (length stdout)
- :truncated truncated
- :error-kind (cj/gh--error-kind exit-code stderr)))
-#+end_src
-
-=:error-kind= mapping:
-
-| Condition | =:error-kind= | Returned message |
-|-----------+---------------+------------------|
-| Exit 4 + "authentication required" | =auth= | "Run =gh auth login --hostname HOST= in a terminal." |
-| Process killed by timeout timer | =timeout= | "Command exceeded N seconds; narrow the query or use a more specific wrapper." |
-| Policy block | =policy-blocked= | "Blocked by V1 policy: REASON." |
-| Irreversible match | =irreversible-blocked= | "Hard-blocked irreversible command: COMMAND." |
-| Truncated output | =truncated= | "Output truncated at 64KB; reduce scope." |
-| TRAMP cwd | =remote-cwd= | "Cannot run gh from remote directory: CWD." |
-| Missing executable | =no-executable= | "gh not found at CJ/GH--EXECUTABLE; install via 'pacman -S github-cli' (or equivalent)." |
-| Other non-zero exit | =gh-exit= | (raw stderr, redacted) |
-
-Each error includes a sanitized reproduce line:
-
-#+begin_example
-Reproduce: GH_PROMPT_DISABLED=1 GH_PAGER=cat gh pr view 171 --repo HOST/OWNER/REPO
-#+end_example
-
-(Secrets, body text, file paths redacted via =cj/gh--redact-argv=.)
-
-** Audit log (V1, opt-out)
-
-Every call appends one line to
-=~/.emacs.d/data/gh-tool-log/YYYY-MM-DD.log=:
-
-#+begin_example
-2026-05-16T14:23:45-0500 host=github.com repo=cjennings/dotemacs class=read policy=auto exit=0 duration=128ms bytes=4321
-2026-05-16T14:24:02-0500 host=deepsat.ghe.com repo=org/repo class=write policy=confirm exit=0 duration=412ms bytes=88
-#+end_example
-
-Metadata only, not output bodies. Defcustom
-=cj/gh-tool-audit-log-enabled= (default =t=). Daily rotation
-implicit (one file per day). Cleanup manual.
-
-** Secrets redaction
-
-=cj/gh--redact-argv= masks:
-
-- Anything after =--token=, =--secret=, =--password= flags.
-- Authorization headers (=-H "Authorization: ..."=).
-- =--figma-api-key=KEY= (in case general tool spawns figma-mcp
- somehow).
-- Bearer tokens in URLs (=?token=...=).
-- Values for =-f=/=-F= keys named like =body=, =text=,
- =description= (private content; metadata still logged).
-
-Applied to:
-- All stderr returned to the agent.
-- All audit-log lines.
-- All debug records.
-- The reproduce line on error.
-
-* Commands & UX
-
-** =cj/gh-doctor=
-
-Diagnostic command. No side effects. Checks:
-
-- =gh= executable found at =cj/gh--executable=.
-- =gh --version= meets =cj/gh--min-version=.
-- =gh auth status= for each host in =cj/gh-known-hosts=.
-- Current buffer's cwd repo detection: resolves to which host/repo?
-- Environment overrides effective (=GH_PROMPT_DISABLED= etc. would
- be set by the runner).
-- Active account per host.
-- Warnings if any block-list-relevant env is set externally
- (e.g. user already has =GH_PAGER= set to something that pages).
-
-Output: a buffer with PASS / FAIL / WARN per check + recovery
-actions for failures.
-
-** =cj/gh-tool-last-error=
-
-Opens a buffer showing the last call's debug record:
-
-#+begin_example
-Host: deepsat.ghe.com
-Repo: org/repo
-Source: upstream
-CWD: ~/projects/work/foo
-Argv: ("pr" "view" "171" "--repo" "deepsat.ghe.com/org/repo")
-Classification: read
-Policy: auto
-Exit code: 0
-Duration: 412 ms
-Bytes captured: 4321
-Truncated: no
-Error kind: none
-
-Reproduce:
- GH_PROMPT_DISABLED=1 GH_PAGER=cat NO_COLOR=1 \
- gh pr view 171 --repo deepsat.ghe.com/org/repo
-#+end_example
-
-** Tool response header
-
-Every tool result begins with a one-line header so cross-host /
-policy decisions are visible:
-
-#+begin_example
-[gh github.com/cjennings/dotemacs read ok 4.3KB]
-{ ... }
-[gh deepsat.ghe.com/org/repo write confirmed 0.2KB]
-{ ... }
-[gh github.com/cjennings/dotemacs api blocked policy-blocked]
-Error: Hard-blocked file-upload path: --input file.
-Reproduce: gh api repos/cjennings/dotemacs/contents/foo --input file
-#+end_example
-
-* Implementation Plan
-
-Eight phases. Each ends with green ERT tests + manual smoke
-before the next.
-
-** Phase 1 -- Common helpers + context resolver
-
-=gh-common.el=: =cj/gh--executable=, =cj/gh--available-p=,
-=cj/gh--version= (cached), =cj/gh--validate-cwd= (HOME + non-TRAMP),
-=cj/gh--parse-remote-url=, =cj/gh--resolve-context=,
-=cj/gh--redact-argv=. No subprocess execution yet.
-
-Tests cover all helpers against fixture remote URLs and synthetic
-git directories. No real gh calls.
-
-** Phase 2 -- Runner with subprocess env + timeout + in-flight cap
-
-=cj/gh--run=, the process filter, the timer kill path, env-var
-setup. Tests stub =make-process= to simulate output / exit / hang
-/ truncation paths.
-
-Acceptance: with stub configured to produce 100 KB of output,
-returned stdout is exactly 64 KB plus the truncation marker, and
-the process gets SIGTERM.
-
-** Phase 3 -- Classifier + blocklist + policy
-
-=cj/gh--classify=, =cj/gh--classify-api=, blocklist constants,
-=cj/gh--policy-decision=. Tests for every blocklist pattern
-(verbs + flags + file-arg paths), API edge cases (=-f= promotes
-to POST, =--method GET -f x=y= stays GET, etc.), and the
-unknown-fails-closed contract.
-
-** Phase 4 -- Read wrappers (5 first)
-
-=gh_repo_view=, =gh_pr_view=, =gh_pr_list=, =gh_pr_diff=,
-=gh_issue_view=. Each is a thin schema + argv builder + delegate
-to =cj/gh--run=. Tests verify argv shape for typical args.
-
-Manual smoke against both hosts. First real gh calls.
-
-** Phase 5 -- Remaining wrappers + JSON defaults
-
-Eight more wrappers (=gh_pr_checks=, =gh_pr_files=,
-=gh_pr_current=, =gh_issue_list=, =gh_run_view=, =gh_run_list=,
-=gh_run_logs_failed=, =gh_search_prs=, =gh_search_issues=,
-=gh_api_get=). JSON-field defaults per wrapper. Tests for the
-=gh_api_get= flag-rejection contract.
-
-** Phase 6 -- General tool
-
-=gh.el= general tool registration with =:confirm t=, blocklist
-enforcement, policy decision applied before invocation. Tests
-verify confirmation gate (stub gptel's confirm flow), blocked
-commands never reach =cj/gh--run=, destructive commands return
-=:irreversible-blocked= without prompting.
-
-** Phase 7 -- UX: doctor, last-error, response header
-
-=cj/gh-doctor=, =cj/gh-tool-last-error=, response header
-formatting. Audit log writer. Defcustom for log enable.
-
-** Phase 8 -- Loader wiring + integration
-
-Add the 16 feature names to =cj/gptel-local-tool-features= (one
-per wrapper + the general tool + the helpers feature). Verify
-they land in =gptel-tools=.
-
-* Test Plan
-
-Target: 55-70 ERT tests across four files. No real subprocesses,
-no real network, no real =~/.claude.json= (gh tools don't use it,
-but the no-real-process rule applies uniformly).
-
-** =tests/test-gh-common.el= -- pure helpers (~25 tests)
-
-- =cj/gh--parse-remote-url=: ssh-scp, ssh-url, https, with/without
- =.git=, with/without trailing slash, unknown host returns =:host
- nil=, =github.com.evil.example= does NOT match =github.com=.
-- =cj/gh--resolve-context=: explicit repo wins; explicit hostname
- wins over remote; upstream beats origin; origin beats default;
- default fires when no git.
-- =cj/gh--validate-cwd=: HOME-rooted ok; outside HOME errors;
- TRAMP errors; non-directory errors.
-- =cj/gh--redact-argv=: =--token=, =-H "Authorization: ..."=,
- =--figma-api-key=, =-f body=...= → body redacted; sentinel
- =REDACTED_TEST_SECRET= never appears in any output of any
- helper.
-- =cj/gh--available-p=: nil when =executable-find= fails; t
- otherwise.
-- =cj/gh--version=: caches per session; floor check rejects
- too-old.
-
-** =tests/test-gh-runner.el= -- runner contract (~15 tests)
-
-Stub =make-process=:
-- Normal exit 0 with short output: returned verbatim, no
- truncation flag.
-- Long output (100 KB stub): truncated at 64 KB exactly,
- truncation marker present, =:truncated t=.
-- Process hangs past timeout: timer fires, SIGTERM sent, returns
- =:error-kind 'timeout=.
-- Exit 4 + auth stderr: returns =:error-kind 'auth= with recovery
- message.
-- TRAMP cwd: never invokes =make-process=, returns =:error-kind
- 'remote-cwd=.
-- Missing executable: returns =:error-kind 'no-executable=.
-- Environment: =process-environment= includes all six required
- vars before =make-process= call.
-- Argv with leading "gh" stripped + logged.
-
-** =tests/test-gh-classifier.el= -- policy logic (~20 tests)
-
-- Every read verb classifies as read.
-- Every write verb classifies as write.
-- Every destructive pattern matches.
-- Every blocked verb (=login=, =browse=, =clone=, etc.) classifies
- as blocked.
-- Every blocked flag (=--web=, =--paginate=, =--input=) classifies
- as blocked.
-- File-upload (=-F key=@/path=) classifies as blocked.
-- =gh api= GET (no fields): read.
-- =gh api= GET (=-f x=y=): write (auto-POST per gh's rules).
-- =gh api --method GET -f q=x=: read (explicit override).
-- =gh api -X DELETE=: destructive.
-- =gh api -X PATCH=: write.
-- =gh api --input file=: blocked.
-- Unknown verb (=gh frobnicate=): unknown → confirm.
-- =-H= as branch (=gh pr list -H feature=) doesn't trigger host
- treatment.
-- =-H= as header (=gh api -H Accept:json=) doesn't trigger host
- treatment.
-- Policy decision: read → auto; write → confirm; destructive →
- block; blocked → block; unknown → confirm.
-
-** =tests/test-gh-wrappers.el= -- per-wrapper builders + schemas (~15 tests)
-
-- Every wrapper's schema is valid (correct =:name=, =:type=,
- =:description=, =:args= shape).
-- =gh_pr_view= argv with number 171 and repo "host/o/r" produces
- =("pr" "view" "171" "--repo" "host/o/r" "--json" "DEFAULTS")=.
-- =gh_pr_diff= rejects =format='json=.
-- =gh_api_get= rejects =-f=, =-F=, =--field=, =--raw-field=,
- =--input=, =--method= other than GET.
-- =gh_pr_current= invokes without a number arg (uses cwd).
-- =limit= clamped to per-wrapper max.
-- =fields= validated against per-resource allowlist.
-
-** Manual smoke (every phase)
-
-| Phase | Smoke |
-|-------+-------|
-| 4 | =gh_pr_view N= against both hosts |
-| 4 | =gh_pr_list= in =~/.emacs.d= → uses github.com/cjennings/dotemacs |
-| 5 | =gh_pr_checks= shows CI status without full logs |
-| 5 | =gh_run_logs_failed= cap kicks in on a long-failed run |
-| 5 | =gh_api_get= rejects a =-f= arg with clear error |
-| 6 | =gh= general tool: agent asked to merge a PR triggers GPTel confirm prompt |
-| 6 | Agent asked to =gh repo delete= gets irreversible-blocked |
-| 6 | Agent asked to =gh --web=... gets policy-blocked |
-| 7 | =cj/gh-doctor= correctly identifies an unauthenticated host |
-| 7 | =cj/gh-tool-last-error= shows debug record after a failing call |
-
-** Opt-in integration suite
-
-A small set of real-gh tests (in =tests/test-gh-integration.el=
-marked =:tag :integration=, default skipped):
-
-- =gh auth status --hostname github.com= ok.
-- =gh auth status --hostname deepsat.ghe.com= ok.
-- =gh repo view cjennings/dotemacs --json name=
- returns parseable JSON.
-- =gh pr list --repo cjennings/dotemacs --limit 1= returns ≤ 1
- PR.
-
-Run manually via =make test-name TEST=gh-integration=.
-
-* Acceptance Criteria
-
-1. *Argv contract.* No tool produces a command string for execution;
- every call goes through the argv-list runner.
-2. *No silent writes.* Every classified write either prompts
- GPTel's confirm or hard-blocks. Verified by an end-to-end
- test where the agent attempts =pr merge= and the test fails if
- =make-process= is invoked before confirm.
-3. *In-flight cap.* A stubbed process emitting 1 MB returns
- exactly 64 KB; the runner never holds more than 65 KB in
- memory.
-4. *Host visibility.* Every successful tool response begins with
- =[gh HOST/REPO ...]=. Verified by a test that greps the
- response text.
-5. *Doctor coverage.* =cj/gh-doctor= correctly identifies (a) no
- gh executable, (b) too-old gh, (c) unauthenticated host, (d)
- non-git cwd, (e) git cwd whose remote points to an unknown
- host.
-6. *No secret leakage.* Test fixtures containing
- =REDACTED_TEST_SECRET= in every secret-bearing slot
- (=--token=, =-H Authorization=, =-f body=, etc.) produce zero
- matches when grepping audit log, debug record, and any
- user-facing message.
-
-* Risks
-
-** R1 -- gh CLI evolves and verbs drift
-
-New =gh= versions may add subcommands the blocklist doesn't
-cover. Or rename verbs.
-
-*Mitigation:* the blocklist works on verbs (not full subcommand
-paths) so most additions are caught. Doctor includes a
-=gh --version= floor. Periodic review when gh bumps a major
-version.
-
-** R2 -- The "all reads auto-execute" default may still be too broad
-
-Some reads expose private content (issue bodies, PR descriptions
-from private repos). An agent surfacing a confidential issue
-body into a saved conversation has data-leak implications.
-
-*Mitigation:* response header makes the host/repo visible in
-every result, so the saved conversation makes the privacy
-boundary auditable. Wrappers truncate body/comments by default;
-agent must explicitly opt-in to include them. Documented in
-commentary.
-
-** R3 -- The general tool's =:confirm t= prompt may become click-fatigue
-
-If the agent uses the general tool heavily during a write-heavy
-workflow (PR creation, label management), confirming every call
-becomes tedious.
-
-*Mitigation:* the expanded wrapper set covers most reads, so the
-general tool fires mainly for writes -- where confirmation is
-exactly the right behavior. If usage shows confirm fatigue,
-V2's per-host policy can add =:auto= for explicit
-write-confirmed contexts.
-
-** R4 -- =--paginate= block conflicts with legitimate large queries
-
-Blocking =--paginate= globally means the agent can't get
-historical CI runs (which may need pagination).
-
-*Mitigation:* =gh_run_list= and =gh_search_*= accept a clamped
-=--limit= which usually substitutes. If a use case needs more,
-the agent can request multiple non-paginated pages explicitly.
-
-** R5 -- Token expiry surfaces as cryptic exit 4
-
-When a host's keyring entry expires, every call returns exit 4
-with "authentication required" on stderr. The agent sees the
-error but may not realize the fix is interactive.
-
-*Mitigation:* the runner's =:error-kind 'auth= mapping prepends
-the recovery message before returning to the agent. =cj/gh-doctor=
-proactively checks auth status.
-
-** R6 -- TRAMP cwd silently runs gh remotely
-
-Without the explicit TRAMP rejection, =process-file= would try to
-spawn =gh= on the remote host (where it may not exist, or may
-authenticate against the wrong keyring).
-
-*Mitigation:* runner checks =(file-remote-p default-directory)=
-first and returns =:error-kind 'remote-cwd= with a clear message.
-
-** R7 -- =gh search= behaves differently on GHE
-
-GHE may not support every advanced search operator
-=github.com= does. Search wrappers may return inconsistent
-results across hosts.
-
-*Mitigation:* documented in =gh_search_*= wrapper descriptions.
-Result header makes the host visible so the agent can adjust.
-
-** R8 -- Audit log grows unbounded
-
-One file per day, but no automatic cleanup.
-
-*Mitigation:* metadata-only entries are tiny (~150 bytes); a
-year of heavy use is a few MB. Manual cleanup acceptable.
-Defcustom to disable for users who don't want it.
-
-* Open Questions
-
-** Q1 -- Should the general tool's confirmation prompt include the classification?
-
-When GPTel asks "Run gh tool? (y/n)" the prompt shows the argv but
-not the classification. Showing "WRITE: gh pr merge 171" gives the
-user more context. Need to investigate gptel's confirm-prompt
-extensibility.
-
-** Q2 -- Should =gh_pr_diff= cap differently from text wrappers?
-
-A PR diff can legitimately be 100KB+ for a large refactor. The
-64KB cap is the same as everywhere else. If diffs need a higher
-cap (256KB?), that's per-wrapper config.
-
-** Q3 -- Should wrappers expose =include-body=, =include-comments=, etc., as separate args, or as a comma-separated list?
-
-The spec proposes separate boolean args (=:include-body t=,
-=:include-comments t=). Alternative: one =:include= comma-list
-arg. Separate args are more discoverable; comma-list is more
-compact. Decide during Phase 4.
-
-** Q4 -- Should =cj/gh-tool-audit-log= grow into a query interface?
-
-V1 writes one line per call. Future: a command to query the
-log (=cj/gh-audit-search REGEX=) for surfacing "what did the agent
-do to this PR last week?".
-
-* V2 Roadmap
-
-Items intentionally deferred:
-
-- *Per-host policy.* =cj/gh-host-policy= alist keyed by hostname
- (mirror of the MCP spec's structure) so work GHE can be
- read-only while personal allows writes-with-confirm.
-- *Conversation context injection.* After a PR view, the wrapper
- inserts a "GitHub context: HOST/REPO PR #N at URL" line into the
- GPTel buffer so saved conversations stay traceable without
- bundling full output.
-- *Local/remote bridge helpers.* current-branch → PR-number,
- changed-files → matching PR file comments, etc.
-- *Artifact download.* =gh_run_artifacts=, =gh_release_download=
- with explicit confirm and write to a controlled directory.
-- *Async general tool.* =make-process= + sentinel for the cases
- where 60s timeout isn't enough (rare, but real for some
- =--paginate= scenarios).
-- *Audit log query interface.* =cj/gh-audit-search=,
- =cj/gh-audit-by-host=.
-- *Profile-based tool subsets.* e.g. read-only profile
- vs. write-capable profile per buffer.
-
-* References
-
-- [[file:../../gptel-tools/git_log.el][gptel-tools/git_log.el]] -- pattern reference for new tool files.
-- [[file:../../modules/ai-config.el][modules/ai-config.el]] -- =gptel-confirm-tool-calls nil= at
- line 386; loader at lines 71-96.
-- [[file:gptel-agentic-tool-ideas.org][gptel-agentic-tool-ideas.org]] -- broader agentic-tool design;
- =gh= sits alongside the MCP integration as the
- collaboration tier.
-- [[file:mcp-el-gptel-integration.org][mcp-el-gptel-integration.org]] -- sibling design; same
- confirm-on-write pattern for safety.
-- [[https://cli.github.com/manual/][gh CLI manual]] -- subcommand reference.
-- =gh --version 2.92.0= help output -- verified flag semantics
- per subcommand.
-- =gh help environment= -- verified env-var names for non-interactive
- mode.