#+TITLE: Spec: Migrate Tests Off Mocking C Primitives
#+AUTHOR: Craig Jennings
#+DATE: 2026-06-30
#+STATUS: Draft — for discussion

* Status

Draft. Pulled out of =todo.org= (=** TODO [#C] Migrate tests off mocking
primitives (native-comp robustness) :test:refactor:solo:=) so the scope and
approach can be settled before any code moves. Execution is deferred; this
document is the discussion vehicle.

Companion reference: [[file:native-comp-subr-mocking.org][native-comp-subr-mocking.org]] holds the full mechanism, the
upstream research, and the 2026-06-21 decision. This spec does not restate the
mechanism; it plans the remaining work that decision deferred.

* Background — how we hit this

We re-enabled native compilation config-wide (early-init.el, commit 3fd28987,
2026-06-20). Tests that had been green for months immediately started failing
with no change to their source — the first 8 were window-primitive mocks in
=test-dirvish-config-wrappers.el= and =test-calibredb-epub-config.el=
(=window-body-width=, =window-margins=, =current-window-configuration=,
=get-buffer-window=), throwing =wrong-number-of-arguments= for a zero-arg mock
lambda called with one argument.

** What we struggled with (the consequences)

- *Intermittent, non-deterministic failure.* The same test passed, then
  crashed, then passed again within a session. Native-comp generates a
  per-primitive trampoline =.eln= lazily and caches it on disk; whether a mock
  "works" depends on whether that trampoline has been built yet. The
  non-determinism was the tell, and it made the failures hard to trust or
  reproduce.
- *Three distinct failure modes from one cause* (full detail in the companion
  doc): (1) trampoline generation failure under =--batch=; (2) silent bypass —
  natively-compiled callers ignore the mock and run the real primitive, so a
  test passes for the wrong reason; (3) arity mismatch — the trampoline calls
  the mock with the primitive's *maximum* arity, so a fixed-arity mock narrower
  than the primitive throws. Mode 3 is the one that bit us; modes 1 and 2 sit
  latent.
- *The tempting quick fix is the dangerous one.* Disabling subr trampolines
  (=native-comp-enable-subr-trampolines nil=) is the most-cited workaround, but
  in our native-comp-heavy setup it produces mode 2 — tests that pass while
  asserting against the real primitive. A quiet false pass is worse than a loud
  crash.
- *Scale of the latent surface.* The suite mocks subrs in hundreds of places.
  The variadic sweep touched 188 arity-narrow mocks.

** What is already done (the stopgap, 2026-06-21)

Not currently broken — two things shipped (commits 571da499, b62c3c88):

1. *Variadic sweep.* Every arity-narrow subr mock got =&rest _= appended, which
   tolerates the trampoline's full-arity call. Deterministic, keeps trampolines
   on, so no silent bypass. Fixes mode 3.
2. *Meta-test gate.* =tests/test-meta-subr-mock-arity.el= statically scans every
   test file for =symbol-function= / =fset= / =setf= subr redefinitions and
   fails =make test= if any mock can't accept the primitive's maximum arity. A
   new arity-narrow mock can't merge silently.

The stopgap fixes the mode we actually suffered. It leaves modes 1 and 2 latent.
The durable fix the ecosystem (and our own =elisp-testing.md=) points to is to
*not redefine primitives at all*. That is the work this spec scopes.

* The real scope — most mocks should NOT move

The raw inventory is large, but the headline number is misleading. =testing.md=
says to mock external boundaries; converting those to "drive real state" would
mean running real shells and touching the real filesystem in unit tests — the
opposite of what we want. So the actual migration target is narrow.

Current subr-mock sites across 261 test files (=cl-letf= / =fset= / =advice-add=
on the named primitive):

| Primitive               | Sites | Classification          | Disposition                       |
|-------------------------+-------+-------------------------+-----------------------------------|
| shell-command-to-string |    62 | external boundary       | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| executable-find         |    60 | external boundary       | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| shell-command           |    29 | external boundary       | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| call-process            |    17 | external boundary       | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| current-time            |    11 | time boundary           | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| save-buffer             |    10 | file I/O boundary       | keep, or real temp-file fixture   |
|-------------------------+-------+-------------------------+-----------------------------------|
| write-region            |     4 | file I/O boundary       | keep, or real temp-file fixture   |
|-------------------------+-------+-------------------------+-----------------------------------|
| message                 |    69 | output-silencing        | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| completing-read         |    25 | UI prompt               | MIGRATE — extract pure internal   |
|-------------------------+-------+-------------------------+-----------------------------------|
| read-string             |    16 | UI prompt               | MIGRATE — extract pure internal   |
|-------------------------+-------+-------------------------+-----------------------------------|
| yes-or-no-p             |    14 | UI prompt               | MIGRATE — extract pure internal   |
|-------------------------+-------+-------------------------+-----------------------------------|
| read-from-minibuffer    |     6 | UI prompt               | MIGRATE — extract pure internal   |
|-------------------------+-------+-------------------------+-----------------------------------|

The genuine migration target is the UI-prompt bucket: ~61 sites. Per
=elisp-testing.md='s Interactive-vs-Internal rule, the fix is to extract a pure
internal that takes the value as an argument and test that directly, leaving the
interactive wrapper a thin un-tested (or smoke-tested) shell. That removes the
prompt mock entirely — immune to all three failure modes — and improves the
production code's testability as a side effect.

Boundary mocks (shell, file I/O, time, =executable-find=, =call-process=) stay
mocked: that is correct unit-test practice, and the variadic form already
handles native-comp. =message= is output-silencing, not logic — keep it.

* Proposed approach

Not a single sweep. The migration touches production code (each extraction is a
small design change), so it is incremental and reviewable, not mechanical.

1. *Scoped exemplar pass first.* Pick one module with a few prompt-mocks, do the
   extract-internal conversion there, set the pattern, and measure the per-case
   effort. This calibrates the rest.
2. *Batch by module afterward.* Convert remaining UI-prompt sites a module at a
   time, each its own commit, with the suite green between.
3. *Leave boundaries alone.* No conversion of shell / file / time / process
   mocks. The meta-test keeps them arity-safe.

* Open decisions (resolve in discussion)

** TODO Confirm the scope: UI-prompt mocks only, boundaries stay
Is the migration scoped to the ~61 UI-prompt sites (completing-read, read-string,
yes-or-no-p, read-from-minibuffer), with all boundary mocks explicitly out of
scope? Or is there an appetite to also convert the file-I/O mocks
(=save-buffer=, =write-region=) to real temp-file fixtures where it reads
cleaner?

** TODO Reframe the todo.org task title to match the real scope
The current title — "Migrate tests off mocking primitives" — reads as all 300+
sites. If we agree on UI-prompt-only, retitle to something like "Extract pure
internals for UI-prompt-mocked tests" so a future session does not re-scope it
as a wholesale sweep.

** TODO Pick the exemplar module for the first pass
Which module gets the calibrating conversion? A small one with 2-4 prompt-mocks
is ideal. Candidate selection needs a per-module breakdown of the ~61 sites
(not yet collected).

** TODO Decide priority and timing
Currently =[#C]=, =:solo:=. The suite is not broken (stopgap holds), so this is
test-quality debt, not urgent. Confirm it stays low and gets done in batches
between other work, rather than as a dedicated push.

* Non-goals

- Re-deriving or re-documenting the native-comp trampoline mechanism (see the
  companion doc).
- Converting boundary mocks (shell, file I/O, time, process, executable-find).
- Removing the variadic-mock convention or the meta-test gate — both stay; they
  are the standing protection for every mock that legitimately remains.