docs: explain native-comp vs primitive-mocking, refine the insight

A reference for the native-comp + subr-mocking trap: the mechanism, the three failure modes, the research with URLs, and the decision (variadic mocks + a meta-test now, migrate off primitive-mocking long-term). Refines the CLAUDE.md codified insight, whose old 'don't mock subrs' framing was too broad, and points it at the new doc.
author: Craig Jennings <c@cjennings.net> 2026-06-21 03:19:08 -0400
committer: Craig Jennings <c@cjennings.net> 2026-06-21 03:19:08 -0400
commit: 6d7a73e616b3111ad5bd46eeb56fdb579e7799bd (patch)
tree: d733568e51521efa916ab9682aafd09e29394364
parent: 0aa85dd219f4be8dbf3383661fd2b42370945b87 (diff)
download: dotemacs-6d7a73e616b3111ad5bd46eeb56fdb579e7799bd.tar.gz
dotemacs-6d7a73e616b3111ad5bd46eeb56fdb579e7799bd.zip
2 files changed, 160 insertions, 1 deletions
diff --git a/CLAUDE.md b/CLAUDE.md
index 2d501dc23..8a13334c7 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -92,4 +92,4 @@ Prefer Write over cumulative Edits for nontrivial new code. Small functions (und
 
 - **`make test` runs with no `package-initialize` — defuns inside a `use-package :config` are void there.** The Makefile's `EMACS_TEST` is `emacs --batch --no-site-file --no-site-lisp` with no `package-initialize`, so elpa packages never load and a `use-package` block whose package isn't found never runs its `:config`. Any `defun` nested inside that `:config` is unbound under `make test` / `make test-file`. The per-edit PostToolUse hook *does* initialize packages, so such defuns load there — a test can pass on save under the hook yet fail `make test`. To unit-test logic that lives in a `:config` block, extract it into a top-level defun outside `use-package` (the `cj/dwim-shell--empty-dirs-command` / `cj/dwim-shell--dated-backup-command` pattern) and test that; keybindings or mode-wiring that must stay in `:config` get live-daemon verification instead. (`gotcha` — 2026-06-13)
 
-- **Don't `cl-letf`-mock C primitives in tests — it triggers a native-comp trampoline rebuild that fails under `--batch`.** Mocking a primitive like `buffer-modified-p`, `file-exists-p`, or `kill-buffer` via `cl-letf`/`fset` makes native-comp try to compile and load a trampoline `.eln`, which errors under `emacs --batch` (`native-lisp-load-failed "file does not exists" .../subr--trampoline-*.eln`, often after a "Redefining 'X' might break native compilation of trampolines" warning). Don't mock the primitive: drive the real state instead (a `make-temp-file` fixture so `file-exists-p` is true for real, `insert`/`set-buffer-modified-p` for modified state, `buffer-live-p` to detect a kill), or extract the decision logic into a pure helper and test that. Mocking ordinary Lisp functions (`y-or-n-p`, `save-buffer`, `info`) is fine — the trap is specific to subrs. (`gotcha` — 2026-06-13)
+- **Mocking a C primitive (subr) in a test is fragile under native-comp; if you must, make the mock variadic — `(lambda (&rest _) ...)`.** When a test redefines a primitive (`cl-letf`/`fset`/`setf`/`advice-add`), native-comp routes natively-compiled callers through a per-primitive trampoline `.eln`, and that interaction fails three different ways depending on eln-cache state: (1) the trampoline `.eln` fails to build/load under `--batch` (`native-lisp-load-failed ... subr--trampoline-*.eln`); (2) when no trampoline is available the redefinition is *silently ignored* and native callers run the real primitive (a quiet false pass); (3) the trampoline calls the mock with the primitive's *maximum* arity, so a fixed-arity mock narrower than the primitive throws `wrong-number-of-arguments`. Mode 3 is the common one — a `(lambda (_) 200)` mock of `window-body-width` (a 0-2-arg subr) gets called with 2 args. Note many routinely-mocked functions are subrs (`message`, `completing-read`, `y-or-n-p`, `executable-find`, `save-buffer`, `byte-compile-file`), and those are fine *because* they're mocked variadically; the trap is the narrow fixed-arity ones. The rule, enforced by `tests/test-meta-subr-mock-arity.el` (fails `make test` on any arity-narrow subr mock): a subr mock must accept the primitive's max arity, so append `&rest _` (keep named args the body uses: `(lambda (cmd &rest _) ...)`). The durable fix the ecosystem and our own `elisp-testing.md` point to is *don't mock the primitive*: drive real state (a `make-temp-file` fixture, `insert`/`set-buffer-modified-p`) or extract a pure helper and test that. Full mechanism, the three modes, research, and decision: [[file:docs/native-comp-subr-mocking.org][docs/native-comp-subr-mocking.org]]. (`gotcha` — refined 2026-06-21 after re-enabling native-comp surfaced 170 latent arity-narrow mocks)
diff --git a/docs/native-comp-subr-mocking.org b/docs/native-comp-subr-mocking.org
new file mode 100644
index 000000000..f66e5d102
--- /dev/null
+++ b/docs/native-comp-subr-mocking.org
@@ -0,0 +1,159 @@
+#+TITLE: Native Compilation vs. Mocking C Primitives in Tests
+#+AUTHOR: Craig Jennings
+#+DATE: 2026-06-21
+
+* What this is
+
+A reference for a real, recurring trap: tests that redefine an Emacs C
+primitive (a "subr") with =cl-letf=, =fset=, =setf=, or =advice-add= behave
+differently once native compilation is enabled, and the failures are
+intermittent. We hit it head-on after re-enabling native-comp config-wide
+(early-init.el, commit 3fd28987, 2026-06-20). This document records the
+mechanism, the research, and the decision so we don't re-derive it.
+
+* The symptom
+
+After native-comp was re-enabled, tests that had been green for months started
+failing, with no change to their source. The errors looked like:
+
+: wrong-number-of-arguments #[nil (nil) (t)] 1
+
+That is a zero-argument mock lambda being called with one argument. The 8 tests
+that first tripped were in =test-dirvish-config-wrappers.el= and
+=test-calibredb-epub-config.el=, all mocking window primitives
+(=current-window-configuration=, =window-body-width=, =window-margins=,
+=get-buffer-window=).
+
+The failures were intermittent across the session: the same test passed, then
+crashed, then passed again. That non-determinism is the tell.
+
+* The mechanism
+
+Native-comp emits *direct* calls to primitives for speed. So when Lisp code
+redefines or advises a primitive (which is exactly what a test mock does),
+natively-compiled callers would normally bypass the redefinition entirely. To
+prevent that, Emacs generates a small per-primitive *trampoline* (a =.eln=
+under =eln-cache/=) the first time a primitive is redefined. The trampoline
+reroutes calls to the primitive through its Lisp function cell, where the mock
+lives.
+
+The trampoline is generated lazily and cached on disk, and that is the source
+of the non-determinism: whether a given mock "works" depends on whether the
+trampoline for that primitive has been compiled into the eln-cache yet. As
+native-comp compiles more in the background, more mocks start routing through
+trampolines.
+
+** Three distinct failure modes
+
+Because behavior depends on trampoline state, the same mock can fail three
+different ways:
+
+1. *Generation failure.* The trampoline =.eln= can't be built or loaded
+   (notably under =emacs --batch=), giving
+   =native-lisp-load-failed "... subr--trampoline-*.eln"=. This is the mode our
+   older CLAUDE.md insight first documented.
+2. *Silent bypass.* When a trampoline isn't available and can't be generated,
+   the manual states natively-compiled callers *ignore* the redefinition and
+   call the real primitive. The mock does nothing, so the test passes for the
+   wrong reason or asserts against real behavior.
+3. *Arity mismatch.* The trampoline *is* built and routes to the mock, but
+   calls it with the primitive's *maximum* arity (filling optionals with nil),
+   not the arity the source used. A fixed-arity mock narrower than the
+   primitive then throws =wrong-number-of-arguments=. This is the mode that bit
+   us this session (every one of the 8 was this).
+
+* Important: this is a test-only artifact
+
+Production code never redefines a C primitive, so these trampolines are never
+generated for this reason in normal use. Nothing here is a defect in the
+config. It is an incompatibility between *mocking primitives in tests* and
+native-comp, confined to the test suite.
+
+* What the wider community has found
+
+This is well known and genuinely hard. It is not us doing something wrong.
+
+- [[https://lists.gnu.org/archive/html/bug-gnu-emacs/2021-10/msg00971.html][bug#51140 (emacs-devel)]] — "cl-letf appears not to work with native-comp."
+  Redefining a built-in like =process-exit-status= via =cl-letf= breaks under
+  native compilation. Confirms the core problem.
+- [[https://github.com/jorgenschaefer/emacs-buttercup/issues/230][buttercup issue #230]] — the buttercup test framework's =spy-on= on primitives
+  (=file-exists-p=, =buffer-file-name=) fails with the
+  =native-lisp-load-failed ... subr--trampoline-*.eln= error (failure mode 1).
+  Our scenario exactly, in a mainstream test framework.
+- [[https://groups.google.com/g/linux.debian.bugs.dist/c/n9P2xhpruDE][Debian bug#1021842]] — buttercup's *own self-tests* hit the trampoline
+  compilation error. Even the test framework's maintainers run into it.
+- [[https://lists.gnu.org/archive/html/bug-gnu-emacs/2023-03/msg00076.html][bug#61880 (emacs-devel)]] — native compilation fails to generate trampolines
+  in certain sequential cases (failure mode 1, deterministic variant).
+- [[https://lists.gnu.org/archive/html/emacs-diffs/2023-03/msg00145.html][emacs-29 commit (bug-fix)]] — Emacs added a warning when you redefine a
+  primitive that the trampoline machinery itself depends on
+  ("Redefining '%s' might break trampoline native compilation"). Shows the
+  maintainers' stance: redefining primitives is discouraged.
+- [[https://www.gnu.org/software/emacs/manual/html_node/elisp/Native_002dCompilation-Variables.html][ELisp Manual: Native-Compilation Variables]] — documents
+  =native-comp-enable-subr-trampolines=. Default on; generates trampolines on
+  the fly. When *off* and no cached trampoline exists, "calls to that primitive
+  from natively-compiled Lisp will ignore redefinitions and advices" (this is
+  failure mode 2, and the catch in the common workaround below).
+
+** The two commonly-cited workarounds, and their costs
+
+- *Disable subr trampolines for tests* (=native-comp-enable-subr-trampolines
+  nil=). The most-cited quick fix. One line. But per the manual it makes
+  natively-compiled callers *ignore* the mock (failure mode 2). It only works
+  reliably when the code under test runs interpreted, not natively compiled.
+  With native-comp aggressively compiling our modules, the code under test is
+  increasingly native, so this risks silent mock-bypass: tests that pass while
+  asserting against the real primitive. Worse than a loud failure.
+- *Don't mock primitives at all.* The maintainers' and our own
+  =elisp-testing.md='s position: inject dependencies or test pure helpers
+  instead. The only fix immune to all three failure modes. Also the most work.
+
+* Our decision (2026-06-21)
+
+We chose a pragmatic middle path with a clear long-term direction.
+
+1. *Make subr mocks variadic.* The arity mode (3) is the only one we have
+   actually suffered. A mock written =(lambda (&rest _) VALUE)= tolerates the
+   trampoline's full-arity call. We swept every arity-narrow subr mock in the
+   suite to append =&rest _= to its arglist (preserving any named args the
+   body uses). This is deterministic and keeps trampolines on, so mocks still
+   route correctly (no silent bypass).
+2. *Enforce it with a meta-test.* =tests/test-meta-subr-mock-arity.el= statically
+   scans every test file for =symbol-function= / =fset= redefinitions of a
+   subr and fails =make test= if any mock can't accept the primitive's maximum
+   arity (=func-arity=). It is deterministic (a pure source read; no dependence
+   on eln-cache state), so a new arity-narrow mock can't merge silently. The
+   rule it enforces is NOT "never mock a subr" (the suite mocks subrs like
+   =message= and =completing-read= hundreds of times, all fine) but "a subr
+   mock must accept the primitive's arity."
+3. *Treat "migrate off primitive-mocking" as a long-term test-quality project.*
+   The variadic sweep fixes the mode we hit but leaves modes 1 and 2 latent
+   (we haven't hit them, but they exist). The durable fix the ecosystem points
+   to is restructuring tests to not redefine primitives at all. Filed as a
+   standalone TODO rather than forced now.
+
+** Why not just disable trampolines for tests?
+
+Because of failure mode 2 (silent bypass) above. In our native-comp-heavy
+setup, disabling trampolines would let natively-compiled code under test ignore
+the mocks, producing tests that pass while testing nothing. A loud
+=wrong-number-of-arguments= that the meta-test prevents up front is strictly
+safer than a quiet false pass.
+
+* Practical rule for writing tests (today)
+
+When you mock a C primitive (subr) in a test, make the replacement variadic:
+
+: (cl-letf (((symbol-function 'window-body-width) (lambda (&rest _) 200)))
+:   ...)
+
+not
+
+: (cl-letf (((symbol-function 'window-body-width) (lambda (_) 200)))   ; breaks under native-comp
+:   ...)
+
+If the body needs the argument, keep it and append =&rest _=:
+
+: (lambda (cmd &rest _) (member cmd allowed))
+
+The meta-test will catch you if you forget. Better still, when practical, don't
+mock the primitive: pass the value in as a parameter, or test a pure helper.
author	Craig Jennings <c@cjennings.net>	2026-06-21 03:19:08 -0400
committer	Craig Jennings <c@cjennings.net>	2026-06-21 03:19:08 -0400
commit	6d7a73e616b3111ad5bd46eeb56fdb579e7799bd (patch)
tree	d733568e51521efa916ab9682aafd09e29394364
parent	0aa85dd219f4be8dbf3383661fd2b42370945b87 (diff)
download	dotemacs-6d7a73e616b3111ad5bd46eeb56fdb579e7799bd.tar.gz dotemacs-6d7a73e616b3111ad5bd46eeb56fdb579e7799bd.zip