diff options
| author | Craig Jennings <c@cjennings.net> | 2026-06-21 03:19:08 -0400 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-06-21 03:19:08 -0400 |
| commit | 6d7a73e616b3111ad5bd46eeb56fdb579e7799bd (patch) | |
| tree | d733568e51521efa916ab9682aafd09e29394364 | |
| parent | 0aa85dd219f4be8dbf3383661fd2b42370945b87 (diff) | |
| download | dotemacs-6d7a73e616b3111ad5bd46eeb56fdb579e7799bd.tar.gz dotemacs-6d7a73e616b3111ad5bd46eeb56fdb579e7799bd.zip | |
docs: explain native-comp vs primitive-mocking, refine the insight
A reference for the native-comp + subr-mocking trap: the mechanism, the three failure modes, the research with URLs, and the decision (variadic mocks + a meta-test now, migrate off primitive-mocking long-term). Refines the CLAUDE.md codified insight, whose old 'don't mock subrs' framing was too broad, and points it at the new doc.
| -rw-r--r-- | CLAUDE.md | 2 | ||||
| -rw-r--r-- | docs/native-comp-subr-mocking.org | 159 |
2 files changed, 160 insertions, 1 deletions
@@ -92,4 +92,4 @@ Prefer Write over cumulative Edits for nontrivial new code. Small functions (und - **`make test` runs with no `package-initialize` — defuns inside a `use-package :config` are void there.** The Makefile's `EMACS_TEST` is `emacs --batch --no-site-file --no-site-lisp` with no `package-initialize`, so elpa packages never load and a `use-package` block whose package isn't found never runs its `:config`. Any `defun` nested inside that `:config` is unbound under `make test` / `make test-file`. The per-edit PostToolUse hook *does* initialize packages, so such defuns load there — a test can pass on save under the hook yet fail `make test`. To unit-test logic that lives in a `:config` block, extract it into a top-level defun outside `use-package` (the `cj/dwim-shell--empty-dirs-command` / `cj/dwim-shell--dated-backup-command` pattern) and test that; keybindings or mode-wiring that must stay in `:config` get live-daemon verification instead. (`gotcha` — 2026-06-13) -- **Don't `cl-letf`-mock C primitives in tests — it triggers a native-comp trampoline rebuild that fails under `--batch`.** Mocking a primitive like `buffer-modified-p`, `file-exists-p`, or `kill-buffer` via `cl-letf`/`fset` makes native-comp try to compile and load a trampoline `.eln`, which errors under `emacs --batch` (`native-lisp-load-failed "file does not exists" .../subr--trampoline-*.eln`, often after a "Redefining 'X' might break native compilation of trampolines" warning). Don't mock the primitive: drive the real state instead (a `make-temp-file` fixture so `file-exists-p` is true for real, `insert`/`set-buffer-modified-p` for modified state, `buffer-live-p` to detect a kill), or extract the decision logic into a pure helper and test that. Mocking ordinary Lisp functions (`y-or-n-p`, `save-buffer`, `info`) is fine — the trap is specific to subrs. (`gotcha` — 2026-06-13) +- **Mocking a C primitive (subr) in a test is fragile under native-comp; if you must, make the mock variadic — `(lambda (&rest _) ...)`.** When a test redefines a primitive (`cl-letf`/`fset`/`setf`/`advice-add`), native-comp routes natively-compiled callers through a per-primitive trampoline `.eln`, and that interaction fails three different ways depending on eln-cache state: (1) the trampoline `.eln` fails to build/load under `--batch` (`native-lisp-load-failed ... subr--trampoline-*.eln`); (2) when no trampoline is available the redefinition is *silently ignored* and native callers run the real primitive (a quiet false pass); (3) the trampoline calls the mock with the primitive's *maximum* arity, so a fixed-arity mock narrower than the primitive throws `wrong-number-of-arguments`. Mode 3 is the common one — a `(lambda (_) 200)` mock of `window-body-width` (a 0-2-arg subr) gets called with 2 args. Note many routinely-mocked functions are subrs (`message`, `completing-read`, `y-or-n-p`, `executable-find`, `save-buffer`, `byte-compile-file`), and those are fine *because* they're mocked variadically; the trap is the narrow fixed-arity ones. The rule, enforced by `tests/test-meta-subr-mock-arity.el` (fails `make test` on any arity-narrow subr mock): a subr mock must accept the primitive's max arity, so append `&rest _` (keep named args the body uses: `(lambda (cmd &rest _) ...)`). The durable fix the ecosystem and our own `elisp-testing.md` point to is *don't mock the primitive*: drive real state (a `make-temp-file` fixture, `insert`/`set-buffer-modified-p`) or extract a pure helper and test that. Full mechanism, the three modes, research, and decision: [[file:docs/native-comp-subr-mocking.org][docs/native-comp-subr-mocking.org]]. (`gotcha` — refined 2026-06-21 after re-enabling native-comp surfaced 170 latent arity-narrow mocks) diff --git a/docs/native-comp-subr-mocking.org b/docs/native-comp-subr-mocking.org new file mode 100644 index 000000000..f66e5d102 --- /dev/null +++ b/docs/native-comp-subr-mocking.org @@ -0,0 +1,159 @@ +#+TITLE: Native Compilation vs. Mocking C Primitives in Tests +#+AUTHOR: Craig Jennings +#+DATE: 2026-06-21 + +* What this is + +A reference for a real, recurring trap: tests that redefine an Emacs C +primitive (a "subr") with =cl-letf=, =fset=, =setf=, or =advice-add= behave +differently once native compilation is enabled, and the failures are +intermittent. We hit it head-on after re-enabling native-comp config-wide +(early-init.el, commit 3fd28987, 2026-06-20). This document records the +mechanism, the research, and the decision so we don't re-derive it. + +* The symptom + +After native-comp was re-enabled, tests that had been green for months started +failing, with no change to their source. The errors looked like: + +: wrong-number-of-arguments #[nil (nil) (t)] 1 + +That is a zero-argument mock lambda being called with one argument. The 8 tests +that first tripped were in =test-dirvish-config-wrappers.el= and +=test-calibredb-epub-config.el=, all mocking window primitives +(=current-window-configuration=, =window-body-width=, =window-margins=, +=get-buffer-window=). + +The failures were intermittent across the session: the same test passed, then +crashed, then passed again. That non-determinism is the tell. + +* The mechanism + +Native-comp emits *direct* calls to primitives for speed. So when Lisp code +redefines or advises a primitive (which is exactly what a test mock does), +natively-compiled callers would normally bypass the redefinition entirely. To +prevent that, Emacs generates a small per-primitive *trampoline* (a =.eln= +under =eln-cache/=) the first time a primitive is redefined. The trampoline +reroutes calls to the primitive through its Lisp function cell, where the mock +lives. + +The trampoline is generated lazily and cached on disk, and that is the source +of the non-determinism: whether a given mock "works" depends on whether the +trampoline for that primitive has been compiled into the eln-cache yet. As +native-comp compiles more in the background, more mocks start routing through +trampolines. + +** Three distinct failure modes + +Because behavior depends on trampoline state, the same mock can fail three +different ways: + +1. *Generation failure.* The trampoline =.eln= can't be built or loaded + (notably under =emacs --batch=), giving + =native-lisp-load-failed "... subr--trampoline-*.eln"=. This is the mode our + older CLAUDE.md insight first documented. +2. *Silent bypass.* When a trampoline isn't available and can't be generated, + the manual states natively-compiled callers *ignore* the redefinition and + call the real primitive. The mock does nothing, so the test passes for the + wrong reason or asserts against real behavior. +3. *Arity mismatch.* The trampoline *is* built and routes to the mock, but + calls it with the primitive's *maximum* arity (filling optionals with nil), + not the arity the source used. A fixed-arity mock narrower than the + primitive then throws =wrong-number-of-arguments=. This is the mode that bit + us this session (every one of the 8 was this). + +* Important: this is a test-only artifact + +Production code never redefines a C primitive, so these trampolines are never +generated for this reason in normal use. Nothing here is a defect in the +config. It is an incompatibility between *mocking primitives in tests* and +native-comp, confined to the test suite. + +* What the wider community has found + +This is well known and genuinely hard. It is not us doing something wrong. + +- [[https://lists.gnu.org/archive/html/bug-gnu-emacs/2021-10/msg00971.html][bug#51140 (emacs-devel)]] — "cl-letf appears not to work with native-comp." + Redefining a built-in like =process-exit-status= via =cl-letf= breaks under + native compilation. Confirms the core problem. +- [[https://github.com/jorgenschaefer/emacs-buttercup/issues/230][buttercup issue #230]] — the buttercup test framework's =spy-on= on primitives + (=file-exists-p=, =buffer-file-name=) fails with the + =native-lisp-load-failed ... subr--trampoline-*.eln= error (failure mode 1). + Our scenario exactly, in a mainstream test framework. +- [[https://groups.google.com/g/linux.debian.bugs.dist/c/n9P2xhpruDE][Debian bug#1021842]] — buttercup's *own self-tests* hit the trampoline + compilation error. Even the test framework's maintainers run into it. +- [[https://lists.gnu.org/archive/html/bug-gnu-emacs/2023-03/msg00076.html][bug#61880 (emacs-devel)]] — native compilation fails to generate trampolines + in certain sequential cases (failure mode 1, deterministic variant). +- [[https://lists.gnu.org/archive/html/emacs-diffs/2023-03/msg00145.html][emacs-29 commit (bug-fix)]] — Emacs added a warning when you redefine a + primitive that the trampoline machinery itself depends on + ("Redefining '%s' might break trampoline native compilation"). Shows the + maintainers' stance: redefining primitives is discouraged. +- [[https://www.gnu.org/software/emacs/manual/html_node/elisp/Native_002dCompilation-Variables.html][ELisp Manual: Native-Compilation Variables]] — documents + =native-comp-enable-subr-trampolines=. Default on; generates trampolines on + the fly. When *off* and no cached trampoline exists, "calls to that primitive + from natively-compiled Lisp will ignore redefinitions and advices" (this is + failure mode 2, and the catch in the common workaround below). + +** The two commonly-cited workarounds, and their costs + +- *Disable subr trampolines for tests* (=native-comp-enable-subr-trampolines + nil=). The most-cited quick fix. One line. But per the manual it makes + natively-compiled callers *ignore* the mock (failure mode 2). It only works + reliably when the code under test runs interpreted, not natively compiled. + With native-comp aggressively compiling our modules, the code under test is + increasingly native, so this risks silent mock-bypass: tests that pass while + asserting against the real primitive. Worse than a loud failure. +- *Don't mock primitives at all.* The maintainers' and our own + =elisp-testing.md='s position: inject dependencies or test pure helpers + instead. The only fix immune to all three failure modes. Also the most work. + +* Our decision (2026-06-21) + +We chose a pragmatic middle path with a clear long-term direction. + +1. *Make subr mocks variadic.* The arity mode (3) is the only one we have + actually suffered. A mock written =(lambda (&rest _) VALUE)= tolerates the + trampoline's full-arity call. We swept every arity-narrow subr mock in the + suite to append =&rest _= to its arglist (preserving any named args the + body uses). This is deterministic and keeps trampolines on, so mocks still + route correctly (no silent bypass). +2. *Enforce it with a meta-test.* =tests/test-meta-subr-mock-arity.el= statically + scans every test file for =symbol-function= / =fset= redefinitions of a + subr and fails =make test= if any mock can't accept the primitive's maximum + arity (=func-arity=). It is deterministic (a pure source read; no dependence + on eln-cache state), so a new arity-narrow mock can't merge silently. The + rule it enforces is NOT "never mock a subr" (the suite mocks subrs like + =message= and =completing-read= hundreds of times, all fine) but "a subr + mock must accept the primitive's arity." +3. *Treat "migrate off primitive-mocking" as a long-term test-quality project.* + The variadic sweep fixes the mode we hit but leaves modes 1 and 2 latent + (we haven't hit them, but they exist). The durable fix the ecosystem points + to is restructuring tests to not redefine primitives at all. Filed as a + standalone TODO rather than forced now. + +** Why not just disable trampolines for tests? + +Because of failure mode 2 (silent bypass) above. In our native-comp-heavy +setup, disabling trampolines would let natively-compiled code under test ignore +the mocks, producing tests that pass while testing nothing. A loud +=wrong-number-of-arguments= that the meta-test prevents up front is strictly +safer than a quiet false pass. + +* Practical rule for writing tests (today) + +When you mock a C primitive (subr) in a test, make the replacement variadic: + +: (cl-letf (((symbol-function 'window-body-width) (lambda (&rest _) 200))) +: ...) + +not + +: (cl-letf (((symbol-function 'window-body-width) (lambda (_) 200))) ; breaks under native-comp +: ...) + +If the body needs the argument, keep it and append =&rest _=: + +: (lambda (cmd &rest _) (member cmd allowed)) + +The meta-test will catch you if you forget. Better still, when practical, don't +mock the primitive: pass the value in as a parameter, or test a pure helper. |
