docs/subr-mock-migration-spec.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158

#+TITLE: Spec: Migrate Tests Off Mocking C Primitives
#+AUTHOR: Craig Jennings
#+DATE: 2026-06-30
#+STATUS: Draft — for discussion

* Status

Draft. Pulled out of =todo.org= (=** TODO [#C] Migrate tests off mocking
primitives (native-comp robustness) :test:refactor:solo:=) so the scope and
approach can be settled before any code moves. Execution is deferred; this
document is the discussion vehicle.

Companion reference: [[file:native-comp-subr-mocking.org][native-comp-subr-mocking.org]] holds the full mechanism, the
upstream research, and the 2026-06-21 decision. This spec does not restate the
mechanism; it plans the remaining work that decision deferred.

* Background — how we hit this

We re-enabled native compilation config-wide (early-init.el, commit 3fd28987,
2026-06-20). Tests that had been green for months immediately started failing
with no change to their source — the first 8 were window-primitive mocks in
=test-dirvish-config-wrappers.el= and =test-calibredb-epub-config.el=
(=window-body-width=, =window-margins=, =current-window-configuration=,
=get-buffer-window=), throwing =wrong-number-of-arguments= for a zero-arg mock
lambda called with one argument.

** What we struggled with (the consequences)

- *Intermittent, non-deterministic failure.* The same test passed, then
  crashed, then passed again within a session. Native-comp generates a
  per-primitive trampoline =.eln= lazily and caches it on disk; whether a mock
  "works" depends on whether that trampoline has been built yet. The
  non-determinism was the tell, and it made the failures hard to trust or
  reproduce.
- *Three distinct failure modes from one cause* (full detail in the companion
  doc): (1) trampoline generation failure under =--batch=; (2) silent bypass —
  natively-compiled callers ignore the mock and run the real primitive, so a
  test passes for the wrong reason; (3) arity mismatch — the trampoline calls
  the mock with the primitive's *maximum* arity, so a fixed-arity mock narrower
  than the primitive throws. Mode 3 is the one that bit us; modes 1 and 2 sit
  latent.
- *The tempting quick fix is the dangerous one.* Disabling subr trampolines
  (=native-comp-enable-subr-trampolines nil=) is the most-cited workaround, but
  in our native-comp-heavy setup it produces mode 2 — tests that pass while
  asserting against the real primitive. A quiet false pass is worse than a loud
  crash.
- *Scale of the latent surface.* The suite mocks subrs in hundreds of places.
  The variadic sweep touched 188 arity-narrow mocks.

** What is already done (the stopgap, 2026-06-21)

Not currently broken — two things shipped (commits 571da499, b62c3c88):

1. *Variadic sweep.* Every arity-narrow subr mock got =&rest _= appended, which
   tolerates the trampoline's full-arity call. Deterministic, keeps trampolines
   on, so no silent bypass. Fixes mode 3.
2. *Meta-test gate.* =tests/test-meta-subr-mock-arity.el= statically scans every
   test file for =symbol-function= / =fset= / =setf= subr redefinitions and
   fails =make test= if any mock can't accept the primitive's maximum arity. A
   new arity-narrow mock can't merge silently.

The stopgap fixes the mode we actually suffered. It leaves modes 1 and 2 latent.
The durable fix the ecosystem (and our own =elisp-testing.md=) points to is to
*not redefine primitives at all*. That is the work this spec scopes.

* The real scope — most mocks should NOT move

The raw inventory is large, but the headline number is misleading. =testing.md=
says to mock external boundaries; converting those to "drive real state" would
mean running real shells and touching the real filesystem in unit tests — the
opposite of what we want. So the actual migration target is narrow.

Current subr-mock sites across 261 test files (=cl-letf= / =fset= / =advice-add=
on the named primitive):

| Primitive               | Sites | Classification          | Disposition                       |
|-------------------------+-------+-------------------------+-----------------------------------|
| shell-command-to-string |    62 | external boundary       | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| executable-find         |    60 | external boundary       | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| shell-command           |    29 | external boundary       | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| call-process            |    17 | external boundary       | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| current-time            |    11 | time boundary           | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| save-buffer             |    10 | file I/O boundary       | keep, or real temp-file fixture   |
|-------------------------+-------+-------------------------+-----------------------------------|
| write-region            |     4 | file I/O boundary       | keep, or real temp-file fixture   |
|-------------------------+-------+-------------------------+-----------------------------------|
| message                 |    69 | output-silencing        | keep mocked (variadic)            |
|-------------------------+-------+-------------------------+-----------------------------------|
| completing-read         |    25 | UI prompt               | MIGRATE — extract pure internal   |
|-------------------------+-------+-------------------------+-----------------------------------|
| read-string             |    16 | UI prompt               | MIGRATE — extract pure internal   |
|-------------------------+-------+-------------------------+-----------------------------------|
| yes-or-no-p             |    14 | UI prompt               | MIGRATE — extract pure internal   |
|-------------------------+-------+-------------------------+-----------------------------------|
| read-from-minibuffer    |     6 | UI prompt               | MIGRATE — extract pure internal   |
|-------------------------+-------+-------------------------+-----------------------------------|

The genuine migration target is the UI-prompt bucket: ~61 sites. Per
=elisp-testing.md='s Interactive-vs-Internal rule, the fix is to extract a pure
internal that takes the value as an argument and test that directly, leaving the
interactive wrapper a thin un-tested (or smoke-tested) shell. That removes the
prompt mock entirely — immune to all three failure modes — and improves the
production code's testability as a side effect.

Boundary mocks (shell, file I/O, time, =executable-find=, =call-process=) stay
mocked: that is correct unit-test practice, and the variadic form already
handles native-comp. =message= is output-silencing, not logic — keep it.

* Proposed approach

Not a single sweep. The migration touches production code (each extraction is a
small design change), so it is incremental and reviewable, not mechanical.

1. *Scoped exemplar pass first.* Pick one module with a few prompt-mocks, do the
   extract-internal conversion there, set the pattern, and measure the per-case
   effort. This calibrates the rest.
2. *Batch by module afterward.* Convert remaining UI-prompt sites a module at a
   time, each its own commit, with the suite green between.
3. *Leave boundaries alone.* No conversion of shell / file / time / process
   mocks. The meta-test keeps them arity-safe.

* Open decisions (resolve in discussion)

** TODO Confirm the scope: UI-prompt mocks only, boundaries stay
Is the migration scoped to the ~61 UI-prompt sites (completing-read, read-string,
yes-or-no-p, read-from-minibuffer), with all boundary mocks explicitly out of
scope? Or is there an appetite to also convert the file-I/O mocks
(=save-buffer=, =write-region=) to real temp-file fixtures where it reads
cleaner?

** TODO Reframe the todo.org task title to match the real scope
The current title — "Migrate tests off mocking primitives" — reads as all 300+
sites. If we agree on UI-prompt-only, retitle to something like "Extract pure
internals for UI-prompt-mocked tests" so a future session does not re-scope it
as a wholesale sweep.

** TODO Pick the exemplar module for the first pass
Which module gets the calibrating conversion? A small one with 2-4 prompt-mocks
is ideal. Candidate selection needs a per-module breakdown of the ~61 sites
(not yet collected).

** TODO Decide priority and timing
Currently =[#C]=, =:solo:=. The suite is not broken (stopgap holds), so this is
test-quality debt, not urgent. Confirm it stays low and gets done in batches
between other work, rather than as a dedicated push.

* Non-goals

- Re-deriving or re-documenting the native-comp trampoline mechanism (see the
  companion doc).
- Converting boundary mocks (shell, file I/O, time, process, executable-find).
- Removing the variadic-mock convention or the meta-test gate — both stay; they
  are the standing protection for every mock that legitimately remains.