docs/design/2026-06-25-testinfra-validation.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238

#+TITLE: Design: Testinfra Post-Install Validation for archsetup
#+AUTHOR: Craig Jennings
#+DATE: 2026-06-25
#+STATUS: Accepted (2026-06-25)

* Problem

The VM integration harness (=scripts/testing/run-test.sh=) runs archsetup in a
QEMU VM, then verifies the result two ways:

1. Parses archsetup's own install log for its Error Summary and the
   =ARCHSETUP_EXECUTION_COMPLETE= marker (did the script finish, did it log
   errors).
2. Runs =run_all_validations= from =scripts/testing/lib/validation.sh= — a
   hand-rolled, shell-based post-install assertion sweep of ~26 checks over SSH.

The shell sweep works, but each check is 6-40 lines of =ssh_cmd= +
=validation_pass/fail= + =attribute_issue= boilerplate, the pass/fail counters
are hand-maintained globals, and the reporting is bespoke. Adding or reading a
check is heavier than it should be, and growing the suite (archsetup configures
far more than the 26 checks cover) compounds that weight.

This doc proposes porting the post-install validation to Testinfra (Python +
pytest) for more expressive checks and better reporting, then growing coverage.

* Decision

Port the post-install validation layer to Testinfra + pytest, reaching parity
with the existing =validation.sh= sweep, then expand coverage. Recorded
rationale: the up-front port cost (parity rewrite + a test-only dependency) is
an accepted trade — the priority is a robust, well-reported, growing validation
suite over feature speed. The framework swap alone buys ergonomics and
reporting, not coverage, so it is paired with real new coverage (below).

This replaces the shell sweep; it does not touch archsetup's own install-log
parsing (that stays as a separate signal). The full coverage expansion (P4)
lands in this task too, sequenced strictly after the parity cutover so the
parity verification stays clean.

* Current harness (what exists today)

** Flow (run-test.sh)
1. Revert VM to base snapshot, boot, wait for SSH.
2. =capture_pre_install_state=.
3. Bundle + copy archsetup + dotfiles into the VM, run archsetup in background,
   poll to completion.
4. =capture_post_install_state=.
5. =run_all_validations= (the shell sweep).
6. =analyze_log_diff= + =generate_issue_report= (issue attribution).
7. Explicit pass/fail exit code; cleanup.

** The shell sweep (validation.sh)
~26 checks under =run_all_validations=: user created / shell / groups, dotfiles,
yay, pacman working, window manager, firewall, DNS, avahi, fail2ban,
NetworkManager, emacs, git config, dev tools, zfs, boot config, autologin,
gnome-keyring, terminus font, mkinitcpio hooks, initramfs consolefont, nvme
module, archsetup log, state markers.

** Issue attribution
=attribute_issue <msg> <bucket>= sorts each failure into one of three arrays —
=ARCHSETUP_ISSUES=, =BASE_INSTALL_ISSUES=, =UNKNOWN_ISSUES= — and
=generate_issue_report= writes them out (base-install issues route to the
archzfs inbox). This is domain logic Testinfra has no equivalent for; the port
must preserve it.

** Connection
=ssh_cmd= uses =sshpass -p "$ROOT_PASSWORD" ssh ... -p "$SSH_PORT" root@$VM_IP=,
with =VM_IP=localhost=, =SSH_PORT=2222=, =ROOT_PASSWORD=archsetup=.

* Design

** Where Testinfra fits
Replace the =run_all_validations= call (step 5) with a pytest invocation against
the running VM. Steps 1-4 and 6-7 are unchanged; =analyze_log_diff= stays.
Testinfra connects over the same SSH the harness already exposes.

** Connection model
Testinfra's paramiko/ssh backend targets the live VM via its host spec:

#+begin_src sh
pytest scripts/testing/tests/ \
  --hosts="ssh://root@localhost:2222" \
  --ssh-config=<generated> \
  --json-report --json-report-file="$TEST_RESULTS_DIR/testinfra.json"
#+end_src

Password auth: generate a throwaway ssh-config (or reuse sshpass via a
=--ssh-identity= once archsetup drops the key, but at validation time we only
have the root password). Simplest: a tiny generated ssh config + sshpass
wrapper, or switch the test VM to a known test key injected pre-run. Open
question below.

** Test layout
#+begin_example
scripts/testing/tests/
  conftest.py            # host fixture, markers, attribution hook, report glue
  test_users.py          # user created / shell / groups
  test_dotfiles.py       # stow symlinks, readable by user
  test_packages.py       # yay, pacman working, dev tools, key packages
  test_services.py       # firewall, dns, avahi, fail2ban, networkmanager
  test_boot.py           # zfs, mkinitcpio hooks, nvme, consolefont, terminus
  test_desktop.py        # window manager, autologin, gnome-keyring
  test_archsetup.py      # install log, state markers
  test_hardening.py      # NEW: sshd drop-in, sysctl, /etc fstab perms, backups
#+end_example

** Example tests (parity)
#+begin_src python
def test_ufw_enabled(host):
    assert host.service("ufw").is_enabled

def test_user_cjennings_exists(host):
    u = host.user("cjennings")
    assert u.exists
    assert u.shell == "/usr/bin/zsh"

def test_zshrc_stowed_and_readable(host):
    f = host.file("/home/cjennings/.zshrc")
    assert f.is_symlink
    assert ".dotfiles/" in f.linked_to
    assert f.exists                       # not broken
    assert host.run("sudo -u cjennings test -r %s" % f.path).rc == 0

def test_mkinitcpio_systemd_hook(host):
    # non-ZFS systems delegate fsck from udev to systemd
    conf = host.file("/etc/mkinitcpio.conf").content_string
    assert "systemd" in conf
#+end_src

Compare =test_ufw_enabled= (1 line) to the current =validate_firewall= (8 lines
of ssh_cmd + branch + counters).

** Preserving issue attribution
Map the three buckets to pytest markers and collect them in a =conftest.py=
hook:

#+begin_src python
@pytest.mark.attribution("archsetup")   # or "base_install" / "unknown"
def test_ufw_enabled(host): ...
#+end_src

A =pytest_runtest_makereport= hook records each failure under its marker's
bucket and writes the same three-way report =generate_issue_report= produces
(base-install failures still route to the archzfs inbox). Default bucket =
archsetup when unmarked.

** Tiered strategy
Markers =@pytest.mark.smoke= (user, key packages, dotfiles present) and
=@pytest.mark.integration= (services, configs, boot). =pytest -m smoke= for a
fast gate, full run otherwise. Drop the task's original X11/startx end-to-end
slice — the fleet is Wayland/Hyprland and headless GUI e2e is flaky and
expensive; a Wayland-session smoke check can be reconsidered later as its own
task.

** Reporting
=pytest-json-report= (or junit-xml) → =$TEST_RESULTS_DIR/=, surfaced in the
test report alongside the install-log analysis. pytest's own per-test
pass/fail/skip output replaces the hand-maintained counters.

* Coverage

** Parity (port all current checks)
All ~26 =validation.sh= checks, grouped per the layout above.

** Expansion (new — the coverage win)
archsetup configures much that isn't validated today. Candidates:
- sshd hardening drop-in (=/etc/ssh/sshd_config.d/10-hardening.conf=,
  PermitRootLogin prohibit-password).
- =backup_system_file= behavior — assert =.archsetup.bak= exists for files
  archsetup edited in place (fstab, mkinitcpio.conf, sudoers, …).
- pacman.conf (ParallelDownloads, Color, multilib) and makepkg.conf (MAKEFLAGS,
  OPTIONS) settings actually applied.
- systemd-resolved DNS-over-TLS drop-in; NetworkManager wifi-privacy.
- fail2ban jail.local present; reflector config; sysctl printk; /etc/issue
  emptied; vconsole font; fstab /efi fmask/dmask perms.
- sanoid / zfs-replicate units (ZFS hosts).

* Dependencies

Add =python-pytest=, =python-pytest-testinfra= (pulls paramiko), and a JSON
reporter to =make deps= (test host only — not installed by archsetup itself).
Note: the existing unit suites run under =python3 -m unittest=; the integration
layer runs under pytest. Two runners, both Python; =make test-unit= unchanged,
=make test= gains the pytest step.

* Goss comparison (the task asked)

- *Goss* — YAML-declarative health specs, a single Go binary executed *on the
  target*. Fast, no Python. But the spec must be pushed into the VM and run
  there, the assertions are less programmable, and it adds a Go binary to the
  flow.
- *Testinfra* — Python, runs *on the host* over SSH (nothing installed in the
  VM), assertions are full Python with rich built-in modules
  (File/Package/Service/User/Command), integrates with pytest's tooling.

Choose Testinfra: it runs from the host (the VM stays clean), it's far more
programmable for the conditional checks archsetup needs (DESKTOP_ENV branches,
ZFS-vs-not), and it aligns with the repo's existing Python test tooling.

* Migration plan (phased, TDD where the helper logic is ours)

- *P1 — Scaffold.* conftest.py (host fixture + connection), the attribution
  marker + report hook, and 3 parity checks (firewall, user, dotfiles). Wire a
  pytest step into run-test.sh behind a flag so the shell sweep still runs.
- *P2 — Full parity.* Port all ~26 checks; diff a real VM run's results against
  the shell sweep to confirm no check was lost.
- *P3 — Cut over.* Make pytest the primary sweep in run-test.sh; keep
  =analyze_log_diff= and the install-log signal.
- *P4 — Expand.* Add the new coverage (hardening, backups, applied settings).
- *P5 — Retire.* Remove =run_all_validations= from validation.sh (keep the
  capture/analyze helpers that pytest doesn't replace).

* Acceptance criteria

- =make test= runs archsetup in a VM, then a pytest sweep over SSH, and a real
  run reports parity with (or a superset of) the current shell checks.
- Failures still sort into archsetup / base-install / unknown, with base-install
  issues routed to the archzfs inbox as today.
- =make deps= installs the test dependencies; the VM has nothing extra installed.
- A documented =pytest -m smoke= fast path exists.

* Resolved decisions (2026-06-25)

1. *Auth at validation time — inject a throwaway test key.* Pre-run, generate
   an ephemeral keypair, push the pubkey into the VM's
   =/root/.ssh/authorized_keys= over the existing sshpass channel, and point
   Testinfra at the private key via a generated ssh-config. No password in the
   pytest invocation; paramiko key auth just works; the keypair is discarded
   after the run. (Chosen over wrapping sshpass around Testinfra, which is
   awkward since Testinfra spawns its own ssh connections.)
2. *Cut over — run both through parity, then switch.* Keep the shell sweep
   running alongside pytest through P2 so a real VM run can diff pytest's
   results against the shell sweep and prove no check was dropped. pytest
   becomes primary at P3; =run_all_validations= is deleted at P5 after the
   expanded suite proves out.
3. *Expansion scope — full, in this task, after cutover.* All of P4 lands here,
   sequenced strictly after the P3 parity cutover so the parity diff is clean
   before new checks are added.