#+TITLE: Design: Testinfra Post-Install Validation for archsetup
#+AUTHOR: Craig Jennings
#+DATE: 2026-06-25
#+STATUS: Accepted (2026-06-25)

* Problem

The VM integration harness (=scripts/testing/run-test.sh=) runs archsetup in a
QEMU VM, then verifies the result two ways:

1. Parses archsetup's own install log for its Error Summary and the
   =ARCHSETUP_EXECUTION_COMPLETE= marker (did the script finish, did it log
   errors).
2. Runs =run_all_validations= from =scripts/testing/lib/validation.sh= — a
   hand-rolled, shell-based post-install assertion sweep of ~26 checks over SSH.

The shell sweep works, but each check is 6-40 lines of =ssh_cmd= +
=validation_pass/fail= + =attribute_issue= boilerplate, the pass/fail counters
are hand-maintained globals, and the reporting is bespoke. Adding or reading a
check is heavier than it should be, and growing the suite (archsetup configures
far more than the 26 checks cover) compounds that weight.

This doc proposes porting the post-install validation to Testinfra (Python +
pytest) for more expressive checks and better reporting, then growing coverage.

* Decision

Port the post-install validation layer to Testinfra + pytest, reaching parity
with the existing =validation.sh= sweep, then expand coverage. Recorded
rationale: the up-front port cost (parity rewrite + a test-only dependency) is
an accepted trade — the priority is a robust, well-reported, growing validation
suite over feature speed. The framework swap alone buys ergonomics and
reporting, not coverage, so it is paired with real new coverage (below).

This replaces the shell sweep; it does not touch archsetup's own install-log
parsing (that stays as a separate signal). The full coverage expansion (P4)
lands in this task too, sequenced strictly after the parity cutover so the
parity verification stays clean.

* Current harness (what exists today)

** Flow (run-test.sh)
1. Revert VM to base snapshot, boot, wait for SSH.
2. =capture_pre_install_state=.
3. Bundle + copy archsetup + dotfiles into the VM, run archsetup in background,
   poll to completion.
4. =capture_post_install_state=.
5. =run_all_validations= (the shell sweep).
6. =analyze_log_diff= + =generate_issue_report= (issue attribution).
7. Explicit pass/fail exit code; cleanup.

** The shell sweep (validation.sh)
~26 checks under =run_all_validations=: user created / shell / groups, dotfiles,
yay, pacman working, window manager, firewall, DNS, avahi, fail2ban,
NetworkManager, emacs, git config, dev tools, zfs, boot config, autologin,
gnome-keyring, terminus font, mkinitcpio hooks, initramfs consolefont, nvme
module, archsetup log, state markers.

** Issue attribution
=attribute_issue <msg> <bucket>= sorts each failure into one of three arrays —
=ARCHSETUP_ISSUES=, =BASE_INSTALL_ISSUES=, =UNKNOWN_ISSUES= — and
=generate_issue_report= writes them out (base-install issues route to the
archzfs inbox). This is domain logic Testinfra has no equivalent for; the port
must preserve it.

** Connection
=ssh_cmd= uses =sshpass -p "$ROOT_PASSWORD" ssh ... -p "$SSH_PORT" root@$VM_IP=,
with =VM_IP=localhost=, =SSH_PORT=2222=, =ROOT_PASSWORD=archsetup=.

* Design

** Where Testinfra fits
Replace the =run_all_validations= call (step 5) with a pytest invocation against
the running VM. Steps 1-4 and 6-7 are unchanged; =analyze_log_diff= stays.
Testinfra connects over the same SSH the harness already exposes.

** Connection model
Testinfra's paramiko/ssh backend targets the live VM via its host spec:

#+begin_src sh
pytest scripts/testing/tests/ \
  --hosts="ssh://root@localhost:2222" \
  --ssh-config=<generated> \
  --json-report --json-report-file="$TEST_RESULTS_DIR/testinfra.json"
#+end_src

Password auth: generate a throwaway ssh-config (or reuse sshpass via a
=--ssh-identity= once archsetup drops the key, but at validation time we only
have the root password). Simplest: a tiny generated ssh config + sshpass
wrapper, or switch the test VM to a known test key injected pre-run. Open
question below.

** Test layout
#+begin_example
scripts/testing/tests/
  conftest.py            # host fixture, markers, attribution hook, report glue
  test_users.py          # user created / shell / groups
  test_dotfiles.py       # stow symlinks, readable by user
  test_packages.py       # yay, pacman working, dev tools, key packages
  test_services.py       # firewall, dns, avahi, fail2ban, networkmanager
  test_boot.py           # zfs, mkinitcpio hooks, nvme, consolefont, terminus
  test_desktop.py        # window manager, autologin, gnome-keyring
  test_archsetup.py      # install log, state markers
  test_hardening.py      # NEW: sshd drop-in, sysctl, /etc fstab perms, backups
#+end_example

** Example tests (parity)
#+begin_src python
def test_ufw_enabled(host):
    assert host.service("ufw").is_enabled

def test_user_cjennings_exists(host):
    u = host.user("cjennings")
    assert u.exists
    assert u.shell == "/usr/bin/zsh"

def test_zshrc_stowed_and_readable(host):
    f = host.file("/home/cjennings/.zshrc")
    assert f.is_symlink
    assert ".dotfiles/" in f.linked_to
    assert f.exists                       # not broken
    assert host.run("sudo -u cjennings test -r %s" % f.path).rc == 0

def test_mkinitcpio_systemd_hook(host):
    # non-ZFS systems delegate fsck from udev to systemd
    conf = host.file("/etc/mkinitcpio.conf").content_string
    assert "systemd" in conf
#+end_src

Compare =test_ufw_enabled= (1 line) to the current =validate_firewall= (8 lines
of ssh_cmd + branch + counters).

** Preserving issue attribution
Map the three buckets to pytest markers and collect them in a =conftest.py=
hook:

#+begin_src python
@pytest.mark.attribution("archsetup")   # or "base_install" / "unknown"
def test_ufw_enabled(host): ...
#+end_src

A =pytest_runtest_makereport= hook records each failure under its marker's
bucket and writes the same three-way report =generate_issue_report= produces
(base-install failures still route to the archzfs inbox). Default bucket =
archsetup when unmarked.

** Tiered strategy
Markers =@pytest.mark.smoke= (user, key packages, dotfiles present) and
=@pytest.mark.integration= (services, configs, boot). =pytest -m smoke= for a
fast gate, full run otherwise. Drop the task's original X11/startx end-to-end
slice — the fleet is Wayland/Hyprland and headless GUI e2e is flaky and
expensive; a Wayland-session smoke check can be reconsidered later as its own
task.

** Reporting
=pytest-json-report= (or junit-xml) → =$TEST_RESULTS_DIR/=, surfaced in the
test report alongside the install-log analysis. pytest's own per-test
pass/fail/skip output replaces the hand-maintained counters.

* Coverage

** Parity (port all current checks)
All ~26 =validation.sh= checks, grouped per the layout above.

** Expansion (new — the coverage win)
archsetup configures much that isn't validated today. Candidates:
- sshd hardening drop-in (=/etc/ssh/sshd_config.d/10-hardening.conf=,
  PermitRootLogin prohibit-password).
- =backup_system_file= behavior — assert =.archsetup.bak= exists for files
  archsetup edited in place (fstab, mkinitcpio.conf, sudoers, …).
- pacman.conf (ParallelDownloads, Color, multilib) and makepkg.conf (MAKEFLAGS,
  OPTIONS) settings actually applied.
- systemd-resolved DNS-over-TLS drop-in; NetworkManager wifi-privacy.
- fail2ban jail.local present; reflector config; sysctl printk; /etc/issue
  emptied; vconsole font; fstab /efi fmask/dmask perms.
- sanoid / zfs-replicate units (ZFS hosts).

* Dependencies

Add =python-pytest=, =python-pytest-testinfra= (pulls paramiko), and a JSON
reporter to =make deps= (test host only — not installed by archsetup itself).
Note: the existing unit suites run under =python3 -m unittest=; the integration
layer runs under pytest. Two runners, both Python; =make test-unit= unchanged,
=make test= gains the pytest step.

* Goss comparison (the task asked)

- *Goss* — YAML-declarative health specs, a single Go binary executed *on the
  target*. Fast, no Python. But the spec must be pushed into the VM and run
  there, the assertions are less programmable, and it adds a Go binary to the
  flow.
- *Testinfra* — Python, runs *on the host* over SSH (nothing installed in the
  VM), assertions are full Python with rich built-in modules
  (File/Package/Service/User/Command), integrates with pytest's tooling.

Choose Testinfra: it runs from the host (the VM stays clean), it's far more
programmable for the conditional checks archsetup needs (DESKTOP_ENV branches,
ZFS-vs-not), and it aligns with the repo's existing Python test tooling.

* Migration plan (phased, TDD where the helper logic is ours)

- *P1 — Scaffold.* conftest.py (host fixture + connection), the attribution
  marker + report hook, and 3 parity checks (firewall, user, dotfiles). Wire a
  pytest step into run-test.sh behind a flag so the shell sweep still runs.
- *P2 — Full parity.* Port all ~26 checks; diff a real VM run's results against
  the shell sweep to confirm no check was lost.
- *P3 — Cut over.* Make pytest the primary sweep in run-test.sh; keep
  =analyze_log_diff= and the install-log signal.
- *P4 — Expand.* Add the new coverage (hardening, backups, applied settings).
- *P5 — Retire.* Remove =run_all_validations= from validation.sh (keep the
  capture/analyze helpers that pytest doesn't replace).

* Acceptance criteria

- =make test= runs archsetup in a VM, then a pytest sweep over SSH, and a real
  run reports parity with (or a superset of) the current shell checks.
- Failures still sort into archsetup / base-install / unknown, with base-install
  issues routed to the archzfs inbox as today.
- =make deps= installs the test dependencies; the VM has nothing extra installed.
- A documented =pytest -m smoke= fast path exists.

* Resolved decisions (2026-06-25)

1. *Auth at validation time — inject a throwaway test key.* Pre-run, generate
   an ephemeral keypair, push the pubkey into the VM's
   =/root/.ssh/authorized_keys= over the existing sshpass channel, and point
   Testinfra at the private key via a generated ssh-config. No password in the
   pytest invocation; paramiko key auth just works; the keypair is discarded
   after the run. (Chosen over wrapping sshpass around Testinfra, which is
   awkward since Testinfra spawns its own ssh connections.)
2. *Cut over — run both through parity, then switch.* Keep the shell sweep
   running alongside pytest through P2 so a real VM run can diff pytest's
   results against the shell sweep and prove no check was dropped. pytest
   becomes primary at P3; =run_all_validations= is deleted at P5 after the
   expanded suite proves out.
3. *Expansion scope — full, in this task, after cutover.* All of P4 lands here,
   sequenced strictly after the P3 parity cutover so the parity diff is clean
   before new checks are added.