#+TITLE: Testing Strategy #+AUTHOR: Craig Jennings #+DATE: 2026-01-25 * Overview This document describes the testing strategy for the archzfs installer project, including automated VM testing and the rationale for key technical decisions. * Running Tests ** Makefile Targets | Target | Description | |--------+-------------| | =make test-install= | Run all 12 automated install tests (builds ISO first) | | =make test-vm= | Boot ISO in a single-disk VM (interactive) | | =make test-multi= | Boot ISO in a 2-disk VM for mirror/RAID testing | | =make test-multi3= | Boot ISO in a 3-disk VM for raidz1 testing | | =make test-boot= | Boot from installed disk (after running install in VM) | | =make test-clean= | Remove VM disks and OVMF vars, start fresh | | =make lint= | Run shellcheck on all scripts | | =make test= | Run lint (alias) | ** Running a Single Automated Test #+begin_src bash ./scripts/test-install.sh zfs-encrypt #+end_src ** Running Multiple Specific Tests #+begin_src bash ./scripts/test-install.sh zfs-encrypt zfs-mirror-encrypt btrfs-luks #+end_src ** Listing Available Test Configs #+begin_src bash ./scripts/test-install.sh --list #+end_src * Test Infrastructure ** Test Scripts - =scripts/test-install.sh= - Main test runner - =scripts/test-configs/= - Configuration files for different test scenarios ** Test Flow 1. Build ISO with =./build.sh= 2. Boot QEMU VM from ISO 3. Run unattended installation via config file 4. Verify installation (packages, services, filesystem) 5. Reboot from installed disk (no ISO) 6. Verify system survives reboot 7. Test rollback functionality (ZFS and btrfs) * LUKS Encryption Testing ** The Challenge LUKS-encrypted systems require TWO passphrase prompts at boot: 1. *GRUB prompt* - GRUB must decrypt /boot to read kernel/initramfs 2. *Initramfs prompt* - encrypt hook must decrypt root to mount filesystem This blocks automated testing because: - SSH is unavailable until after both decryptions complete - Both prompts require interactive passphrase entry ** Options Evaluated *** Option A: Put /boot on EFI partition for testing Move /boot to the unencrypted EFI partition when TESTING=yes, so GRUB doesn't need to decrypt anything. *Rejected* - Tests different code path than production. Bugs in GRUB cryptodisk setup would not be caught. "Testing something different than what ships defeats the purpose." *** Option B: Accept limitation, enhance installation verification Skip reboot tests for LUKS. Instead, verify configs before cleanup: - Check crypttab, grub.cfg, mkinitcpio.conf are correct - If configs are right, boot should work *Rejected* - We already found bugs (empty grub.cfg from FAT32 sync) that only manifested at boot time. Config inspection wouldn't catch everything. *** Option C: Hybrid approach (Chosen) Use TWO mechanisms to handle the two prompts: 1. *GRUB prompt* - QEMU monitor sendkey (timing is predictable) 2. *Initramfs prompt* - Keyfile in initramfs (deterministic) The GRUB countdown provides clear timing signal: #+begin_example The highlighted entry will be executed automatically in 0s. Booting 'Arch Linux' Enter passphrase for hd0,gpt2: #+end_example We know exactly when the GRUB prompt appears. After sendkey handles GRUB, the keyfile handles initramfs automatically. ** Why Option C - Tests actual production code path (critical requirement) - GRUB timing is predictable (countdown visible in serial) - Keyfile handles the harder timing problem (initramfs) - Only one sendkey interaction needed (GRUB prompt) ** Implementation *** GRUB Passphrase (sendkey) 1. Change serial from file-based to real-time (socket or pty) 2. Monitor for "Enter passphrase for" text after GRUB countdown 3. Send passphrase via QEMU monitor: =sendkey= commands 4. Send Enter key to submit *** Initramfs Passphrase (keyfile) When =TESTING=yes= is set in config: 1. Generate random 2KB keyfile at =/etc/cryptroot.key= 2. Add keyfile to LUKS slot 1 (passphrase remains in slot 0) 3. Set keyfile permissions to 000 4. Add keyfile to mkinitcpio FILES= array 5. Configure crypttab to use keyfile instead of "none" 6. Initramfs unlocks automatically (no prompt) ** Security Mitigations - Test-only flag: Only activates when TESTING=yes - Separate key slot: Keyfile in slot 1, passphrase in slot 0 - Random per-build: Fresh keyfile generated each installation - Never shipped: Keyfile only in test VMs, not in ISO - Restricted permissions: chmod 000 on keyfile ** Files Modified - =custom/lib/btrfs.sh= - setup_luks_testing_keyfile(), configure_crypttab(), configure_luks_initramfs() - =custom/archangel= - Calls keyfile setup in LUKS flow - =scripts/test-install.sh= - sendkey for GRUB, real-time serial monitoring - =scripts/test-configs/btrfs-luks.conf= - TESTING=yes - =scripts/test-configs/btrfs-mirror-luks.conf= - TESTING=yes * Adding a New Test ** Step 1: Create a Config File Add a =.conf= file in =scripts/test-configs/=. First line should be a comment describing the test: #+begin_example # Test config: Description of what this tests #+end_example Required fields: - =HOSTNAME= - unique per test - =TIMEZONE=, =LOCALE=, =KEYMAP= - use =UTC=, =en_US.UTF-8=, =us= for defaults - =DISKS= - =/dev/vda= for single, =/dev/vda,/dev/vdb= for 2-disk, etc. - =ROOT_PASSWORD= - needed for SSH into installed system - =ENABLE_SSH= - =yes= for full verification, =no= for console-only boot check Filesystem choice: - ZFS (default): no =FILESYSTEM= needed, or set =FILESYSTEM=zfs= - Btrfs: =FILESYSTEM=btrfs= Encryption: - No encryption: =NO_ENCRYPT=yes= - ZFS encryption: =ZFS_PASSPHRASE=testpass= (no =NO_ENCRYPT=) - LUKS encryption: =LUKS_PASSPHRASE=testpassphrase=, =TESTING=yes= (no =NO_ENCRYPT=) Multi-disk: - Mirror: =RAID_LEVEL=mirror= - RAIDZ1: =RAID_LEVEL=raidz1= (ZFS only, needs 3+ disks) - Btrfs stripe: =RAID_LEVEL=stripe= ** Step 2: Run the Test #+begin_example ./scripts/test-install.sh my-new-test #+end_example The test runner automatically: - Counts disks from =DISKS== to create QCOW2 images - Detects encryption type from =LUKS_PASSPHRASE= / =ZFS_PASSPHRASE= - Adds QEMU monitor socket when encryption is detected - Dispatches to =send_luks_passphrase()= or =send_zfs_passphrase()= at reboot - Runs =verify_install()=, =verify_reboot_survival()=, =verify_rollback()= ** Step 3: If Encryption Needs a New Prompt Handler The encryption dispatch in =run_test()= uses =encrypt_flag=: #+begin_example if [[ "$encrypt_flag" == "luks" ]]; then send_luks_passphrase ... elif [[ "$encrypt_flag" == "zfs" ]]; then send_zfs_passphrase ... fi #+end_example To add a new encryption type: 1. Add detection logic that reads the config and sets =encrypt_flag= 2. Write a =send__passphrase()= function 3. Add an =elif= branch in the dispatch ** Key Functions in test-install.sh | Function | Purpose | |----------+---------| | =monitor_sendkeys()= | Sends a string as QEMU sendkey commands (char-by-char + Enter) | | =send_luks_passphrase()= | Detects GRUB prompt in serial, sends passphrase per disk | | =send_zfs_passphrase()= | Timed delay (no serial), sends passphrase twice (ZBM + initramfs) | | =start_vm_from_disk()= | Boots installed disk; adds monitor socket if encrypt mode is set | | =verify_install()= | Checks filesystem, snapshots, encryption properties via SSH | | =verify_reboot_survival()= | Checks pool/filesystem health after reboot | | =verify_rollback()= | Creates file, snapshots, deletes, rolls back, verifies restore | ** Debugging a Failing Test Serial log is saved to =test-logs/-reboot-serial.log= on failure. For VGA-only boot stages (ZFSBootMenu), take a screenshot via QEMU monitor: #+begin_example echo "screendump /tmp/screen.ppm" | socat -t 2 - UNIX-CONNECT:vm/monitor-.sock convert /tmp/screen.ppm /tmp/screen.png #+end_example This requires keeping the VM alive (add debugging before =stop_vm= in the failure path). * Test Configurations ** Btrfs Tests | Config | Disks | LUKS | Status | |-------------------+-------+------+-------------------------| | btrfs-single | 1 | No | Pass | | btrfs-luks | 1 | Yes | Pass (with TESTING=yes) | | btrfs-mirror | 2 | No | Pass | | btrfs-stripe | 2 | No | Pass | | btrfs-mirror-luks | 2 | Yes | Pass (with TESTING=yes) | ** ZFS Tests | Config | Disks | Encryption | Status | |--------------------+-------+------------+--------| | single-disk | 1 | No | Pass | | mirror | 2 | No | Pass | | raidz1 | 3 | No | Pass | | zfs-encrypt | 1 | Yes | Pass | | zfs-mirror-encrypt | 2 | Yes | Pass | * ZFS Native Encryption Testing ** The Challenge ZFS native encryption (=keylocation=prompt=) requires TWO passphrase prompts at boot, similar to LUKS but from different components: 1. *ZFSBootMenu prompt* - ZFSBootMenu must unlock the pool to enumerate boot environments 2. *Initramfs prompt* - mkinitcpio's =zfs= hook re-imports the pool after kexec ** Key Difference from LUKS - *LUKS*: GRUB prompts once per encrypted disk, initramfs uses a keyfile (no prompt) - *ZFS*: One prompt regardless of disk count (pool-level encryption), but TWO prompts from different boot stages (ZFSBootMenu + initramfs) ** Key Difference: Serial Console GRUB outputs to serial console, so its passphrase prompt is detectable in the serial log. ZFSBootMenu renders entirely to the VGA framebuffer — its passphrase prompt (and the initramfs prompt) never appear in serial output. The serial log only shows the UEFI firmware loading ZFSBootMenu: #+begin_example BdsDxe: starting Boot0009 "ZFSBootMenu" from HD(...) EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path #+end_example After that, nothing until the booted system's getty starts. ** Implementation: Timed Sendkey Since prompt detection via serial is not possible, we use timed delays: 1. Detect UEFI firmware log line (=starting.*ZFSBootMenu=) in serial 2. Wait 15s for ZFSBootMenu to initialize and display passphrase prompt 3. Send passphrase via QEMU monitor sendkey 4. Wait 30s for ZFSBootMenu to boot kernel and mkinitcpio to reach zfs hook 5. Send passphrase again via sendkey 6. Wait for SSH to become available ** Why Not a Keyfile (Like LUKS)? For LUKS, we embed a keyfile in the initramfs to avoid the second prompt. For ZFS, this would require changing =keylocation= from =prompt= to =file:///path/to/key= and embedding the key in the initramfs — which tests a different code path than production. The timed sendkey approach tests the actual production passphrase flow. ** Files - =scripts/test-install.sh= - =send_zfs_passphrase()= function - =scripts/test-configs/zfs-encrypt.conf= - Single disk, TESTING=yes - =scripts/test-configs/zfs-mirror-encrypt.conf= - Mirror, TESTING=yes * References - Arch Wiki: dm-crypt/System configuration - HashiCorp Discuss: LUKS Encryption Key on Initial Reboot - GitHub: tylert/packer-build Issue #31 (LUKS unattended builds)