From c5759dd61684c15f4ca460d4f8a166825e1bf3d8 Mon Sep 17 00:00:00 2001 From: Craig Jennings Date: Tue, 24 Feb 2026 07:54:13 -0600 Subject: docs: add testing-strategy.org with ZFS encryption notes and test recipe --- testing-strategy.org | 287 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 287 insertions(+) create mode 100644 testing-strategy.org diff --git a/testing-strategy.org b/testing-strategy.org new file mode 100644 index 0000000..3119917 --- /dev/null +++ b/testing-strategy.org @@ -0,0 +1,287 @@ +#+TITLE: Testing Strategy +#+AUTHOR: Craig Jennings +#+DATE: 2026-01-25 + +* Overview + +This document describes the testing strategy for the archzfs installer project, +including automated VM testing and the rationale for key technical decisions. + +* Test Infrastructure + +** Test Scripts + +- =scripts/test-install.sh= - Main test runner +- =scripts/test-configs/= - Configuration files for different test scenarios + +** Test Flow + +1. Build ISO with =./build.sh= +2. Boot QEMU VM from ISO +3. Run unattended installation via config file +4. Verify installation (packages, services, filesystem) +5. Reboot from installed disk (no ISO) +6. Verify system survives reboot +7. Test rollback functionality (ZFS and btrfs) + +* LUKS Encryption Testing + +** The Challenge + +LUKS-encrypted systems require TWO passphrase prompts at boot: + +1. *GRUB prompt* - GRUB must decrypt /boot to read kernel/initramfs +2. *Initramfs prompt* - encrypt hook must decrypt root to mount filesystem + +This blocks automated testing because: +- SSH is unavailable until after both decryptions complete +- Both prompts require interactive passphrase entry + +** Options Evaluated + +*** Option A: Put /boot on EFI partition for testing + +Move /boot to the unencrypted EFI partition when TESTING=yes, so GRUB +doesn't need to decrypt anything. + +*Rejected* - Tests different code path than production. Bugs in GRUB +cryptodisk setup would not be caught. "Testing something different than +what ships defeats the purpose." + +*** Option B: Accept limitation, enhance installation verification + +Skip reboot tests for LUKS. Instead, verify configs before cleanup: +- Check crypttab, grub.cfg, mkinitcpio.conf are correct +- If configs are right, boot should work + +*Rejected* - We already found bugs (empty grub.cfg from FAT32 sync) that +only manifested at boot time. Config inspection wouldn't catch everything. + +*** Option C: Hybrid approach (Chosen) + +Use TWO mechanisms to handle the two prompts: + +1. *GRUB prompt* - QEMU monitor sendkey (timing is predictable) +2. *Initramfs prompt* - Keyfile in initramfs (deterministic) + +The GRUB countdown provides clear timing signal: +#+begin_example +The highlighted entry will be executed automatically in 0s. +Booting 'Arch Linux' +Enter passphrase for hd0,gpt2: +#+end_example + +We know exactly when the GRUB prompt appears. After sendkey handles GRUB, +the keyfile handles initramfs automatically. + +** Why Option C + +- Tests actual production code path (critical requirement) +- GRUB timing is predictable (countdown visible in serial) +- Keyfile handles the harder timing problem (initramfs) +- Only one sendkey interaction needed (GRUB prompt) + +** Implementation + +*** GRUB Passphrase (sendkey) + +1. Change serial from file-based to real-time (socket or pty) +2. Monitor for "Enter passphrase for" text after GRUB countdown +3. Send passphrase via QEMU monitor: =sendkey= commands +4. Send Enter key to submit + +*** Initramfs Passphrase (keyfile) + +When =TESTING=yes= is set in config: + +1. Generate random 2KB keyfile at =/etc/cryptroot.key= +2. Add keyfile to LUKS slot 1 (passphrase remains in slot 0) +3. Set keyfile permissions to 000 +4. Add keyfile to mkinitcpio FILES= array +5. Configure crypttab to use keyfile instead of "none" +6. Initramfs unlocks automatically (no prompt) + +** Security Mitigations + +- Test-only flag: Only activates when TESTING=yes +- Separate key slot: Keyfile in slot 1, passphrase in slot 0 +- Random per-build: Fresh keyfile generated each installation +- Never shipped: Keyfile only in test VMs, not in ISO +- Restricted permissions: chmod 000 on keyfile + +** Files Modified + +- =custom/lib/btrfs.sh= - setup_luks_testing_keyfile(), configure_crypttab(), configure_luks_initramfs() +- =custom/archangel= - Calls keyfile setup in LUKS flow +- =scripts/test-install.sh= - sendkey for GRUB, real-time serial monitoring +- =scripts/test-configs/btrfs-luks.conf= - TESTING=yes +- =scripts/test-configs/btrfs-mirror-luks.conf= - TESTING=yes + +* Adding a New Test + +** Step 1: Create a Config File + +Add a =.conf= file in =scripts/test-configs/=. First line should be a comment +describing the test: + +#+begin_example +# Test config: Description of what this tests +#+end_example + +Required fields: +- =HOSTNAME= - unique per test +- =TIMEZONE=, =LOCALE=, =KEYMAP= - use =UTC=, =en_US.UTF-8=, =us= for defaults +- =DISKS= - =/dev/vda= for single, =/dev/vda,/dev/vdb= for 2-disk, etc. +- =ROOT_PASSWORD= - needed for SSH into installed system +- =ENABLE_SSH= - =yes= for full verification, =no= for console-only boot check + +Filesystem choice: +- ZFS (default): no =FILESYSTEM= needed, or set =FILESYSTEM=zfs= +- Btrfs: =FILESYSTEM=btrfs= + +Encryption: +- No encryption: =NO_ENCRYPT=yes= +- ZFS encryption: =ZFS_PASSPHRASE=testpass= (no =NO_ENCRYPT=) +- LUKS encryption: =LUKS_PASSPHRASE=testpassphrase=, =TESTING=yes= (no =NO_ENCRYPT=) + +Multi-disk: +- Mirror: =RAID_LEVEL=mirror= +- RAIDZ1: =RAID_LEVEL=raidz1= (ZFS only, needs 3+ disks) +- Btrfs stripe: =RAID_LEVEL=stripe= + +** Step 2: Run the Test + +#+begin_example +./scripts/test-install.sh my-new-test +#+end_example + +The test runner automatically: +- Counts disks from =DISKS== to create QCOW2 images +- Detects encryption type from =LUKS_PASSPHRASE= / =ZFS_PASSPHRASE= +- Adds QEMU monitor socket when encryption is detected +- Dispatches to =send_luks_passphrase()= or =send_zfs_passphrase()= at reboot +- Runs =verify_install()=, =verify_reboot_survival()=, =verify_rollback()= + +** Step 3: If Encryption Needs a New Prompt Handler + +The encryption dispatch in =run_test()= uses =encrypt_flag=: + +#+begin_example +if [[ "$encrypt_flag" == "luks" ]]; then + send_luks_passphrase ... +elif [[ "$encrypt_flag" == "zfs" ]]; then + send_zfs_passphrase ... +fi +#+end_example + +To add a new encryption type: +1. Add detection logic that reads the config and sets =encrypt_flag= +2. Write a =send__passphrase()= function +3. Add an =elif= branch in the dispatch + +** Key Functions in test-install.sh + +| Function | Purpose | +|----------+---------| +| =monitor_sendkeys()= | Sends a string as QEMU sendkey commands (char-by-char + Enter) | +| =send_luks_passphrase()= | Detects GRUB prompt in serial, sends passphrase per disk | +| =send_zfs_passphrase()= | Timed delay (no serial), sends passphrase twice (ZBM + initramfs) | +| =start_vm_from_disk()= | Boots installed disk; adds monitor socket if encrypt mode is set | +| =verify_install()= | Checks filesystem, snapshots, encryption properties via SSH | +| =verify_reboot_survival()= | Checks pool/filesystem health after reboot | +| =verify_rollback()= | Creates file, snapshots, deletes, rolls back, verifies restore | + +** Debugging a Failing Test + +Serial log is saved to =test-logs/-reboot-serial.log= on failure. + +For VGA-only boot stages (ZFSBootMenu), take a screenshot via QEMU monitor: + +#+begin_example +echo "screendump /tmp/screen.ppm" | socat -t 2 - UNIX-CONNECT:vm/monitor-.sock +convert /tmp/screen.ppm /tmp/screen.png +#+end_example + +This requires keeping the VM alive (add debugging before =stop_vm= in the failure path). + +* Test Configurations + +** Btrfs Tests + +| Config | Disks | LUKS | Status | +|-------------------+-------+------+-------------------------| +| btrfs-single | 1 | No | Pass | +| btrfs-luks | 1 | Yes | Pass (with TESTING=yes) | +| btrfs-mirror | 2 | No | Pass | +| btrfs-stripe | 2 | No | Pass | +| btrfs-mirror-luks | 2 | Yes | Pass (with TESTING=yes) | + +** ZFS Tests + +| Config | Disks | Encryption | Status | +|--------------------+-------+------------+--------| +| single-disk | 1 | No | Pass | +| mirror | 2 | No | Pass | +| raidz1 | 3 | No | Pass | +| zfs-encrypt | 1 | Yes | Pass | +| zfs-mirror-encrypt | 2 | Yes | Pass | + +* ZFS Native Encryption Testing + +** The Challenge + +ZFS native encryption (=keylocation=prompt=) requires TWO passphrase prompts +at boot, similar to LUKS but from different components: + +1. *ZFSBootMenu prompt* - ZFSBootMenu must unlock the pool to enumerate boot environments +2. *Initramfs prompt* - mkinitcpio's =zfs= hook re-imports the pool after kexec + +** Key Difference from LUKS + +- *LUKS*: GRUB prompts once per encrypted disk, initramfs uses a keyfile (no prompt) +- *ZFS*: One prompt regardless of disk count (pool-level encryption), but TWO prompts + from different boot stages (ZFSBootMenu + initramfs) + +** Key Difference: Serial Console + +GRUB outputs to serial console, so its passphrase prompt is detectable in the serial +log. ZFSBootMenu renders entirely to the VGA framebuffer — its passphrase prompt +(and the initramfs prompt) never appear in serial output. + +The serial log only shows the UEFI firmware loading ZFSBootMenu: +#+begin_example +BdsDxe: starting Boot0009 "ZFSBootMenu" from HD(...) +EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path +#+end_example + +After that, nothing until the booted system's getty starts. + +** Implementation: Timed Sendkey + +Since prompt detection via serial is not possible, we use timed delays: + +1. Detect UEFI firmware log line (=starting.*ZFSBootMenu=) in serial +2. Wait 15s for ZFSBootMenu to initialize and display passphrase prompt +3. Send passphrase via QEMU monitor sendkey +4. Wait 30s for ZFSBootMenu to boot kernel and mkinitcpio to reach zfs hook +5. Send passphrase again via sendkey +6. Wait for SSH to become available + +** Why Not a Keyfile (Like LUKS)? + +For LUKS, we embed a keyfile in the initramfs to avoid the second prompt. For ZFS, +this would require changing =keylocation= from =prompt= to =file:///path/to/key= and +embedding the key in the initramfs — which tests a different code path than production. +The timed sendkey approach tests the actual production passphrase flow. + +** Files + +- =scripts/test-install.sh= - =send_zfs_passphrase()= function +- =scripts/test-configs/zfs-encrypt.conf= - Single disk, TESTING=yes +- =scripts/test-configs/zfs-mirror-encrypt.conf= - Mirror, TESTING=yes + +* References + +- Arch Wiki: dm-crypt/System configuration +- HashiCorp Discuss: LUKS Encryption Key on Initial Reboot +- GitHub: tylert/packer-build Issue #31 (LUKS unattended builds) -- cgit v1.2.3