aboutsummaryrefslogtreecommitdiff
path: root/testing-strategy.org
diff options
context:
space:
mode:
Diffstat (limited to 'testing-strategy.org')
-rw-r--r--testing-strategy.org287
1 files changed, 287 insertions, 0 deletions
diff --git a/testing-strategy.org b/testing-strategy.org
new file mode 100644
index 0000000..3119917
--- /dev/null
+++ b/testing-strategy.org
@@ -0,0 +1,287 @@
+#+TITLE: Testing Strategy
+#+AUTHOR: Craig Jennings
+#+DATE: 2026-01-25
+
+* Overview
+
+This document describes the testing strategy for the archzfs installer project,
+including automated VM testing and the rationale for key technical decisions.
+
+* Test Infrastructure
+
+** Test Scripts
+
+- =scripts/test-install.sh= - Main test runner
+- =scripts/test-configs/= - Configuration files for different test scenarios
+
+** Test Flow
+
+1. Build ISO with =./build.sh=
+2. Boot QEMU VM from ISO
+3. Run unattended installation via config file
+4. Verify installation (packages, services, filesystem)
+5. Reboot from installed disk (no ISO)
+6. Verify system survives reboot
+7. Test rollback functionality (ZFS and btrfs)
+
+* LUKS Encryption Testing
+
+** The Challenge
+
+LUKS-encrypted systems require TWO passphrase prompts at boot:
+
+1. *GRUB prompt* - GRUB must decrypt /boot to read kernel/initramfs
+2. *Initramfs prompt* - encrypt hook must decrypt root to mount filesystem
+
+This blocks automated testing because:
+- SSH is unavailable until after both decryptions complete
+- Both prompts require interactive passphrase entry
+
+** Options Evaluated
+
+*** Option A: Put /boot on EFI partition for testing
+
+Move /boot to the unencrypted EFI partition when TESTING=yes, so GRUB
+doesn't need to decrypt anything.
+
+*Rejected* - Tests different code path than production. Bugs in GRUB
+cryptodisk setup would not be caught. "Testing something different than
+what ships defeats the purpose."
+
+*** Option B: Accept limitation, enhance installation verification
+
+Skip reboot tests for LUKS. Instead, verify configs before cleanup:
+- Check crypttab, grub.cfg, mkinitcpio.conf are correct
+- If configs are right, boot should work
+
+*Rejected* - We already found bugs (empty grub.cfg from FAT32 sync) that
+only manifested at boot time. Config inspection wouldn't catch everything.
+
+*** Option C: Hybrid approach (Chosen)
+
+Use TWO mechanisms to handle the two prompts:
+
+1. *GRUB prompt* - QEMU monitor sendkey (timing is predictable)
+2. *Initramfs prompt* - Keyfile in initramfs (deterministic)
+
+The GRUB countdown provides clear timing signal:
+#+begin_example
+The highlighted entry will be executed automatically in 0s.
+Booting 'Arch Linux'
+Enter passphrase for hd0,gpt2:
+#+end_example
+
+We know exactly when the GRUB prompt appears. After sendkey handles GRUB,
+the keyfile handles initramfs automatically.
+
+** Why Option C
+
+- Tests actual production code path (critical requirement)
+- GRUB timing is predictable (countdown visible in serial)
+- Keyfile handles the harder timing problem (initramfs)
+- Only one sendkey interaction needed (GRUB prompt)
+
+** Implementation
+
+*** GRUB Passphrase (sendkey)
+
+1. Change serial from file-based to real-time (socket or pty)
+2. Monitor for "Enter passphrase for" text after GRUB countdown
+3. Send passphrase via QEMU monitor: =sendkey= commands
+4. Send Enter key to submit
+
+*** Initramfs Passphrase (keyfile)
+
+When =TESTING=yes= is set in config:
+
+1. Generate random 2KB keyfile at =/etc/cryptroot.key=
+2. Add keyfile to LUKS slot 1 (passphrase remains in slot 0)
+3. Set keyfile permissions to 000
+4. Add keyfile to mkinitcpio FILES= array
+5. Configure crypttab to use keyfile instead of "none"
+6. Initramfs unlocks automatically (no prompt)
+
+** Security Mitigations
+
+- Test-only flag: Only activates when TESTING=yes
+- Separate key slot: Keyfile in slot 1, passphrase in slot 0
+- Random per-build: Fresh keyfile generated each installation
+- Never shipped: Keyfile only in test VMs, not in ISO
+- Restricted permissions: chmod 000 on keyfile
+
+** Files Modified
+
+- =custom/lib/btrfs.sh= - setup_luks_testing_keyfile(), configure_crypttab(), configure_luks_initramfs()
+- =custom/archangel= - Calls keyfile setup in LUKS flow
+- =scripts/test-install.sh= - sendkey for GRUB, real-time serial monitoring
+- =scripts/test-configs/btrfs-luks.conf= - TESTING=yes
+- =scripts/test-configs/btrfs-mirror-luks.conf= - TESTING=yes
+
+* Adding a New Test
+
+** Step 1: Create a Config File
+
+Add a =.conf= file in =scripts/test-configs/=. First line should be a comment
+describing the test:
+
+#+begin_example
+# Test config: Description of what this tests
+#+end_example
+
+Required fields:
+- =HOSTNAME= - unique per test
+- =TIMEZONE=, =LOCALE=, =KEYMAP= - use =UTC=, =en_US.UTF-8=, =us= for defaults
+- =DISKS= - =/dev/vda= for single, =/dev/vda,/dev/vdb= for 2-disk, etc.
+- =ROOT_PASSWORD= - needed for SSH into installed system
+- =ENABLE_SSH= - =yes= for full verification, =no= for console-only boot check
+
+Filesystem choice:
+- ZFS (default): no =FILESYSTEM= needed, or set =FILESYSTEM=zfs=
+- Btrfs: =FILESYSTEM=btrfs=
+
+Encryption:
+- No encryption: =NO_ENCRYPT=yes=
+- ZFS encryption: =ZFS_PASSPHRASE=testpass= (no =NO_ENCRYPT=)
+- LUKS encryption: =LUKS_PASSPHRASE=testpassphrase=, =TESTING=yes= (no =NO_ENCRYPT=)
+
+Multi-disk:
+- Mirror: =RAID_LEVEL=mirror=
+- RAIDZ1: =RAID_LEVEL=raidz1= (ZFS only, needs 3+ disks)
+- Btrfs stripe: =RAID_LEVEL=stripe=
+
+** Step 2: Run the Test
+
+#+begin_example
+./scripts/test-install.sh my-new-test
+#+end_example
+
+The test runner automatically:
+- Counts disks from =DISKS== to create QCOW2 images
+- Detects encryption type from =LUKS_PASSPHRASE= / =ZFS_PASSPHRASE=
+- Adds QEMU monitor socket when encryption is detected
+- Dispatches to =send_luks_passphrase()= or =send_zfs_passphrase()= at reboot
+- Runs =verify_install()=, =verify_reboot_survival()=, =verify_rollback()=
+
+** Step 3: If Encryption Needs a New Prompt Handler
+
+The encryption dispatch in =run_test()= uses =encrypt_flag=:
+
+#+begin_example
+if [[ "$encrypt_flag" == "luks" ]]; then
+ send_luks_passphrase ...
+elif [[ "$encrypt_flag" == "zfs" ]]; then
+ send_zfs_passphrase ...
+fi
+#+end_example
+
+To add a new encryption type:
+1. Add detection logic that reads the config and sets =encrypt_flag=
+2. Write a =send_<type>_passphrase()= function
+3. Add an =elif= branch in the dispatch
+
+** Key Functions in test-install.sh
+
+| Function | Purpose |
+|----------+---------|
+| =monitor_sendkeys()= | Sends a string as QEMU sendkey commands (char-by-char + Enter) |
+| =send_luks_passphrase()= | Detects GRUB prompt in serial, sends passphrase per disk |
+| =send_zfs_passphrase()= | Timed delay (no serial), sends passphrase twice (ZBM + initramfs) |
+| =start_vm_from_disk()= | Boots installed disk; adds monitor socket if encrypt mode is set |
+| =verify_install()= | Checks filesystem, snapshots, encryption properties via SSH |
+| =verify_reboot_survival()= | Checks pool/filesystem health after reboot |
+| =verify_rollback()= | Creates file, snapshots, deletes, rolls back, verifies restore |
+
+** Debugging a Failing Test
+
+Serial log is saved to =test-logs/<name>-reboot-serial.log= on failure.
+
+For VGA-only boot stages (ZFSBootMenu), take a screenshot via QEMU monitor:
+
+#+begin_example
+echo "screendump /tmp/screen.ppm" | socat -t 2 - UNIX-CONNECT:vm/monitor-<name>.sock
+convert /tmp/screen.ppm /tmp/screen.png
+#+end_example
+
+This requires keeping the VM alive (add debugging before =stop_vm= in the failure path).
+
+* Test Configurations
+
+** Btrfs Tests
+
+| Config | Disks | LUKS | Status |
+|-------------------+-------+------+-------------------------|
+| btrfs-single | 1 | No | Pass |
+| btrfs-luks | 1 | Yes | Pass (with TESTING=yes) |
+| btrfs-mirror | 2 | No | Pass |
+| btrfs-stripe | 2 | No | Pass |
+| btrfs-mirror-luks | 2 | Yes | Pass (with TESTING=yes) |
+
+** ZFS Tests
+
+| Config | Disks | Encryption | Status |
+|--------------------+-------+------------+--------|
+| single-disk | 1 | No | Pass |
+| mirror | 2 | No | Pass |
+| raidz1 | 3 | No | Pass |
+| zfs-encrypt | 1 | Yes | Pass |
+| zfs-mirror-encrypt | 2 | Yes | Pass |
+
+* ZFS Native Encryption Testing
+
+** The Challenge
+
+ZFS native encryption (=keylocation=prompt=) requires TWO passphrase prompts
+at boot, similar to LUKS but from different components:
+
+1. *ZFSBootMenu prompt* - ZFSBootMenu must unlock the pool to enumerate boot environments
+2. *Initramfs prompt* - mkinitcpio's =zfs= hook re-imports the pool after kexec
+
+** Key Difference from LUKS
+
+- *LUKS*: GRUB prompts once per encrypted disk, initramfs uses a keyfile (no prompt)
+- *ZFS*: One prompt regardless of disk count (pool-level encryption), but TWO prompts
+ from different boot stages (ZFSBootMenu + initramfs)
+
+** Key Difference: Serial Console
+
+GRUB outputs to serial console, so its passphrase prompt is detectable in the serial
+log. ZFSBootMenu renders entirely to the VGA framebuffer — its passphrase prompt
+(and the initramfs prompt) never appear in serial output.
+
+The serial log only shows the UEFI firmware loading ZFSBootMenu:
+#+begin_example
+BdsDxe: starting Boot0009 "ZFSBootMenu" from HD(...)
+EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
+#+end_example
+
+After that, nothing until the booted system's getty starts.
+
+** Implementation: Timed Sendkey
+
+Since prompt detection via serial is not possible, we use timed delays:
+
+1. Detect UEFI firmware log line (=starting.*ZFSBootMenu=) in serial
+2. Wait 15s for ZFSBootMenu to initialize and display passphrase prompt
+3. Send passphrase via QEMU monitor sendkey
+4. Wait 30s for ZFSBootMenu to boot kernel and mkinitcpio to reach zfs hook
+5. Send passphrase again via sendkey
+6. Wait for SSH to become available
+
+** Why Not a Keyfile (Like LUKS)?
+
+For LUKS, we embed a keyfile in the initramfs to avoid the second prompt. For ZFS,
+this would require changing =keylocation= from =prompt= to =file:///path/to/key= and
+embedding the key in the initramfs — which tests a different code path than production.
+The timed sendkey approach tests the actual production passphrase flow.
+
+** Files
+
+- =scripts/test-install.sh= - =send_zfs_passphrase()= function
+- =scripts/test-configs/zfs-encrypt.conf= - Single disk, TESTING=yes
+- =scripts/test-configs/zfs-mirror-encrypt.conf= - Mirror, TESTING=yes
+
+* References
+
+- Arch Wiki: dm-crypt/System configuration
+- HashiCorp Discuss: LUKS Encryption Key on Initial Reboot
+- GitHub: tylert/packer-build Issue #31 (LUKS unattended builds)