Overview
This document describes the testing strategy for the archangel installer project, including automated VM testing and the rationale for key technical decisions.
Testing has two layers: fast bats unit tests for pure logic in
installer/lib/*.sh, and slower VM integration tests that exercise the
full install path against real block devices.
Running Tests
Makefile Targets
| Target | Description |
|---|---|
make test-install |
Run all 12 automated install tests (builds ISO first) |
make test-vm |
Boot ISO in a single-disk VM (interactive) |
make test-multi |
Boot ISO in a 2-disk VM for mirror/RAID testing |
make test-multi3 |
Boot ISO in a 3-disk VM for raidz1 testing |
make test-boot |
Boot from installed disk (after running install in VM) |
make test-clean |
Remove VM disks and OVMF vars, start fresh |
make bats |
Run bats unit tests only (tests/unit/) |
make lint |
Run shellcheck on all scripts |
make test |
Run lint + bats (fast; no VMs) |
Running a Single Automated Test
./scripts/test-install.sh zfs-encryptRunning Multiple Specific Tests
./scripts/test-install.sh zfs-encrypt zfs-mirror-encrypt btrfs-luksListing Available Test Configs
./scripts/test-install.sh --listUnit Tests (bats)
What bats covers
The bats test suite exercises pure logic in installer/lib/*.sh — helpers
that can run without root, without block devices, and without a chroot.
It runs in under a second and is the fast feedback loop for every commit.
Current coverage lives in tests/unit/:
| File | What it covers |
|---|---|
test_common.bats |
command_exists, require_command, info=/=warn=/=error, enable_color, log, prompt_password, pacstrap_packages |
test_config.bats |
parse_args, load_config, validate_config, check_config |
test_raid.bats |
raid_valid_levels_for_count, raid_is_valid, raid_usable_bytes, raid_fault_tolerance |
What bats does NOT cover (deliberately)
Anything that shells out to mkfs, cryptsetup, zpool create,
pacstrap, arch-chroot, grub-install, or needs root. Those behaviors
only mean anything against real partitions on real (virtual) hardware and
belong in the VM integration tests below.
Running
make bats # bats only
make test # lint + bats (pre-commit check)
bats tests/unit/ # direct invocation, same resultInstalling bats
sudo pacman -S bats # Arch
brew install bats-core # macOS
apt install bats # Debian/UbuntuThe make bats target prints an install hint if bats isn't on PATH.
The pattern for adding coverage
When a refactor extracts pure logic out of a monolithic installer
function into a lib/*.sh helper, add bats cases for the helper in the
same commit. Don't try to write bats tests against the monolith
directly — extract, then test. That's how raid.sh, pacstrap_packages,
and prompt_password got covered.
Test Infrastructure
Test Scripts
scripts/test-install.sh- Main test runnerscripts/test-configs/- Configuration files for different test scenarios
Test Flow
- Build ISO with
./build.sh - Boot QEMU VM from ISO
- Run unattended installation via config file
- Verify installation (packages, services, filesystem)
- Reboot from installed disk (no ISO)
- Verify system survives reboot
- Test rollback functionality (ZFS and btrfs)
LUKS Encryption Testing
The Challenge
LUKS-encrypted systems require TWO passphrase prompts at boot:
- GRUB prompt - GRUB must decrypt /boot to read kernel/initramfs
- Initramfs prompt - encrypt hook must decrypt root to mount filesystem
This blocks automated testing because:
- SSH is unavailable until after both decryptions complete
- Both prompts require interactive passphrase entry
Options Evaluated
Option A: Put /boot on EFI partition for testing
Move /boot to the unencrypted EFI partition when TESTING=yes, so GRUB doesn't need to decrypt anything.
Rejected - Tests different code path than production. Bugs in GRUB cryptodisk setup would not be caught. "Testing something different than what ships defeats the purpose."
Option B: Accept limitation, enhance installation verification
Skip reboot tests for LUKS. Instead, verify configs before cleanup:
- Check crypttab, grub.cfg, mkinitcpio.conf are correct
- If configs are right, boot should work
Rejected - We already found bugs (empty grub.cfg from FAT32 sync) that only manifested at boot time. Config inspection wouldn't catch everything.
Option C: Hybrid approach (Chosen)
Use TWO mechanisms to handle the two prompts:
- GRUB prompt - QEMU monitor sendkey (timing is predictable)
- Initramfs prompt - Keyfile in initramfs (deterministic)
The GRUB countdown provides clear timing signal:
The highlighted entry will be executed automatically in 0s.
Booting 'Arch Linux'
Enter passphrase for hd0,gpt2:
We know exactly when the GRUB prompt appears. After sendkey handles GRUB, the keyfile handles initramfs automatically.
Why Option C
- Tests actual production code path (critical requirement)
- GRUB timing is predictable (countdown visible in serial)
- Keyfile handles the harder timing problem (initramfs)
- Only one sendkey interaction needed (GRUB prompt)
Implementation
GRUB Passphrase (sendkey)
- Change serial from file-based to real-time (socket or pty)
- Monitor for "Enter passphrase for" text after GRUB countdown
- Send passphrase via QEMU monitor:
sendkeycommands - Send Enter key to submit
Initramfs Passphrase (keyfile)
When TESTING=yes is set in config:
- Generate random 2KB keyfile at
/etc/cryptroot.key - Add keyfile to LUKS slot 1 (passphrase remains in slot 0)
- Set keyfile permissions to 000
- Add keyfile to mkinitcpio FILES= array
- Configure crypttab to use keyfile instead of "none"
- Initramfs unlocks automatically (no prompt)
Security Mitigations
- Test-only flag: Only activates when TESTING=yes
- Separate key slot: Keyfile in slot 1, passphrase in slot 0
- Random per-build: Fresh keyfile generated each installation
- Never shipped: Keyfile only in test VMs, not in ISO
- Restricted permissions: chmod 000 on keyfile
Files Modified
custom/lib/btrfs.sh- setuplukstestingkeyfile(), configurecrypttab(), configureluksinitramfs()custom/archangel- Calls keyfile setup in LUKS flowscripts/test-install.sh- sendkey for GRUB, real-time serial monitoringscripts/test-configs/btrfs-luks.conf- TESTING=yesscripts/test-configs/btrfs-mirror-luks.conf- TESTING=yes
Adding a New Test
Step 1: Create a Config File
Add a .conf file in scripts/test-configs/. First line should be a comment
describing the test:
# Test config: Description of what this tests
Required fields:
HOSTNAME- unique per testTIMEZONE,LOCALE,KEYMAP- useUTC,en_US.UTF-8,usfor defaultsDISKS-/dev/vdafor single,/dev/vda,/dev/vdbfor 2-disk, etc.ROOT_PASSWORD- needed for SSH into installed systemENABLE_SSH-yesfor full verification,nofor console-only boot check
Filesystem choice:
- ZFS (default): no
FILESYSTEMneeded, or setFILESYSTEM=zfs - Btrfs:
FILESYSTEM=btrfs
Encryption:
- No encryption:
NO_ENCRYPT=yes - ZFS encryption:
ZFS_PASSPHRASE=testpass(noNO_ENCRYPT) - LUKS encryption:
LUKS_PASSPHRASE=testpassphrase,TESTING=yes(noNO_ENCRYPT)
Multi-disk:
- Mirror:
RAID_LEVEL=mirror - RAIDZ1:
RAID_LEVEL=raidz1(ZFS only, needs 3+ disks) - Btrfs stripe:
RAID_LEVEL=stripe
Step 2: Run the Test
./scripts/test-install.sh my-new-test
The test runner automatically:
- Counts disks from
DISKS=to create QCOW2 images - Detects encryption type from
LUKS_PASSPHRASE/ZFS_PASSPHRASE - Adds QEMU monitor socket when encryption is detected
- Dispatches to
send_luks_passphrase()orsend_zfs_passphrase()at reboot - Runs
verify_install(),verify_reboot_survival(),verify_rollback()
Step 3: If Encryption Needs a New Prompt Handler
The encryption dispatch in run_test() uses encrypt_flag:
if [[ "$encrypt_flag" == "luks" ]]; then
send_luks_passphrase ...
elif [[ "$encrypt_flag" == "zfs" ]]; then
send_zfs_passphrase ...
fi
To add a new encryption type:
- Add detection logic that reads the config and sets
encrypt_flag - Write a
send_<type>_passphrase()function - Add an
elifbranch in the dispatch
Key Functions in test-install.sh
| Function | Purpose |
|---|---|
monitor_sendkeys() |
Sends a string as QEMU sendkey commands (char-by-char + Enter) |
send_luks_passphrase() |
Detects GRUB prompt in serial, sends passphrase per disk |
send_zfs_passphrase() |
Timed delay (no serial), sends passphrase twice (ZBM + initramfs) |
start_vm_from_disk() |
Boots installed disk; adds monitor socket if encrypt mode is set |
verify_install() |
Checks filesystem, snapshots, encryption properties via SSH |
verify_reboot_survival() |
Checks pool/filesystem health after reboot |
verify_rollback() |
Creates file, snapshots, deletes, rolls back, verifies restore |
Debugging a Failing Test
Serial log is saved to test-logs/<name>-reboot-serial.log on failure.
For VGA-only boot stages (ZFSBootMenu), take a screenshot via QEMU monitor:
echo "screendump /tmp/screen.ppm" | socat -t 2 - UNIX-CONNECT:vm/monitor-<name>.sock
convert /tmp/screen.ppm /tmp/screen.png
This requires keeping the VM alive (add debugging before stop_vm in the failure path).
Test Configurations
Btrfs Tests
| Config | Disks | LUKS | Status |
|---|---|---|---|
| btrfs-single | 1 | No | Pass |
| btrfs-luks | 1 | Yes | Pass (with TESTING=yes) |
| btrfs-mirror | 2 | No | Pass |
| btrfs-stripe | 2 | No | Pass |
| btrfs-mirror-luks | 2 | Yes | Pass (with TESTING=yes) |
ZFS Tests
| Config | Disks | Encryption | Status |
|---|---|---|---|
| single-disk | 1 | No | Pass |
| mirror | 2 | No | Pass |
| raidz1 | 3 | No | Pass |
| zfs-encrypt | 1 | Yes | Pass |
| zfs-mirror-encrypt | 2 | Yes | Pass |
ZFS Native Encryption Testing
The Challenge
ZFS native encryption (keylocation=prompt) requires TWO passphrase prompts
at boot, similar to LUKS but from different components:
- ZFSBootMenu prompt - ZFSBootMenu must unlock the pool to enumerate boot environments
- Initramfs prompt - mkinitcpio's
zfshook re-imports the pool after kexec
Key Difference from LUKS
- LUKS: GRUB prompts once per encrypted disk, initramfs uses a keyfile (no prompt)
- ZFS: One prompt regardless of disk count (pool-level encryption), but TWO prompts from different boot stages (ZFSBootMenu + initramfs)
Key Difference: Serial Console
GRUB outputs to serial console, so its passphrase prompt is detectable in the serial log. ZFSBootMenu renders entirely to the VGA framebuffer — its passphrase prompt (and the initramfs prompt) never appear in serial output.
The serial log only shows the UEFI firmware loading ZFSBootMenu:
BdsDxe: starting Boot0009 "ZFSBootMenu" from HD(...)
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
After that, nothing until the booted system's getty starts.
Implementation: Timed Sendkey
Since prompt detection via serial is not possible, we use timed delays:
- Detect UEFI firmware log line (
starting.*ZFSBootMenu) in serial - Wait 15s for ZFSBootMenu to initialize and display passphrase prompt
- Send passphrase via QEMU monitor sendkey
- Wait 30s for ZFSBootMenu to boot kernel and mkinitcpio to reach zfs hook
- Send passphrase again via sendkey
- Wait for SSH to become available
Why Not a Keyfile (Like LUKS)?
For LUKS, we embed a keyfile in the initramfs to avoid the second prompt. For ZFS,
this would require changing keylocation from prompt to file:///path/to/key and
embedding the key in the initramfs — which tests a different code path than production.
The timed sendkey approach tests the actual production passphrase flow.
Files
scripts/test-install.sh-send_zfs_passphrase()functionscripts/test-configs/zfs-encrypt.conf- Single disk, TESTING=yesscripts/test-configs/zfs-mirror-encrypt.conf- Mirror, TESTING=yes
References
- Arch Wiki: dm-crypt/System configuration
- HashiCorp Discuss: LUKS Encryption Key on Initial Reboot
- GitHub: tylert/packer-build Issue #31 (LUKS unattended builds)
