aboutsummaryrefslogtreecommitdiff

Overview

This document describes the testing strategy for the archangel installer project, including automated VM testing and the rationale for key technical decisions.

Testing has two layers: fast bats unit tests for pure logic in installer/lib/*.sh, and slower VM integration tests that exercise the full install path against real block devices.

Running Tests

Makefile Targets

Target Description
make test-install Run all 12 automated install tests (builds ISO first)
make test-vm Boot ISO in a single-disk VM (interactive)
make test-multi Boot ISO in a 2-disk VM for mirror/RAID testing
make test-multi3 Boot ISO in a 3-disk VM for raidz1 testing
make test-boot Boot from installed disk (after running install in VM)
make test-clean Remove VM disks and OVMF vars, start fresh
make bats Run bats unit tests only (tests/unit/)
make lint Run shellcheck on all scripts
make test Run lint + bats (fast; no VMs)

Running a Single Automated Test

./scripts/test-install.sh zfs-encrypt

Running Multiple Specific Tests

./scripts/test-install.sh zfs-encrypt zfs-mirror-encrypt btrfs-luks

Listing Available Test Configs

./scripts/test-install.sh --list

Unit Tests (bats)

What bats covers

The bats test suite exercises pure logic in installer/lib/*.sh — helpers that can run without root, without block devices, and without a chroot. It runs in under a second and is the fast feedback loop for every commit.

Current coverage lives in tests/unit/:

File What it covers
test_common.bats command_exists, require_command, info=/=warn=/=error, enable_color, log, prompt_password, pacstrap_packages
test_config.bats parse_args, load_config, validate_config, check_config
test_raid.bats raid_valid_levels_for_count, raid_is_valid, raid_usable_bytes, raid_fault_tolerance

What bats does NOT cover (deliberately)

Anything that shells out to mkfs, cryptsetup, zpool create, pacstrap, arch-chroot, grub-install, or needs root. Those behaviors only mean anything against real partitions on real (virtual) hardware and belong in the VM integration tests below.

Running

make bats             # bats only
make test             # lint + bats (pre-commit check)
bats tests/unit/      # direct invocation, same result

Installing bats

sudo pacman -S bats   # Arch
brew install bats-core # macOS
apt install bats      # Debian/Ubuntu

The make bats target prints an install hint if bats isn't on PATH.

The pattern for adding coverage

When a refactor extracts pure logic out of a monolithic installer function into a lib/*.sh helper, add bats cases for the helper in the same commit. Don't try to write bats tests against the monolith directly — extract, then test. That's how raid.sh, pacstrap_packages, and prompt_password got covered.

Test Infrastructure

Test Scripts

  • scripts/test-install.sh - Main test runner
  • scripts/test-configs/ - Configuration files for different test scenarios

Test Flow

  1. Build ISO with ./build.sh
  2. Boot QEMU VM from ISO
  3. Run unattended installation via config file
  4. Verify installation (packages, services, filesystem)
  5. Reboot from installed disk (no ISO)
  6. Verify system survives reboot
  7. Test rollback functionality (ZFS and btrfs)

LUKS Encryption Testing

The Challenge

LUKS-encrypted systems require TWO passphrase prompts at boot:

  1. GRUB prompt - GRUB must decrypt /boot to read kernel/initramfs
  2. Initramfs prompt - encrypt hook must decrypt root to mount filesystem

This blocks automated testing because:

  • SSH is unavailable until after both decryptions complete
  • Both prompts require interactive passphrase entry

Options Evaluated

Option A: Put /boot on EFI partition for testing

Move /boot to the unencrypted EFI partition when TESTING=yes, so GRUB doesn't need to decrypt anything.

Rejected - Tests different code path than production. Bugs in GRUB cryptodisk setup would not be caught. "Testing something different than what ships defeats the purpose."

Option B: Accept limitation, enhance installation verification

Skip reboot tests for LUKS. Instead, verify configs before cleanup:

  • Check crypttab, grub.cfg, mkinitcpio.conf are correct
  • If configs are right, boot should work

Rejected - We already found bugs (empty grub.cfg from FAT32 sync) that only manifested at boot time. Config inspection wouldn't catch everything.

Option C: Hybrid approach (Chosen)

Use TWO mechanisms to handle the two prompts:

  1. GRUB prompt - QEMU monitor sendkey (timing is predictable)
  2. Initramfs prompt - Keyfile in initramfs (deterministic)

The GRUB countdown provides clear timing signal:

The highlighted entry will be executed automatically in 0s.
Booting 'Arch Linux'
Enter passphrase for hd0,gpt2:

We know exactly when the GRUB prompt appears. After sendkey handles GRUB, the keyfile handles initramfs automatically.

Why Option C

  • Tests actual production code path (critical requirement)
  • GRUB timing is predictable (countdown visible in serial)
  • Keyfile handles the harder timing problem (initramfs)
  • Only one sendkey interaction needed (GRUB prompt)

Implementation

GRUB Passphrase (sendkey)

  1. Change serial from file-based to real-time (socket or pty)
  2. Monitor for "Enter passphrase for" text after GRUB countdown
  3. Send passphrase via QEMU monitor: sendkey commands
  4. Send Enter key to submit

Initramfs Passphrase (keyfile)

When TESTING=yes is set in config:

  1. Generate random 2KB keyfile at /etc/cryptroot.key
  2. Add keyfile to LUKS slot 1 (passphrase remains in slot 0)
  3. Set keyfile permissions to 000
  4. Add keyfile to mkinitcpio FILES= array
  5. Configure crypttab to use keyfile instead of "none"
  6. Initramfs unlocks automatically (no prompt)

Security Mitigations

  • Test-only flag: Only activates when TESTING=yes
  • Separate key slot: Keyfile in slot 1, passphrase in slot 0
  • Random per-build: Fresh keyfile generated each installation
  • Never shipped: Keyfile only in test VMs, not in ISO
  • Restricted permissions: chmod 000 on keyfile

Files Modified

  • custom/lib/btrfs.sh - setuplukstestingkeyfile(), configurecrypttab(), configureluksinitramfs()
  • custom/archangel - Calls keyfile setup in LUKS flow
  • scripts/test-install.sh - sendkey for GRUB, real-time serial monitoring
  • scripts/test-configs/btrfs-luks.conf - TESTING=yes
  • scripts/test-configs/btrfs-mirror-luks.conf - TESTING=yes

Adding a New Test

Step 1: Create a Config File

Add a .conf file in scripts/test-configs/. First line should be a comment describing the test:

# Test config: Description of what this tests

Required fields:

  • HOSTNAME - unique per test
  • TIMEZONE, LOCALE, KEYMAP - use UTC, en_US.UTF-8, us for defaults
  • DISKS - /dev/vda for single, /dev/vda,/dev/vdb for 2-disk, etc.
  • ROOT_PASSWORD - needed for SSH into installed system
  • ENABLE_SSH - yes for full verification, no for console-only boot check

Filesystem choice:

  • ZFS (default): no FILESYSTEM needed, or set FILESYSTEM=zfs
  • Btrfs: FILESYSTEM=btrfs

Encryption:

  • No encryption: NO_ENCRYPT=yes
  • ZFS encryption: ZFS_PASSPHRASE=testpass (no NO_ENCRYPT)
  • LUKS encryption: LUKS_PASSPHRASE=testpassphrase, TESTING=yes (no NO_ENCRYPT)

Multi-disk:

  • Mirror: RAID_LEVEL=mirror
  • RAIDZ1: RAID_LEVEL=raidz1 (ZFS only, needs 3+ disks)
  • Btrfs stripe: RAID_LEVEL=stripe

Step 2: Run the Test

./scripts/test-install.sh my-new-test

The test runner automatically:

  • Counts disks from DISKS= to create QCOW2 images
  • Detects encryption type from LUKS_PASSPHRASE / ZFS_PASSPHRASE
  • Adds QEMU monitor socket when encryption is detected
  • Dispatches to send_luks_passphrase() or send_zfs_passphrase() at reboot
  • Runs verify_install(), verify_reboot_survival(), verify_rollback()

Step 3: If Encryption Needs a New Prompt Handler

The encryption dispatch in run_test() uses encrypt_flag:

if [[ "$encrypt_flag" == "luks" ]]; then
    send_luks_passphrase ...
elif [[ "$encrypt_flag" == "zfs" ]]; then
    send_zfs_passphrase ...
fi

To add a new encryption type:

  1. Add detection logic that reads the config and sets encrypt_flag
  2. Write a send_<type>_passphrase() function
  3. Add an elif branch in the dispatch

Key Functions in test-install.sh

Function Purpose
monitor_sendkeys() Sends a string as QEMU sendkey commands (char-by-char + Enter)
send_luks_passphrase() Detects GRUB prompt in serial, sends passphrase per disk
send_zfs_passphrase() Timed delay (no serial), sends passphrase twice (ZBM + initramfs)
start_vm_from_disk() Boots installed disk; adds monitor socket if encrypt mode is set
verify_install() Checks filesystem, snapshots, encryption properties via SSH
verify_reboot_survival() Checks pool/filesystem health after reboot
verify_rollback() Creates file, snapshots, deletes, rolls back, verifies restore

Debugging a Failing Test

Serial log is saved to test-logs/<name>-reboot-serial.log on failure.

For VGA-only boot stages (ZFSBootMenu), take a screenshot via QEMU monitor:

echo "screendump /tmp/screen.ppm" | socat -t 2 - UNIX-CONNECT:vm/monitor-<name>.sock
convert /tmp/screen.ppm /tmp/screen.png

This requires keeping the VM alive (add debugging before stop_vm in the failure path).

Test Configurations

Btrfs Tests

Config Disks LUKS Status
btrfs-single 1 No Pass
btrfs-luks 1 Yes Pass (with TESTING=yes)
btrfs-mirror 2 No Pass
btrfs-stripe 2 No Pass
btrfs-mirror-luks 2 Yes Pass (with TESTING=yes)

ZFS Tests

Config Disks Encryption Status
single-disk 1 No Pass
mirror 2 No Pass
raidz1 3 No Pass
zfs-encrypt 1 Yes Pass
zfs-mirror-encrypt 2 Yes Pass

ZFS Native Encryption Testing

The Challenge

ZFS native encryption (keylocation=prompt) requires TWO passphrase prompts at boot, similar to LUKS but from different components:

  1. ZFSBootMenu prompt - ZFSBootMenu must unlock the pool to enumerate boot environments
  2. Initramfs prompt - mkinitcpio's zfs hook re-imports the pool after kexec

Key Difference from LUKS

  • LUKS: GRUB prompts once per encrypted disk, initramfs uses a keyfile (no prompt)
  • ZFS: One prompt regardless of disk count (pool-level encryption), but TWO prompts from different boot stages (ZFSBootMenu + initramfs)

Key Difference: Serial Console

GRUB outputs to serial console, so its passphrase prompt is detectable in the serial log. ZFSBootMenu renders entirely to the VGA framebuffer — its passphrase prompt (and the initramfs prompt) never appear in serial output.

The serial log only shows the UEFI firmware loading ZFSBootMenu:

BdsDxe: starting Boot0009 "ZFSBootMenu" from HD(...)
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path

After that, nothing until the booted system's getty starts.

Implementation: Timed Sendkey

Since prompt detection via serial is not possible, we use timed delays:

  1. Detect UEFI firmware log line (starting.*ZFSBootMenu) in serial
  2. Wait 15s for ZFSBootMenu to initialize and display passphrase prompt
  3. Send passphrase via QEMU monitor sendkey
  4. Wait 30s for ZFSBootMenu to boot kernel and mkinitcpio to reach zfs hook
  5. Send passphrase again via sendkey
  6. Wait for SSH to become available

Why Not a Keyfile (Like LUKS)?

For LUKS, we embed a keyfile in the initramfs to avoid the second prompt. For ZFS, this would require changing keylocation from prompt to file:///path/to/key and embedding the key in the initramfs — which tests a different code path than production. The timed sendkey approach tests the actual production passphrase flow.

Files

  • scripts/test-install.sh - send_zfs_passphrase() function
  • scripts/test-configs/zfs-encrypt.conf - Single disk, TESTING=yes
  • scripts/test-configs/zfs-mirror-encrypt.conf - Mirror, TESTING=yes

References

  • Arch Wiki: dm-crypt/System configuration
  • HashiCorp Discuss: LUKS Encryption Key on Initial Reboot
  • GitHub: tylert/packer-build Issue #31 (LUKS unattended builds)