archangel - Arch Linux installer ISO — ZFS-on-root or BTRFS, doubles as rescue disk

	Commit message (Collapse)	Author	Age	Files	Lines
*	test: cover disk_in_use and network_available failure paths	Craig Jennings	2026-05-23	1	-0/+29
\| \| \| \| \| \|	These two boundary functions backed the pre-flight guards from #215 but had no unit coverage of their own. The VM harness exercised them instead. I added 7 bats tests that mock the system commands they query, so the real branching logic runs. test_disk.bats covers disk_in_use across mountpoint, active swap, imported-zpool member, and idle — that's the gate that refuses to wipe an already-mounted disk. test_archangel.bats covers network_available for DNS failure, TCP-connect failure, and success, the check that fails the install before pacstrap. The /proc/mdstat-positive branch and the live probes stay in the VM harness, since neither drives cleanly without writing to /proc or hitting the network. Suite 238 to 245, lint clean.
*	feat(install): add pre-flight environment and disk-target validation	Craig Jennings	2026-05-22	1	-0/+133
\| \| \| \| \| \| \| \| \| \|	archangel went straight from filesystem selection into a destructive install behind only a root check and a ZFS module load. A missing tool, a BIOS boot, a too-small or in-use disk, or a dead network surfaced as a confusing abort partway through, sometimes after partitioning had already run. Two gates now fail fast. validate_environment runs after filesystem selection, before any disk is touched: it confirms UEFI boot mode and that every required command is present, with the list coming from a new required_commands helper built like pacstrap_packages. validate_install_targets runs after disk selection, before the first wipe: it refuses a target that's mounted, holds active swap, or belongs to an imported pool or md array, rejects disks under 20 GB, and confirms a mirror is reachable via DNS plus a TCP probe (no ICMP, since some networks drop it). I folded the install_failure_cleanup hardening into the same change. It now falls back to lazy unmounts, so a pacstrap-interrupted target with busy bind mounts still releases the pool and unmounts the EFI partition. Without that, the disk-in-use guard would block the very retry the cleanup exists to enable. "Re-run to retry" only holds if the disk is genuinely freed first. The 20 GB floor is decimal on purpose. It reads as the natural minimum and clears a 20 GiB disk image with headroom instead of sitting on the boundary.
*	refactor: wire validate_config into the unattended install path	Craig Jennings	2026-05-19	1	-54/+9
\| \| \| \| \| \| \| \| \| \|	validate_config in lib/config.sh was unreachable from main(). Its empty-field checks duplicated four lines in gather_input's unattended branch. validate_config also has two checks gather_input doesn't: that every entry in SELECTED_DISKS is a real block device, and that TIMEZONE exists under /usr/share/zoneinfo. Neither check ever ran. A config with a typo'd disk path slipped past gather_input and surfaced as an obscure sgdisk error inside the destructive partitioning step. I wired validate_config into main() after validate_filesystem, gated on UNATTENDED so it only runs against an already-loaded config. I dropped the four duplicate empty-field checks from gather_input's unattended branch. The filesystem-specific passphrase checks stay there because they're coupled to the FILESYSTEM branch logic. validate_config reports every missing field at once instead of dying on the first. A config with five missing fields tells you all five in one pass. I removed the four corresponding gather_input bats tests. validate_config's existing unit tests in test_config.bats already cover their assertions. Bats: 178 → 174.
*	refactor: consolidate installer defaults and FILESYSTEM validation into ↵	Craig Jennings	2026-04-27	1	-39/+6
\| \| \| \| \| \| \| \| \| \| \| \|	config.sh The installer had three sites touching FILESYSTEM: a top-level default in the monolith, a re-default block in gather_input, and a runtime validation block also in gather_input. The same scattering existed for LOCALE, KEYMAP, ENABLE_SSH, and NO_ENCRYPT. A future contributor changing one site wouldn't have known the other two existed. Move all five defaults into the lib/config.sh declarations so config.sh is the single source of truth. Add validate_filesystem() in lib/config.sh and call it from main() between check_config and gather_input, so a typo in a config file's FILESYSTEM= fails fast before any install action runs. The behavior change is stricter. An empty FILESYSTEM in a config file used to be silently defaulted to zfs, now it errors. Interactive mode is unaffected. select_filesystem still controls the value and already errored on cancellation. Bats: 140 → 142. Five tests added in test_config.bats for the defaults pinning and validate_filesystem coverage. Three removed from test_archangel.bats for behavior that moved out of gather_input. Lint clean.
*	fix: clean up partial installs via ERR/INT/TERM trap	Craig Jennings	2026-04-26	1	-0/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Failed installs left the system in an inconsistent state: /mnt mounted, the zpool imported, possibly LUKS containers open. The user had to manually unmount, export the pool, and clean up partitions before re-running the installer. The existing trap at the top of archangel was `trap 'error "Installation interrupted!"' INT TERM`. It just printed a message and exited. There was no ERR trap, so `set -e`-aborted commands ran no cleanup either. I added `install_failure_cleanup` next to the existing `cleanup`. It captures `$?` first, disarms the trap to prevent recursion, clears sensitive variables (ROOT_PASSWORD, ZFS_PASSPHRASE, LUKS_PASSPHRASE), and dispatches on FILESYSTEM: - ZFS path: unmount /mnt/efi, recursive umount /mnt, export the pool (with `-f` fallback) if it's still imported. - Btrfs path: unmount /mnt/efi, call the existing btrfs_cleanup and btrfs_close_encryption helpers. All cleanup steps swallow their own errors with `\|\| true` since partial state is expected when this fires mid-install. `install_zfs` and `install_btrfs` now both arm the trap as their first action and disarm it just before the success-path cleanup. The disarm matters because the success cleanup calls `zpool export` (or btrfs_close_encryption) directly, and those can produce non-zero exit codes that we don't want to interpret as "installation failed". The note in notes.org described this as "install_zfs has no mid-step recovery." The framing was off. Both paths were exposed: install_btrfs's `btrfs_cleanup` only runs on the success path, same as `cleanup` for ZFS. Both paths now have the same recovery shape. Added 4 bats tests for `install_failure_cleanup` that mock the system tools (umount, zpool, btrfs_cleanup, btrfs_close_encryption) via function override and track invocations through a CALLS array. Array assignment isn't affected by the production code's `>/dev/null 2>&1` redirects on `zpool list`, so we capture the call regardless of where the mock's stdout would have gone. Verified end-to-end on the dev box: sourced archangel, set FILESYSTEM=zfs, armed the trap, ran `false` to trigger `set -e`. The trap fired with exit code 1, dispatched to the ZFS cleanup path, called `umount /mnt/efi` and `umount -R /mnt`, checked `zpool list` (returned non-zero since no pool exists on the dev box), skipped the export, and exited via `error`. No behavior change on the success path. The existing `cleanup` and `btrfs_cleanup` stay unchanged.
*	test: expand bats coverage across installer modules	Craig Jennings	2026-04-26	1	-0/+214
	Added unit tests for `disk.sh`, `btrfs.sh`, the archangel monolith's `gather_input` unattended branch, and filled gap cases in `config.sh`. The suite grew from 71 to 110 tests. `installer/lib/disk.sh` was completely uncovered. New `tests/unit/test_disk.bats` covers the four pure partition-path helpers (`get_efi_partition`, `get_root_partition`, `get_efi_partitions`, `get_root_partitions`) across SATA, virtio, and NVMe inputs, mixed arrays, and the empty-input behavior. Side-effecting functions in the same file (sgdisk, mkfs.fat, partprobe, and fzf wrappers) stay deliberately VM-tested. `installer/lib/btrfs.sh` had no bats coverage. New `tests/unit/test_btrfs.bats` covers `get_luks_devices`, the only pure helper in the file. It pins the asymmetric naming convention where the first device gets the bare `LUKS_MAPPER_NAME` and subsequent devices append the index. The archangel monolith was un-source-able for tests because its top-level code created a /tmp log file and redirected stdout via `exec > >(tee...)`, plus called `main "$@"` unconditionally at the bottom. I extracted the logging setup into an `init_logging` function called from `main`, and wrapped the main call in a `[[ "${BASH_SOURCE[0]}" == "${0}" ]]` guard. Sourcing the script now loads function definitions silently, with no log file and no banner. Running it directly works exactly as before. Verified both paths. That refactor unlocks `tests/unit/test_archangel.bats`, which covers `gather_input` in unattended mode. Required-field validation for HOSTNAME, TIMEZONE, ROOT_PASSWORD, and DISKS. Optional-field defaulting (FILESYSTEM to zfs, LOCALE to en_US.UTF-8, KEYMAP to us, ENABLE_SSH to yes). Filesystem-specific encryption checks (ZFS_PASSPHRASE required when not NO_ENCRYPT, same for LUKS_PASSPHRASE on Btrfs). Filesystem validity. RAID_LEVEL defaulting for multi-disk installs. The interactive branch stays out of scope per the testing-strategy policy. `tests/unit/test_config.bats` got five gap tests: `check_config` when CONFIG_FILE is set, `validate_config` against a non-block-device entry (e.g. /dev/null) and a missing path, and `parse_args` accepting `--color` and `--config-file` together in either order. `testing-strategy.org` got an expanded "What bats does NOT cover" section. The doc previously named six tools (mkfs, cryptsetup, zpool create, pacstrap, arch-chroot, grub-install). The new list adds sgdisk, partprobe, blkid, mkfs.fat, mkfs.btrfs, snapper, efibootmgr, mount, umount, findmnt, mountpoint, and fzf. It also names the conditions (root needed, real /dev or /sys state) that make a function VM-only. The coverage table at the top now lists the three new test files. No behavior change in production code. The init_logging extraction preserves the existing log path and banner format byte-for-byte.