archangel - Arch Linux installer ISO — ZFS-on-root or BTRFS, doubles as rescue disk

	Commit message (Collapse)	Author	Age	Files	Lines
*	fix(installer): scope AUR list to filesystem, keep pacman.conf 0644	Craig Jennings	2026-06-27	1	-0/+70
\| \| \| \| \| \| \| \|	The baked AUR set installed unconditionally, so zfs-auto-snapshot reached every target. On a btrfs install there's no zfs to satisfy its dependency, and pacstrap aborted the whole transaction. The ISO still bakes the full set. install_base now filters the manifest names through filter_aur_for_fs, dropping zfs-only tooling (zfs-auto-snapshot, zrepl) on a non-zfs target. strip_repo_stanza mv'd a 0600 mktemp file onto the target, so a clean install shipped /etc/pacman.conf root-only and every user-level makepkg/yay failed to read it. It now truncate-writes through the existing file, preserving the pristine 0644. Tested in test_common.bats.
*	refactor(installer): extract parse_btrfs_subvol_opts helper	Craig Jennings	2026-06-23	1	-0/+35
\| \| \| \|	mount_btrfs_subvolumes and generate_btrfs_fstab each carried an identical block that composed a subvolume's mount options from BTRFS_OPTS plus the per-subvol extra flags. The two could drift out of sync. Extracted the logic into parse_btrfs_subvol_opts (pure string transform), preserving the exact behavior, and called it from both. Added bats cases covering the default, compress=no, nodatacow, nosuid, and combined paths.
*	fix(installer): RAID validation, set -e fix, drop dead shadow branch	Craig Jennings	2026-06-23	1	-0/+58
\| \| \| \| \| \| \| \|	Two installer cleanups from the todo backlog. validate_config now rejects a RAID_LEVEL the selected disk count can't support, guarding the unattended path (the interactive path already constrains the choice). While adding it I found a latent bug: the error loop's ((errors++)) returned 0 on the first error and tripped set -e in the monolith's `[[ UNATTENDED == true ]] && validate_config` call, aborting after one warning instead of listing every problem. Switched to pre-increment so the count accumulates as designed. Added four bats cases, including one that runs validate_config under set -e outside bats' run shield. build.sh dropped the dead shadow-file rebuild else-branch. The profile is always copied fresh from releng (which ships /etc/shadow), so the branch never ran, and its hardcoded account list had drifted from what releng provides. Replaced with an assertion that fails the build loudly if the file is ever missing.
*	fix(build): drop sanoid from the baked AUR set	Craig Jennings	2026-06-17	1	-4/+7
\| \| \| \| \| \|	sanoid depends on perl-config-inifiles, which is AUR-only. makepkg -s can't resolve it from the official repos, so the build aborts before it produces an ISO. The 2026-06-09 dependency gate checked AUR-RPC existence rather than the official sync dbs, so it wrongly classified perl-config-inifiles as official. A full build caught it. sanoid joins paru and mkinitcpio-firmware as AUR-of-AUR packages deferred to the vNext dependency-resolution work. The v1 baked set is now eight packages. Updated the tests and README to match.
*	feat(install): install baked AUR packages and clean the target config	Craig Jennings	2026-06-09	1	-0/+152
\| \| \| \| \| \| \| \| \| \|	Wire the baked AUR repo into the installer. Before pacstrap, install_base checks whether the ISO shipped the repo and, if so, exposes [aur] in the live /etc/pacman.conf and reads the package names from the manifest, adding them to the pacstrap set so they install into the target offline. This mirrors the existing [archzfs] handling. pacstrap resolves repos from the live system, not $MNTPOINT. The live config already carries [aur] from the shipped ISO config, so the append is idempotent by design. A --skip-aur ISO ships no repo, and aur_repo_available gates the whole path, so the installer still works there. configure_system strips any [aur] stanza from the target /etc/pacman.conf. pacstrap installs a stock target config with no [aur], so this is defensive, but it guarantees the installed system never references /usr/share/aur-packages, which exists only on the live ISO. Four new common.sh helpers carry the logic: aur_repo_available, append_aur_repo (idempotent), aur_manifest_names (the manifest is the source of what to install, so the list never drifts), and strip_repo_stanza. All four covered across Normal, Boundary, and Error.
*	feat(build): inject the AUR repo into the profile and live ISO	Craig Jennings	2026-06-09	1	-0/+20
\| \| \| \| \| \| \| \| \| \|	Wire build-aur.sh into build.sh. After the pacoloco block, build the AUR repo and append a build-host [aur] stanza to profile/pacman.conf with an absolute file:// Server, so mkarchiso installs the baked packages into airootfs. The stanza lands after the pacoloco rewrite so its file:// path isn't redirected to localhost. Add the audited official extra packages and the baked AUR names to packages.x86_64, both sourced from build-aur.sh so the list never drifts from the build array. Ship the repo into airootfs and write a complete live /etc/pacman.conf: the pristine releng config with [aur] appended, not an [aur]-only file, since this replaces the live system's stock config and an AUR-only one would strip the official repos. Copy the manifest beside the ISO in out/. --skip-aur skips the build, the stanza, the AUR names, and the live config. The three injection points also guard on the repo dir existing, so the documented empty-set path can't point mkarchiso at a missing repo. Moved BUILD_LOG creation ahead of the AUR build so its output is captured too. A unit test reproduces the live-config construction and asserts core, extra, the mirrorlist, and [aur] all survive. The end-to-end proof that mkarchiso installs from the build-host repo needs a real root build and is tracked as manual verification.
*	feat(build): add AUR local-repo build helpers	Craig Jennings	2026-06-09	1	-0/+206
\| \| \| \| \| \| \| \|	Add build-aur.sh, sourced by build.sh, that builds the v1 genuine-AUR set into a local pacman repo and emits an auditable manifest. The pure helpers carry the testable surface: the package sets (one source of truth for the build array and the package-list append), the [aur] stanza renderer, the TSV manifest header/row, the package-file locator, the staged repo replacement, and the build-environment preflight. makepkg refuses to run as root, so the orchestrator drops to $SUDO_USER for the clone and build. It stages on the same filesystem and swaps in with mv -T on full success, so a failure ships no repo and leaves no stale one. On any failure error() names the package, the phase, and the log path. The orchestrator and manifest-append need root, network, and makepkg, so they stay out of bats and are covered by the build integration test and the manual checklist instead. Eighteen unit tests cover the pure helpers across Normal, Boundary, and Error.
*	test: cover disk_in_use and network_available failure paths	Craig Jennings	2026-05-23	2	-0/+74
\| \| \| \| \| \|	These two boundary functions backed the pre-flight guards from #215 but had no unit coverage of their own. The VM harness exercised them instead. I added 7 bats tests that mock the system commands they query, so the real branching logic runs. test_disk.bats covers disk_in_use across mountpoint, active swap, imported-zpool member, and idle — that's the gate that refuses to wipe an already-mounted disk. test_archangel.bats covers network_available for DNS failure, TCP-connect failure, and success, the check that fails the install before pacstrap. The /proc/mdstat-positive branch and the live probes stay in the VM harness, since neither drives cleanly without writing to /proc or hitting the network. Suite 238 to 245, lint clean.
*	fix(build): clear stale archzfs from the pacoloco cache too	Craig Jennings	2026-05-22	1	-0/+34
\| \| \| \| \| \|	archzfs re-uploads its GitHub release assets under the same filename, so pacoloco keeps serving a zfs-dkms/zfs-utils it cached earlier while pacman fetches a fresh archzfs.db with a new checksum. The two mismatch and pacstrap aborts with "invalid or corrupted package." build.sh already drops the stale packages from the host pacman cache, but it never cleared the pacoloco layer, which the VM test installs route through too, so test-install.sh kept hitting the corruption (four times in one session). build.sh runs as root, so it now clears /var/cache/pacoloco/pkgs/archzfs/zfs-* alongside the host cache, which makes the build-then-test flow self-healing. The pacoloco cache is root-owned and test-install.sh runs as the user, so it can't clear it unattended. Instead, test-install.sh now recognizes the corruption (is_archzfs_cache_corruption) and prints how to clear it, the way it already names the SSH_PORT override on a port collision. A retry alone won't help since it hits the same cached file, so this fails fast with the hint rather than retrying.
*	fix(test): fail clearly when the VM forward port is taken	Craig Jennings	2026-05-22	1	-0/+31
\| \| \| \| \| \|	A test run launched qemu without first checking the SSH forward port, so a collision with another VM already holding it surfaced only as an opaque "Failed to start VM," with qemu unable to bind and no hint why. I added a port_in_use check in run_test before the launch: it errors with the port number and the SSH_PORT override to set, records the failure, and moves on. The check lives in run_test, not start_vm, because start_vm runs in a command substitution (vm_pid=$(start_vm ...)) where this harness's non-exiting error() would be captured as the PID instead of failing the run. The pure half, port_listening_in, takes an `ss -tln` snapshot as a string so it's unit-testable.
*	test: make SSH_PORT overridable in test-install.sh	Craig Jennings	2026-05-22	1	-0/+21
\| \| \| \|	The port was hardcoded, so a test run collided with any other VM already forwarding 2222. It now defaults to 2222, so existing invocations are unchanged. SSH_PORT=2223 scripts/test-install.sh picks a free port to run alongside another VM.
*	feat(install): add pre-flight environment and disk-target validation	Craig Jennings	2026-05-22	3	-0/+224
\| \| \| \| \| \| \| \| \| \|	archangel went straight from filesystem selection into a destructive install behind only a root check and a ZFS module load. A missing tool, a BIOS boot, a too-small or in-use disk, or a dead network surfaced as a confusing abort partway through, sometimes after partitioning had already run. Two gates now fail fast. validate_environment runs after filesystem selection, before any disk is touched: it confirms UEFI boot mode and that every required command is present, with the list coming from a new required_commands helper built like pacstrap_packages. validate_install_targets runs after disk selection, before the first wipe: it refuses a target that's mounted, holds active swap, or belongs to an imported pool or md array, rejects disks under 20 GB, and confirms a mirror is reachable via DNS plus a TCP probe (no ICMP, since some networks drop it). I folded the install_failure_cleanup hardening into the same change. It now falls back to lazy unmounts, so a pacstrap-interrupted target with busy bind mounts still releases the pool and unmounts the EFI partition. Without that, the disk-in-use guard would block the very retry the cleanup exists to enable. "Re-run to retry" only holds if the disk is genuinely freed first. The 20 GB floor is decimal on purpose. It reads as the natural minimum and clears a 20 GiB disk image with headroom instead of sitting on the boundary.
*	feat(test): retry pacstrap through transient mirror flakes	Craig Jennings	2026-05-20	1	-0/+216
\| \| \| \| \| \| \| \|	test-install.sh aborts a whole 5-minute VM run when pacstrap hits a transient mirror blip, and the suite reports a failure indistinguishable from a real install regression. run_test now retries the install up to twice, but only when the in-VM log shows both pacstrap's "Failed to install packages to new root" marker and a download/network indicator. A deterministic failure like "target not found" carries the marker without a network indicator, so it still fails fast. archangel's failure trap exports the pool and unmounts on abort, so each retry re-partitions and re-pacstraps from a clean state. Wiring the predicate up needed a source-guard so bats can source the harness, which had none. With that in place I unit-covered the pure helpers — is_transient_install_failure, char_to_qemu_key, get_disk_count, get_disk_args — and lifted char_to_qemu_key out of monitor_sendkeys so the QEMU keymap is testable on its own. The keymap test found a dead branch. The backslash case pattern was '\\', which never matches a lone backslash because bash matches one against '\', so a passphrase containing a backslash would have sent an invalid QEMU keyname instead of "backslash". No test passphrase uses one, so it never bit. I fixed the pattern.
*	refactor: extract validate_encryption_passphrase from gather_input	Craig Jennings	2026-05-19	1	-0/+39
\| \| \| \| \| \| \| \| \| \|	gather_input's unattended branch had two parallel if-blocks, one for ZFS and one for Btrfs, each doing the same encryption-passphrase empty check against a filesystem-specific variable (ZFS_PASSPHRASE or LUKS_PASSPHRASE). The two blocks shared the condition surface and error template. Only the variable name differed. I lifted the check into validate_encryption_passphrase in lib/config.sh next to validate_filesystem. The helper takes the variable name and uses indirect expansion (${!var_name}) so one function covers both filesystems. gather_input now dispatches via if/elif on FILESYSTEM and calls the helper with the right variable, collapsing 14 lines to 6. The original tests in test_archangel.bats (gather_input errors when ZFS without ZFS_PASSPHRASE / when Btrfs without LUKS_PASSPHRASE / accepts ZFS with NO_ENCRYPT=yes) still pass, exercising the helper through the dispatch. Added 4 direct unit tests in test_config.bats covering the four cases: NO_ENCRYPT=yes passes regardless, NO_ENCRYPT=no with empty fails, NO_ENCRYPT=no with value passes, and the error message names the offending variable. Bats: 177 → 181. No behavior change. The helper preserves the original error message format and exit conditions.
*	refactor: lift FILES= keyfile sed to ensure_initramfs_files helper	Craig Jennings	2026-05-19	1	-0/+47
\| \| \| \| \| \| \| \|	btrfs.sh's configure_btrfs_initramfs had a six-line inline that ensured mkinitcpio.conf's FILES= line listed the LUKS keyfile: sed-replace the existing FILES= line, then grep + append as a fallback when no FILES= line existed. The pattern is mkinitcpio-specific and self-healing rather than error-on-miss (FILES= is optional in mkinitcpio.conf, so missing means "no extra files," not a broken config). I lifted the block into ensure_initramfs_files in lib/common.sh next to prepend_grub_cmdline_linux, then collapsed the btrfs.sh call site to a single ensure_initramfs_files line. Added three bats tests for the three cases (FILES= present and empty, FILES= present with a different value, FILES= absent entirely). Bats: 174 → 177. No behavior change. The helper's logic matches the inline byte-for-byte: same sed pattern, same grep fallback, same final state.
*	refactor: wire validate_config into the unattended install path	Craig Jennings	2026-05-19	1	-54/+9
\| \| \| \| \| \| \| \| \| \|	validate_config in lib/config.sh was unreachable from main(). Its empty-field checks duplicated four lines in gather_input's unattended branch. validate_config also has two checks gather_input doesn't: that every entry in SELECTED_DISKS is a real block device, and that TIMEZONE exists under /usr/share/zoneinfo. Neither check ever ran. A config with a typo'd disk path slipped past gather_input and surfaced as an obscure sgdisk error inside the destructive partitioning step. I wired validate_config into main() after validate_filesystem, gated on UNATTENDED so it only runs against an already-loaded config. I dropped the four duplicate empty-field checks from gather_input's unattended branch. The filesystem-specific passphrase checks stay there because they're coupled to the FILESYSTEM branch logic. validate_config reports every missing field at once instead of dying on the first. A config with five missing fields tells you all five in one pass. I removed the four corresponding gather_input bats tests. validate_config's existing unit tests in test_config.bats already cover their assertions. Bats: 178 → 174.
*	feat: add --name flag to zfssnapshot rollback and delete	Craig Jennings	2026-05-14	1	-0/+133
\| \| \| \| \| \|	I added --name NAME to rollback (single name) and --name NAME[,NAME...] to delete (comma-separated for multi-select) so scripted callers can drive the wrapper without fzf. The upcoming VM verification step in scripts/test-install.sh needs this. fzf is now conditional, required only when --name is omitted. The 10 new bats tests cover help-text mentions, parse success and failure modes (missing value, mutex with -s, unknown flag), fzf-bypass on both subcommands, and multi-name expansion on delete.
*	feat: consolidate zfssnapshot and zfsrollback into one subcommand-driven script	Craig Jennings	2026-04-27	1	-0/+153
\| \| \| \| \| \| \| \| \| \| \| \|	Problem: zfssnapshot and zfsrollback were two separate scripts with overlapping pre-flight checks (zfs / fzf / root) and parallel UX patterns (description sanitization in one, fzf selection in the other). Users had to remember which script was for which operation, and a "list" view meant typing the raw `zfs list -t snapshot` command. There was no path to destroy individual snapshots short of `zfs destroy` directly, which is dangerous without a confirmation flow. Solution: rewrite zfssnapshot as a single multi-subcommand script (list, create, rollback, delete). Drop installer/zfsrollback. The new script uses a source-guard at the bottom (`if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then main "$@"; fi`) so bats can source it without triggering the install-time pre-flight checks, matching the pattern in installer/archangel. Pure helpers (sanitize_description, validate_description, format_snapshot_name) get extracted as named functions so they're testable in isolation. The destructive flows (rollback, delete) keep the explicit "yes" confirmation prompt, the genesis-snapshot warning, and the recursive-rollback-destroys-newer-snapshots warning. Delete uses fzf --multi so the user can pick several snapshot names at once. Updated build.sh to copy only the consolidated script. Dropped the zfsrollback profiledef permission line. Updated Makefile, README, scripts/sanity-test.sh, and testing-strategy.org to reflect the single-script layout. Bats: 147 → 168 (+21). Coverage spans sanitize_description (normal / boundary / error), validate_description (alphanumerics, hyphens, underscores accepted; spaces, slashes, shell metacharacters, empty rejected), format_snapshot_name (timestamp + description composition), and main subcommand dispatch (list / create / rollback / delete / help / unknown). Lint clean. The zfs-, fzf-, and arch-chroot-shelling subcommand bodies stay VM-tested per testing-strategy.org.
*	refactor: extract MNTPOINT constant for the install chroot mount point	Craig Jennings	2026-04-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Last on the tech-debt drain. The installer hardcoded /mnt at 50+ sites: pacstrap, arch-chroot, mount/umount, fstab writes, and every host-side write into the chroot's /etc, /usr, /var, /boot, /tmp. Same magic-string smell as /mnt/efi but at much larger scale. Add MNTPOINT="/mnt" to lib/common.sh next to EFI_DIR. Replace literal /mnt/... with $MNTPOINT/... across installer/archangel, installer/lib/btrfs.sh, and installer/lib/common.sh. Replace bare /mnt (mount target, arch-chroot root, umount target, install_dropin parameter) with $MNTPOINT. EFI_DIR's own definition becomes EFI_DIR="$MNTPOINT/efi" for the natural composition. Folded in the related ticket: /mnt${chroot_efi_dir} in btrfs.sh:install_grub_all_efi becomes ${MNTPOINT}${chroot_efi_dir}. Was filed as a separate item but the ticket said it should ship with the MNTPOINT extraction, since the composition pattern is unusual and easy to miss in a global sed. Three /mnt references kept literal in comments where the comment describes the string concept rather than the mount point ("Remove /mnt prefix - config is used inside chroot where root is /", etc.). Substituting to $MNTPOINT in those comments would obscure the documentation. Bats: 146 → 147. One new test in test_common.bats pins MNTPOINT="/mnt". Lint clean (one shellcheck SC2295 warning fixed by quoting the parameter expansion: ${isp_firmware#"$MNTPOINT"}). VM verification deferred to a single full make test-install run after all three tech-debt commits land.
*	refactor: verify GRUB_CMDLINE_LINUX seds via prepend_grub_cmdline_linux helper	Craig Jennings	2026-04-27	1	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \|	Audited the ~10 silent sed -i sites in the installer against the verification-after pattern that landed for sshd_config last session. Triaged each by failure mode. The two GRUB_CMDLINE_LINUX seds in lib/btrfs.sh have a real silent-failure risk. If /etc/default/grub is missing or malformed and the sed pattern doesn't match, nothing happens. The kernel boots without cryptdevice=. The system can't unlock LUKS at boot. Added prepend_grub_cmdline_linux to lib/common.sh. Same shape as enable_sshd_root_login (sed, then grep, then error if the line wasn't modified). Replaced the two inline seds with helper calls. The HOOKS= seds in installer/archangel and lib/btrfs.sh (six total) don't need verification. A missing HOOKS= line makes mkinitcpio -P fail loudly downstream, so silent-replace failure can't reach a booted system. Added a one-line audit-rationale comment at each of the three locations so the next reader doesn't re-litigate the decision. The FILES= sed at lib/btrfs.sh:213 already self-heals via a sed-then-grep-then-append pattern, so no behavior change there. Filed a separate follow-up to lift that pattern into a named helper for clarity. Bats: 142 → 146. Four new tests in test_common.bats cover normal (empty cmdline, existing cmdline preserved, other lines preserved) and error (missing GRUB_CMDLINE_LINUX line). Lint clean.
*	refactor: consolidate installer defaults and FILESYSTEM validation into ↵	Craig Jennings	2026-04-27	2	-39/+55
\| \| \| \| \| \| \| \| \| \| \| \|	config.sh The installer had three sites touching FILESYSTEM: a top-level default in the monolith, a re-default block in gather_input, and a runtime validation block also in gather_input. The same scattering existed for LOCALE, KEYMAP, ENABLE_SSH, and NO_ENCRYPT. A future contributor changing one site wouldn't have known the other two existed. Move all five defaults into the lib/config.sh declarations so config.sh is the single source of truth. Add validate_filesystem() in lib/config.sh and call it from main() between check_config and gather_input, so a typo in a config file's FILESYSTEM= fails fast before any install action runs. The behavior change is stricter. An empty FILESYSTEM in a config file used to be silently defaulted to zfs, now it errors. Interactive mode is unaffected. select_filesystem still controls the value and already errored on cancellation. Bats: 140 → 142. Five tests added in test_config.bats for the defaults pinning and validate_filesystem coverage. Three removed from test_archangel.bats for behavior that moved out of gather_input. Lint clean.
*	refactor: collapse sshd_config seds into enable_sshd_root_login	Craig Jennings	2026-04-26	1	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \|	The two sed -i invocations in configure_ssh worked on stock Arch sshd_config but had a real silent-failure mode. If neither the commented (#PermitRootLogin) nor the uncommented form was present, both seds did nothing and the install shipped without root SSH. The user discovered it at first ssh attempt, not at install time. The second sed was also redundant. By the time it ran, the first sed had produced a line matching the second sed's pattern. The new enable_sshd_root_login helper in lib/common.sh combines both substitutions into one sed -i -e ..., then verifies PermitRootLogin yes is present in the file. If the verification fails, it calls error rather than silently appending. Silent appending would mask a corrupted starting file, which is exactly the failure mode worth flagging loudly. The helper takes the config path as an argument so the bats tests in commit 7486abb can run unprivileged against tempfiles. configure_ssh passes /mnt/etc/ssh/sshd_config and is now a single call instead of two seds. Verified: bats 135 → 140 (+5 covering normal/boundary/error). Lint clean. Helper smoke-tested against current Arch sshd_config. The loud-error path can't be exercised against the live default but is covered by the bats error case. Filed as a follow-up :techdebt: item: ~10 other sed -i sites in installer/archangel and lib/btrfs.sh follow the same silent-replace pattern. The FILES= site for LUKS is the worst (silent failure means LUKS prompts on every boot). Triage each per this same recipe in a future session.
*	refactor: extract EFI_DIR constant for the install-time EFI mount point	Craig Jennings	2026-04-26	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	The literal /mnt/efi appeared at 17 sites across installer/archangel and installer/lib/btrfs.sh. Renaming it (or pointing tests at a different mount) meant touching every site and risking incomplete sweeps. One canonical name in installer/lib/common.sh now backs every reference. EFI_DIR has no trailing slash so the three expansion patterns in the codebase compose cleanly. Bare ($EFI_DIR), sub-path ($EFI_DIR/EFI/ZBM), and the index-suffix used by install_grub_all_efi for secondary EFI mounts (${EFI_DIR}${i}). The sync_efi_partitions staging path also moves from the literal /mnt/efi_sync to ${EFI_DIR}_sync, so it follows EFI_DIR if anyone ever changes the base. Two follow-ups filed as separate :techdebt: items. MNTPOINT=/mnt extraction across the 50+ /mnt/... sites (pacstrap, arch-chroot, fstab writes), and the related /mnt${chroot_efi_dir} composition pattern at btrfs.sh:681-682. Both ship together when MNTPOINT lands. Verified: bats 134 → 135 (+1 pinning EFI_DIR=/mnt/efi). Lint clean. All four expansion patterns smoke-tested at runtime and produce the original literal byte-for-byte. VM run skipped, pure constant substitution with zero behavior change.
*	refactor: unify partition_disks across ZFS and Btrfs install paths	Craig Jennings	2026-04-26	1	-44/+146
\| \| \| \| \| \| \| \| \| \| \| \|	The monolith's partition_disks() at installer/archangel was ZFS-only and silently shadowed lib/disk.sh:partition_disks(), which had been dead code since the Btrfs install path was added. install_btrfs was assembling partitioning manually via partition_disk (singular) plus a separate format_efi_partitions call. Two parallel implementations meant fixes had to land in two places and the lib version drifted with no visible warning. The unified partition_disks now lives in lib/disk.sh. It reads SELECTED_DISKS, dispatches the per-disk layout on FILESYSTEM (BF00 for ZFS, 8300 for Btrfs), populates EFI_PARTS + ROOT_PARTS, and formats each EFI partition with EFI0, EFI1, ... labels. Folded in two pre-existing divergences while consolidating. wipefs -af now runs on every disk, not just the ZFS path. The Btrfs path was missing this defense against non-GPT signatures (LVM, mdadm, ext) that sgdisk --zap-all alone won't touch. EFI labels standardized on EFI0, EFI1, ... across both paths. The lib version was producing asymmetric EFI / EFI2 labels. No consumer in the repo references the labels after format, so that change is cosmetic. ZFS_PARTS renamed to ROOT_PARTS for symmetry with get_root_partition. Deleted four orphaned helpers: format_efi, format_efi_partitions (only caller was the collapsed install_btrfs section), get_efi_partitions, get_root_partitions (test-only callers after install_btrfs simplified). Verified end to end: bats 134/134, make test-install passing 12/12 configs across both install paths.
*	fix: clean up partial installs via ERR/INT/TERM trap	Craig Jennings	2026-04-26	1	-0/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Failed installs left the system in an inconsistent state: /mnt mounted, the zpool imported, possibly LUKS containers open. The user had to manually unmount, export the pool, and clean up partitions before re-running the installer. The existing trap at the top of archangel was `trap 'error "Installation interrupted!"' INT TERM`. It just printed a message and exited. There was no ERR trap, so `set -e`-aborted commands ran no cleanup either. I added `install_failure_cleanup` next to the existing `cleanup`. It captures `$?` first, disarms the trap to prevent recursion, clears sensitive variables (ROOT_PASSWORD, ZFS_PASSPHRASE, LUKS_PASSPHRASE), and dispatches on FILESYSTEM: - ZFS path: unmount /mnt/efi, recursive umount /mnt, export the pool (with `-f` fallback) if it's still imported. - Btrfs path: unmount /mnt/efi, call the existing btrfs_cleanup and btrfs_close_encryption helpers. All cleanup steps swallow their own errors with `\|\| true` since partial state is expected when this fires mid-install. `install_zfs` and `install_btrfs` now both arm the trap as their first action and disarm it just before the success-path cleanup. The disarm matters because the success cleanup calls `zpool export` (or btrfs_close_encryption) directly, and those can produce non-zero exit codes that we don't want to interpret as "installation failed". The note in notes.org described this as "install_zfs has no mid-step recovery." The framing was off. Both paths were exposed: install_btrfs's `btrfs_cleanup` only runs on the success path, same as `cleanup` for ZFS. Both paths now have the same recovery shape. Added 4 bats tests for `install_failure_cleanup` that mock the system tools (umount, zpool, btrfs_cleanup, btrfs_close_encryption) via function override and track invocations through a CALLS array. Array assignment isn't affected by the production code's `>/dev/null 2>&1` redirects on `zpool list`, so we capture the call regardless of where the mock's stdout would have gone. Verified end-to-end on the dev box: sourced archangel, set FILESYSTEM=zfs, armed the trap, ran `false` to trigger `set -e`. The trap fired with exit code 1, dispatched to the ZFS cleanup path, called `umount /mnt/efi` and `umount -R /mnt`, checked `zpool list` (returned non-zero since no pool exists on the dev box), skipped the export, and exited via `error`. No behavior change on the success path. The existing `cleanup` and `btrfs_cleanup` stay unchanged.
*	test: expand bats coverage across installer modules	Craig Jennings	2026-04-26	4	-0/+449
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added unit tests for `disk.sh`, `btrfs.sh`, the archangel monolith's `gather_input` unattended branch, and filled gap cases in `config.sh`. The suite grew from 71 to 110 tests. `installer/lib/disk.sh` was completely uncovered. New `tests/unit/test_disk.bats` covers the four pure partition-path helpers (`get_efi_partition`, `get_root_partition`, `get_efi_partitions`, `get_root_partitions`) across SATA, virtio, and NVMe inputs, mixed arrays, and the empty-input behavior. Side-effecting functions in the same file (sgdisk, mkfs.fat, partprobe, and fzf wrappers) stay deliberately VM-tested. `installer/lib/btrfs.sh` had no bats coverage. New `tests/unit/test_btrfs.bats` covers `get_luks_devices`, the only pure helper in the file. It pins the asymmetric naming convention where the first device gets the bare `LUKS_MAPPER_NAME` and subsequent devices append the index. The archangel monolith was un-source-able for tests because its top-level code created a /tmp log file and redirected stdout via `exec > >(tee...)`, plus called `main "$@"` unconditionally at the bottom. I extracted the logging setup into an `init_logging` function called from `main`, and wrapped the main call in a `[[ "${BASH_SOURCE[0]}" == "${0}" ]]` guard. Sourcing the script now loads function definitions silently, with no log file and no banner. Running it directly works exactly as before. Verified both paths. That refactor unlocks `tests/unit/test_archangel.bats`, which covers `gather_input` in unattended mode. Required-field validation for HOSTNAME, TIMEZONE, ROOT_PASSWORD, and DISKS. Optional-field defaulting (FILESYSTEM to zfs, LOCALE to en_US.UTF-8, KEYMAP to us, ENABLE_SSH to yes). Filesystem-specific encryption checks (ZFS_PASSPHRASE required when not NO_ENCRYPT, same for LUKS_PASSPHRASE on Btrfs). Filesystem validity. RAID_LEVEL defaulting for multi-disk installs. The interactive branch stays out of scope per the testing-strategy policy. `tests/unit/test_config.bats` got five gap tests: `check_config` when CONFIG_FILE is set, `validate_config` against a non-block-device entry (e.g. /dev/null) and a missing path, and `parse_args` accepting `--color` and `--config-file` together in either order. `testing-strategy.org` got an expanded "What bats does NOT cover" section. The doc previously named six tools (mkfs, cryptsetup, zpool create, pacstrap, arch-chroot, grub-install). The new list adds sgdisk, partprobe, blkid, mkfs.fat, mkfs.btrfs, snapper, efibootmgr, mount, umount, findmnt, mountpoint, and fzf. It also names the conditions (root needed, real /dev or /sys state) that make a function VM-only. The coverage table at the top now lists the three new test files. No behavior change in production code. The init_logging extraction preserves the existing log path and banner format byte-for-byte.
*	refactor: extract get_raid_level fzf preview into raid.sh helper	Craig Jennings	2026-04-26	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	`get_raid_level()` carried a 98-line inline fzf `--preview` shell snippet that contained five nearly-parallel `case` branches emitting per-level preview text, plus three exported variables (`RAID_DISK_COUNT`, `RAID_TOTAL_GB`, `RAID_SMALLEST_GB`) just to pass values into the fzf preview subshell. The inline shell-in-shell had no syntax highlighting, no shellcheck on the inner snippet, and any edit to preview copy meant editing inside a single-quoted argument. I extracted the per-level text into a new `raid_preview(level, disk_count, total_gb, smallest_gb)` helper in `lib/raid.sh`. It reuses the existing `raid_fault_tolerance` and `raid_usable_bytes` primitives for the data lines instead of redoing the arithmetic inline. That keeps the math in one place. The fzf `--preview` argument is now a one-liner that calls `raid_preview` with the sizing values, and the env-var exports are gone. `export -f raid_preview raid_fault_tolerance raid_usable_bytes` makes the functions visible in fzf's preview subshell. I verified this against a fresh `bash -c` subshell, which is what fzf spawns internally. `get_raid_level()` shrinks from 144 to 49 lines. Preview text is now bats-tested. Added 8 unit tests across the 5 RAID levels (headline, fault-tolerance line, computed usable space), mixed-size handling for mirror (smallest disk, not average), unknown level returning 1 with empty output, and a sanity loop confirming every valid level produces non-empty output. No behavior change. The preview pane shows the same text, the same level options, the same selected output. The pure logic in `lib/raid.sh` is unchanged.
*	fix: fail fast on missing ZFSBootMenu efibootmgr entry	Craig Jennings	2026-04-26	1	-0/+78
\| \| \| \| \| \| \| \| \| \|	The post-bootloader boot-order step in `configure_zfsbootmenu` parsed `efibootmgr` output through a `grep \| head \| grep -oP` chain with no null guards. If any link returned empty (the entry wasn't created, the label was different, or efibootmgr itself failed), the surrounding `if [[ -n "$bootnum" ]]` silently skipped, the install reported success, and the user rebooted into a machine that wouldn't boot ZFSBootMenu by default. I replaced the chain with two pure helpers in `lib/common.sh`, `parse_efibootmgr_entry` and `parse_efibootmgr_bootorder`. The caller in `archangel` invokes them with explicit `\|\| error` guards on each parse stage. The helpers capture `efibootmgr` output once and reuse it (it was called twice before). The same hardening covers the BootOrder lookup at the adjacent line. It used to rely on the now-removed `bootnum` guard for safety. The helpers are stdin-driven and use bash regex, so they're easy to test in bats without exercising the real efibootmgr binary. Added 9 unit tests across normal cases, hex-character boot numbers, multi-match selection, missing label, missing BootOrder line, empty input, and an empty label argument. The empty-label case would otherwise falsely match `BootCurrent` via the hex regex, capturing "C". The helper now guards it explicitly. Verified manually against real efibootmgr output (GRUB entry at Boot0001, BootOrder 0006,0001,2001,2002,2003). Both helpers parsed correctly. VM integration not re-run for this small post-bootloader change. The next scheduled `make test-install` exercises the green path.
*	feat: PrivateTmp=yes drop-in for systemd-tmpfiles on ZFS-root	Craig Jennings	2026-04-21	1	-0/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On ZFS-on-root, statx() across sibling services' /var/tmp/systemd-private-*/tmp mounts returns errno 132 (ENOTNAM). This produces 10-30 journal errors per boot and causes systemd-tmpfiles-clean.service to fail every periodic run (exit 73 / CANTCREAT). Running tmpfiles inside its own mount namespace avoids traversing sibling private-tmp paths. install_zfs() now calls configure_tmpfiles_private_tmp() between configure_zfs_tools and sync_efi_partitions, so the genesis snapshot captures the drop-ins. Btrfs path is untouched — errno 132 is ZFS-specific. The drop-in file-writing is factored into install_dropin() in lib/common.sh (service, name, root; body from stdin). Six bats tests exercise path, content, directory permissions, idempotent overwrite, empty content, and special-character preservation. Full root-cause write-up and verification steps in docs/zfs-tmpfiles-private-tmp-fix.md.
*	refactor: merge install_base and install_base_btrfs	Craig Jennings	2026-04-13	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extract the pacstrap package list into pacstrap_packages(filesystem) in lib/common.sh (common + filesystem-specific). install_base() now dispatches on FILESYSTEM for both the archzfs-repo-append and the package list. install_base_btrfs() deleted; install_btrfs() call site updated to invoke install_base. Old: 49 + 38 lines of ~95% copy-paste. New: 32 lines + a 20-line pure helper. 7 bats tests cover: zfs has zfs-dkms/zfs-utils, btrfs has btrfs-progs + grub + grub-btrfs + snapper + snap-pac, each flavor excludes the other's specifics, common packages are in both, unknown filesystem returns status 1, output is one-per-line. make test: 65/65.
*	refactor: unify get_{luks,zfs}_passphrase and get_root_password	Craig Jennings	2026-04-13	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extract the prompt/confirm/min-length loop into prompt_password() in lib/common.sh using a nameref for the output variable, so UI output stays on the terminal (no command-substitution capture) and the three callers collapse from ~30 lines each to a single helper call. - get_luks_passphrase() — min 8 chars - get_zfs_passphrase() — min 8 chars - get_root_password() — no min (was unchecked before; preserved) 5 bats tests added: match+min-ok path, length-retry loop, mismatch-retry loop, min_len=0 disables check, empty passphrase when min_len=0. make test: 58/58.
*	refactor: extract pure RAID logic to lib/raid.sh with bats coverage	Craig Jennings	2026-04-12	1	-0/+199
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Peel the testable pieces of get_raid_level() out of the 1600-line installer monolith into installer/lib/raid.sh: - raid_valid_levels_for_count(count) — replaces the inline option-list builder in get_raid_level() - raid_is_valid(level, count) — useful for unattended-config validation - raid_usable_bytes(level, count, smallest, total) — usable-space math - raid_fault_tolerance(level, count) — max tolerable disk failures archangel now sources lib/raid.sh and uses raid_valid_levels_for_count for the fzf option list. Fzf preview subshell still inlines its own usable-bytes arithmetic (calling exported lib functions across preview subshells is fragile; left for a later pass). 30 bats tests in tests/unit/test_raid.bats cover the full enumeration table, every valid/invalid level-vs-count combo from 2 to 5 disks, mixed-size mirror, and unknown-level error paths. make test: 53/53.
*	test: add bats unit tests for common.sh and config.sh	Craig Jennings	2026-04-12	2	-0/+199
	23 bats tests covering the pure logic in installer/lib/common.sh (command_exists, require_command, info/warn/error, enable_color, require_root, log) and installer/lib/config.sh (parse_args, load_config, validate_config, check_config). Makefile adds a 'bats' target; 'test' now runs lint + bats (VM integration tests remain under test-install).