<feed xmlns='http://www.w3.org/2005/Atom'>
<title>archangel/tests/unit/test_archangel.bats, branch main</title>
<subtitle>Arch Linux installer ISO — ZFS-on-root or BTRFS, doubles as rescue disk
</subtitle>
<id>https://git.cjennings.net/archangel/atom?h=main</id>
<link rel='self' href='https://git.cjennings.net/archangel/atom?h=main'/>
<link rel='alternate' type='text/html' href='https://git.cjennings.net/archangel/'/>
<updated>2026-05-23T09:08:53+00:00</updated>
<entry>
<title>test: cover disk_in_use and network_available failure paths</title>
<updated>2026-05-23T09:08:53+00:00</updated>
<author>
<name>Craig Jennings</name>
<email>c@cjennings.net</email>
</author>
<published>2026-05-23T09:08:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.cjennings.net/archangel/commit/?id=98fc424f7edb26314ffe124d3bc24549146a06d5'/>
<id>urn:sha1:98fc424f7edb26314ffe124d3bc24549146a06d5</id>
<content type='text'>
These two boundary functions backed the pre-flight guards from #215 but had no unit coverage of their own. The VM harness exercised them instead. I added 7 bats tests that mock the system commands they query, so the real branching logic runs.

test_disk.bats covers disk_in_use across mountpoint, active swap, imported-zpool member, and idle — that's the gate that refuses to wipe an already-mounted disk. test_archangel.bats covers network_available for DNS failure, TCP-connect failure, and success, the check that fails the install before pacstrap. The /proc/mdstat-positive branch and the live probes stay in the VM harness, since neither drives cleanly without writing to /proc or hitting the network. Suite 238 to 245, lint clean.
</content>
</entry>
<entry>
<title>feat(install): add pre-flight environment and disk-target validation</title>
<updated>2026-05-22T23:03:40+00:00</updated>
<author>
<name>Craig Jennings</name>
<email>c@cjennings.net</email>
</author>
<published>2026-05-22T23:03:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.cjennings.net/archangel/commit/?id=b6525a50fabf3aedf41eee70c164519b00d27704'/>
<id>urn:sha1:b6525a50fabf3aedf41eee70c164519b00d27704</id>
<content type='text'>
archangel went straight from filesystem selection into a destructive install behind only a root check and a ZFS module load. A missing tool, a BIOS boot, a too-small or in-use disk, or a dead network surfaced as a confusing abort partway through, sometimes after partitioning had already run.

Two gates now fail fast. validate_environment runs after filesystem selection, before any disk is touched: it confirms UEFI boot mode and that every required command is present, with the list coming from a new required_commands helper built like pacstrap_packages. validate_install_targets runs after disk selection, before the first wipe: it refuses a target that's mounted, holds active swap, or belongs to an imported pool or md array, rejects disks under 20 GB, and confirms a mirror is reachable via DNS plus a TCP probe (no ICMP, since some networks drop it).

I folded the install_failure_cleanup hardening into the same change. It now falls back to lazy unmounts, so a pacstrap-interrupted target with busy bind mounts still releases the pool and unmounts the EFI partition. Without that, the disk-in-use guard would block the very retry the cleanup exists to enable. "Re-run to retry" only holds if the disk is genuinely freed first.

The 20 GB floor is decimal on purpose. It reads as the natural minimum and clears a 20 GiB disk image with headroom instead of sitting on the boundary.
</content>
</entry>
<entry>
<title>refactor: wire validate_config into the unattended install path</title>
<updated>2026-05-19T07:19:04+00:00</updated>
<author>
<name>Craig Jennings</name>
<email>c@cjennings.net</email>
</author>
<published>2026-05-19T07:19:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.cjennings.net/archangel/commit/?id=b6d737fc9e3cee641afe7b855c5356122537850b'/>
<id>urn:sha1:b6d737fc9e3cee641afe7b855c5356122537850b</id>
<content type='text'>
validate_config in lib/config.sh was unreachable from main(). Its empty-field checks duplicated four lines in gather_input's unattended branch. validate_config also has two checks gather_input doesn't: that every entry in SELECTED_DISKS is a real block device, and that TIMEZONE exists under /usr/share/zoneinfo. Neither check ever ran. A config with a typo'd disk path slipped past gather_input and surfaced as an obscure sgdisk error inside the destructive partitioning step.

I wired validate_config into main() after validate_filesystem, gated on UNATTENDED so it only runs against an already-loaded config. I dropped the four duplicate empty-field checks from gather_input's unattended branch. The filesystem-specific passphrase checks stay there because they're coupled to the FILESYSTEM branch logic.

validate_config reports every missing field at once instead of dying on the first. A config with five missing fields tells you all five in one pass.

I removed the four corresponding gather_input bats tests. validate_config's existing unit tests in test_config.bats already cover their assertions. Bats: 178 → 174.
</content>
</entry>
<entry>
<title>refactor: consolidate installer defaults and FILESYSTEM validation into config.sh</title>
<updated>2026-04-27T17:45:48+00:00</updated>
<author>
<name>Craig Jennings</name>
<email>c@cjennings.net</email>
</author>
<published>2026-04-27T17:45:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.cjennings.net/archangel/commit/?id=8c69aaaff13da3b7d1d24ed34975e8c5b30409e6'/>
<id>urn:sha1:8c69aaaff13da3b7d1d24ed34975e8c5b30409e6</id>
<content type='text'>
The installer had three sites touching FILESYSTEM: a top-level default in the monolith, a re-default block in gather_input, and a runtime validation block also in gather_input. The same scattering existed for LOCALE, KEYMAP, ENABLE_SSH, and NO_ENCRYPT. A future contributor changing one site wouldn't have known the other two existed.

Move all five defaults into the lib/config.sh declarations so config.sh is the single source of truth. Add validate_filesystem() in lib/config.sh and call it from main() between check_config and gather_input, so a typo in a config file's FILESYSTEM= fails fast before any install action runs.

The behavior change is stricter. An empty FILESYSTEM in a config file used to be silently defaulted to zfs, now it errors. Interactive mode is unaffected. select_filesystem still controls the value and already errored on cancellation.

Bats: 140 → 142. Five tests added in test_config.bats for the defaults pinning and validate_filesystem coverage. Three removed from test_archangel.bats for behavior that moved out of gather_input. Lint clean.
</content>
</entry>
<entry>
<title>fix: clean up partial installs via ERR/INT/TERM trap</title>
<updated>2026-04-26T06:31:38+00:00</updated>
<author>
<name>Craig Jennings</name>
<email>c@cjennings.net</email>
</author>
<published>2026-04-26T06:31:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.cjennings.net/archangel/commit/?id=6de9f378cf52b9e9b0e89b396a12b978700241ff'/>
<id>urn:sha1:6de9f378cf52b9e9b0e89b396a12b978700241ff</id>
<content type='text'>
Failed installs left the system in an inconsistent state: /mnt mounted, the zpool imported, possibly LUKS containers open. The user had to manually unmount, export the pool, and clean up partitions before re-running the installer. The existing trap at the top of archangel was `trap 'error "Installation interrupted!"' INT TERM`. It just printed a message and exited. There was no ERR trap, so `set -e`-aborted commands ran no cleanup either.

I added `install_failure_cleanup` next to the existing `cleanup`. It captures `$?` first, disarms the trap to prevent recursion, clears sensitive variables (ROOT_PASSWORD, ZFS_PASSPHRASE, LUKS_PASSPHRASE), and dispatches on FILESYSTEM:

- ZFS path: unmount /mnt/efi, recursive umount /mnt, export the pool (with `-f` fallback) if it's still imported.
- Btrfs path: unmount /mnt/efi, call the existing btrfs_cleanup and btrfs_close_encryption helpers.

All cleanup steps swallow their own errors with `|| true` since partial state is expected when this fires mid-install.

`install_zfs` and `install_btrfs` now both arm the trap as their first action and disarm it just before the success-path cleanup. The disarm matters because the success cleanup calls `zpool export` (or btrfs_close_encryption) directly, and those can produce non-zero exit codes that we don't want to interpret as "installation failed".

The note in notes.org described this as "install_zfs has no mid-step recovery." The framing was off. Both paths were exposed: install_btrfs's `btrfs_cleanup` only runs on the success path, same as `cleanup` for ZFS. Both paths now have the same recovery shape.

Added 4 bats tests for `install_failure_cleanup` that mock the system tools (umount, zpool, btrfs_cleanup, btrfs_close_encryption) via function override and track invocations through a CALLS array. Array assignment isn't affected by the production code's `&gt;/dev/null 2&gt;&amp;1` redirects on `zpool list`, so we capture the call regardless of where the mock's stdout would have gone.

Verified end-to-end on the dev box: sourced archangel, set FILESYSTEM=zfs, armed the trap, ran `false` to trigger `set -e`. The trap fired with exit code 1, dispatched to the ZFS cleanup path, called `umount /mnt/efi` and `umount -R /mnt`, checked `zpool list` (returned non-zero since no pool exists on the dev box), skipped the export, and exited via `error`.

No behavior change on the success path. The existing `cleanup` and `btrfs_cleanup` stay unchanged.
</content>
</entry>
<entry>
<title>test: expand bats coverage across installer modules</title>
<updated>2026-04-26T06:09:01+00:00</updated>
<author>
<name>Craig Jennings</name>
<email>c@cjennings.net</email>
</author>
<published>2026-04-26T06:09:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.cjennings.net/archangel/commit/?id=1a261b0c220903c8bb628e7f2b94cf75a843f688'/>
<id>urn:sha1:1a261b0c220903c8bb628e7f2b94cf75a843f688</id>
<content type='text'>
Added unit tests for `disk.sh`, `btrfs.sh`, the archangel monolith's `gather_input` unattended branch, and filled gap cases in `config.sh`. The suite grew from 71 to 110 tests.

`installer/lib/disk.sh` was completely uncovered. New `tests/unit/test_disk.bats` covers the four pure partition-path helpers (`get_efi_partition`, `get_root_partition`, `get_efi_partitions`, `get_root_partitions`) across SATA, virtio, and NVMe inputs, mixed arrays, and the empty-input behavior. Side-effecting functions in the same file (sgdisk, mkfs.fat, partprobe, and fzf wrappers) stay deliberately VM-tested.

`installer/lib/btrfs.sh` had no bats coverage. New `tests/unit/test_btrfs.bats` covers `get_luks_devices`, the only pure helper in the file. It pins the asymmetric naming convention where the first device gets the bare `LUKS_MAPPER_NAME` and subsequent devices append the index.

The archangel monolith was un-source-able for tests because its top-level code created a /tmp log file and redirected stdout via `exec &gt; &gt;(tee...)`, plus called `main "$@"` unconditionally at the bottom. I extracted the logging setup into an `init_logging` function called from `main`, and wrapped the main call in a `[[ "${BASH_SOURCE[0]}" == "${0}" ]]` guard. Sourcing the script now loads function definitions silently, with no log file and no banner. Running it directly works exactly as before. Verified both paths.

That refactor unlocks `tests/unit/test_archangel.bats`, which covers `gather_input` in unattended mode. Required-field validation for HOSTNAME, TIMEZONE, ROOT_PASSWORD, and DISKS. Optional-field defaulting (FILESYSTEM to zfs, LOCALE to en_US.UTF-8, KEYMAP to us, ENABLE_SSH to yes). Filesystem-specific encryption checks (ZFS_PASSPHRASE required when not NO_ENCRYPT, same for LUKS_PASSPHRASE on Btrfs). Filesystem validity. RAID_LEVEL defaulting for multi-disk installs. The interactive branch stays out of scope per the testing-strategy policy.

`tests/unit/test_config.bats` got five gap tests: `check_config` when CONFIG_FILE is set, `validate_config` against a non-block-device entry (e.g. /dev/null) and a missing path, and `parse_args` accepting `--color` and `--config-file` together in either order.

`testing-strategy.org` got an expanded "What bats does NOT cover" section. The doc previously named six tools (mkfs, cryptsetup, zpool create, pacstrap, arch-chroot, grub-install). The new list adds sgdisk, partprobe, blkid, mkfs.fat, mkfs.btrfs, snapper, efibootmgr, mount, umount, findmnt, mountpoint, and fzf. It also names the conditions (root needed, real /dev or /sys state) that make a function VM-only. The coverage table at the top now lists the three new test files.

No behavior change in production code. The init_logging extraction preserves the existing log path and banner format byte-for-byte.
</content>
</entry>
</feed>
