aboutsummaryrefslogtreecommitdiff
path: root/installer
Commit message (Collapse)AuthorAgeFilesLines
* feat(install): install baked AUR packages and clean the target configCraig Jennings4 days2-0/+84
| | | | | | | | | | Wire the baked AUR repo into the installer. Before pacstrap, install_base checks whether the ISO shipped the repo and, if so, exposes [aur] in the live /etc/pacman.conf and reads the package names from the manifest, adding them to the pacstrap set so they install into the target offline. This mirrors the existing [archzfs] handling. pacstrap resolves repos from the live system, not $MNTPOINT. The live config already carries [aur] from the shipped ISO config, so the append is idempotent by design. A --skip-aur ISO ships no repo, and aur_repo_available gates the whole path, so the installer still works there. configure_system strips any [aur] stanza from the target /etc/pacman.conf. pacstrap installs a stock target config with no [aur], so this is defensive, but it guarantees the installed system never references /usr/share/aur-packages, which exists only on the live ISO. Four new common.sh helpers carry the logic: aur_repo_available, append_aur_repo (idempotent), aur_manifest_names (the manifest is the source of what to install, so the list never drifts), and strip_repo_stanza. All four covered across Normal, Boundary, and Error.
* refactor: drop the dead duplicate disk_in_use from common.shCraig Jennings2026-05-231-8/+0
| | | | | | common.sh and disk.sh both defined disk_in_use. archangel sources common.sh first, then disk.sh, so disk.sh's thorough version (mount, active swap, imported zpool, md array) won at runtime everywhere — including list_available_disks, the common.sh function that calls it. common.sh's older mount-and-holders version was dead. I deleted it. list_available_disks now resolves disk_in_use to disk.sh's, which is what already happened at runtime. The disk.sh unit tests cover the surviving version. Suite stays at 245, lint clean.
* feat(install): add pre-flight environment and disk-target validationCraig Jennings2026-05-223-3/+145
| | | | | | | | | | archangel went straight from filesystem selection into a destructive install behind only a root check and a ZFS module load. A missing tool, a BIOS boot, a too-small or in-use disk, or a dead network surfaced as a confusing abort partway through, sometimes after partitioning had already run. Two gates now fail fast. validate_environment runs after filesystem selection, before any disk is touched: it confirms UEFI boot mode and that every required command is present, with the list coming from a new required_commands helper built like pacstrap_packages. validate_install_targets runs after disk selection, before the first wipe: it refuses a target that's mounted, holds active swap, or belongs to an imported pool or md array, rejects disks under 20 GB, and confirms a mirror is reachable via DNS plus a TCP probe (no ICMP, since some networks drop it). I folded the install_failure_cleanup hardening into the same change. It now falls back to lazy unmounts, so a pacstrap-interrupted target with busy bind mounts still releases the pool and unmounts the EFI partition. Without that, the disk-in-use guard would block the very retry the cleanup exists to enable. "Re-run to retry" only holds if the disk is genuinely freed first. The 20 GB floor is decimal on purpose. It reads as the natural minimum and clears a 20 GiB disk image with headroom instead of sitting on the boundary.
* refactor: extract validate_encryption_passphrase from gather_inputCraig Jennings2026-05-192-11/+17
| | | | | | | | | | gather_input's unattended branch had two parallel if-blocks, one for ZFS and one for Btrfs, each doing the same encryption-passphrase empty check against a filesystem-specific variable (ZFS_PASSPHRASE or LUKS_PASSPHRASE). The two blocks shared the condition surface and error template. Only the variable name differed. I lifted the check into validate_encryption_passphrase in lib/config.sh next to validate_filesystem. The helper takes the variable name and uses indirect expansion (${!var_name}) so one function covers both filesystems. gather_input now dispatches via if/elif on FILESYSTEM and calls the helper with the right variable, collapsing 14 lines to 6. The original tests in test_archangel.bats (gather_input errors when ZFS without ZFS_PASSPHRASE / when Btrfs without LUKS_PASSPHRASE / accepts ZFS with NO_ENCRYPT=yes) still pass, exercising the helper through the dispatch. Added 4 direct unit tests in test_config.bats covering the four cases: NO_ENCRYPT=yes passes regardless, NO_ENCRYPT=no with empty fails, NO_ENCRYPT=no with value passes, and the error message names the offending variable. Bats: 177 → 181. No behavior change. The helper preserves the original error message format and exit conditions.
* refactor: lift FILES= keyfile sed to ensure_initramfs_files helperCraig Jennings2026-05-192-5/+18
| | | | | | | | btrfs.sh's configure_btrfs_initramfs had a six-line inline that ensured mkinitcpio.conf's FILES= line listed the LUKS keyfile: sed-replace the existing FILES= line, then grep + append as a fallback when no FILES= line existed. The pattern is mkinitcpio-specific and self-healing rather than error-on-miss (FILES= is optional in mkinitcpio.conf, so missing means "no extra files," not a broken config). I lifted the block into ensure_initramfs_files in lib/common.sh next to prepend_grub_cmdline_linux, then collapsed the btrfs.sh call site to a single ensure_initramfs_files line. Added three bats tests for the three cases (FILES= present and empty, FILES= present with a different value, FILES= absent entirely). Bats: 174 → 177. No behavior change. The helper's logic matches the inline byte-for-byte: same sed pattern, same grep fallback, same final state.
* refactor: wire validate_config into the unattended install pathCraig Jennings2026-05-191-5/+3
| | | | | | | | | | validate_config in lib/config.sh was unreachable from main(). Its empty-field checks duplicated four lines in gather_input's unattended branch. validate_config also has two checks gather_input doesn't: that every entry in SELECTED_DISKS is a real block device, and that TIMEZONE exists under /usr/share/zoneinfo. Neither check ever ran. A config with a typo'd disk path slipped past gather_input and surfaced as an obscure sgdisk error inside the destructive partitioning step. I wired validate_config into main() after validate_filesystem, gated on UNATTENDED so it only runs against an already-loaded config. I dropped the four duplicate empty-field checks from gather_input's unattended branch. The filesystem-specific passphrase checks stay there because they're coupled to the FILESYSTEM branch logic. validate_config reports every missing field at once instead of dying on the first. A config with five missing fields tells you all five in one pass. I removed the four corresponding gather_input bats tests. validate_config's existing unit tests in test_config.bats already cover their assertions. Bats: 178 → 174.
* fix(install): drop dead zfsrollback copy from configure_zfs_toolsCraig Jennings2026-05-141-4/+1
| | | | | | The 2026-04-27 consolidation (422d109) deleted installer/zfsrollback in favor of the unified zfssnapshot wrapper, but installer/archangel:configure_zfs_tools still tried to cp /usr/local/bin/zfsrollback to the installed system. With set -euo pipefail in effect, a fresh ISO + ZFS install would abort here with cp: cannot stat 'zfsrollback'. The bug shipped on main without anyone noticing because the 04-27 VM tests ran against an ISO built earlier that day, before the consolidation merged. No fresh ISO has been built since. I also dropped the leftover profile/airootfs/usr/local/bin/zfsrollback file that build.sh no longer regenerates.
* feat: add --name flag to zfssnapshot rollback and deleteCraig Jennings2026-05-141-32/+79
| | | | | | I added --name NAME to rollback (single name) and --name NAME[,NAME...] to delete (comma-separated for multi-select) so scripted callers can drive the wrapper without fzf. The upcoming VM verification step in scripts/test-install.sh needs this. fzf is now conditional, required only when --name is omitted. The 10 new bats tests cover help-text mentions, parse success and failure modes (missing value, mutex with -s, unknown flag), fzf-bypass on both subcommands, and multi-name expansion on delete.
* refactor: drop dead variables from lib/config.shCraig Jennings2026-05-141-3/+0
| | | | I dropped ENCRYPTION_ENABLED, SSH_ENABLED, and SSH_KEY — declarations with no readers anywhere in the project. The live names (NO_ENCRYPT, ENABLE_SSH) handle these settings instead. The example config files already reference those.
* feat: consolidate zfssnapshot and zfsrollback into one subcommand-driven scriptCraig Jennings2026-04-272-254/+405
| | | | | | | | | | | | Problem: zfssnapshot and zfsrollback were two separate scripts with overlapping pre-flight checks (zfs / fzf / root) and parallel UX patterns (description sanitization in one, fzf selection in the other). Users had to remember which script was for which operation, and a "list" view meant typing the raw `zfs list -t snapshot` command. There was no path to destroy individual snapshots short of `zfs destroy` directly, which is dangerous without a confirmation flow. Solution: rewrite zfssnapshot as a single multi-subcommand script (list, create, rollback, delete). Drop installer/zfsrollback. The new script uses a source-guard at the bottom (`if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then main "$@"; fi`) so bats can source it without triggering the install-time pre-flight checks, matching the pattern in installer/archangel. Pure helpers (sanitize_description, validate_description, format_snapshot_name) get extracted as named functions so they're testable in isolation. The destructive flows (rollback, delete) keep the explicit "yes" confirmation prompt, the genesis-snapshot warning, and the recursive-rollback-destroys-newer-snapshots warning. Delete uses fzf --multi so the user can pick several snapshot names at once. Updated build.sh to copy only the consolidated script. Dropped the zfsrollback profiledef permission line. Updated Makefile, README, scripts/sanity-test.sh, and testing-strategy.org to reflect the single-script layout. Bats: 147 → 168 (+21). Coverage spans sanitize_description (normal / boundary / error), validate_description (alphanumerics, hyphens, underscores accepted; spaces, slashes, shell metacharacters, empty rejected), format_snapshot_name (timestamp + description composition), and main subcommand dispatch (list / create / rollback / delete / help / unknown). Lint clean. The zfs-, fzf-, and arch-chroot-shelling subcommand bodies stay VM-tested per testing-strategy.org.
* refactor: drop dead configure_luks_grub from Btrfs install pathCraig Jennings2026-04-271-28/+0
| | | | | | | | | | Problem: configure_luks_grub appended GRUB_ENABLE_CRYPTODISK=y and prepended cryptdevice= to /etc/default/grub during the LUKS-target setup phase, but configure_grub at lib/btrfs.sh:578 does `cat > /etc/default/grub` later in the same install, with a single redirect that overwrites the file. Between the two, only generate_btrfs_fstab and configure_btrfs_initramfs run, neither of which touches /etc/default/grub. So configure_luks_grub's writes never reach the installed system. The live LUKS-cmdline work is configure_grub's own LUKS-enabled block at lib/btrfs.sh:597-627. Solution: drop configure_luks_grub from btrfs_configure_luks_target and delete the function (no other callers). configure_luks_initramfs stays since it writes to mkinitcpio, not /etc/default/grub. VM tests on the btrfs-luks path have always been passing because they exercise configure_grub's live block. prepend_grub_cmdline_linux already has bats coverage for the live cmdline path. Bats: 147, 0 fail. Lint clean.
* refactor: extract MNTPOINT constant for the install chroot mount pointCraig Jennings2026-04-273-130/+138
| | | | | | | | | | | | Last on the tech-debt drain. The installer hardcoded /mnt at 50+ sites: pacstrap, arch-chroot, mount/umount, fstab writes, and every host-side write into the chroot's /etc, /usr, /var, /boot, /tmp. Same magic-string smell as /mnt/efi but at much larger scale. Add MNTPOINT="/mnt" to lib/common.sh next to EFI_DIR. Replace literal /mnt/... with $MNTPOINT/... across installer/archangel, installer/lib/btrfs.sh, and installer/lib/common.sh. Replace bare /mnt (mount target, arch-chroot root, umount target, install_dropin parameter) with $MNTPOINT. EFI_DIR's own definition becomes EFI_DIR="$MNTPOINT/efi" for the natural composition. Folded in the related ticket: /mnt${chroot_efi_dir} in btrfs.sh:install_grub_all_efi becomes ${MNTPOINT}${chroot_efi_dir}. Was filed as a separate item but the ticket said it should ship with the MNTPOINT extraction, since the composition pattern is unusual and easy to miss in a global sed. Three /mnt references kept literal in comments where the comment describes the string concept rather than the mount point ("Remove /mnt prefix - config is used inside chroot where root is /", etc.). Substituting to $MNTPOINT in those comments would obscure the documentation. Bats: 146 → 147. One new test in test_common.bats pins MNTPOINT="/mnt". Lint clean (one shellcheck SC2295 warning fixed by quoting the parameter expansion: ${isp_firmware#"$MNTPOINT"}). VM verification deferred to a single full make test-install run after all three tech-debt commits land.
* refactor: verify GRUB_CMDLINE_LINUX seds via prepend_grub_cmdline_linux helperCraig Jennings2026-04-273-2/+30
| | | | | | | | | | | | Audited the ~10 silent sed -i sites in the installer against the verification-after pattern that landed for sshd_config last session. Triaged each by failure mode. The two GRUB_CMDLINE_LINUX seds in lib/btrfs.sh have a real silent-failure risk. If /etc/default/grub is missing or malformed and the sed pattern doesn't match, nothing happens. The kernel boots without cryptdevice=. The system can't unlock LUKS at boot. Added prepend_grub_cmdline_linux to lib/common.sh. Same shape as enable_sshd_root_login (sed, then grep, then error if the line wasn't modified). Replaced the two inline seds with helper calls. The HOOKS= seds in installer/archangel and lib/btrfs.sh (six total) don't need verification. A missing HOOKS= line makes mkinitcpio -P fail loudly downstream, so silent-replace failure can't reach a booted system. Added a one-line audit-rationale comment at each of the three locations so the next reader doesn't re-litigate the decision. The FILES= sed at lib/btrfs.sh:213 already self-heals via a sed-then-grep-then-append pattern, so no behavior change there. Filed a separate follow-up to lift that pattern into a named helper for clarity. Bats: 142 → 146. Four new tests in test_common.bats cover normal (empty cmdline, existing cmdline preserved, other lines preserved) and error (missing GRUB_CMDLINE_LINUX line). Lint clean.
* refactor: consolidate installer defaults and FILESYSTEM validation into ↵Craig Jennings2026-04-272-35/+28
| | | | | | | | | | | | config.sh The installer had three sites touching FILESYSTEM: a top-level default in the monolith, a re-default block in gather_input, and a runtime validation block also in gather_input. The same scattering existed for LOCALE, KEYMAP, ENABLE_SSH, and NO_ENCRYPT. A future contributor changing one site wouldn't have known the other two existed. Move all five defaults into the lib/config.sh declarations so config.sh is the single source of truth. Add validate_filesystem() in lib/config.sh and call it from main() between check_config and gather_input, so a typo in a config file's FILESYSTEM= fails fast before any install action runs. The behavior change is stricter. An empty FILESYSTEM in a config file used to be silently defaulted to zfs, now it errors. Interactive mode is unaffected. select_filesystem still controls the value and already errored on cancellation. Bats: 140 → 142. Five tests added in test_config.bats for the defaults pinning and validate_filesystem coverage. Three removed from test_archangel.bats for behavior that moved out of gather_input. Lint clean.
* refactor: collapse sshd_config seds into enable_sshd_root_loginCraig Jennings2026-04-262-3/+19
| | | | | | | | | | | | The two sed -i invocations in configure_ssh worked on stock Arch sshd_config but had a real silent-failure mode. If neither the commented (#PermitRootLogin) nor the uncommented form was present, both seds did nothing and the install shipped without root SSH. The user discovered it at first ssh attempt, not at install time. The second sed was also redundant. By the time it ran, the first sed had produced a line matching the second sed's pattern. The new enable_sshd_root_login helper in lib/common.sh combines both substitutions into one sed -i -e ..., then verifies PermitRootLogin yes is present in the file. If the verification fails, it calls error rather than silently appending. Silent appending would mask a corrupted starting file, which is exactly the failure mode worth flagging loudly. The helper takes the config path as an argument so the bats tests in commit 7486abb can run unprivileged against tempfiles. configure_ssh passes /mnt/etc/ssh/sshd_config and is now a single call instead of two seds. Verified: bats 135 → 140 (+5 covering normal/boundary/error). Lint clean. Helper smoke-tested against current Arch sshd_config. The loud-error path can't be exercised against the live default but is covered by the bats error case. Filed as a follow-up :techdebt: item: ~10 other sed -i sites in installer/archangel and lib/btrfs.sh follow the same silent-replace pattern. The FILES= site for LUKS is the worst (silent failure means LUKS prompts on every boot). Triage each per this same recipe in a future session.
* refactor: extract EFI_DIR constant for the install-time EFI mount pointCraig Jennings2026-04-263-18/+28
| | | | | | | | | | The literal /mnt/efi appeared at 17 sites across installer/archangel and installer/lib/btrfs.sh. Renaming it (or pointing tests at a different mount) meant touching every site and risking incomplete sweeps. One canonical name in installer/lib/common.sh now backs every reference. EFI_DIR has no trailing slash so the three expansion patterns in the codebase compose cleanly. Bare ($EFI_DIR), sub-path ($EFI_DIR/EFI/ZBM), and the index-suffix used by install_grub_all_efi for secondary EFI mounts (${EFI_DIR}${i}). The sync_efi_partitions staging path also moves from the literal /mnt/efi_sync to ${EFI_DIR}_sync, so it follows EFI_DIR if anyone ever changes the base. Two follow-ups filed as separate :techdebt: items. MNTPOINT=/mnt extraction across the 50+ /mnt/... sites (pacstrap, arch-chroot, fstab writes), and the related /mnt${chroot_efi_dir} composition pattern at btrfs.sh:681-682. Both ship together when MNTPOINT lands. Verified: bats 134 → 135 (+1 pinning EFI_DIR=/mnt/efi). Lint clean. All four expansion patterns smoke-tested at runtime and produce the original literal byte-for-byte. VM run skipped, pure constant substitution with zero behavior change.
* refactor: unify partition_disks across ZFS and Btrfs install pathsCraig Jennings2026-04-262-140/+53
| | | | | | | | | | | | The monolith's partition_disks() at installer/archangel was ZFS-only and silently shadowed lib/disk.sh:partition_disks(), which had been dead code since the Btrfs install path was added. install_btrfs was assembling partitioning manually via partition_disk (singular) plus a separate format_efi_partitions call. Two parallel implementations meant fixes had to land in two places and the lib version drifted with no visible warning. The unified partition_disks now lives in lib/disk.sh. It reads SELECTED_DISKS, dispatches the per-disk layout on FILESYSTEM (BF00 for ZFS, 8300 for Btrfs), populates EFI_PARTS + ROOT_PARTS, and formats each EFI partition with EFI0, EFI1, ... labels. Folded in two pre-existing divergences while consolidating. wipefs -af now runs on every disk, not just the ZFS path. The Btrfs path was missing this defense against non-GPT signatures (LVM, mdadm, ext) that sgdisk --zap-all alone won't touch. EFI labels standardized on EFI0, EFI1, ... across both paths. The lib version was producing asymmetric EFI / EFI2 labels. No consumer in the repo references the labels after format, so that change is cosmetic. ZFS_PARTS renamed to ROOT_PARTS for symmetry with get_root_partition. Deleted four orphaned helpers: format_efi, format_efi_partitions (only caller was the collapsed install_btrfs section), get_efi_partitions, get_root_partitions (test-only callers after install_btrfs simplified). Verified end to end: bats 134/134, make test-install passing 12/12 configs across both install paths.
* fix: clean up partial installs via ERR/INT/TERM trapCraig Jennings2026-04-261-0/+58
| | | | | | | | | | | | | | | | | | | | | Failed installs left the system in an inconsistent state: /mnt mounted, the zpool imported, possibly LUKS containers open. The user had to manually unmount, export the pool, and clean up partitions before re-running the installer. The existing trap at the top of archangel was `trap 'error "Installation interrupted!"' INT TERM`. It just printed a message and exited. There was no ERR trap, so `set -e`-aborted commands ran no cleanup either. I added `install_failure_cleanup` next to the existing `cleanup`. It captures `$?` first, disarms the trap to prevent recursion, clears sensitive variables (ROOT_PASSWORD, ZFS_PASSPHRASE, LUKS_PASSPHRASE), and dispatches on FILESYSTEM: - ZFS path: unmount /mnt/efi, recursive umount /mnt, export the pool (with `-f` fallback) if it's still imported. - Btrfs path: unmount /mnt/efi, call the existing btrfs_cleanup and btrfs_close_encryption helpers. All cleanup steps swallow their own errors with `|| true` since partial state is expected when this fires mid-install. `install_zfs` and `install_btrfs` now both arm the trap as their first action and disarm it just before the success-path cleanup. The disarm matters because the success cleanup calls `zpool export` (or btrfs_close_encryption) directly, and those can produce non-zero exit codes that we don't want to interpret as "installation failed". The note in notes.org described this as "install_zfs has no mid-step recovery." The framing was off. Both paths were exposed: install_btrfs's `btrfs_cleanup` only runs on the success path, same as `cleanup` for ZFS. Both paths now have the same recovery shape. Added 4 bats tests for `install_failure_cleanup` that mock the system tools (umount, zpool, btrfs_cleanup, btrfs_close_encryption) via function override and track invocations through a CALLS array. Array assignment isn't affected by the production code's `>/dev/null 2>&1` redirects on `zpool list`, so we capture the call regardless of where the mock's stdout would have gone. Verified end-to-end on the dev box: sourced archangel, set FILESYSTEM=zfs, armed the trap, ran `false` to trigger `set -e`. The trap fired with exit code 1, dispatched to the ZFS cleanup path, called `umount /mnt/efi` and `umount -R /mnt`, checked `zpool list` (returned non-zero since no pool exists on the dev box), skipped the export, and exited via `error`. No behavior change on the success path. The existing `cleanup` and `btrfs_cleanup` stay unchanged.
* test: expand bats coverage across installer modulesCraig Jennings2026-04-261-10/+21
| | | | | | | | | | | | | | | | | | Added unit tests for `disk.sh`, `btrfs.sh`, the archangel monolith's `gather_input` unattended branch, and filled gap cases in `config.sh`. The suite grew from 71 to 110 tests. `installer/lib/disk.sh` was completely uncovered. New `tests/unit/test_disk.bats` covers the four pure partition-path helpers (`get_efi_partition`, `get_root_partition`, `get_efi_partitions`, `get_root_partitions`) across SATA, virtio, and NVMe inputs, mixed arrays, and the empty-input behavior. Side-effecting functions in the same file (sgdisk, mkfs.fat, partprobe, and fzf wrappers) stay deliberately VM-tested. `installer/lib/btrfs.sh` had no bats coverage. New `tests/unit/test_btrfs.bats` covers `get_luks_devices`, the only pure helper in the file. It pins the asymmetric naming convention where the first device gets the bare `LUKS_MAPPER_NAME` and subsequent devices append the index. The archangel monolith was un-source-able for tests because its top-level code created a /tmp log file and redirected stdout via `exec > >(tee...)`, plus called `main "$@"` unconditionally at the bottom. I extracted the logging setup into an `init_logging` function called from `main`, and wrapped the main call in a `[[ "${BASH_SOURCE[0]}" == "${0}" ]]` guard. Sourcing the script now loads function definitions silently, with no log file and no banner. Running it directly works exactly as before. Verified both paths. That refactor unlocks `tests/unit/test_archangel.bats`, which covers `gather_input` in unattended mode. Required-field validation for HOSTNAME, TIMEZONE, ROOT_PASSWORD, and DISKS. Optional-field defaulting (FILESYSTEM to zfs, LOCALE to en_US.UTF-8, KEYMAP to us, ENABLE_SSH to yes). Filesystem-specific encryption checks (ZFS_PASSPHRASE required when not NO_ENCRYPT, same for LUKS_PASSPHRASE on Btrfs). Filesystem validity. RAID_LEVEL defaulting for multi-disk installs. The interactive branch stays out of scope per the testing-strategy policy. `tests/unit/test_config.bats` got five gap tests: `check_config` when CONFIG_FILE is set, `validate_config` against a non-block-device entry (e.g. /dev/null) and a missing path, and `parse_args` accepting `--color` and `--config-file` together in either order. `testing-strategy.org` got an expanded "What bats does NOT cover" section. The doc previously named six tools (mkfs, cryptsetup, zpool create, pacstrap, arch-chroot, grub-install). The new list adds sgdisk, partprobe, blkid, mkfs.fat, mkfs.btrfs, snapper, efibootmgr, mount, umount, findmnt, mountpoint, and fzf. It also names the conditions (root needed, real /dev or /sys state) that make a function VM-only. The coverage table at the top now lists the three new test files. No behavior change in production code. The init_logging extraction preserves the existing log path and banner format byte-for-byte.
* refactor: extract get_raid_level fzf preview into raid.sh helperCraig Jennings2026-04-262-102/+125
| | | | | | | | | | | | | | `get_raid_level()` carried a 98-line inline fzf `--preview` shell snippet that contained five nearly-parallel `case` branches emitting per-level preview text, plus three exported variables (`RAID_DISK_COUNT`, `RAID_TOTAL_GB`, `RAID_SMALLEST_GB`) just to pass values into the fzf preview subshell. The inline shell-in-shell had no syntax highlighting, no shellcheck on the inner snippet, and any edit to preview copy meant editing inside a single-quoted argument. I extracted the per-level text into a new `raid_preview(level, disk_count, total_gb, smallest_gb)` helper in `lib/raid.sh`. It reuses the existing `raid_fault_tolerance` and `raid_usable_bytes` primitives for the data lines instead of redoing the arithmetic inline. That keeps the math in one place. The fzf `--preview` argument is now a one-liner that calls `raid_preview` with the sizing values, and the env-var exports are gone. `export -f raid_preview raid_fault_tolerance raid_usable_bytes` makes the functions visible in fzf's preview subshell. I verified this against a fresh `bash -c` subshell, which is what fzf spawns internally. `get_raid_level()` shrinks from 144 to 49 lines. Preview text is now bats-tested. Added 8 unit tests across the 5 RAID levels (headline, fault-tolerance line, computed usable space), mixed-size handling for mirror (smallest disk, not average), unknown level returning 1 with empty output, and a sanity loop confirming every valid level produces non-empty output. No behavior change. The preview pane shows the same text, the same level options, the same selected output. The pure logic in `lib/raid.sh` is unchanged.
* fix: fail fast on missing ZFSBootMenu efibootmgr entryCraig Jennings2026-04-262-10/+38
| | | | | | | | | | The post-bootloader boot-order step in `configure_zfsbootmenu` parsed `efibootmgr` output through a `grep | head | grep -oP` chain with no null guards. If any link returned empty (the entry wasn't created, the label was different, or efibootmgr itself failed), the surrounding `if [[ -n "$bootnum" ]]` silently skipped, the install reported success, and the user rebooted into a machine that wouldn't boot ZFSBootMenu by default. I replaced the chain with two pure helpers in `lib/common.sh`, `parse_efibootmgr_entry` and `parse_efibootmgr_bootorder`. The caller in `archangel` invokes them with explicit `|| error` guards on each parse stage. The helpers capture `efibootmgr` output once and reuse it (it was called twice before). The same hardening covers the BootOrder lookup at the adjacent line. It used to rely on the now-removed `bootnum` guard for safety. The helpers are stdin-driven and use bash regex, so they're easy to test in bats without exercising the real efibootmgr binary. Added 9 unit tests across normal cases, hex-character boot numbers, multi-match selection, missing label, missing BootOrder line, empty input, and an empty label argument. The empty-label case would otherwise falsely match `BootCurrent` via the hex regex, capturing "C". The helper now guards it explicitly. Verified manually against real efibootmgr output (GRUB entry at Boot0001, BootOrder 0006,0001,2001,2002,2003). Both helpers parsed correctly. VM integration not re-run for this small post-bootloader change. The next scheduled `make test-install` exercises the green path.
* feat: PrivateTmp=yes drop-in for systemd-tmpfiles on ZFS-rootCraig Jennings2026-04-212-0/+34
| | | | | | | | | | | | | | | | | | | | | | On ZFS-on-root, statx() across sibling services' /var/tmp/systemd-private-*/tmp mounts returns errno 132 (ENOTNAM). This produces 10-30 journal errors per boot and causes systemd-tmpfiles-clean.service to fail every periodic run (exit 73 / CANTCREAT). Running tmpfiles inside its own mount namespace avoids traversing sibling private-tmp paths. install_zfs() now calls configure_tmpfiles_private_tmp() between configure_zfs_tools and sync_efi_partitions, so the genesis snapshot captures the drop-ins. Btrfs path is untouched — errno 132 is ZFS-specific. The drop-in file-writing is factored into install_dropin() in lib/common.sh (service, name, root; body from stdin). Six bats tests exercise path, content, directory permissions, idempotent overwrite, empty content, and special-character preservation. Full root-cause write-up and verification steps in docs/zfs-tmpfiles-private-tmp-fix.md.
* refactor: decompose install_btrfs into named orchestration stagesCraig Jennings2026-04-132-61/+94
| | | | | | | | | | | | | | | | | | | | | | | | | Pull the single-vs-multi-disk and LUKS-vs-no-encryption branching out of install_btrfs() into five helpers in lib/btrfs.sh: - btrfs_open_encryption — LUKS open + fill devices array - btrfs_make_filesystem — create_btrfs_volume dispatch - btrfs_configure_luks_target — in-chroot LUKS config - btrfs_install_grub — GRUB primary + multi-disk mirror - btrfs_close_encryption — LUKS close (cleanup) Helpers use namerefs (local -n) to take the caller's arrays as locals instead of promoting them to globals. install_btrfs() drops from ~99 lines of nested if-then-else to a ~45-line flat sequence of named stages — matching the style of install_zfs(). Behavior preserved — this is pure code movement, no new disk/LUKS operations. No unit tests added for the new helpers: they all wrap real LUKS/mkfs.btrfs calls that need block devices and root; VM integration tests in scripts/test-install.sh remain the source of truth. .shellcheckrc: disable SC2178 (nameref array heuristic) and SC2153 (globals from sourced files) — both recurring false positives. make test: 65/65. shellcheck clean.
* refactor: merge install_base and install_base_btrfsCraig Jennings2026-04-132-69/+43
| | | | | | | | | | | | | | | | Extract the pacstrap package list into pacstrap_packages(filesystem) in lib/common.sh (common + filesystem-specific). install_base() now dispatches on FILESYSTEM for both the archzfs-repo-append and the package list. install_base_btrfs() deleted; install_btrfs() call site updated to invoke install_base. Old: 49 + 38 lines of ~95% copy-paste. New: 32 lines + a 20-line pure helper. 7 bats tests cover: zfs has zfs-dkms/zfs-utils, btrfs has btrfs-progs + grub + grub-btrfs + snapper + snap-pac, each flavor excludes the other's specifics, common packages are in both, unknown filesystem returns status 1, output is one-per-line. make test: 65/65.
* refactor: unify get_{luks,zfs}_passphrase and get_root_passwordCraig Jennings2026-04-132-54/+41
| | | | | | | | | | | | | | | Extract the prompt/confirm/min-length loop into prompt_password() in lib/common.sh using a nameref for the output variable, so UI output stays on the terminal (no command-substitution capture) and the three callers collapse from ~30 lines each to a single helper call. - get_luks_passphrase() — min 8 chars - get_zfs_passphrase() — min 8 chars - get_root_password() — no min (was unchecked before; preserved) 5 bats tests added: match+min-ok path, length-retry loop, mismatch-retry loop, min_len=0 disables check, empty passphrase when min_len=0. make test: 58/58.
* refactor: drop dead mount_efi and select_raid_level from lib/disk.shCraig Jennings2026-04-131-41/+0
| | | | | | | lib/disk.sh:mount_efi() was shadowed by installer/archangel:mount_efi() (different signature, no-arg ZFS-specific) and had zero callers. lib/disk.sh:select_raid_level() was superseded by get_raid_level() in archangel and also had zero callers. Both removed.
* refactor: extract pure RAID logic to lib/raid.sh with bats coverageCraig Jennings2026-04-122-6/+75
| | | | | | | | | | | | | | | | | | | | Peel the testable pieces of get_raid_level() out of the 1600-line installer monolith into installer/lib/raid.sh: - raid_valid_levels_for_count(count) — replaces the inline option-list builder in get_raid_level() - raid_is_valid(level, count) — useful for unattended-config validation - raid_usable_bytes(level, count, smallest, total) — usable-space math - raid_fault_tolerance(level, count) — max tolerable disk failures archangel now sources lib/raid.sh and uses raid_valid_levels_for_count for the fzf option list. Fzf preview subshell still inlines its own usable-bytes arithmetic (calling exported lib functions across preview subshells is fragile; left for a later pass). 30 bats tests in tests/unit/test_raid.bats cover the full enumeration table, every valid/invalid level-vs-count combo from 2 to 5 disks, mixed-size mirror, and unknown-level error paths. make test: 53/53.
* security: gitignore host configs, add .example templatesCraig Jennings2026-04-122-0/+0
| | | | | | velox-{zfs,btrfs}.conf contain LUKS/ZFS passphrases and root passwords. Untrack them and add velox-*.conf to .gitignore. Committed .example templates show the expected structure with 'welcome' placeholders.
* refactor: remove dead installer/lib/zfs.shCraig Jennings2026-04-122-378/+8
| | | | | | | | | | | | | The library was sourced but only zfs_preflight was reachable from install_zfs(); the other ten functions either had names that were never called (create_zfs_datasets, configure_zfs_pacman_hook, etc.) or were shadowed by same-named definitions in the monolithic installer/archangel (create_zfs_pool, configure_zfsbootmenu, configure_zfs_services). Inlined zfs_preflight into archangel and dropped the source line. Removes a trap where fixes appear to be "mirrored" but only one copy actually runs.
* fix: drop zroot/tmp dataset and dedup pacman snapshot hookCraig Jennings2026-04-122-8/+46
| | | | | | | | | | - /tmp on ZFS breaks systemd-tmpfiles-clean (statx ENOLINK on PrivateTmp paths). Use tmpfs via fstab instead; keep zroot/var/tmp. - zfs-pre-snapshot gains a 60s lockfile in /run so burst transactions (archsetup produced 357 snapshots in one run) collapse to one. Both fixes mirrored in installer/archangel and installer/lib/zfs.sh. Already applied and verified on velox.
* session: first bare metal install on velox, multiple fixesCraig Jennings2026-04-102-0/+30
| | | | | ZFS and Btrfs tested on bare metal. Fixed archzfs repo URL, LUKS pbkdf2 for GRUB, no-color default, and missing inetutils. Tagged v0.8.
* fix: add inetutils to installed system packagesCraig Jennings2026-04-091-0/+2
| | | | Provides hostname, ping, and other networking basics on the target system.
* fix: use pbkdf2 for LUKS2 containers instead of argon2idCraig Jennings2026-04-091-2/+5
| | | | | GRUB's LUKS2 support only handles pbkdf2. When /boot is inside the encrypted volume, argon2id causes GRUB to reject the correct password.
* feat: default to no-color output, add --color flag to enableCraig Jennings2026-04-092-11/+16
| | | | Keeps logs and SSH output clean. Use archangel --color for colored output.
* fix: migrate archzfs repo from stale archzfs.com to GitHub ReleasesCraig Jennings2026-04-091-2/+2
| | | | | archzfs.com was abandoned mid-2025; latest packages were ZFS 2.3.2 for kernel 6.12.29. The new GitHub-hosted repo has ZFS 2.4.1 for 6.18.21.
* refactor: rename custom/ to installer/ for clarityCraig Jennings2026-02-2311-0/+6477
The custom/ directory name was an archiso implementation detail. Renamed to installer/ which clearly communicates that this directory contains the installer scripts and utilities that ship on the ISO. Updated all references in build.sh, Makefile, test-install.sh, and README.