archangel - Arch Linux installer ISO — ZFS-on-root or BTRFS, doubles as rescue disk

	Commit message (Collapse)	Author	Age	Files	Lines
*	docs: fix broken README anchors and normalize table formattingHEAD main	Craig Jennings	8 days	1	-7/+46
\|
*	chore: extend gitignore for local task tooling	Craig Jennings	8 days	1	-0/+3
\|
*	fix(installer): scope AUR list to filesystem, keep pacman.conf 0644	Craig Jennings	2026-06-27	3	-1/+115
\| \| \| \| \| \| \| \|	The baked AUR set installed unconditionally, so zfs-auto-snapshot reached every target. On a btrfs install there's no zfs to satisfy its dependency, and pacstrap aborted the whole transaction. The ISO still bakes the full set. install_base now filters the manifest names through filter_aur_for_fs, dropping zfs-only tooling (zfs-auto-snapshot, zrepl) on a non-zfs target. strip_repo_stanza mv'd a 0600 mktemp file onto the target, so a clean install shipped /etc/pacman.conf root-only and every user-level makepkg/yay failed to read it. It now truncate-writes through the existing file, preserving the pristine 0644. Tested in test_common.bats.
*	fix(installer): refresh package databases before pacstrap	Craig Jennings	2026-06-25	1	-0/+10
\| \| \| \| \| \|	A ZFS-root install from an aged ISO died in install_base: dkms couldn't build zfs/2.3.3 against a current linux-lts. The ISO bakes the archzfs sync db at build time, so as the ISO ages that db keeps pinning an older zfs-dkms while linux-lts is pulled current from the live mirror. The DKMS module then fails to compile against the newer kernel. Run pacman -Syy after the archzfs and baked-AUR repos are added to the live pacman.conf and before pacstrap -K, so pacstrap resolves the current zfs-dkms instead of the stale baked one. A plain -y can be skipped by pacman's freshness check against the GitHub-served archzfs db (no reliable timestamps), so -yy forces the refresh.
*	refactor(installer): extract parse_btrfs_subvol_opts helper	Craig Jennings	2026-06-23	2	-35/+69
\| \| \| \|	mount_btrfs_subvolumes and generate_btrfs_fstab each carried an identical block that composed a subvolume's mount options from BTRFS_OPTS plus the per-subvol extra flags. The two could drift out of sync. Extracted the logic into parse_btrfs_subvol_opts (pure string transform), preserving the exact behavior, and called it from both. Added bats cases covering the default, compress=no, nodatacow, nosuid, and combined paths.
*	fix(installer): RAID validation, set -e fix, drop dead shadow branch	Craig Jennings	2026-06-23	3	-33/+82
\| \| \| \| \| \| \| \|	Two installer cleanups from the todo backlog. validate_config now rejects a RAID_LEVEL the selected disk count can't support, guarding the unattended path (the interactive path already constrains the choice). While adding it I found a latent bug: the error loop's ((errors++)) returned 0 on the first error and tripped set -e in the monolith's `[[ UNATTENDED == true ]] && validate_config` call, aborting after one warning instead of listing every problem. Switched to pre-increment so the count accumulates as designed. Added four bats cases, including one that runs validate_config under set -e outside bats' run shield. build.sh dropped the dead shadow-file rebuild else-branch. The profile is always copied fresh from releng (which ships /etc/shadow), so the branch never ran, and its hardcoded account list had drifted from what releng provides. Replaced with an assertion that fails the build loudly if the file is ever missing.
*	chore: ignore elisp and python build artifacts	Craig Jennings	2026-06-23	1	-0/+14
\|
*	fix(build): drop sanoid from the baked AUR set	Craig Jennings	2026-06-17	3	-13/+17
\| \| \| \| \| \|	sanoid depends on perl-config-inifiles, which is AUR-only. makepkg -s can't resolve it from the official repos, so the build aborts before it produces an ISO. The 2026-06-09 dependency gate checked AUR-RPC existence rather than the official sync dbs, so it wrongly classified perl-config-inifiles as official. A full build caught it. sanoid joins paru and mkinitcpio-firmware as AUR-of-AUR packages deferred to the vNext dependency-resolution work. The v1 baked set is now eight packages. Updated the tests and README to match.
*	chore: gitignore .claude/ and AGENTS.md	Craig Jennings	2026-06-16	1	-0/+2
\|
*	chore: gitignore the personal project-level CLAUDE.md	Craig Jennings	2026-06-10	1	-0/+1
\|
*	docs: document baked AUR packages in the README	Craig Jennings	2026-06-09	1	-2/+40
\| \| \| \|	Add a Build Host Requirements subsection covering the baked AUR repo: the v1 package set and audit date, the base-devel/git/sudo build-host needs, the makepkg -s host build-dep mutation, the manifest, and the installed-system disposition (packages installed, repo not retained). Document the --skip-aur toggle and add the AUR build to the build-steps list.
*	feat(install): install baked AUR packages and clean the target config	Craig Jennings	2026-06-09	3	-0/+236
\| \| \| \| \| \| \| \| \| \|	Wire the baked AUR repo into the installer. Before pacstrap, install_base checks whether the ISO shipped the repo and, if so, exposes [aur] in the live /etc/pacman.conf and reads the package names from the manifest, adding them to the pacstrap set so they install into the target offline. This mirrors the existing [archzfs] handling. pacstrap resolves repos from the live system, not $MNTPOINT. The live config already carries [aur] from the shipped ISO config, so the append is idempotent by design. A --skip-aur ISO ships no repo, and aur_repo_available gates the whole path, so the installer still works there. configure_system strips any [aur] stanza from the target /etc/pacman.conf. pacstrap installs a stock target config with no [aur], so this is defensive, but it guarantees the installed system never references /usr/share/aur-packages, which exists only on the live ISO. Four new common.sh helpers carry the logic: aur_repo_available, append_aur_repo (idempotent), aur_manifest_names (the manifest is the source of what to install, so the list never drifts), and strip_repo_stanza. All four covered across Normal, Boundary, and Error.
*	feat(build): inject the AUR repo into the profile and live ISO	Craig Jennings	2026-06-09	2	-11/+117
\| \| \| \| \| \| \| \| \| \|	Wire build-aur.sh into build.sh. After the pacoloco block, build the AUR repo and append a build-host [aur] stanza to profile/pacman.conf with an absolute file:// Server, so mkarchiso installs the baked packages into airootfs. The stanza lands after the pacoloco rewrite so its file:// path isn't redirected to localhost. Add the audited official extra packages and the baked AUR names to packages.x86_64, both sourced from build-aur.sh so the list never drifts from the build array. Ship the repo into airootfs and write a complete live /etc/pacman.conf: the pristine releng config with [aur] appended, not an [aur]-only file, since this replaces the live system's stock config and an AUR-only one would strip the official repos. Copy the manifest beside the ISO in out/. --skip-aur skips the build, the stanza, the AUR names, and the live config. The three injection points also guard on the repo dir existing, so the documented empty-set path can't point mkarchiso at a missing repo. Moved BUILD_LOG creation ahead of the AUR build so its output is captured too. A unit test reproduces the live-config construction and asserts core, extra, the mirrorlist, and [aur] all survive. The end-to-end proof that mkarchiso installs from the build-host repo needs a real root build and is tracked as manual verification.
*	feat(build): add AUR local-repo build helpers	Craig Jennings	2026-06-09	4	-1/+442
\| \| \| \| \| \| \| \|	Add build-aur.sh, sourced by build.sh, that builds the v1 genuine-AUR set into a local pacman repo and emits an auditable manifest. The pure helpers carry the testable surface: the package sets (one source of truth for the build array and the package-list append), the [aur] stanza renderer, the TSV manifest header/row, the package-file locator, the staged repo replacement, and the build-environment preflight. makepkg refuses to run as root, so the orchestrator drops to $SUDO_USER for the clone and build. It stages on the same filesystem and swaps in with mv -T on full success, so a failure ships no repo and leaves no stale one. On any failure error() names the package, the phase, and the log path. The orchestrator and manifest-append need root, network, and makepkg, so they stay out of bats and are covered by the build integration test and the manual checklist instead. Eighteen unit tests cover the pure helpers across Normal, Boundary, and Error.
*	docs: add AUR local repository build spec	Craig Jennings	2026-06-09	1	-0/+343
\| \| \| \| \| \|	Specifies building a fixed set of genuine-AUR packages at ISO-build time and baking them into the ISO as a local pacman repo, so AUR tools work in the live environment and install onto the target offline. Covers the build/live/pacstrap repo-visibility namespace split, the v1 dependency gate, the auditable manifest, staged-replacement failure semantics, and a --skip-aur toggle. The spec went through two review rounds to Ready with caveats. The one remaining caveat is a Phase 2 build proof that mkarchiso installs from the build-host repo.
*	chore: gitignore emacs backup, autosave, and lockfiles	Craig Jennings	2026-05-23	1	-0/+5
\| \| \| \|	Batch and interactive emacs edits of the gitignored todo.org leave todo.org~ (and would leave #todo.org# / .#todo.org) in the working tree. Ignore the patterns so they stop showing up in git status.
*	refactor: drop the dead duplicate disk_in_use from common.sh	Craig Jennings	2026-05-23	1	-8/+0
\| \| \| \| \| \|	common.sh and disk.sh both defined disk_in_use. archangel sources common.sh first, then disk.sh, so disk.sh's thorough version (mount, active swap, imported zpool, md array) won at runtime everywhere — including list_available_disks, the common.sh function that calls it. common.sh's older mount-and-holders version was dead. I deleted it. list_available_disks now resolves disk_in_use to disk.sh's, which is what already happened at runtime. The disk.sh unit tests cover the surviving version. Suite stays at 245, lint clean.
*	test: cover disk_in_use and network_available failure paths	Craig Jennings	2026-05-23	2	-0/+74
\| \| \| \| \| \|	These two boundary functions backed the pre-flight guards from #215 but had no unit coverage of their own. The VM harness exercised them instead. I added 7 bats tests that mock the system commands they query, so the real branching logic runs. test_disk.bats covers disk_in_use across mountpoint, active swap, imported-zpool member, and idle — that's the gate that refuses to wipe an already-mounted disk. test_archangel.bats covers network_available for DNS failure, TCP-connect failure, and success, the check that fails the install before pacstrap. The /proc/mdstat-positive branch and the live probes stay in the VM harness, since neither drives cleanly without writing to /proc or hitting the network. Suite 238 to 245, lint clean.
*	fix(test): run the ZFS-encryption check on the booted system	Craig Jennings	2026-05-22	1	-12/+16
\| \| \| \| \| \|	The ZFS native-encryption assertion lived in verify_install, which runs in the live ISO before reboot. But archangel exports zroot at the end of the install, so verify_install bails at "ZFS pool not found" and never reaches the check. It was dead code: the encrypted-config tests passed on the reboot path (entering the passphrase at ZFSBootMenu and booting is itself proof), while the explicit aes-256-gcm assertion gave false confidence by never running. I moved it into verify_reboot_survival, which ssh's into the booted system where zroot is imported, so zfs get encryption zroot/ROOT actually returns aes-256-gcm and the assertion fires. Confirmed on a zfs-encrypt VM run: "ZFS encryption (aes-256-gcm) verified on running system."
*	fix(build): clear stale archzfs from the pacoloco cache too	Craig Jennings	2026-05-22	3	-0/+60
\| \| \| \| \| \|	archzfs re-uploads its GitHub release assets under the same filename, so pacoloco keeps serving a zfs-dkms/zfs-utils it cached earlier while pacman fetches a fresh archzfs.db with a new checksum. The two mismatch and pacstrap aborts with "invalid or corrupted package." build.sh already drops the stale packages from the host pacman cache, but it never cleared the pacoloco layer, which the VM test installs route through too, so test-install.sh kept hitting the corruption (four times in one session). build.sh runs as root, so it now clears /var/cache/pacoloco/pkgs/archzfs/zfs-* alongside the host cache, which makes the build-then-test flow self-healing. The pacoloco cache is root-owned and test-install.sh runs as the user, so it can't clear it unattended. Instead, test-install.sh now recognizes the corruption (is_archzfs_cache_corruption) and prints how to clear it, the way it already names the SSH_PORT override on a port collision. A retry alone won't help since it hits the same cached file, so this fails fast with the hint rather than retrying.
*	fix(test): fail clearly when the VM forward port is taken	Craig Jennings	2026-05-22	2	-0/+58
\| \| \| \| \| \|	A test run launched qemu without first checking the SSH forward port, so a collision with another VM already holding it surfaced only as an opaque "Failed to start VM," with qemu unable to bind and no hint why. I added a port_in_use check in run_test before the launch: it errors with the port number and the SSH_PORT override to set, records the failure, and moves on. The check lives in run_test, not start_vm, because start_vm runs in a command substitution (vm_pid=$(start_vm ...)) where this harness's non-exiting error() would be captured as the PID instead of failing the run. The pure half, port_listening_in, takes an `ss -tln` snapshot as a string so it's unit-testable.
*	test: make SSH_PORT overridable in test-install.sh	Craig Jennings	2026-05-22	2	-1/+24
\| \| \| \|	The port was hardcoded, so a test run collided with any other VM already forwarding 2222. It now defaults to 2222, so existing invocations are unchanged. SSH_PORT=2223 scripts/test-install.sh picks a free port to run alongside another VM.
*	feat(install): add pre-flight environment and disk-target validation	Craig Jennings	2026-05-22	6	-3/+369
\| \| \| \| \| \| \| \| \| \|	archangel went straight from filesystem selection into a destructive install behind only a root check and a ZFS module load. A missing tool, a BIOS boot, a too-small or in-use disk, or a dead network surfaced as a confusing abort partway through, sometimes after partitioning had already run. Two gates now fail fast. validate_environment runs after filesystem selection, before any disk is touched: it confirms UEFI boot mode and that every required command is present, with the list coming from a new required_commands helper built like pacstrap_packages. validate_install_targets runs after disk selection, before the first wipe: it refuses a target that's mounted, holds active swap, or belongs to an imported pool or md array, rejects disks under 20 GB, and confirms a mirror is reachable via DNS plus a TCP probe (no ICMP, since some networks drop it). I folded the install_failure_cleanup hardening into the same change. It now falls back to lazy unmounts, so a pacstrap-interrupted target with busy bind mounts still releases the pool and unmounts the EFI partition. Without that, the disk-in-use guard would block the very retry the cleanup exists to enable. "Re-run to retry" only holds if the disk is genuinely freed first. The 20 GB floor is decimal on purpose. It reads as the natural minimum and clears a 20 GiB disk image with headroom instead of sitting on the boundary.
*	feat(test): retry pacstrap through transient mirror flakes	Craig Jennings	2026-05-20	2	-26/+298
\| \| \| \| \| \| \| \|	test-install.sh aborts a whole 5-minute VM run when pacstrap hits a transient mirror blip, and the suite reports a failure indistinguishable from a real install regression. run_test now retries the install up to twice, but only when the in-VM log shows both pacstrap's "Failed to install packages to new root" marker and a download/network indicator. A deterministic failure like "target not found" carries the marker without a network indicator, so it still fails fast. archangel's failure trap exports the pool and unmounts on abort, so each retry re-partitions and re-pacstraps from a clean state. Wiring the predicate up needed a source-guard so bats can source the harness, which had none. With that in place I unit-covered the pure helpers — is_transient_install_failure, char_to_qemu_key, get_disk_count, get_disk_args — and lifted char_to_qemu_key out of monitor_sendkeys so the QEMU keymap is testable on its own. The keymap test found a dead branch. The backslash case pattern was '\\', which never matches a lone backslash because bash matches one against '\', so a passphrase containing a backslash would have sent an invalid QEMU keyname instead of "backslash". No test passphrase uses one, so it never bit. I fixed the pattern.
*	feat(build): route VM-internal pacstrap through host pacoloco	Craig Jennings	2026-05-19	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The build-host pacoloco routing from e2eb958 only covered mkarchiso's pacstrap. VMs spawned by scripts/test-install.sh ran their own pacstrap inside the guest, fetching ~600 packages per config from upstream and re-hitting the same archzfs corruption that bites the build host. A full 12-config test-install run exposed 7200+ package downloads to upstream flake. I added a routing step to run_install() in test-install.sh, after the config file gets SCP'd to the VM and before archangel runs. It detects pacoloco on the host (port 9129, same probe as build.sh's) and rewrites the live system's /etc/pacman.conf over SSH. [core] and [extra] swap their Include lines for Server lines pointing at 10.0.2.2:9129/repo/archlinux/$repo/os/$arch. A preempt [archzfs] block lands ahead of archangel's default insertion. 10.0.2.2 is QEMU's SLIRP default gateway as seen from the guest, so the host's localhost:9129 maps to that address inside the VM. Pacoloco binds 0.0.0.0:9129, reachable from there without firewall changes. The preempt matters because archangel's install_base checks for an existing [archzfs] block in /etc/pacman.conf and skips its own insertion when one is already there. Writing the pacoloco-routed [archzfs] up front means archangel keeps the routed version. The installed system's $MNTPOINT/etc/pacman.conf isn't touched: it gets upstream URLs like before, since the installed system shouldn't depend on the test host's proxy. The status message uses a plain echo rather than test-install.sh's info() function. run_install() runs inside a bash -c subshell at line 864 that only exports ssh_cmd and run_install via declare -f. A bare info call there resolves to /usr/bin/info (the GNU info reader) and prints a confusing "No menu item" error. An inline comment in the code records the pitfall. Verified end-to-end with scripts/test-install.sh single-disk: pacoloco's cache grew from 77MB (post-build) to 953MB (post-VM-install), the VM's pacstrap completed cleanly, and the install verified. Bats: still 181.
*	refactor: extract validate_encryption_passphrase from gather_input	Craig Jennings	2026-05-19	3	-11/+56
\| \| \| \| \| \| \| \| \| \|	gather_input's unattended branch had two parallel if-blocks, one for ZFS and one for Btrfs, each doing the same encryption-passphrase empty check against a filesystem-specific variable (ZFS_PASSPHRASE or LUKS_PASSPHRASE). The two blocks shared the condition surface and error template. Only the variable name differed. I lifted the check into validate_encryption_passphrase in lib/config.sh next to validate_filesystem. The helper takes the variable name and uses indirect expansion (${!var_name}) so one function covers both filesystems. gather_input now dispatches via if/elif on FILESYSTEM and calls the helper with the right variable, collapsing 14 lines to 6. The original tests in test_archangel.bats (gather_input errors when ZFS without ZFS_PASSPHRASE / when Btrfs without LUKS_PASSPHRASE / accepts ZFS with NO_ENCRYPT=yes) still pass, exercising the helper through the dispatch. Added 4 direct unit tests in test_config.bats covering the four cases: NO_ENCRYPT=yes passes regardless, NO_ENCRYPT=no with empty fails, NO_ENCRYPT=no with value passes, and the error message names the offending variable. Bats: 177 → 181. No behavior change. The helper preserves the original error message format and exit conditions.
*	refactor: lift FILES= keyfile sed to ensure_initramfs_files helper	Craig Jennings	2026-05-19	3	-5/+65
\| \| \| \| \| \| \| \|	btrfs.sh's configure_btrfs_initramfs had a six-line inline that ensured mkinitcpio.conf's FILES= line listed the LUKS keyfile: sed-replace the existing FILES= line, then grep + append as a fallback when no FILES= line existed. The pattern is mkinitcpio-specific and self-healing rather than error-on-miss (FILES= is optional in mkinitcpio.conf, so missing means "no extra files," not a broken config). I lifted the block into ensure_initramfs_files in lib/common.sh next to prepend_grub_cmdline_linux, then collapsed the btrfs.sh call site to a single ensure_initramfs_files line. Added three bats tests for the three cases (FILES= present and empty, FILES= present with a different value, FILES= absent entirely). Bats: 174 → 177. No behavior change. The helper's logic matches the inline byte-for-byte: same sed pattern, same grep fallback, same final state.
*	feat(build): route pacstrap through pacoloco when available	Craig Jennings	2026-05-19	2	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \|	build.sh now checks for a pacoloco caching proxy on localhost:9129 before mkarchiso runs. When pacoloco is up, build.sh rewrites the build profile's pacman.conf to point [core], [extra], and [archzfs] at the proxy. When it isn't, the build falls back to the upstream mirrors and the GitHub-releases URL it always used. The motivation is the recurring archzfs corruption that's hit ~2 of 3 builds. The earlier cache-hygiene step (db851ff) clears the host's pacman cache so pacstrap can't reuse a corrupted package between builds. Pacoloco is the next layer. It caches successful fetches, so once a known-good copy of zfs-dkms or zfs-utils lands, future builds skip the GitHub roundtrip entirely. Pacoloco doesn't validate checksums itself, so a corrupted upstream fetch still fails the build at pacstrap. Once a clean copy lands, the cache stays clean. Detection uses bash /dev/tcp (no external dependency on nc or netcat). Two sed lines rewrite the URLs in the freshly-copied profile pacman.conf. The fallback prints an info message so the operator knows which mode the build ran in. README's "Build Host Requirements" section now lists pacoloco as an optional dependency with install + enable steps. Pacoloco is from AUR (~yay -S pacoloco~). The config in /etc/pacoloco.yaml needs an archzfs repo entry, documented in the README block. I verified end-to-end: installed pacoloco, configured /etc/pacoloco.yaml for archlinux (host mirrorlist) + archzfs (GitHub releases URL), enabled the systemd service, smoke-tested with curl (archzfs.db and core.db both served 200), then ran a full build. The build completed in ~4 minutes. The info message confirmed pacoloco routing. Pacoloco's cache filled with the freshly-fetched archzfs packages (zfs-dkms-2.4.2-1, zfs-utils-2.4.2-2) plus the Arch packages that weren't already in the host pacman cache. Build log and ISO landed at the expected paired names.
*	fix(build): drop cached archzfs packages before mkarchiso runs	Craig Jennings	2026-05-19	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	The upstream archzfs mirror has handed us corrupted .pkg.tar.zst files several times now. Past builds aborted at pacstrap with "File X is corrupted (invalid or corrupted package (checksum))". The cached package sits in /var/cache/pacman/pkg/ between builds, so once it's bad every subsequent build fails until someone manually deletes it. I added a step right before mkarchiso that rm -f's the archzfs entries (zfs-dkms-, zfs-utils-) from the host pacman cache. Pacstrap fetches fresh copies on the next build. Costs ~30s of bandwidth on a warm mirror. Saves debugging a baffling pacstrap abort. The proper permanent fix is a local caching proxy (pacoloco or flexo). That lives on the Medium Lifts backlog. This commit is the cheap interim that addresses the recurring failure mode without new infrastructure. Shellcheck reports clean. The rm -f globs handle empty-match cleanly (silent return). I skipped a full build re-run since the parent commit (b4c0f72) already verified the surrounding mkarchiso path. This is a defensive prepend with no interaction surface.
*	feat(build): save build logs to out/ alongside the ISO	Craig Jennings	2026-05-19	2	-2/+24
\| \| \| \| \| \| \| \| \| \|	Before this commit, mkarchiso's verbose output streamed to the terminal and disappeared. The cleanup trap removed work/ after each build, taking any captured output with it. Failed builds left no log to compare against past failures — including the recurring archzfs cache corruption that hits ~2 of 3 builds. I added a tee on the mkarchiso pipeline that writes to a log file in out/. The log starts as out/build-YYYY-MM-DD-HHMM.log, pre-created and chowned to SUDO_USER. It stays user-readable even when a failed build leaves it under that name. On success, the script renames it to match the ISO's basename so finding the log for a given ISO is a one-step lookup. set -o pipefail was already on at the top of build.sh, so mkarchiso failures still propagate through the pipe and abort the script the way they used to. No exit-code masking. I verified the change end-to-end against a real build that hit archzfs cache corruption (log captured, kept as build-2026-05-19-0325.log, user-readable) followed by a clean retry (log renamed to archangel-2026-05-19-vmlinuz-6.18.29-lts-x86_64.log, paired with the ISO).
*	refactor: wire validate_config into the unattended install path	Craig Jennings	2026-05-19	2	-59/+12
\| \| \| \| \| \| \| \| \| \|	validate_config in lib/config.sh was unreachable from main(). Its empty-field checks duplicated four lines in gather_input's unattended branch. validate_config also has two checks gather_input doesn't: that every entry in SELECTED_DISKS is a real block device, and that TIMEZONE exists under /usr/share/zoneinfo. Neither check ever ran. A config with a typo'd disk path slipped past gather_input and surfaced as an obscure sgdisk error inside the destructive partitioning step. I wired validate_config into main() after validate_filesystem, gated on UNATTENDED so it only runs against an already-loaded config. I dropped the four duplicate empty-field checks from gather_input's unattended branch. The filesystem-specific passphrase checks stay there because they're coupled to the FILESYSTEM branch logic. validate_config reports every missing field at once instead of dying on the first. A config with five missing fields tells you all five in one pass. I removed the four corresponding gather_input bats tests. validate_config's existing unit tests in test_config.bats already cover their assertions. Bats: 178 → 174.
*	test(install): exercise zfssnapshot wrapper in VM verification	Craig Jennings	2026-05-14	1	-0/+154
\| \| \| \| \| \| \| \| \| \| \| \| \|	The wrapper had no runtime coverage — bats tests pin pure helpers and arg parsing only, and verify_rollback bypassed it by calling zfs snapshot / zfs rollback directly via SSH. A regression in cmd_create, cmd_rollback, or cmd_delete would only have surfaced in production. verify_zfssnapshot_wrapper runs after verify_rollback for ZFS configs (no-op for Btrfs) and exercises: - list confirms @genesis baseline - create runtime-test — recursive snapshot across all datasets - echo no \| delete --name — confirms the gate aborts (catches the -n vs = regression class) - echo yes \| delete --name — destroys across all datasets, list confirms gone - create wrapper-rollback + drop sentinel + rollback --name — round-trip restores the sentinel The function scps the working-tree wrapper to the VM before testing so the run reflects current source rather than what the ISO froze at build time. A regression here fails the test (no warn-only path) — it's the wrapper's only runtime check.
*	fix(install): drop dead zfsrollback copy from configure_zfs_tools	Craig Jennings	2026-05-14	1	-4/+1
\| \| \| \| \| \|	The 2026-04-27 consolidation (422d109) deleted installer/zfsrollback in favor of the unified zfssnapshot wrapper, but installer/archangel:configure_zfs_tools still tried to cp /usr/local/bin/zfsrollback to the installed system. With set -euo pipefail in effect, a fresh ISO + ZFS install would abort here with cp: cannot stat 'zfsrollback'. The bug shipped on main without anyone noticing because the 04-27 VM tests ran against an ISO built earlier that day, before the consolidation merged. No fresh ISO has been built since. I also dropped the leftover profile/airootfs/usr/local/bin/zfsrollback file that build.sh no longer regenerates.
*	feat: add --name flag to zfssnapshot rollback and delete	Craig Jennings	2026-05-14	2	-32/+212
\| \| \| \| \| \|	I added --name NAME to rollback (single name) and --name NAME[,NAME...] to delete (comma-separated for multi-select) so scripted callers can drive the wrapper without fzf. The upcoming VM verification step in scripts/test-install.sh needs this. fzf is now conditional, required only when --name is omitted. The 10 new bats tests cover help-text mentions, parse success and failure modes (missing value, mutex with -s, unknown flag), fzf-bypass on both subcommands, and multi-name expansion on delete.
*	refactor: drop dead variables from lib/config.sh	Craig Jennings	2026-05-14	1	-3/+0
\| \| \| \|	I dropped ENCRYPTION_ENABLED, SSH_ENABLED, and SSH_KEY — declarations with no readers anywhere in the project. The live names (NO_ENCRYPT, ENABLE_SSH) handle these settings instead. The example config files already reference those.
*	feat: consolidate zfssnapshot and zfsrollback into one subcommand-driven script	Craig Jennings	2026-04-27	8	-274/+573
\| \| \| \| \| \| \| \| \| \| \| \|	Problem: zfssnapshot and zfsrollback were two separate scripts with overlapping pre-flight checks (zfs / fzf / root) and parallel UX patterns (description sanitization in one, fzf selection in the other). Users had to remember which script was for which operation, and a "list" view meant typing the raw `zfs list -t snapshot` command. There was no path to destroy individual snapshots short of `zfs destroy` directly, which is dangerous without a confirmation flow. Solution: rewrite zfssnapshot as a single multi-subcommand script (list, create, rollback, delete). Drop installer/zfsrollback. The new script uses a source-guard at the bottom (`if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then main "$@"; fi`) so bats can source it without triggering the install-time pre-flight checks, matching the pattern in installer/archangel. Pure helpers (sanitize_description, validate_description, format_snapshot_name) get extracted as named functions so they're testable in isolation. The destructive flows (rollback, delete) keep the explicit "yes" confirmation prompt, the genesis-snapshot warning, and the recursive-rollback-destroys-newer-snapshots warning. Delete uses fzf --multi so the user can pick several snapshot names at once. Updated build.sh to copy only the consolidated script. Dropped the zfsrollback profiledef permission line. Updated Makefile, README, scripts/sanity-test.sh, and testing-strategy.org to reflect the single-script layout. Bats: 147 → 168 (+21). Coverage spans sanitize_description (normal / boundary / error), validate_description (alphanumerics, hyphens, underscores accepted; spaces, slashes, shell metacharacters, empty rejected), format_snapshot_name (timestamp + description composition), and main subcommand dispatch (list / create / rollback / delete / help / unknown). Lint clean. The zfs-, fzf-, and arch-chroot-shelling subcommand bodies stay VM-tested per testing-strategy.org.
*	refactor: drop dead configure_luks_grub from Btrfs install path	Craig Jennings	2026-04-27	1	-28/+0
\| \| \| \| \| \| \| \| \| \|	Problem: configure_luks_grub appended GRUB_ENABLE_CRYPTODISK=y and prepended cryptdevice= to /etc/default/grub during the LUKS-target setup phase, but configure_grub at lib/btrfs.sh:578 does `cat > /etc/default/grub` later in the same install, with a single redirect that overwrites the file. Between the two, only generate_btrfs_fstab and configure_btrfs_initramfs run, neither of which touches /etc/default/grub. So configure_luks_grub's writes never reach the installed system. The live LUKS-cmdline work is configure_grub's own LUKS-enabled block at lib/btrfs.sh:597-627. Solution: drop configure_luks_grub from btrfs_configure_luks_target and delete the function (no other callers). configure_luks_initramfs stays since it writes to mkinitcpio, not /etc/default/grub. VM tests on the btrfs-luks path have always been passing because they exercise configure_grub's live block. prepend_grub_cmdline_linux already has bats coverage for the live cmdline path. Bats: 147, 0 fail. Lint clean.
*	refactor: extract MNTPOINT constant for the install chroot mount point	Craig Jennings	2026-04-27	5	-131/+143
\| \| \| \| \| \| \| \| \| \| \| \|	Last on the tech-debt drain. The installer hardcoded /mnt at 50+ sites: pacstrap, arch-chroot, mount/umount, fstab writes, and every host-side write into the chroot's /etc, /usr, /var, /boot, /tmp. Same magic-string smell as /mnt/efi but at much larger scale. Add MNTPOINT="/mnt" to lib/common.sh next to EFI_DIR. Replace literal /mnt/... with $MNTPOINT/... across installer/archangel, installer/lib/btrfs.sh, and installer/lib/common.sh. Replace bare /mnt (mount target, arch-chroot root, umount target, install_dropin parameter) with $MNTPOINT. EFI_DIR's own definition becomes EFI_DIR="$MNTPOINT/efi" for the natural composition. Folded in the related ticket: /mnt${chroot_efi_dir} in btrfs.sh:install_grub_all_efi becomes ${MNTPOINT}${chroot_efi_dir}. Was filed as a separate item but the ticket said it should ship with the MNTPOINT extraction, since the composition pattern is unusual and easy to miss in a global sed. Three /mnt references kept literal in comments where the comment describes the string concept rather than the mount point ("Remove /mnt prefix - config is used inside chroot where root is /", etc.). Substituting to $MNTPOINT in those comments would obscure the documentation. Bats: 146 → 147. One new test in test_common.bats pins MNTPOINT="/mnt". Lint clean (one shellcheck SC2295 warning fixed by quoting the parameter expansion: ${isp_firmware#"$MNTPOINT"}). VM verification deferred to a single full make test-install run after all three tech-debt commits land.
*	refactor: verify GRUB_CMDLINE_LINUX seds via prepend_grub_cmdline_linux helper	Craig Jennings	2026-04-27	5	-3/+95
\| \| \| \| \| \| \| \| \| \| \| \|	Audited the ~10 silent sed -i sites in the installer against the verification-after pattern that landed for sshd_config last session. Triaged each by failure mode. The two GRUB_CMDLINE_LINUX seds in lib/btrfs.sh have a real silent-failure risk. If /etc/default/grub is missing or malformed and the sed pattern doesn't match, nothing happens. The kernel boots without cryptdevice=. The system can't unlock LUKS at boot. Added prepend_grub_cmdline_linux to lib/common.sh. Same shape as enable_sshd_root_login (sed, then grep, then error if the line wasn't modified). Replaced the two inline seds with helper calls. The HOOKS= seds in installer/archangel and lib/btrfs.sh (six total) don't need verification. A missing HOOKS= line makes mkinitcpio -P fail loudly downstream, so silent-replace failure can't reach a booted system. Added a one-line audit-rationale comment at each of the three locations so the next reader doesn't re-litigate the decision. The FILES= sed at lib/btrfs.sh:213 already self-heals via a sed-then-grep-then-append pattern, so no behavior change there. Filed a separate follow-up to lift that pattern into a named helper for clarity. Bats: 142 → 146. Four new tests in test_common.bats cover normal (empty cmdline, existing cmdline preserved, other lines preserved) and error (missing GRUB_CMDLINE_LINUX line). Lint clean.
*	refactor: consolidate installer defaults and FILESYSTEM validation into ↵	Craig Jennings	2026-04-27	5	-75/+84
\| \| \| \| \| \| \| \| \| \| \| \|	config.sh The installer had three sites touching FILESYSTEM: a top-level default in the monolith, a re-default block in gather_input, and a runtime validation block also in gather_input. The same scattering existed for LOCALE, KEYMAP, ENABLE_SSH, and NO_ENCRYPT. A future contributor changing one site wouldn't have known the other two existed. Move all five defaults into the lib/config.sh declarations so config.sh is the single source of truth. Add validate_filesystem() in lib/config.sh and call it from main() between check_config and gather_input, so a typo in a config file's FILESYSTEM= fails fast before any install action runs. The behavior change is stricter. An empty FILESYSTEM in a config file used to be silently defaulted to zfs, now it errors. Interactive mode is unaffected. select_filesystem still controls the value and already errored on cancellation. Bats: 140 → 142. Five tests added in test_config.bats for the defaults pinning and validate_filesystem coverage. Three removed from test_archangel.bats for behavior that moved out of gather_input. Lint clean.
*	refactor: collapse sshd_config seds into enable_sshd_root_login	Craig Jennings	2026-04-26	3	-3/+92
\| \| \| \| \| \| \| \| \| \| \| \|	The two sed -i invocations in configure_ssh worked on stock Arch sshd_config but had a real silent-failure mode. If neither the commented (#PermitRootLogin) nor the uncommented form was present, both seds did nothing and the install shipped without root SSH. The user discovered it at first ssh attempt, not at install time. The second sed was also redundant. By the time it ran, the first sed had produced a line matching the second sed's pattern. The new enable_sshd_root_login helper in lib/common.sh combines both substitutions into one sed -i -e ..., then verifies PermitRootLogin yes is present in the file. If the verification fails, it calls error rather than silently appending. Silent appending would mask a corrupted starting file, which is exactly the failure mode worth flagging loudly. The helper takes the config path as an argument so the bats tests in commit 7486abb can run unprivileged against tempfiles. configure_ssh passes /mnt/etc/ssh/sshd_config and is now a single call instead of two seds. Verified: bats 135 → 140 (+5 covering normal/boundary/error). Lint clean. Helper smoke-tested against current Arch sshd_config. The loud-error path can't be exercised against the live default but is covered by the bats error case. Filed as a follow-up :techdebt: item: ~10 other sed -i sites in installer/archangel and lib/btrfs.sh follow the same silent-replace pattern. The FILES= site for LUKS is the worst (silent failure means LUKS prompts on every boot). Triage each per this same recipe in a future session.
*	refactor: extract EFI_DIR constant for the install-time EFI mount point	Craig Jennings	2026-04-26	4	-18/+36
\| \| \| \| \| \| \| \| \| \|	The literal /mnt/efi appeared at 17 sites across installer/archangel and installer/lib/btrfs.sh. Renaming it (or pointing tests at a different mount) meant touching every site and risking incomplete sweeps. One canonical name in installer/lib/common.sh now backs every reference. EFI_DIR has no trailing slash so the three expansion patterns in the codebase compose cleanly. Bare ($EFI_DIR), sub-path ($EFI_DIR/EFI/ZBM), and the index-suffix used by install_grub_all_efi for secondary EFI mounts (${EFI_DIR}${i}). The sync_efi_partitions staging path also moves from the literal /mnt/efi_sync to ${EFI_DIR}_sync, so it follows EFI_DIR if anyone ever changes the base. Two follow-ups filed as separate :techdebt: items. MNTPOINT=/mnt extraction across the 50+ /mnt/... sites (pacstrap, arch-chroot, fstab writes), and the related /mnt${chroot_efi_dir} composition pattern at btrfs.sh:681-682. Both ship together when MNTPOINT lands. Verified: bats 134 → 135 (+1 pinning EFI_DIR=/mnt/efi). Lint clean. All four expansion patterns smoke-tested at runtime and produce the original literal byte-for-byte. VM run skipped, pure constant substitution with zero behavior change.
*	refactor: unify partition_disks across ZFS and Btrfs install paths	Craig Jennings	2026-04-26	4	-187/+202
\| \| \| \| \| \| \| \| \| \| \| \|	The monolith's partition_disks() at installer/archangel was ZFS-only and silently shadowed lib/disk.sh:partition_disks(), which had been dead code since the Btrfs install path was added. install_btrfs was assembling partitioning manually via partition_disk (singular) plus a separate format_efi_partitions call. Two parallel implementations meant fixes had to land in two places and the lib version drifted with no visible warning. The unified partition_disks now lives in lib/disk.sh. It reads SELECTED_DISKS, dispatches the per-disk layout on FILESYSTEM (BF00 for ZFS, 8300 for Btrfs), populates EFI_PARTS + ROOT_PARTS, and formats each EFI partition with EFI0, EFI1, ... labels. Folded in two pre-existing divergences while consolidating. wipefs -af now runs on every disk, not just the ZFS path. The Btrfs path was missing this defense against non-GPT signatures (LVM, mdadm, ext) that sgdisk --zap-all alone won't touch. EFI labels standardized on EFI0, EFI1, ... across both paths. The lib version was producing asymmetric EFI / EFI2 labels. No consumer in the repo references the labels after format, so that change is cosmetic. ZFS_PARTS renamed to ROOT_PARTS for symmetry with get_root_partition. Deleted four orphaned helpers: format_efi, format_efi_partitions (only caller was the collapsed install_btrfs section), get_efi_partitions, get_root_partitions (test-only callers after install_btrfs simplified). Verified end to end: bats 134/134, make test-install passing 12/12 configs across both install paths.
*	fix: clean up partial installs via ERR/INT/TERM trap	Craig Jennings	2026-04-26	2	-0/+163
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Failed installs left the system in an inconsistent state: /mnt mounted, the zpool imported, possibly LUKS containers open. The user had to manually unmount, export the pool, and clean up partitions before re-running the installer. The existing trap at the top of archangel was `trap 'error "Installation interrupted!"' INT TERM`. It just printed a message and exited. There was no ERR trap, so `set -e`-aborted commands ran no cleanup either. I added `install_failure_cleanup` next to the existing `cleanup`. It captures `$?` first, disarms the trap to prevent recursion, clears sensitive variables (ROOT_PASSWORD, ZFS_PASSPHRASE, LUKS_PASSPHRASE), and dispatches on FILESYSTEM: - ZFS path: unmount /mnt/efi, recursive umount /mnt, export the pool (with `-f` fallback) if it's still imported. - Btrfs path: unmount /mnt/efi, call the existing btrfs_cleanup and btrfs_close_encryption helpers. All cleanup steps swallow their own errors with `\|\| true` since partial state is expected when this fires mid-install. `install_zfs` and `install_btrfs` now both arm the trap as their first action and disarm it just before the success-path cleanup. The disarm matters because the success cleanup calls `zpool export` (or btrfs_close_encryption) directly, and those can produce non-zero exit codes that we don't want to interpret as "installation failed". The note in notes.org described this as "install_zfs has no mid-step recovery." The framing was off. Both paths were exposed: install_btrfs's `btrfs_cleanup` only runs on the success path, same as `cleanup` for ZFS. Both paths now have the same recovery shape. Added 4 bats tests for `install_failure_cleanup` that mock the system tools (umount, zpool, btrfs_cleanup, btrfs_close_encryption) via function override and track invocations through a CALLS array. Array assignment isn't affected by the production code's `>/dev/null 2>&1` redirects on `zpool list`, so we capture the call regardless of where the mock's stdout would have gone. Verified end-to-end on the dev box: sourced archangel, set FILESYSTEM=zfs, armed the trap, ran `false` to trigger `set -e`. The trap fired with exit code 1, dispatched to the ZFS cleanup path, called `umount /mnt/efi` and `umount -R /mnt`, checked `zpool list` (returned non-zero since no pool exists on the dev box), skipped the export, and exited via `error`. No behavior change on the success path. The existing `cleanup` and `btrfs_cleanup` stay unchanged.
*	test: expand bats coverage across installer modules	Craig Jennings	2026-04-26	6	-15/+501
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added unit tests for `disk.sh`, `btrfs.sh`, the archangel monolith's `gather_input` unattended branch, and filled gap cases in `config.sh`. The suite grew from 71 to 110 tests. `installer/lib/disk.sh` was completely uncovered. New `tests/unit/test_disk.bats` covers the four pure partition-path helpers (`get_efi_partition`, `get_root_partition`, `get_efi_partitions`, `get_root_partitions`) across SATA, virtio, and NVMe inputs, mixed arrays, and the empty-input behavior. Side-effecting functions in the same file (sgdisk, mkfs.fat, partprobe, and fzf wrappers) stay deliberately VM-tested. `installer/lib/btrfs.sh` had no bats coverage. New `tests/unit/test_btrfs.bats` covers `get_luks_devices`, the only pure helper in the file. It pins the asymmetric naming convention where the first device gets the bare `LUKS_MAPPER_NAME` and subsequent devices append the index. The archangel monolith was un-source-able for tests because its top-level code created a /tmp log file and redirected stdout via `exec > >(tee...)`, plus called `main "$@"` unconditionally at the bottom. I extracted the logging setup into an `init_logging` function called from `main`, and wrapped the main call in a `[[ "${BASH_SOURCE[0]}" == "${0}" ]]` guard. Sourcing the script now loads function definitions silently, with no log file and no banner. Running it directly works exactly as before. Verified both paths. That refactor unlocks `tests/unit/test_archangel.bats`, which covers `gather_input` in unattended mode. Required-field validation for HOSTNAME, TIMEZONE, ROOT_PASSWORD, and DISKS. Optional-field defaulting (FILESYSTEM to zfs, LOCALE to en_US.UTF-8, KEYMAP to us, ENABLE_SSH to yes). Filesystem-specific encryption checks (ZFS_PASSPHRASE required when not NO_ENCRYPT, same for LUKS_PASSPHRASE on Btrfs). Filesystem validity. RAID_LEVEL defaulting for multi-disk installs. The interactive branch stays out of scope per the testing-strategy policy. `tests/unit/test_config.bats` got five gap tests: `check_config` when CONFIG_FILE is set, `validate_config` against a non-block-device entry (e.g. /dev/null) and a missing path, and `parse_args` accepting `--color` and `--config-file` together in either order. `testing-strategy.org` got an expanded "What bats does NOT cover" section. The doc previously named six tools (mkfs, cryptsetup, zpool create, pacstrap, arch-chroot, grub-install). The new list adds sgdisk, partprobe, blkid, mkfs.fat, mkfs.btrfs, snapper, efibootmgr, mount, umount, findmnt, mountpoint, and fzf. It also names the conditions (root needed, real /dev or /sys state) that make a function VM-only. The coverage table at the top now lists the three new test files. No behavior change in production code. The init_logging extraction preserves the existing log path and banner format byte-for-byte.
*	refactor: extract get_raid_level fzf preview into raid.sh helper	Craig Jennings	2026-04-26	3	-102/+194
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	`get_raid_level()` carried a 98-line inline fzf `--preview` shell snippet that contained five nearly-parallel `case` branches emitting per-level preview text, plus three exported variables (`RAID_DISK_COUNT`, `RAID_TOTAL_GB`, `RAID_SMALLEST_GB`) just to pass values into the fzf preview subshell. The inline shell-in-shell had no syntax highlighting, no shellcheck on the inner snippet, and any edit to preview copy meant editing inside a single-quoted argument. I extracted the per-level text into a new `raid_preview(level, disk_count, total_gb, smallest_gb)` helper in `lib/raid.sh`. It reuses the existing `raid_fault_tolerance` and `raid_usable_bytes` primitives for the data lines instead of redoing the arithmetic inline. That keeps the math in one place. The fzf `--preview` argument is now a one-liner that calls `raid_preview` with the sizing values, and the env-var exports are gone. `export -f raid_preview raid_fault_tolerance raid_usable_bytes` makes the functions visible in fzf's preview subshell. I verified this against a fresh `bash -c` subshell, which is what fzf spawns internally. `get_raid_level()` shrinks from 144 to 49 lines. Preview text is now bats-tested. Added 8 unit tests across the 5 RAID levels (headline, fault-tolerance line, computed usable space), mixed-size handling for mirror (smallest disk, not average), unknown level returning 1 with empty output, and a sanity loop confirming every valid level produces non-empty output. No behavior change. The preview pane shows the same text, the same level options, the same selected output. The pure logic in `lib/raid.sh` is unchanged.
*	fix: fail fast on missing ZFSBootMenu efibootmgr entry	Craig Jennings	2026-04-26	3	-10/+116
\| \| \| \| \| \| \| \| \| \|	The post-bootloader boot-order step in `configure_zfsbootmenu` parsed `efibootmgr` output through a `grep \| head \| grep -oP` chain with no null guards. If any link returned empty (the entry wasn't created, the label was different, or efibootmgr itself failed), the surrounding `if [[ -n "$bootnum" ]]` silently skipped, the install reported success, and the user rebooted into a machine that wouldn't boot ZFSBootMenu by default. I replaced the chain with two pure helpers in `lib/common.sh`, `parse_efibootmgr_entry` and `parse_efibootmgr_bootorder`. The caller in `archangel` invokes them with explicit `\|\| error` guards on each parse stage. The helpers capture `efibootmgr` output once and reuse it (it was called twice before). The same hardening covers the BootOrder lookup at the adjacent line. It used to rely on the now-removed `bootnum` guard for safety. The helpers are stdin-driven and use bash regex, so they're easy to test in bats without exercising the real efibootmgr binary. Added 9 unit tests across normal cases, hex-character boot numbers, multi-match selection, missing label, missing BootOrder line, empty input, and an empty label argument. The empty-label case would otherwise falsely match `BootCurrent` via the hex regex, capturing "C". The helper now guards it explicitly. Verified manually against real efibootmgr output (GRUB entry at Boot0001, BootOrder 0006,0001,2001,2002,2003). Both helpers parsed correctly. VM integration not re-run for this small post-bootloader change. The next scheduled `make test-install` exercises the green path.
*	fix: don't mask test-install exit codes through tee	Craig Jennings	2026-04-21	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Makefile let failures slip through silently when the caller piped output: `make test-install 2>&1 \| tee log` returned 0 even when the inner build or VM run failed, because tee's zero exit masked the pipeline. This happened today during an ISO rebuild — the first attempt failed on corrupted pacman cache but the task notification reported exit 0. Two changes: 1. Set the recipe shell flags to enable pipefail (`SHELL := /bin/bash` + `.SHELLFLAGS := -o pipefail -c`). Any intra-recipe pipeline now propagates the first non-zero exit instead of the last. Safe to add — no existing recipe uses intra-recipe pipes today. 2. Bake tee into the test-install recipe itself. Output writes to `test-logs/make-test-install-YYYY-MM-DD-HHMM.log` so callers never need to pipe through tee externally. With SHELLFLAGS pipefail in place, the test script's exit code propagates through the baked-in tee back to the caller cleanly. Verified with a repro: a recipe shaped like `failing-cmd \| tee log` returns non-zero; the 71-test bats suite still passes.
*	fix: cleanup on empty findmnt + stale profiledef entries	Craig Jennings	2026-04-21	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two pre-existing build.sh bugs surfaced while rebuilding the ISO for the first time since the April lib/ refactor. 1. safe_cleanup_work_dir runs findmnt \| grep \| sort in a command substitution. When no mounts match (the common case — this is a defensive sweep on top of the explicit known-mount loop), grep exits 1, pipefail propagates it, set -e exits the script. The trap then reports "Build interrupted or failed" and the build aborts before doing any work. Swallow the grep status with \|\| true; no-match is not an error here. 2. The profiledef.sh file_permissions seds drifted out of sync with the lib/ refactor: - lib/zfs.sh was removed in 3321f0f but its sed block stayed, producing a warning every build ("Cannot change permissions of '.../lib/zfs.sh': no such file"). - lib/raid.sh was added in 19f4624 without a corresponding sed, so the file shipped with whatever permissions cp gave it. Drop the zfs.sh block and add the raid.sh block.
*	fix: verify_rollback sentinel must live on the rolled-back dataset	Craig Jennings	2026-04-21	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	/root is mounted on a separate dataset (zroot/home/root, created by archangel:create_datasets), but verify_rollback was snapshotting zroot/ROOT/default. The rollback was a no-op for the sentinel file, so the post-rollback existence check failed — the visible symptom was a PASSED test with a soft-failure warning ("Rollback failed - test file not restored" → "Rollback verification had issues") that persisted across ZFS configs for weeks. Move the sentinel to /etc/archangel-rollback-test. /etc has no child dataset mounted there, so the file lives on zroot/ROOT/default — the dataset actually being snapshotted and rolled back. Defensively single-quote $test_file at the five ssh_cmd call-sites so future path changes (whitespace, special chars) stay correct without touching each call again. The 2026-04-21 VM run logged "Rollback verified - test file restored" on zfs-mirror-encrypt, confirming the fix.