| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
The ZFS native-encryption assertion lived in verify_install, which runs in the live ISO before reboot. But archangel exports zroot at the end of the install, so verify_install bails at "ZFS pool not found" and never reaches the check. It was dead code: the encrypted-config tests passed on the reboot path (entering the passphrase at ZFSBootMenu and booting is itself proof), while the explicit aes-256-gcm assertion gave false confidence by never running.
I moved it into verify_reboot_survival, which ssh's into the booted system where zroot is imported, so zfs get encryption zroot/ROOT actually returns aes-256-gcm and the assertion fires. Confirmed on a zfs-encrypt VM run: "ZFS encryption (aes-256-gcm) verified on running system."
|
| |
|
|
|
|
| |
archzfs re-uploads its GitHub release assets under the same filename, so pacoloco keeps serving a zfs-dkms/zfs-utils it cached earlier while pacman fetches a fresh archzfs.db with a new checksum. The two mismatch and pacstrap aborts with "invalid or corrupted package." build.sh already drops the stale packages from the host pacman cache, but it never cleared the pacoloco layer, which the VM test installs route through too, so test-install.sh kept hitting the corruption (four times in one session).
build.sh runs as root, so it now clears /var/cache/pacoloco/pkgs/archzfs/zfs-* alongside the host cache, which makes the build-then-test flow self-healing. The pacoloco cache is root-owned and test-install.sh runs as the user, so it can't clear it unattended. Instead, test-install.sh now recognizes the corruption (is_archzfs_cache_corruption) and prints how to clear it, the way it already names the SSH_PORT override on a port collision. A retry alone won't help since it hits the same cached file, so this fails fast with the hint rather than retrying.
|
| |
|
|
|
|
| |
A test run launched qemu without first checking the SSH forward port, so a collision with another VM already holding it surfaced only as an opaque "Failed to start VM," with qemu unable to bind and no hint why. I added a port_in_use check in run_test before the launch: it errors with the port number and the SSH_PORT override to set, records the failure, and moves on.
The check lives in run_test, not start_vm, because start_vm runs in a command substitution (vm_pid=$(start_vm ...)) where this harness's non-exiting error() would be captured as the PID instead of failing the run. The pure half, port_listening_in, takes an `ss -tln` snapshot as a string so it's unit-testable.
|
| |
|
|
| |
The port was hardcoded, so a test run collided with any other VM already forwarding 2222. It now defaults to 2222, so existing invocations are unchanged. SSH_PORT=2223 scripts/test-install.sh picks a free port to run alongside another VM.
|
| |
|
|
|
|
|
|
| |
test-install.sh aborts a whole 5-minute VM run when pacstrap hits a transient mirror blip, and the suite reports a failure indistinguishable from a real install regression. run_test now retries the install up to twice, but only when the in-VM log shows both pacstrap's "Failed to install packages to new root" marker and a download/network indicator. A deterministic failure like "target not found" carries the marker without a network indicator, so it still fails fast. archangel's failure trap exports the pool and unmounts on abort, so each retry re-partitions and re-pacstraps from a clean state.
Wiring the predicate up needed a source-guard so bats can source the harness, which had none. With that in place I unit-covered the pure helpers — is_transient_install_failure, char_to_qemu_key, get_disk_count, get_disk_args — and lifted char_to_qemu_key out of monitor_sendkeys so the QEMU keymap is testable on its own.
The keymap test found a dead branch. The backslash case pattern was '\\', which never matches a lone backslash because bash matches one against '\', so a passphrase containing a backslash would have sent an invalid QEMU keyname instead of "backslash". No test passphrase uses one, so it never bit. I fixed the pattern.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The build-host pacoloco routing from e2eb958 only covered mkarchiso's pacstrap. VMs spawned by scripts/test-install.sh ran their own pacstrap inside the guest, fetching ~600 packages per config from upstream and re-hitting the same archzfs corruption that bites the build host. A full 12-config test-install run exposed 7200+ package downloads to upstream flake.
I added a routing step to run_install() in test-install.sh, after the config file gets SCP'd to the VM and before archangel runs. It detects pacoloco on the host (port 9129, same probe as build.sh's) and rewrites the live system's /etc/pacman.conf over SSH. [core] and [extra] swap their Include lines for Server lines pointing at 10.0.2.2:9129/repo/archlinux/$repo/os/$arch. A preempt [archzfs] block lands ahead of archangel's default insertion.
10.0.2.2 is QEMU's SLIRP default gateway as seen from the guest, so the host's localhost:9129 maps to that address inside the VM. Pacoloco binds 0.0.0.0:9129, reachable from there without firewall changes.
The preempt matters because archangel's install_base checks for an existing [archzfs] block in /etc/pacman.conf and skips its own insertion when one is already there. Writing the pacoloco-routed [archzfs] up front means archangel keeps the routed version. The installed system's $MNTPOINT/etc/pacman.conf isn't touched: it gets upstream URLs like before, since the installed system shouldn't depend on the test host's proxy.
The status message uses a plain echo rather than test-install.sh's info() function. run_install() runs inside a bash -c subshell at line 864 that only exports ssh_cmd and run_install via declare -f. A bare info call there resolves to /usr/bin/info (the GNU info reader) and prints a confusing "No menu item" error. An inline comment in the code records the pitfall.
Verified end-to-end with scripts/test-install.sh single-disk: pacoloco's cache grew from 77MB (post-build) to 953MB (post-VM-install), the VM's pacstrap completed cleanly, and the install verified. Bats: still 181.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The wrapper had no runtime coverage — bats tests pin pure helpers and arg parsing only, and verify_rollback bypassed it by calling zfs snapshot / zfs rollback directly via SSH. A regression in cmd_create, cmd_rollback, or cmd_delete would only have surfaced in production.
verify_zfssnapshot_wrapper runs after verify_rollback for ZFS configs (no-op for Btrfs) and exercises:
- list confirms @genesis baseline
- create runtime-test — recursive snapshot across all datasets
- echo no | delete --name — confirms the gate aborts (catches the -n vs = regression class)
- echo yes | delete --name — destroys across all datasets, list confirms gone
- create wrapper-rollback + drop sentinel + rollback --name — round-trip restores the sentinel
The function scps the working-tree wrapper to the VM before testing so the run reflects current source rather than what the ISO froze at build time. A regression here fails the test (no warn-only path) — it's the wrapper's only runtime check.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Problem: zfssnapshot and zfsrollback were two separate scripts with overlapping pre-flight checks (zfs / fzf / root) and parallel UX patterns (description sanitization in one, fzf selection in the other). Users had to remember which script was for which operation, and a "list" view meant typing the raw `zfs list -t snapshot` command. There was no path to destroy individual snapshots short of `zfs destroy` directly, which is dangerous without a confirmation flow.
Solution: rewrite zfssnapshot as a single multi-subcommand script (list, create, rollback, delete). Drop installer/zfsrollback. The new script uses a source-guard at the bottom (`if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then main "$@"; fi`) so bats can source it without triggering the install-time pre-flight checks, matching the pattern in installer/archangel.
Pure helpers (sanitize_description, validate_description, format_snapshot_name) get extracted as named functions so they're testable in isolation. The destructive flows (rollback, delete) keep the explicit "yes" confirmation prompt, the genesis-snapshot warning, and the recursive-rollback-destroys-newer-snapshots warning. Delete uses fzf --multi so the user can pick several snapshot names at once.
Updated build.sh to copy only the consolidated script. Dropped the zfsrollback profiledef permission line. Updated Makefile, README, scripts/sanity-test.sh, and testing-strategy.org to reflect the single-script layout.
Bats: 147 → 168 (+21). Coverage spans sanitize_description (normal / boundary / error), validate_description (alphanumerics, hyphens, underscores accepted; spaces, slashes, shell metacharacters, empty rejected), format_snapshot_name (timestamp + description composition), and main subcommand dispatch (list / create / rollback / delete / help / unknown). Lint clean. The zfs-, fzf-, and arch-chroot-shelling subcommand bodies stay VM-tested per testing-strategy.org.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
/root is mounted on a separate dataset (zroot/home/root, created by
archangel:create_datasets), but verify_rollback was snapshotting
zroot/ROOT/default. The rollback was a no-op for the sentinel file,
so the post-rollback existence check failed — the visible symptom
was a PASSED test with a soft-failure warning
("Rollback failed - test file not restored" →
"Rollback verification had issues") that persisted across ZFS
configs for weeks.
Move the sentinel to /etc/archangel-rollback-test. /etc has no child
dataset mounted there, so the file lives on zroot/ROOT/default —
the dataset actually being snapshotted and rolled back.
Defensively single-quote $test_file at the five ssh_cmd call-sites
so future path changes (whitespace, special chars) stay correct
without touching each call again.
The 2026-04-21 VM run logged "Rollback verified - test file restored"
on zfs-mirror-encrypt, confirming the fix.
|
| |
|
|
|
|
|
|
|
|
|
| |
ZFS DKMS compile + depmod against kernel 6.18.22 in a 4-CPU VM under
host load exceeds 10 minutes. With INSTALL_TIMEOUT=600, all 6 ZFS test
configs timed out during the DKMS install step after pacstrap. The one
ZFS config that passed ('custom-locale', first ZFS config alphabetically)
squeaked in just under the deadline.
Bumped to 1800s (30 min). Session notes from 2026-02-12 mention this
bump but the change never made it into git.
|
| |
|
|
|
|
|
| |
Add automated tests for ZFS native encryption, matching existing Btrfs
LUKS test coverage. ZFS encrypted boot requires two passphrase entries
(ZFSBootMenu + mkinitcpio zfs hook), both sent via QEMU monitor sendkey
with timed delays since ZFSBootMenu renders to VGA, not serial.
|
| |
|
|
|
|
|
|
| |
The custom/ directory name was an archiso implementation detail. Renamed
to installer/ which clearly communicates that this directory contains the
installer scripts and utilities that ship on the ISO.
Updated all references in build.sh, Makefile, test-install.sh, and README.
|
| |
|
|
|
|
|
|
|
|
| |
- Add monitor_sendkeys() to type strings into QEMU via monitor socket
- Add send_luks_passphrase() that detects GRUB passphrase prompt in
serial log and sends passphrase via sendkey, supporting multi-disk
LUKS (one passphrase per encrypted disk)
- Add QEMU monitor socket to start_vm_from_disk() for LUKS configs
- Auto-detect LUKS configs and handle passphrase entry during reboot test
- Add socat dependency check
|
| |
|
|
|
|
| |
Declare and assign local variables separately in custom/archangel,
scripts/full-test.sh, scripts/test-install.sh, and remove unused
variable in custom/lib/zfs.sh.
|
| |
|
|
|
|
| |
Allow OVMF_CODE and OVMF_VARS_ORIG to be overridden via environment
variables for portability across distros (Fedora, Ubuntu, etc. use
different paths for UEFI firmware).
|
| |
|
|
|
|
| |
Enable undefined variable checking (set -u) and pipefail across
standalone scripts. Guard SUDO_USER references with ${SUDO_USER:-}
for set -u compatibility.
|
| |
|
|
|
|
|
| |
Remove personal email addresses, hardcoded paths, and infrastructure
references to prepare for open-source release. Distribution targets
in build-release are now configurable via environment variables,
and archsetup inclusion is opt-in.
|
| |
|
|
|
|
|
| |
- Change all script shebangs to #!/usr/bin/env bash for portability
(heredocs writing to installed systems keep #!/bin/bash)
- Remove || true from Makefile lint target so shellcheck errors fail the build
- Add .editorconfig for consistent formatting across editors
|
| |
|
|
|
|
|
|
| |
The no-ssh test failed because reboot verification unconditionally
used wait_for_ssh, which timed out on systems without SSH. Add
wait_for_boot_console() that checks serial log for ZFSBootMenu boot
markers, and branch run_test() on ENABLE_SSH to use the appropriate
verification path.
|
| |
|
|
|
|
|
|
|
|
| |
- Remove personal hardware specs, machine-specific troubleshooting docs,
and video transcript from assets/
- Remove stale PLAN-zfsbootmenu-implementation.org (feature complete)
- Remove .stignore (Syncthing config, not project-relevant)
- Untrack todo.org (personal task tracker with private infra details)
- Make archsetup path configurable via ARCHSETUP_DIR env var in build.sh
- Use $REAL_USER instead of hardcoded username in build-release scp
|
| |
|
|
|
|
|
|
|
| |
- Change archzfs SigLevel to Never (pacstrap -K empty keyring caused
interactive GPG prompt blocking unattended installs)
- Fix pgrep matching avahi-daemon's [archangel.local] in full-test.sh
- Bump install timeout to 30min for DKMS builds
- Add ~/downloads/isos and archsetup inbox to build-release distribution
- Sync templates
|
| |
|
|
|
|
|
|
| |
- Standardize naming: VM names, hostname, passwords, ISO naming
- Remove USB, Ventoy, and local deployment from build-release
- Add snapper package and Btrfs validation tests to sanity-test
- Update README for dual ZFS/Btrfs architecture
- Delete obsolete SESSION-CONTEXT.md and download-archzfs-iso.sh
|
| |
|
|
|
|
| |
- Remove test-zfs-snap-prune.sh (tested deleted script)
- Update Makefile to reference existing custom/ scripts
- Remove test-unit target (no unit tests remain)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
- Add setup_luks_testing_keyfile() for automated LUKS testing
- Modify configure_crypttab() and configure_luks_initramfs() for keyfile support
- Fix configure_btrfs_initramfs() to preserve encrypt hook when LUKS enabled
- Add TESTING=yes to LUKS test configs
- Create docs/TESTING-STRATEGY.org documenting testing approach
LUKS automated reboot testing remains a work-in-progress due to
complexity of sending passphrase to initramfs encrypt hook.
Non-LUKS tests all pass: btrfs-single, btrfs-mirror, btrfs-stripe.
|
| |
|
|
|
|
| |
- Add NO_ENCRYPT=yes to btrfs-single.conf for unattended testing
- Add offline Arch Wiki documentation section to RESCUE-GUIDE.txt
- Update todo.org with completed tasks and new items
|
| |
|
|
|
|
|
|
|
| |
- Add start_vm_from_disk() to boot installed system without ISO
- Add stop_vm keep_vars parameter to preserve EFI boot entries
- Add verify_reboot_survival() to check system boots from disk
- Add verify_rollback() to test snapshot/rollback functionality
- Support different SSH passwords for live ISO vs installed system
- Integrate reboot/rollback checks into test flow
|
| |
|
|
|
|
|
| |
- Use -d - flag for cryptsetup stdin key input (matches easy-arch)
- Change ((i++)) to ((++i)) to avoid set -e exit on 0 increment
- Add btrfs-mirror-luks test config
- Update status protocol with sound notifications
|
| |
|
|
|
|
|
|
| |
- RAID1 (mirror) and RAID0 (stripe) for 2+ disks
- Multi-disk LUKS with single passphrase prompt
- EFI redundancy: GRUB installed on all disks
- Pacman hook syncs GRUB updates across EFI partitions
- btrfs initramfs hook for multi-device assembly at boot
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
- Add LUKS functions to btrfs.sh (create/open/close container)
- Add crypttab configuration for boot
- Add encrypt hook to mkinitcpio HOOKS
- Add cryptdevice parameter to GRUB cmdline
- Add get_btrfs_encryption_choice and get_luks_passphrase prompts
- Add LUKS_PASSPHRASE to config variables
- Update show_summary and print_btrfs_summary for encryption status
- Add btrfs-luks.conf test config
VM test pending.
|
| |
|
|
|
|
|
|
|
|
|
| |
- Fix GRUB config path (remove GRUB_BTRFS_GRUB_DIRNAME, use default)
- Create snapper config manually (D-Bus not available in chroot)
- Create genesis snapshot with btrfs command (not snapper)
- Add btrfs-single.conf test config
- Update test-install.sh to copy lib/ directory
- Update test-install.sh to handle btrfs verification
VM test now passes for btrfs single-disk installation.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch to the recommended pool import method that uses blkid to scan
for pools instead of relying on zpool.cache. This eliminates the
complexity of managing cachefile paths with altroot during installation.
Changes:
- Remove cachefile setup from create_zfs_pool() and configure_zfs_services()
- Enable zfs-import-scan.service instead of zfs-import-cache.service
- Set cachefile=none on the pool since it's not needed
- Update full-test.sh to verify zfs-import-scan is enabled
This approach is recommended per the Arch Wiki and doesn't require
the cachefile to be present in the initramfs.
|
| |
|
|
|
|
|
| |
- Check zpool.cache is present in initramfs (catches cachefile bugs)
- Add reboot test: issue reboot, wait for system to come back
- Verify ZFS pool healthy after reboot
- Ensures the installed system can survive a reboot cycle
|
| |
|
|
|
|
|
| |
- Add zpool set cachefile=/etc/zfs/zpool.cache after pool creation
- Without this, initramfs ZFS hook can't import pool at boot
- Causes "cannot import '(null)': no such pool available" error
- Add cachefile property test to full-test.sh
|
| | |
|
| |
|
|
|
|
|
|
| |
- Add safe_cleanup_work_dir() to prevent /dev corruption on interrupted builds
- Fix shadow file: use sed to modify root entry instead of replacing file
- Add /etc/hosts and /etc/nsswitch.conf for proper hostname/mDNS resolution
- Add inetutils package for hostname command
- Add sanity tests for password, avahi, mdns, and hostname
|
| |
|
|
|
|
|
|
| |
Bug: ((TESTS_PASSED++)) returns exit code 1 when TESTS_PASSED is 0,
because post-increment evaluates the old value (0) which is falsy.
With set -e, this caused the script to exit after the first test passed.
Fix: Use pre-increment ((++TESTS_PASSED)) which returns the new value.
|
| |
|
|
|
|
|
|
|
|
| |
sanity-test.sh (live ISO):
- Check avahi-daemon is enabled
- Check avahi-daemon is running
test-install.sh (installed system):
- Check avahi and nss-mdns packages installed
- Check avahi-daemon service enabled
|
| |
|
|
|
|
|
| |
- Add scripts/full-test.sh for automated install testing (single, mirror, raidz1)
- Add --full-test option to build-release workflow
- Install zfssnapshot and zfsrollback to target system during install
- Simplify .gitignore to exclude entire vm/ directory
|
| |
|
|
|
|
|
|
| |
New --yes/-y flag skips the dd confirmation prompt, allowing
build-release to run completely unattended for CI/CD workflows.
The ARCHZFS label on the drive is sufficient safety - if it has
that label, it was created by this process and is the intended target.
|
| |
|
|
|
|
|
| |
- Use cjennings@truenas.local instead of root (has SSH keys)
- Remove removable check for ARCHZFS drives (Framework expansion
cards show as internal but are hot-swappable)
- Still requires 'yes' confirmation before dd for safety
|
| |
|
|
|
|
|
| |
- Use SUDO_USER to get real user's home directory
- Run SSH/SCP as real user to use their SSH keys
- Handle TrueNAS SSH failure gracefully (warn and continue)
- Track actual TrueNAS success status for summary
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New scripts/sanity-test.sh:
- Boots ISO in headless QEMU
- Waits for SSH availability
- Runs 13 automated verification tests:
- ZFS module loaded and working
- Custom scripts present (zfsrollback, zfssnapshot, etc.)
- fzf installed
- LTS kernel running
- archsetup directory present
- Reports pass/fail with summary
- Fully automated - no human input required
Updated build-release to use automated sanity test instead of
manual verification prompt.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Automates the full release workflow:
1. Build ISO (via build.sh)
2. Sanity test (boot in QEMU, manual verification)
3. Distribute to multiple targets:
- ~/Downloads/isos (always)
- truenas.local:/mnt/vault/isos (if reachable)
- ARCHZFS labeled USB drive (detected via blkid, writes via dd)
- Ventoy USB drive (detected by label or ventoy/ directory)
Options:
--skip-build Distribute existing ISO without rebuilding
--skip-test Skip the QEMU sanity test
|
| | |
|
| |
|
|
|
|
|
|
|
| |
Add get_encryption_choice() to ask user whether to enable encryption
during interactive install. Remove --no-encrypt CLI flag in favor of
config file NO_ENCRYPT option for unattended installs. Update tests
to rely on config file setting instead of flag.
Also: fix ISO label to ARCHZFS for stable GRUB entries, add TODO items.
|
| |
|
|
|
|
|
|
|
|
| |
- Add Makefile with targets: all, test, test-unit, test-install, build, release, clean, lint
- Add test-install.sh for automated VM installation testing
- Add test configs: single-disk, mirror, raidz1, no-ssh, custom-locale
- Add test-logs/ to .gitignore
- Uses sshpass for SSH authentication to live ISO
- Copies latest install-archzfs to VM before testing (allows testing without rebuild)
- Supports --list to show available configs
|
| |
|
|
|
| |
Simple wrapper that boots from disk if installed, otherwise from ISO.
Saves having to remember --boot-disk flag.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implements hybrid retention policy:
- Always keep 20 most recent snapshots
- Delete snapshots beyond #20 only if older than 180 days
- Genesis snapshot is always protected
Features:
- zfs-snap-prune script with --dry-run, --test, --verbose modes
- Comprehensive test suite (22 tests)
- Runs automatically after pacman operations
- Daily systemd timer for cleanup
- Regenerates GRUB menu after pruning
This prevents unbounded snapshot growth while preserving
recent history and the genesis snapshot.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bug fixes:
- Fix depmod using wrong kernel version during initramfs generation
The script now explicitly specifies the installed kernel version
instead of relying on uname -r (which returns the live ISO kernel)
- Add kernel module verification before mkinitcpio
- Add hostid 0x prefix to spl.spl_hostid kernel parameter
ISO naming:
- Changed format to: archzfs-vmlinuz-{version}-lts-{date}-{arch}.iso
- Example: archzfs-vmlinuz-6.12.65-lts-2026-01-18-x86_64.iso
test-vm.sh:
- Add QEMU monitor socket for automation support
|
| |
|
|
|
|
|
|
|
|
|
| |
- Multi-disk RAID support: mirror, stripe, raidz1/2/3
- EFI partitions on all disks for boot redundancy
- SSH configuration prompt (default yes) with sshd enabled
- Stripe option for max capacity without redundancy
- Genesis snapshot with rollback-to-genesis script
- NetworkManager added to ISO for WiFi config
- Remove color codes for better terminal compatibility
- archsetup launcher via curl
|