aboutsummaryrefslogtreecommitdiff
path: root/scripts/test-install.sh
Commit message (Collapse)AuthorAgeFilesLines
* fix(test): run the ZFS-encryption check on the booted systemCraig Jennings2026-05-221-12/+16
| | | | | | The ZFS native-encryption assertion lived in verify_install, which runs in the live ISO before reboot. But archangel exports zroot at the end of the install, so verify_install bails at "ZFS pool not found" and never reaches the check. It was dead code: the encrypted-config tests passed on the reboot path (entering the passphrase at ZFSBootMenu and booting is itself proof), while the explicit aes-256-gcm assertion gave false confidence by never running. I moved it into verify_reboot_survival, which ssh's into the booted system where zroot is imported, so zfs get encryption zroot/ROOT actually returns aes-256-gcm and the assertion fires. Confirmed on a zfs-encrypt VM run: "ZFS encryption (aes-256-gcm) verified on running system."
* fix(build): clear stale archzfs from the pacoloco cache tooCraig Jennings2026-05-221-0/+17
| | | | | | archzfs re-uploads its GitHub release assets under the same filename, so pacoloco keeps serving a zfs-dkms/zfs-utils it cached earlier while pacman fetches a fresh archzfs.db with a new checksum. The two mismatch and pacstrap aborts with "invalid or corrupted package." build.sh already drops the stale packages from the host pacman cache, but it never cleared the pacoloco layer, which the VM test installs route through too, so test-install.sh kept hitting the corruption (four times in one session). build.sh runs as root, so it now clears /var/cache/pacoloco/pkgs/archzfs/zfs-* alongside the host cache, which makes the build-then-test flow self-healing. The pacoloco cache is root-owned and test-install.sh runs as the user, so it can't clear it unattended. Instead, test-install.sh now recognizes the corruption (is_archzfs_cache_corruption) and prints how to clear it, the way it already names the SSH_PORT override on a port collision. A retry alone won't help since it hits the same cached file, so this fails fast with the hint rather than retrying.
* fix(test): fail clearly when the VM forward port is takenCraig Jennings2026-05-221-0/+27
| | | | | | A test run launched qemu without first checking the SSH forward port, so a collision with another VM already holding it surfaced only as an opaque "Failed to start VM," with qemu unable to bind and no hint why. I added a port_in_use check in run_test before the launch: it errors with the port number and the SSH_PORT override to set, records the failure, and moves on. The check lives in run_test, not start_vm, because start_vm runs in a command substitution (vm_pid=$(start_vm ...)) where this harness's non-exiting error() would be captured as the PID instead of failing the run. The pure half, port_listening_in, takes an `ss -tln` snapshot as a string so it's unit-testable.
* test: make SSH_PORT overridable in test-install.shCraig Jennings2026-05-221-1/+3
| | | | The port was hardcoded, so a test run collided with any other VM already forwarding 2222. It now defaults to 2222, so existing invocations are unchanged. SSH_PORT=2223 scripts/test-install.sh picks a free port to run alongside another VM.
* feat(test): retry pacstrap through transient mirror flakesCraig Jennings2026-05-201-26/+82
| | | | | | | | test-install.sh aborts a whole 5-minute VM run when pacstrap hits a transient mirror blip, and the suite reports a failure indistinguishable from a real install regression. run_test now retries the install up to twice, but only when the in-VM log shows both pacstrap's "Failed to install packages to new root" marker and a download/network indicator. A deterministic failure like "target not found" carries the marker without a network indicator, so it still fails fast. archangel's failure trap exports the pool and unmounts on abort, so each retry re-partitions and re-pacstraps from a clean state. Wiring the predicate up needed a source-guard so bats can source the harness, which had none. With that in place I unit-covered the pure helpers — is_transient_install_failure, char_to_qemu_key, get_disk_count, get_disk_args — and lifted char_to_qemu_key out of monitor_sendkeys so the QEMU keymap is testable on its own. The keymap test found a dead branch. The backslash case pattern was '\\', which never matches a lone backslash because bash matches one against '\', so a passphrase containing a backslash would have sent an invalid QEMU keyname instead of "backslash". No test passphrase uses one, so it never bit. I fixed the pattern.
* feat(build): route VM-internal pacstrap through host pacolocoCraig Jennings2026-05-191-0/+21
| | | | | | | | | | | | | | The build-host pacoloco routing from e2eb958 only covered mkarchiso's pacstrap. VMs spawned by scripts/test-install.sh ran their own pacstrap inside the guest, fetching ~600 packages per config from upstream and re-hitting the same archzfs corruption that bites the build host. A full 12-config test-install run exposed 7200+ package downloads to upstream flake. I added a routing step to run_install() in test-install.sh, after the config file gets SCP'd to the VM and before archangel runs. It detects pacoloco on the host (port 9129, same probe as build.sh's) and rewrites the live system's /etc/pacman.conf over SSH. [core] and [extra] swap their Include lines for Server lines pointing at 10.0.2.2:9129/repo/archlinux/$repo/os/$arch. A preempt [archzfs] block lands ahead of archangel's default insertion. 10.0.2.2 is QEMU's SLIRP default gateway as seen from the guest, so the host's localhost:9129 maps to that address inside the VM. Pacoloco binds 0.0.0.0:9129, reachable from there without firewall changes. The preempt matters because archangel's install_base checks for an existing [archzfs] block in /etc/pacman.conf and skips its own insertion when one is already there. Writing the pacoloco-routed [archzfs] up front means archangel keeps the routed version. The installed system's $MNTPOINT/etc/pacman.conf isn't touched: it gets upstream URLs like before, since the installed system shouldn't depend on the test host's proxy. The status message uses a plain echo rather than test-install.sh's info() function. run_install() runs inside a bash -c subshell at line 864 that only exports ssh_cmd and run_install via declare -f. A bare info call there resolves to /usr/bin/info (the GNU info reader) and prints a confusing "No menu item" error. An inline comment in the code records the pitfall. Verified end-to-end with scripts/test-install.sh single-disk: pacoloco's cache grew from 77MB (post-build) to 953MB (post-VM-install), the VM's pacstrap completed cleanly, and the install verified. Bats: still 181.
* test(install): exercise zfssnapshot wrapper in VM verificationCraig Jennings2026-05-141-0/+154
| | | | | | | | | | | | | The wrapper had no runtime coverage — bats tests pin pure helpers and arg parsing only, and verify_rollback bypassed it by calling zfs snapshot / zfs rollback directly via SSH. A regression in cmd_create, cmd_rollback, or cmd_delete would only have surfaced in production. verify_zfssnapshot_wrapper runs after verify_rollback for ZFS configs (no-op for Btrfs) and exercises: - list confirms @genesis baseline - create runtime-test — recursive snapshot across all datasets - echo no | delete --name — confirms the gate aborts (catches the -n vs = regression class) - echo yes | delete --name — destroys across all datasets, list confirms gone - create wrapper-rollback + drop sentinel + rollback --name — round-trip restores the sentinel The function scps the working-tree wrapper to the VM before testing so the run reflects current source rather than what the ISO froze at build time. A regression here fails the test (no warn-only path) — it's the wrapper's only runtime check.
* fix: verify_rollback sentinel must live on the rolled-back datasetCraig Jennings2026-04-211-5/+11
| | | | | | | | | | | | | | | | | | | | | | /root is mounted on a separate dataset (zroot/home/root, created by archangel:create_datasets), but verify_rollback was snapshotting zroot/ROOT/default. The rollback was a no-op for the sentinel file, so the post-rollback existence check failed — the visible symptom was a PASSED test with a soft-failure warning ("Rollback failed - test file not restored" → "Rollback verification had issues") that persisted across ZFS configs for weeks. Move the sentinel to /etc/archangel-rollback-test. /etc has no child dataset mounted there, so the file lives on zroot/ROOT/default — the dataset actually being snapshotted and rolled back. Defensively single-quote $test_file at the five ssh_cmd call-sites so future path changes (whitespace, special chars) stay correct without touching each call again. The 2026-04-21 VM run logged "Rollback verified - test file restored" on zfs-mirror-encrypt, confirming the fix.
* fix: bump INSTALL_TIMEOUT from 600 to 1800 for kernel 6.18+ DKMS buildsCraig Jennings2026-04-131-1/+4
| | | | | | | | | | | ZFS DKMS compile + depmod against kernel 6.18.22 in a 4-CPU VM under host load exceeds 10 minutes. With INSTALL_TIMEOUT=600, all 6 ZFS test configs timed out during the DKMS install step after pacstrap. The one ZFS config that passed ('custom-locale', first ZFS config alphabetically) squeaked in just under the deadline. Bumped to 1800s (30 min). Session notes from 2026-02-12 mention this bump but the change never made it into git.
* feat: add ZFS encrypted volume tests (single disk + mirror)Craig Jennings2026-02-241-10/+77
| | | | | | | Add automated tests for ZFS native encryption, matching existing Btrfs LUKS test coverage. ZFS encrypted boot requires two passphrase entries (ZFSBootMenu + mkinitcpio zfs hook), both sent via QEMU monitor sendkey with timed delays since ZFSBootMenu renders to VGA, not serial.
* refactor: rename custom/ to installer/ for clarityCraig Jennings2026-02-231-2/+2
| | | | | | | | The custom/ directory name was an archiso implementation detail. Renamed to installer/ which clearly communicates that this directory contains the installer scripts and utilities that ship on the ISO. Updated all references in build.sh, Makefile, test-install.sh, and README.
* feat: add LUKS passphrase automation to VM test frameworkCraig Jennings2026-02-231-1/+117
| | | | | | | | | | - Add monitor_sendkeys() to type strings into QEMU via monitor socket - Add send_luks_passphrase() that detects GRUB passphrase prompt in serial log and sends passphrase via sendkey, supporting multi-disk LUKS (one passphrase per encrypted disk) - Add QEMU monitor socket to start_vm_from_disk() for LUKS configs - Auto-detect LUKS configs and handle passphrase entry during reboot test - Add socat dependency check
* fix: resolve remaining SC2155 warnings across all scriptsCraig Jennings2026-02-231-1/+2
| | | | | | Declare and assign local variables separately in custom/archangel, scripts/full-test.sh, scripts/test-install.sh, and remove unused variable in custom/lib/zfs.sh.
* chore: make OVMF firmware paths configurable via environmentCraig Jennings2026-02-231-3/+3
| | | | | | Allow OVMF_CODE and OVMF_VARS_ORIG to be overridden via environment variables for portability across distros (Fedora, Ubuntu, etc. use different paths for UEFI firmware).
* chore: add set -euo pipefail to scripts for safetyCraig Jennings2026-02-231-1/+1
| | | | | | Enable undefined variable checking (set -u) and pipefail across standalone scripts. Guard SUDO_USER references with ${SUDO_USER:-} for set -u compatibility.
* chore: standardize shebangs, fix lint target, add .editorconfigCraig Jennings2026-02-231-1/+1
| | | | | | | - Change all script shebangs to #!/usr/bin/env bash for portability (heredocs writing to installed systems keep #!/bin/bash) - Remove || true from Makefile lint target so shellcheck errors fail the build - Add .editorconfig for consistent formatting across editors
* fix: support no-ssh test by adding console boot verificationCraig Jennings2026-02-231-30/+73
| | | | | | | | The no-ssh test failed because reboot verification unconditionally used wait_for_ssh, which timed out on systems without SSH. Add wait_for_boot_console() that checks serial log for ZFSBootMenu boot markers, and branch run_test() on ENABLE_SSH to use the appropriate verification path.
* refactor: rename archzfs to archangel, simplify build-releaseCraig Jennings2026-01-311-4/+4
| | | | | | | | - Standardize naming: VM names, hostname, passwords, ISO naming - Remove USB, Ventoy, and local deployment from build-release - Add snapper package and Btrfs validation tests to sanity-test - Update README for dual ZFS/Btrfs architecture - Delete obsolete SESSION-CONTEXT.md and download-archzfs-iso.sh
* Add reboot survival and rollback verification testsCraig Jennings2026-01-251-6/+223
| | | | | | | | | - Add start_vm_from_disk() to boot installed system without ISO - Add stop_vm keep_vars parameter to preserve EFI boot entries - Add verify_reboot_survival() to check system boots from disk - Add verify_rollback() to test snapshot/rollback functionality - Support different SSH passwords for live ISO vs installed system - Integrate reboot/rollback checks into test flow
* Fix btrfs bugs from VM testingCraig Jennings2026-01-231-24/+42
| | | | | | | | | | | - Fix GRUB config path (remove GRUB_BTRFS_GRUB_DIRNAME, use default) - Create snapper config manually (D-Bus not available in chroot) - Create genesis snapshot with btrfs command (not snapper) - Add btrfs-single.conf test config - Update test-install.sh to copy lib/ directory - Update test-install.sh to handle btrfs verification VM test now passes for btrfs single-disk installation.
* Add Avahi mDNS validation to test scriptsCraig Jennings2026-01-201-0/+14
| | | | | | | | | | sanity-test.sh (live ISO): - Check avahi-daemon is enabled - Check avahi-daemon is running test-install.sh (installed system): - Check avahi and nss-mdns packages installed - Check avahi-daemon service enabled
* Include timestamp in install-archzfs log filenameCraig Jennings2026-01-181-2/+2
|
* Make ZFS encryption optional with interactive promptCraig Jennings2026-01-181-2/+2
| | | | | | | | | Add get_encryption_choice() to ask user whether to enable encryption during interactive install. Remove --no-encrypt CLI flag in favor of config file NO_ENCRYPT option for unattended installs. Update tests to rely on config file setting instead of flag. Also: fix ISO label to ARCHZFS for stable GRUB entries, add TODO items.
* Add CI/CD test infrastructureCraig Jennings2026-01-181-0/+459
- Add Makefile with targets: all, test, test-unit, test-install, build, release, clean, lint - Add test-install.sh for automated VM installation testing - Add test configs: single-disk, mirror, raidz1, no-ssh, custom-locale - Add test-logs/ to .gitignore - Uses sshpass for SSH authentication to live ISO - Copies latest install-archzfs to VM before testing (allows testing without rebuild) - Supports --list to show available configs