aboutsummaryrefslogtreecommitdiff
path: root/scripts/testing/lib/vm-utils.sh
Commit message (Collapse)AuthorAgeFilesLines
* fix(test): bump default VM RAM to 8 GiB to stop AUR-build OOM killsCraig Jennings17 hours1-1/+3
| | | | | | | | The zfs green run OOM-killed cc1plus three times during AUR C++ builds: makepkg runs -j$VM_CPUS (4), and parallel compiles at ~700 MB each overran the 4 GiB default. The install still passed (yay retries), but the kills showed up as attributed issues. 8 GiB gives the four jobs headroom. Overridable via VM_RAM as before.
* fix(test): give each filesystem profile its own OVMF NVRAM fileCraig Jennings22 hours1-1/+5
| | | | | | | | | | | | | init_vm_paths suffixed the disk image per profile but shared one OVMF_VARS.fd across btrfs and zfs. NVRAM holds the UEFI boot entries and lives outside the qcow2, so a disk-snapshot revert can't restore it. A zfs run's ZFSBootMenu entries clobbered the btrfs GRUB entry, and with no removable ESP fallback the btrfs base then booted to "no bootable device" and timed out before archsetup ran. NVRAM now carries the same per-profile suffix as the disk image, so the two profiles keep separate boot state. Validated by a full green zfs run (ArchSetup exit 0, Testinfra 96 passed / 0 failed).
* test(archsetup): migrate bare-metal runner to key auth + TestinfraCraig Jennings3 days1-1/+3
| | | | | | | | | | run-test-baremetal.sh SSHed to the target as root by password throughout, which archsetup's sshd hardening (PermitRootLogin prohibit-password) kills mid-install, the same break the VM runner already fixed. It also still called the validation.sh shell sweep (run_all_validations, validate_all_services, validate_zfs_services), the last caller keeping those functions alive. It now mirrors the VM runner. After the first SSH, and after any genesis rollback so the key survives it, inject_root_key authorizes a throwaway root key, and every later ssh_cmd plus the raw scp transfers and log-copies thread SSH_KEY_OPT to survive the hardening. The shell sweep is replaced with run_testinfra_validation, now the authoritative validator on both runners. A --port option, threaded through every SSH and scp, lets the runner target a test VM on 2222 instead of only real hardware on 22. inject_root_key now authorizes root@$VM_IP instead of root@localhost, so one helper serves both runners (the VM runner sets VM_IP=localhost). Validated against the ZFS VM (--validate-only, localhost:2222): connectivity, the ZFS check, key authorization, and the Testinfra sweep all connect and run over the key-based ssh-config. A green bare-metal install still needs real ZFS hardware.
* test(archsetup): add FS_PROFILE selector for ZFS VM coverageCraig Jennings3 days1-1/+16
| | | | | | | | | | The VM harness only built one btrfs base image, so every ZFS-conditional check in the Testinfra suite skipped and the ZFS install path went untested in automation. I added an FS_PROFILE selector (btrfs default, zfs) so `make test FS_PROFILE=zfs` can target a ZFS root. init_vm_paths derives the image name from FS_PROFILE and validates it. btrfs keeps the legacy unsuffixed archsetup-base.qcow2 so existing images and invocations are untouched. The zfs profile gets archsetup-base-zfs.qcow2. create-base-vm.sh picks archsetup-test.conf vs the new archsetup-test-zfs.conf (FILESYSTEM=zfs, NO_ENCRYPT=yes for an unattended install), and the Makefile resolves the matching image for its base-VM check. The archsetup run config stays shared. archsetup reads no filesystem key. It detects ZFS from the live root via is_zfs_root, so the ZFS branch fires on its own once the base image is ZFS. The design doc is reconciled to that: no separate archsetup-vm-zfs.conf, and the non-ZFS profile is btrfs, not ext4. Building the ZFS base image and running the ZFS sweep green is next.
* fix(testing): authorize a root key so make test survives sshd hardeningCraig Jennings4 days1-4/+35
| | | | | | The VM test SSHes into the guest as root with a password for the whole run. archsetup hardens sshd to PermitRootLogin prohibit-password and reloads it partway through the install, so every SSH after that step failed with "Permission denied" and the run aborted before any validation — make test had been silently broken since the hardening landed. inject_root_key authorizes a throwaway root key right after the first SSH (before archsetup runs) and the ssh/scp helpers now add -i <key> via SSH_KEY_OPT. prohibit-password still allows root key auth, so the harness survives the very hardening it validates. Password stays as the fallback, so the change is additive.
* chore: open-source release-prep (udev flag, SPDX headers, boolean style)Craig Jennings4 days1-0/+1
| | | | | | | | Three release cleanups, all behavior-preserving for my machines: - Gated the Logitech BRIO udev rule behind INSTALL_DEVICE_UDEV_RULES (default yes, opt-out), so the device-specific rule is off for anyone without that hardware. Added the config read, validation, and a conf.example entry. - Added a GPL-3.0-or-later SPDX-License-Identifier header after the shebang of all 24 shell scripts in the repo. - Standardized boolean conditionals on the explicit [ "$var" = "true" ] form, replacing the bare `if $var` idiom. The STEPS function-dispatch is left alone, since it runs a function name rather than testing a boolean.
* fix(testing): cleanup traps, arg validation, and two real bugsCraig Jennings2026-05-171-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two real bugs and a sweep of hygiene across the harness. `make test` passed cleanly on this branch with the same 52/0/5 profile as the 2026-05-11 run, so the wiring is verified end-to-end. Real bugs: - `lib/vm-utils.sh` `snapshot_exists` was running `qemu-img snapshot -l | grep -q "$snapshot_name"`, which matches the name as a substring anywhere in the output — including inside dates or filenames in other fields. Replaced with an awk field extraction on the TAG column plus `grep -Fxq` for a whole-line literal match. - `run-test-baremetal.sh` was setting `VALIDATION_PASSED=true|false` after validation, but `validation.sh` already uses `VALIDATION_PASSED` as a pass counter. The test report then referenced `$VALIDATION_PASSED_COUNT`, which is defined nowhere. Renamed the boolean to `TEST_PASSED` (matching run-test.sh's pattern) and report the actual counter. Cleanup traps and arg validation: - `run-test.sh` now installs a top-level EXIT trap that, on abort, kills QEMU and restores the clean-install snapshot. A `CLEANUP_DONE=1` sentinel keeps the existing normal-path cleanup from double-firing. This is the recurring pain from 2026-05-11 where two failed runs left orphaned QEMU processes and dirty base disks behind. - `create-base-vm.sh` and `debug-vm.sh` got the same kind of trap, plus `debug-vm.sh` now rejects non-`.qcow2` paths up front instead of letting QEMU fail later. - `run-test.sh`, `run-test-baremetal.sh`, and `cleanup-tests.sh` now validate that options with required values actually receive one (`${var:?msg}` for `--script`/`--snapshot`/`--host`/`--password`, numeric check for `--keep`). - `run-test-baremetal.sh` traps the temp git bundle for cleanup if the script aborts before its explicit `rm`. The ZFS rollback loop now uses `while IFS= read -r ds` and quotes `$ds` inside the ssh_cmd so dataset names with whitespace wouldn't break it. Smaller hygiene: - `vm-utils.sh` `check_ovmf` also checks `OVMF_VARS_TEMPLATE`; `start_qemu` validates disk and ISO paths before building the QEMU command; numeric tests quoted. - `cleanup-tests.sh` find expression for temp disks wrapped in `\( ... -o ... \)`, all `while read` loops use `IFS= read -r`, orphaned QEMU cleanup tries SIGTERM with a 2s sleep before SIGKILL. - `create-base-vm.sh` moved the "Copy an archangel-*.iso" info line before its `fatal` instead of after (unreachable), and added the serial-log path to the final summary. - `lib/logging.sh` `stop_timer` no longer produces `$((end - ))` when the named timer was never started. - `lib/network-diagnostics.sh` `read` → `IFS= read -r`. - `setup-testing-env.sh` now installs all missing pacman packages in one transaction instead of one-at-a-time (avoids half-installed state if package N fails). KVM check also verifies the user has read+write on `/dev/kvm` and prints the `gpasswd -a $(id -un) kvm` fix if not. A few items from the review I deliberately skipped: replacing the codebase-wide unquoted `$SSH_OPTS` string with an array (cosmetic, would need to be done everywhere at once), `set -e` adds where the existing fall-through-on-failure is intentional, and a `--force` gate on `create-base-vm.sh` (would break the expected workflow).
* feat(testing): rewrite test infrastructure from libvirt to direct QEMUCraig Jennings2026-01-271-202/+269
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the never-fully-operational libvirt-based VM test infrastructure with direct QEMU management and archangel ISO for fully automated, unattended base VM creation. Key changes: - vm-utils.sh: complete rewrite — QEMU process mgmt via PID file, monitor socket for graceful shutdown, qemu-img snapshots, SSH port forwarding (localhost:2222) - create-base-vm.sh: boots archangel ISO, SSHs in, runs unattended install via config file, verifies, creates clean-install snapshot - run-test.sh: snapshot revert, git bundle transfer, detached archsetup execution with setsid, polling, validation, and report generation - debug-vm.sh: CoW overlay disk, GTK display, auto-cleanup on close - setup-testing-env.sh: reduced deps to qemu-full/sshpass/edk2-ovmf/socat - cleanup-tests.sh: PID-based process management, orphan detection - validation.sh: port-based SSH (backward compatible), fuzzel/foot for Hyprland, corrected package list paths - network-diagnostics.sh: getent/curl instead of nslookup/ping (SLIRP) New files: - archsetup-test.conf: archangel config for base VM (btrfs, no encrypt) - archsetup-vm.conf: archsetup config for unattended test execution - assets/archangel.conf.example: reference archangel config Deleted: - finalize-base-vm.sh: merged into create-base-vm.sh - archinstall-config.json: replaced by archangel .conf format Tested: full end-to-end run — 51 validations passed, 0 failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(testing): remove obsolete --skip-slow-packages optionCraig Jennings2026-01-241-0/+321
This flag was removed from archsetup but remained in test scripts.