From fe84b71ce4f0476d485257410e90f3621d6c5dbe Mon Sep 17 00:00:00 2001 From: Craig Jennings Date: Thu, 25 Jun 2026 08:35:28 -0400 Subject: fix(testing): raise the install monitor timeout to 150 minutes A full archsetup install with heavy AUR builds (vagrant and its git-cloned installers) can run past the old 90-minute monitor cap on a slow mirror. When that happened the run stopped monitoring mid-install and validated a half-installed system, producing spurious late-step failures. Raise MAX_POLLS from 180 to 300 (90 -> 150 minutes) so a slow-but-healthy install completes. --- scripts/testing/run-test.sh | 5 +++-- todo.org | 2 ++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/scripts/testing/run-test.sh b/scripts/testing/run-test.sh index 6e51fc2..a22ae98 100755 --- a/scripts/testing/run-test.sh +++ b/scripts/testing/run-test.sh @@ -249,7 +249,8 @@ fi # Poll for completion step "Monitoring archsetup progress (polling every 30 seconds)..." POLL_COUNT=0 -MAX_POLLS=180 # 90 minutes max (180 * 30 seconds) +MAX_POLLS=300 # 150 minutes max (300 * 30 seconds); a full install with heavy + # AUR builds (e.g. vagrant) can exceed 90 min on a slow mirror while [ $POLL_COUNT -lt $MAX_POLLS ]; do # Check if archsetup process is still running @@ -270,7 +271,7 @@ while [ $POLL_COUNT -lt $MAX_POLLS ]; do done if [ $POLL_COUNT -ge $MAX_POLLS ]; then - error "ArchSetup timed out after 90 minutes" + error "ArchSetup timed out after 150 minutes" ARCHSETUP_EXIT_CODE=124 else # Get exit code from the remote log diff --git a/todo.org b/todo.org index ea3734e..9e836a8 100644 --- a/todo.org +++ b/todo.org @@ -551,6 +551,8 @@ VM run #1 aborted ~6 min in (Error 5), before any validation ran. Root cause (pr VM run #3 (=make test-keep=, kept VM up): pytest parity = 78 passed / 10 skipped / 0 fail / 0 err — matches & exceeds the shell sweep (53/0/0). Then built P4 expansion against the live VM (iterating in ~30s, no rebuild): test_hardening (sshd prohibit-password, sysctl printk, /etc/issue emptied, vconsole font, /efi fmask), test_config_applied (pacman ParallelDownloads/Color/multilib, makepkg MAKEFLAGS/OPTIONS, NM dns+wifi-privacy drop-ins, fail2ban jail, reflector), test_backups (=.archsetup.bak= present for pacman.conf/makepkg.conf/sudoers/mkinitcpio.conf — end-to-end proof of the backup feature). Full suite vs live VM: 95 passed / 10 skipped / 1 fail. The 1 fail = a REAL archsetup bug the tests caught: =ParallelDownloads= stayed at the Arch default 5 because the sed only matched a commented =#ParallelDownloads=, but current Arch ships it uncommented — fixed the sed to match both (=^#\?ParallelDownloads=). Also fixed a test bug (=grep -qx '[multilib]'= → =grep -Fxq=, the brackets were a regex char class). Remaining: P3 cutover (pytest authoritative) + P5 retire shell sweep, then a final fresh =make test=. *** 2026-06-25 Thu @ 03:38:28 -0400 P3 cutover: Testinfra is now the authoritative validator run-test.sh dropped the =run_all_validations= + =validate_all_services= shell-sweep calls; =run_testinfra_validation= now drives =TEST_PASSED= (returns pytest's rc; "couldn't run" = fail, not a silent pass). It surfaces pytest's pass/skip/fail counts through the shared =VALIDATION_*= counters and parses =testinfra-attribution.txt= into the issue arrays so =generate_issue_report= still buckets failures archsetup/base/unknown. Validated the failure path against the still-up VM: pytest rc=1, failure correctly bucketed to [archsetup]. P5 (physically delete the dead shell-sweep functions) is NOT done here — =run-test-baremetal.sh= still calls =run_all_validations=/=validate_all_services=, so deletion must wait until the bare-metal runner is migrated too (filed below). Final step: fresh =make test= to confirm the pass path (ParallelDownloads now 10) with pytest as the gate. +*** 2026-06-25 Thu @ 08:35:26 -0400 Final run hit the harness 90-min install cap (not a regression) +The fresh =make test= timed out at 9/12 steps while building =vagrant= from AUR (=ARCHSETUP timed out after 90 minutes=, exit 124), so validation ran against a half-installed system → 10 pytest failures, all late-step (issue/sysctl/vconsole/mkinitcpio/docker/state-markers). The suite worked correctly — it caught an incomplete install. Verified my ParallelDownloads sed is clean (no pacman corruption) and archsetup logged 0 errors. Root cause: =MAX_POLLS=180= (90 min) is too tight for a full install with heavy AUR builds; bumped to 300 (150 min). Re-running. Create comprehensive integration tests using Testinfra (Python + pytest) to validate archsetup installations Tests should cover: -- cgit v1.2.3