From fe84b71ce4f0476d485257410e90f3621d6c5dbe Mon Sep 17 00:00:00 2001
From: Craig Jennings <c@cjennings.net>
Date: Thu, 25 Jun 2026 08:35:28 -0400
Subject: fix(testing): raise the install monitor timeout to 150 minutes

A full archsetup install with heavy AUR builds (vagrant and its git-cloned
installers) can run past the old 90-minute monitor cap on a slow mirror. When
that happened the run stopped monitoring mid-install and validated a
half-installed system, producing spurious late-step failures. Raise MAX_POLLS
from 180 to 300 (90 -> 150 minutes) so a slow-but-healthy install completes.
---
 scripts/testing/run-test.sh | 5 +++--
 todo.org                    | 2 ++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/scripts/testing/run-test.sh b/scripts/testing/run-test.sh
index 6e51fc2..a22ae98 100755
--- a/scripts/testing/run-test.sh
+++ b/scripts/testing/run-test.sh
@@ -249,7 +249,8 @@ fi
 # Poll for completion
 step "Monitoring archsetup progress (polling every 30 seconds)..."
 POLL_COUNT=0
-MAX_POLLS=180  # 90 minutes max (180 * 30 seconds)
+MAX_POLLS=300  # 150 minutes max (300 * 30 seconds); a full install with heavy
+               # AUR builds (e.g. vagrant) can exceed 90 min on a slow mirror
 
 while [ $POLL_COUNT -lt $MAX_POLLS ]; do
     # Check if archsetup process is still running
@@ -270,7 +271,7 @@ while [ $POLL_COUNT -lt $MAX_POLLS ]; do
 done
 
 if [ $POLL_COUNT -ge $MAX_POLLS ]; then
-    error "ArchSetup timed out after 90 minutes"
+    error "ArchSetup timed out after 150 minutes"
     ARCHSETUP_EXIT_CODE=124
 else
     # Get exit code from the remote log
diff --git a/todo.org b/todo.org
index ea3734e..9e836a8 100644
--- a/todo.org
+++ b/todo.org
@@ -551,6 +551,8 @@ VM run #1 aborted ~6 min in (Error 5), before any validation ran. Root cause (pr
 VM run #3 (=make test-keep=, kept VM up): pytest parity = 78 passed / 10 skipped / 0 fail / 0 err — matches & exceeds the shell sweep (53/0/0). Then built P4 expansion against the live VM (iterating in ~30s, no rebuild): test_hardening (sshd prohibit-password, sysctl printk, /etc/issue emptied, vconsole font, /efi fmask), test_config_applied (pacman ParallelDownloads/Color/multilib, makepkg MAKEFLAGS/OPTIONS, NM dns+wifi-privacy drop-ins, fail2ban jail, reflector), test_backups (=.archsetup.bak= present for pacman.conf/makepkg.conf/sudoers/mkinitcpio.conf — end-to-end proof of the backup feature). Full suite vs live VM: 95 passed / 10 skipped / 1 fail. The 1 fail = a REAL archsetup bug the tests caught: =ParallelDownloads= stayed at the Arch default 5 because the sed only matched a commented =#ParallelDownloads=, but current Arch ships it uncommented — fixed the sed to match both (=^#\?ParallelDownloads=). Also fixed a test bug (=grep -qx '[multilib]'= → =grep -Fxq=, the brackets were a regex char class). Remaining: P3 cutover (pytest authoritative) + P5 retire shell sweep, then a final fresh =make test=.
 *** 2026-06-25 Thu @ 03:38:28 -0400 P3 cutover: Testinfra is now the authoritative validator
 run-test.sh dropped the =run_all_validations= + =validate_all_services= shell-sweep calls; =run_testinfra_validation= now drives =TEST_PASSED= (returns pytest's rc; "couldn't run" = fail, not a silent pass). It surfaces pytest's pass/skip/fail counts through the shared =VALIDATION_*= counters and parses =testinfra-attribution.txt= into the issue arrays so =generate_issue_report= still buckets failures archsetup/base/unknown. Validated the failure path against the still-up VM: pytest rc=1, failure correctly bucketed to [archsetup]. P5 (physically delete the dead shell-sweep functions) is NOT done here — =run-test-baremetal.sh= still calls =run_all_validations=/=validate_all_services=, so deletion must wait until the bare-metal runner is migrated too (filed below). Final step: fresh =make test= to confirm the pass path (ParallelDownloads now 10) with pytest as the gate.
+*** 2026-06-25 Thu @ 08:35:26 -0400 Final run hit the harness 90-min install cap (not a regression)
+The fresh =make test= timed out at 9/12 steps while building =vagrant= from AUR (=ARCHSETUP timed out after 90 minutes=, exit 124), so validation ran against a half-installed system → 10 pytest failures, all late-step (issue/sysctl/vconsole/mkinitcpio/docker/state-markers). The suite worked correctly — it caught an incomplete install. Verified my ParallelDownloads sed is clean (no pacman corruption) and archsetup logged 0 errors. Root cause: =MAX_POLLS=180= (90 min) is too tight for a full install with heavy AUR builds; bumped to 300 (150 min). Re-running.
 Create comprehensive integration tests using Testinfra (Python + pytest) to validate archsetup installations
 
 Tests should cover:
-- 
cgit v1.2.3