aboutsummaryrefslogtreecommitdiff
path: root/todo.org
diff options
context:
space:
mode:
authorCraig Jennings <c@cjennings.net>2026-05-12 05:15:23 -0500
committerCraig Jennings <c@cjennings.net>2026-05-12 05:15:23 -0500
commite7c15344d3364f5e9bdfd0758e562ac52690e0f0 (patch)
tree1ded05ee7158c9110bd39b24f47e7caeed53b239 /todo.org
parent63cd9e3934b9ed783c4a84e645468b2f89150cd5 (diff)
downloadarchsetup-e7c15344d3364f5e9bdfd0758e562ac52690e0f0.tar.gz
archsetup-e7c15344d3364f5e9bdfd0758e562ac52690e0f0.zip
chore: add tasks for the 2026-05-11 VM-test validation warnings
I added a [#C] task with one child per validation warning from the 18:36 VM test, each with the check it comes from and a recommendation. Most are headless-VM or QEMU-slirp false positives the test harness should skip. The lingering and Docker ones have a small real angle: logind health in the VM, and "systemctl enable docker" vs "enable --now".
Diffstat (limited to 'todo.org')
-rw-r--r--todo.org25
1 files changed, 25 insertions, 0 deletions
diff --git a/todo.org b/todo.org
index 46f52d1..87738d6 100644
--- a/todo.org
+++ b/todo.org
@@ -220,6 +220,31 @@ Errors logged during the VM install. Status as of the 2026-05-11 18:36 run (=tes
- tidaler (AUR) — logged in the error summary with exit code 0 (odd; logging quirk or transient AUR build noise?).
Also seen in the 18:36 run's log-diff (post-install systemd noise, probably VM-environment): =pam_systemd … CreateSession failed= / =logind: Failed to start session scope … Permission denied=, and =Failed to start Proton VPN Daemon= (no VPN config in the test VM).
+** TODO [#C] Investigate the 2026-05-11 VM-test warnings
+The 18:36 =make test= run passed (52/0/5) but raised 5 validation warnings. Each is investigated below with a recommendation. Most look like headless-VM / QEMU-slirp false positives the test harness should skip rather than archsetup bugs — but a couple have a real archsetup angle worth checking. Source: =test-results/20260511-183643/test.log= (WARN lines) and =scripts/testing/lib/validation.sh=.
+
+*** TODO [#C] Warning: Hyprland socket not found (Hyprland may not be running)
+=validate_hyprland_socket()= (=validation.sh:495=) looks for the Hyprland IPC socket and warns when it's absent. In the test VM that's expected — it's headless, nobody logs in graphically, Hyprland never starts, so there's no socket. Not an archsetup bug.
+Recommendation: harness fix in =validate_hyprland_socket()= — detect "no graphical session" (no =$WAYLAND_DISPLAY=/=$DISPLAY=, no graphical user in =loginctl=) and emit =validation_skip=/=validation_info= instead of =validation_warn=. Spinning up a real Hyprland session in the VM just to validate the socket is not worth it.
+
+*** TODO [#C] Warning: Could not query Settings portal (portal may not be running)
+=validate_portal_dark_mode()= (=validation.sh:506=) queries =org.freedesktop.portal.Desktop= for =color-scheme= and warns when it can't (portal not running). Two causes: (1) headless VM → =xdg-desktop-portal= isn't running; (2) the GNOME interface settings (=color-scheme=prefer-dark= etc.) didn't get written during install because =dconf write= failed (exit 1 — see the =[#B]= item above), so even with a portal there'd be nothing to read.
+Recommendation: (a) harness — skip the portal check in headless mode, same as the Hyprland-socket one; (b) the real fix is the =dconf write= exit-1 failure tracked in =[#B] Fix install errors= — once those writes succeed during install, the settings persist and the portal returns them on a real login (this is the "without these, portal-gtk waits ~50s for a settings proxy timeout" comment in =archsetup=).
+
+*** TODO [#C] Warning: mDNS ping failed (avahi may need time to propagate)
+=validate_avahi()= (=validation.sh:577=) checks =systemctl is-enabled avahi-daemon= (passes), then pings =<hostname>.local= as a full-stack mDNS test (warns on failure). The VM uses QEMU user-mode networking (slirp, =10.0.2.x=), which doesn't pass multicast — so mDNS / =.local= resolution genuinely can't work there regardless of timing. The propagation-delay message is misleading.
+Recommendation: harness fix in =validate_avahi()= — when the VM is on slirp networking (detect the =10.0.2.x= address or the absence of multicast), skip the =.local= ping and keep only the =is-enabled= check; or downgrade the ping result to =validation_info=. Optionally also bump the ping timeout/retry for the bridged-networking case. Not an archsetup bug.
+
+*** TODO [#C] Warning: User lingering not enabled (syncthing may not autostart)
+=validation.sh:661= runs =loginctl show-user <user> -p Linger= and warns if it isn't =yes=. archsetup *does* call =loginctl enable-linger "$username"= (=archsetup:1438= and =:1741=), and the install log shows =...enabling user-services lingering for cjennings @ 18:40:28= ran with no error — yet the check still says lingering is off. Likely the =logind=-unhappy-in-the-VM issue (the log-diff shows =logind: Failed to start session scope … Permission denied=) — =loginctl enable-linger= may return 0 but not actually create =/var/lib/systemd/linger/<user>=, or the =show-user= query itself may be wrong while logind is degraded. Possibly a real concern *if* the same happens on bare metal; almost certainly a VM artifact otherwise.
+Recommendation: investigate with =make test-keep= — after a run, check =ls -l /var/lib/systemd/linger/cjennings= and =loginctl show-user cjennings -p Linger= on the VM. If the file exists but =loginctl= disagrees → logind/dbus health issue (cross-ref the logind errors in =[#B]=; the fix may be to ensure =systemd-logind= is healthy before =enable-linger=). If the file doesn't exist → =enable-linger= is silently no-op'ing in the VM; consider a fallback (=install -d /var/lib/systemd/linger && touch /var/lib/systemd/linger/$username=) or running it later in the install. Either way the =enable-linger= call in =archsetup= is wired correctly.
+
+*** TODO [#C] Warning: Docker enabled but not responding
+=validation.sh:791= does =systemctl is-enabled docker= (passes) then =docker info= (warns on failure). Root cause: =archsetup:1981= does =systemctl enable docker.service= — *enable*, not =enable --now= — so docker is enabled-on-boot but not started during install. The test runs validations immediately after archsetup with no reboot, so =docker info= fails because the daemon isn't running yet. That's expected, not a bug. (The =daemon.json= with =storage-driver: zfs= at =archsetup:1971-1979= is correctly gated on =is_zfs_root=, so on the btrfs test VM Docker auto-picks the =btrfs= driver — not the cause here.)
+Recommendation: harness fix in the docker check — treat "enabled, not running pre-reboot" as a pass/info, or =systemctl start docker= first and then run =docker info=. Separately, decide whether =archsetup= should =enable --now docker= for immediate use or keep enable-on-boot (most steps use =enable --now=; docker being the exception may be intentional given the daemon's weight — if so, leave it and just fix the validation).
+
+Note: the run also logged two log-diff meta-warnings — "Found 4 new error lines after archsetup" and "New failed services detected (before: 1, after: 2)". Those correspond to the post-install systemd noise (pam_systemd / logind / Proton VPN) already captured under =[#B] Fix install errors= above; not duplicated here.
+
** TODO [#A] Generate recovery scripts from test failures
Auto-create post-install fix scripts for failed packages - makes failures actionable