diff options
16 files changed, 953 insertions, 199 deletions
@@ -133,3 +133,7 @@ Full palette reference: `assets/color-themes/dupre/dupre-palette.org` - **VM tests run committed code, not your working tree.** `scripts/testing/run-test.sh` provisions the VM from `git bundle create <file> HEAD` (it simulates `git clone`), so an uncommitted edit to `archsetup` or the pytest suite silently runs the old code. Commit (even a throwaway WIP commit) before `make test FS_PROFILE=...`, or the change isn't exercised. (`gotcha` — 2026-06-25) - **Iterate the pytest sweep against a kept VM, not a reinstall.** `make test-keep FS_PROFILE=...` leaves the VM up after the install and writes `testinfra_ssh_config` + `root_key` into `test-results/<timestamp>/`. Point pytest at that ssh-config to re-run only the Testinfra checks in ~30s instead of a ~70-minute full reinstall. Use it when iterating test assertions, not installer logic. (`pattern` — 2026-06-25) +- **VM UEFI NVRAM lives outside the qcow2 and must be per-profile.** OVMF boot entries live in the `OVMF_VARS` file, not the disk image, so reverting the `clean-install` snapshot does NOT restore them. The base ESPs have no removable `\EFI\BOOT\BOOTX64.EFI` fallback, so a base boots only via its NVRAM entry — lose or overwrite it and the VM dies in UEFI ("No bootable option") and SSH-times-out before archsetup runs. `init_vm_paths` now suffixes `OVMF_VARS` per `FS_PROFILE` (matching the disk image); never share one NVRAM file across btrfs/zfs. (`gotcha` — 2026-06-28) +- **sed/awk function extraction breaks on column-0 `}` inside heredocs.** The `tests/` harness and any `/^name() {/,/^}/` extraction stop at the first line beginning with `}` — but a JSON heredoc body (e.g. the docker `daemon.json` in `developer_workstation`) has a column-0 `}` that is NOT the function's close. Find the real closing brace before slicing, or the bounds are silently wrong. (`gotcha` — 2026-06-28) +- **AUR builds need ≥8 GiB VM RAM.** `makepkg` runs `-j$VM_CPUS`, and parallel `cc1plus` (~700 MB each on heavy C++ AUR packages) OOM-killed under the old 4 GiB `VM_RAM` default; the install still passed (yay retries) but the kills showed as attributed issues. Default is now 8192 MB. If you raise `VM_CPUS`, raise `VM_RAM` with it. (`threshold` — 2026-06-28) +- **Guard live upgrades with a PreTransaction hook, not a wrapper.** `hypr-live-update-guard` is a pacman `PreTransaction` hook (`AbortOnFail` + `NeedsTargets`) so it fires no matter how the upgrade launches (pacman, yay, topgrade) and aborts before any package is swapped — the safe point, since nothing is replaced yet. A shell wrapper around `pacman` would be bypassed by the other front-ends. (`pattern` — 2026-06-28) @@ -595,6 +595,29 @@ display() { ### Installation Helpers +# Describe-run-warn primitive. Announces a task, runs the command with +# stdout+stderr appended to $logfile, and on failure logs a non-fatal +# warning carrying the command's real exit code. Replaces the recurring +# action="desc" && display "task" "$action" +# cmd >> "$logfile" 2>&1 || error_warn "$action" "$?" +# idiom with a single call: +# run_task "desc" cmd arg... +run_task() { + local desc="$1" + shift + display "task" "$desc" + "$@" >> "$logfile" 2>&1 || error_warn "$desc" "$?" +} + +# Enable one or more systemd units with the conventional wording. +# Each unit is announced and warned independently via run_task. +enable_service() { + local unit + for unit in "$@"; do + run_task "enabling $unit service" systemctl enable "$unit" + done +} + MAX_INSTALL_RETRIES=3 retry_install() { local pkg="$1" @@ -874,6 +897,13 @@ prerequisites() { display "title" "Prerequisites" + bootstrap_pacman_keyring + install_required_software + configure_build_environment + configure_package_mirrors +} + +bootstrap_pacman_keyring() { display "subtitle" "Bootstrapping" # If the base ships informant (e.g. an archangel-installed system), it @@ -912,6 +942,9 @@ prerequisites() { done $refresh_ok || error_fatal "$action" "$?" +} + +install_required_software() { display "subtitle" "Required Software" for software in linux-firmware wireless-regdb base-devel ca-certificates \ @@ -920,6 +953,9 @@ prerequisites() { pacman_install "$software" done +} + +configure_build_environment() { display "subtitle" "Environment Configuration" # configure locale (must happen before package installs that depend on locale) @@ -974,6 +1010,9 @@ prerequisites() { pacman -Sy >> "$logfile" 2>&1 +} + +configure_package_mirrors() { action="Package Mirrors" && display "subtitle" "$action" pacman_install reflector @@ -1045,7 +1084,7 @@ create_user() { mkdir -p "/home/$username/.cache/zsh/" # give $username sudo nopasswd rights (required for aur installs) - display "task" "granting permissions" + action="granting permissions" && display "task" "$action" backup_system_file /etc/sudoers (echo "%$username ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers) \ || error_warn "$action" "$?" @@ -1076,6 +1115,16 @@ create_user() { user_customizations() { action="User Customizations" && display "title" "$action" + clone_user_repos + stow_dotfiles + prune_waybar_battery + refresh_desktop_caches + configure_dconf_defaults + finalize_dotfiles + create_user_directories +} + +clone_user_repos() { # Clone archsetup to user's home directory so dotfile symlinks are accessible. # This ensures symlinks point to a user-readable location regardless of how # archsetup was invoked (curl|bash, from /root, etc.) @@ -1108,6 +1157,9 @@ user_customizations() { # root runs stow/restore against the user-owned clone; mark it safe. git config --global --add safe.directory "$dotfiles_dir" >> "$logfile" 2>&1 || true +} + +stow_dotfiles() { # Stow the universal layer plus the per-environment layer. Headless installs # (none) get the standalone minimal/ tree instead of common/. case "$desktop_env" in @@ -1138,6 +1190,9 @@ user_customizations() { ;; esac +} + +prune_waybar_battery() { # Remove battery module from waybar config on desktops with no battery # (hyprland only — waybar isn't part of the dwm or minimal trees). if [[ "$desktop_env" == "hyprland" ]] && ! ls /sys/class/power_supply/BAT* &>/dev/null; then @@ -1150,12 +1205,14 @@ user_customizations() { sed -i '/"battery": {/,/^ },$/d' "$waybar_config" fi +} + +refresh_desktop_caches() { # install fontconfig before refreshing cache (provides fc-cache) pacman_install fontconfig # Refresh font cache for any fonts in dotfiles - action="refreshing font cache" && display "task" "$action" - fc-cache -f >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "refreshing font cache" fc-cache -f # install desktop-file-utils before updating database (provides update-desktop-database) pacman_install desktop-file-utils @@ -1165,6 +1222,9 @@ user_customizations() { (sudo -u "$username" update-desktop-database "/home/$username/.local/share/applications" \ >> "$logfile" 2>&1 ) || true +} + +configure_dconf_defaults() { # GTK and GNOME desktop interface settings — read by GTK apps and # xdg-desktop-portal-gtk. Written as a system-wide dconf db rather than # per-user dbus-launch dconf writes: the system path needs no session @@ -1193,6 +1253,9 @@ EOF dconf update ) >> "$logfile" 2>&1 || error_warn "$action" "$?" +} + +finalize_dotfiles() { action="marking archsetup dir as safe.directory" && display "task" "$action" git config --global --add safe.directory "$user_archsetup_dir" >> "$logfile" 2>&1 \ || error_warn "$action" "$?" @@ -1202,9 +1265,11 @@ EOF # (e.g. the /etc/skel .bashrc/.bash_profile a fresh user starts with). Runs # for every desktop_env, including none — minimal/ ships those skel-colliding # files too, so its --adopt needs the same restore. - action="restoring dotfile versions" && display "task" "$action" - git -C "$dotfiles_dir" restore . >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "restoring dotfile versions" git -C "$dotfiles_dir" restore . +} + +create_user_directories() { action="creating common directories" && display "task" "$action" # Create default directories and grant permissions { @@ -1254,22 +1319,36 @@ aur_installer() { ### Essential Services essential_services() { display "title" "Essential Services" + configure_randomness + configure_networking + configure_power + configure_ssh_server + configure_fail2ban + configure_firewall + configure_service_discovery + configure_job_scheduling + configure_package_cache + configure_snapshots + configure_user_lingering +} + +configure_randomness() { # Randomness display "subtitle" "Randomness" pacman_install rng-tools - action="enabling rngd service" && display "task" "$action" - systemctl enable rngd >> "$logfile" 2>&1 || error_warn "$action" "$?" - action="starting rngd service" && display "task" "$action" - systemctl start rngd >> "$logfile" 2>&1 || error_warn "$action" "$?" + enable_service rngd + run_task "starting rngd service" systemctl start rngd +} + +configure_networking() { # Networking display "subtitle" "Networking" pacman_install networkmanager - action="enabling NetworkManager" && display "task" "$action" - systemctl enable NetworkManager.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling NetworkManager" systemctl enable NetworkManager.service action="configuring MAC address randomization" && display "task" "$action" mkdir -p /etc/NetworkManager/conf.d @@ -1319,28 +1398,29 @@ EOF # Note: If Docker containers have DNS issues, systemd-resolved's stub resolver # (127.0.0.53) may be the cause. Fix: configure Docker to use direct DNS, or # disable systemd-resolved and use /etc/resolv.conf directly. (2026-01-18) - action="enabling systemd-resolved" && display "task" "$action" - systemctl enable systemd-resolved >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling systemd-resolved" systemctl enable systemd-resolved # Create resolv.conf symlink to systemd-resolved - action="linking resolv.conf to systemd-resolved" && display "task" "$action" - ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "linking resolv.conf to systemd-resolved" ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf +} + +configure_power() { # Power display "subtitle" "Power" pacman_install upower - action="enabling upower service" && display "task" "$action" - systemctl enable upower >> "$logfile" 2>&1 || error_warn "$action" "$?" + enable_service upower +} + +configure_ssh_server() { # Secure Shell display "subtitle" "Secure Shell" pacman_install openssh - action="enabling the openssh service to run at boot" && display "task" "$action" - systemctl enable sshd >> "$logfile" 2>&1 || error_warn "$action" "$?" - action="starting the openssh service" && display "task" "$action" - systemctl start sshd >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling the openssh service to run at boot" systemctl enable sshd + run_task "starting the openssh service" systemctl start sshd action="hardening sshd (root login by key only)" && display "task" "$action" cat << 'EOF' > /etc/ssh/sshd_config.d/10-hardening.conf @@ -1349,6 +1429,9 @@ EOF PermitRootLogin prohibit-password EOF systemctl reload sshd >> "$logfile" 2>&1 || error_warn "$action" "$?" +} + +configure_fail2ban() { # SSH Brute Force Protection @@ -1373,16 +1456,16 @@ maxretry = 3 bantime = 1h EOF - action="enabling fail2ban service" && display "task" "$action" - systemctl enable fail2ban >> "$logfile" 2>&1 || error_warn "$action" "$?" - action="starting fail2ban service" && display "task" "$action" - systemctl start fail2ban >> "$logfile" 2>&1 || error_warn "$action" "$?" + enable_service fail2ban + run_task "starting fail2ban service" systemctl start fail2ban +} + +configure_firewall() { display "subtitle" "Firewall" pacman_install ufw - action="configuring ufw to deny by default" && display "task" "$action" - ufw default deny incoming >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "configuring ufw to deny by default" ufw default deny incoming # Firewall rules - only open ports for services we actually run for protocol in \ @@ -1410,11 +1493,9 @@ EOF action="rate-limiting SSH to protect from brute force attacks" && display "task" "$action" (ufw limit 22/tcp >> "$logfile" 2>&1) || error_warn "$action" "$?" - action="enabling firewall" && display "task" "$action" - ufw --force enable >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling firewall" ufw --force enable - action="enabling firewall service to launch on boot" && display "task" "$action" - systemctl enable ufw.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling firewall service to launch on boot" systemctl enable ufw.service # Verify firewall is actually active # Note: In VM environments, UFW may show inactive due to missing kernel @@ -1425,6 +1506,9 @@ EOF error_messages=("FIREWALL NOT ACTIVE - run: sudo ufw enable" "${error_messages[@]}") error_warn "$action" "1" fi +} + +configure_service_discovery() { # Service Discovery @@ -1436,17 +1520,14 @@ EOF display "task" "skipping avahi (already running)" else pacman_install avahi # service discovery on a local network using mdns - action="enabling avahi for mDNS discovery" && display "task" "$action" - systemctl enable avahi-daemon.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling avahi for mDNS discovery" systemctl enable avahi-daemon.service fi pacman_install wsdd - action="enabling wsdd for Windows network discovery" && display "task" "$action" - systemctl enable wsdd.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling wsdd for Windows network discovery" systemctl enable wsdd.service pacman_install geoclue # geolocation service for location-aware apps - action="enabling geoclue geolocation service" && display "task" "$action" - systemctl enable geoclue.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling geoclue geolocation service" systemctl enable geoclue.service # Enable BeaconDB as geoclue wifi location provider (default MLS/Ichnaea API is defunct) action="configuring geoclue to use BeaconDB location service" && display "task" "$action" @@ -1476,37 +1557,52 @@ EOF After=systemd-sysusers.service EOF +} + +configure_job_scheduling() { # Job Scheduling display "subtitle" "Job Scheduling" pacman_install cronie - action="enabling cronie to launch at boot" && display "task" "$action" - systemctl enable cronie >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling cronie to launch at boot" systemctl enable cronie pacman_install at - action="enabling the batch delayed command scheduler" && display "task" "$action" - systemctl enable atd >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling the batch delayed command scheduler" systemctl enable atd action="installing log cleanup cron job" && display "task" "$action" (sudo -u "$username" crontab -l 2>/dev/null; \ echo "0 12 * * * \$HOME/.local/bin/cron/log-cleanup") \ | sudo -u "$username" crontab - \ >> "$logfile" 2>&1 || error_warn "$action" "$?" +} + +configure_package_cache() { # Package Repository Cache Maintenance display "subtitle" "Package Repository Cache Maintenance" pacman_install pacman-contrib - action="enabling the package cache cleanup timer" && display "task" "$action" - systemctl enable --now paccache.timer >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling the package cache cleanup timer" systemctl enable --now paccache.timer action="configuring paccache to keep 3 versions" && display "task" "$action" backup_system_file /etc/conf.d/pacman-contrib sed -i 's/^PACCACHE_ARGS=.*/PACCACHE_ARGS=-k3/' /etc/conf.d/pacman-contrib - # Snapshot Service - filesystem-aware +} + +configure_snapshots() { + display "subtitle" "Snapshot Service" if is_zfs_root; then + configure_zfs_snapshots + elif is_btrfs_root; then + configure_btrfs_snapshots + else + display "task" "ext4/other filesystem detected" + fi +} + +configure_zfs_snapshots() { # ZFS: Install sanoid for snapshot management display "task" "ZFS detected - installing sanoid" aur_install sanoid @@ -1610,8 +1706,7 @@ Persistent=true WantedBy=timers.target EOF - action="enabling sanoid timer" && display "task" "$action" - systemctl enable sanoid.timer >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling sanoid timer" systemctl enable sanoid.timer action="enabling weekly ZFS scrub" && display "task" "$action" # Get pool name dynamically (usually zroot) @@ -1623,7 +1718,9 @@ EOF # systemctl enable --now zfs-replicate.timer display "task" "zfs-replicate timer created (enable after SSH key setup to TrueNAS)" - elif is_btrfs_root; then +} + +configure_btrfs_snapshots() { # Btrfs: Install snapper for snapshot management display "task" "btrfs detected - installing snapper and grub-btrfs" pacman_install snapper @@ -1665,16 +1762,13 @@ EOF snapper -c root set-config "TIMELINE_LIMIT_MONTHLY=1" >> "$logfile" 2>&1 snapper -c root set-config "TIMELINE_LIMIT_YEARLY=0" >> "$logfile" 2>&1 - action="enabling snapper timeline timer" && display "task" "$action" - systemctl enable snapper-timeline.timer >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling snapper timeline timer" systemctl enable snapper-timeline.timer systemctl enable snapper-cleanup.timer >> "$logfile" 2>&1 || error_warn "$action" "$?" - action="enabling grub-btrfsd for boot menu snapshots" && display "task" "$action" - systemctl enable grub-btrfsd >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling grub-btrfsd for boot menu snapshots" systemctl enable grub-btrfsd # Allow user to use snapper without root (required for snapper-gui) - action="allowing wheel group to use snapper" && display "task" "$action" - snapper -c root set-config "ALLOW_GROUPS=wheel" >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "allowing wheel group to use snapper" snapper -c root set-config "ALLOW_GROUPS=wheel" snapper -c root set-config "SYNC_ACL=yes" >> "$logfile" 2>&1 || error_warn "$action" "$?" # Set ACL on .snapshots directory for wheel group access setfacl -m g:wheel:rx /.snapshots >> "$logfile" 2>&1 || error_warn "$action" "$?" @@ -1682,9 +1776,9 @@ EOF # Install snapper GUI (AUR) aur_install snapper-gui-git - else - display "task" "ext4/other filesystem detected" - fi +} + +configure_user_lingering() { # User Services Lingering # Keeps user-level systemd services (e.g., protonmail-bridge) running without @@ -1692,8 +1786,7 @@ EOF # user-level IMAP/SMTP daemons over SSH or from remote agents. display "subtitle" "User Services" - action="enabling user-services lingering for $username" && display "task" "$action" - loginctl enable-linger "$username" >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling user-services lingering for $username" loginctl enable-linger "$username" } ### Xorg Display Manager @@ -1718,8 +1811,7 @@ Section "ServerFlags" Option "DontZap" "True" EndSection EOF - action="configuring xorg server" && display "task" "$action" - chmod 644 /etc/X11/xorg.conf.d/00-no-vt-or-zap.conf >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "configuring xorg server" chmod 644 /etc/X11/xorg.conf.d/00-no-vt-or-zap.conf # Install GPU-specific drivers install_gpu_drivers @@ -1811,6 +1903,48 @@ UDEVEOF sed -i "s/ARCHSETUP_USERNAME/${username}/" /etc/udev/rules.d/99-logitech-brio.rules chmod 644 /etc/udev/rules.d/99-logitech-brio.rules fi + + # Live-update guard: a pacman PreTransaction hook that aborts an upgrade of + # GPU/compositor runtime libraries while a Hyprland session is running, so + # the live compositor doesn't SIGABRT when a library is swapped underneath + # it (hit ratio 2026-06-07: live mesa + hyprland upgrade crashed Hyprland and + # its clients). Re-run the upgrade from a TTY with Hyprland stopped and the + # guard stays quiet. + action="Live-Update Guard" && display "subtitle" "$action" + run_task "installing the live GPU/compositor update guard" \ + cp "$user_archsetup_dir/scripts/hypr-live-update-guard" /usr/local/bin/hypr-live-update-guard + chmod 755 /usr/local/bin/hypr-live-update-guard + + action="installing the live-update guard pacman hook" && display "task" "$action" + mkdir -p /etc/pacman.d/hooks + cat > /etc/pacman.d/hooks/hypr-live-update-guard.hook << 'HOOKEOF' +[Trigger] +Operation = Upgrade +Type = Package +Target = mesa +Target = mesa-* +Target = wayland +Target = libdrm +Target = libglvnd +Target = hyprland +Target = aquamarine +Target = hyprutils +Target = hyprgraphics +Target = vulkan-radeon +Target = vulkan-intel +Target = vulkan-mesa-layers +Target = nvidia-utils +Target = lib32-nvidia-utils +Target = xorg-xwayland + +[Action] +Description = Checking for a live Hyprland session before swapping GPU/compositor libs... +When = PreTransaction +Exec = /usr/local/bin/hypr-live-update-guard +AbortOnFail +NeedsTargets +HOOKEOF + chmod 644 /etc/pacman.d/hooks/hypr-live-update-guard.hook } ### Display Server (conditional) @@ -1968,8 +2102,7 @@ desktop_environment() { pacman_install "$software" done pacman_install solaar # Logitech device manager - action="enabling bluetooth to launch at boot" && display "task" "$action" - systemctl enable bluetooth.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling bluetooth to launch at boot" systemctl enable bluetooth.service # Command Line Utilities @@ -2085,8 +2218,7 @@ gaming() { pacman_install steam # Enable gamemode service for user - action="enabling gamemode for user" && display "task" "$action" - sudo -u "$username" systemctl --user enable gamemoded.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling gamemode for user" sudo -u "$username" systemctl --user enable gamemoded.service } ### Zig Toolchain Pin @@ -2192,6 +2324,14 @@ developer_workstation() { action="Developer Workstation" && display "title" "$action" + install_programming_languages + install_editors + install_android_utilities + install_vpn_tools + install_devops_utilities +} + +install_programming_languages() { action="Programming Languages and Utilities" && display "subtitle" "$action" # Rust (via rustup — must precede AUR packages that compile with rust) pacman_install rustup # Rust toolchain manager @@ -2259,6 +2399,9 @@ developer_workstation() { pacman_install hyperfine # statistical command-line benchmarking pacman_install doggo # modern dig: readable DNS client, DoH/DoT/DoQ +} + +install_editors() { action="Programming Editors" && display "subtitle" "$action" pacman_install mg # mini emacs @@ -2317,19 +2460,27 @@ developer_workstation() { >> "$logfile" 2>&1 || error_warn "$action" "$?" fi +} + +install_android_utilities() { action="Android Utilities" && display "subtitle" "$action" pacman_install android-file-transfer pacman_install android-tools +} + +install_vpn_tools() { action="VPN Tools" && display "subtitle" "$action" pacman_install wireguard-tools # VPN - add configs to /etc/wireguard/ pacman_install systemd-resolvconf # resolvconf for wg-quick DNS integration pacman_install proton-vpn-gtk-app # Proton VPN GUI client with system tray pacman_install tailscale # mesh VPN - run 'tailscale up' to authenticate - action="enabling tailscale service" && display "task" "$action" - systemctl enable tailscaled >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling tailscale service" systemctl enable tailscaled +} + +install_devops_utilities() { action="DevOps Utilities" && display "subtitle" "$action" action="installing devops virtualization and automation tools" && display "task" "$action" @@ -2357,8 +2508,7 @@ developer_workstation() { } EOF fi - action="enabling docker service to launch on boot" && display "task" "$action" - systemctl enable docker.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling docker service to launch on boot" systemctl enable docker.service # podman (rootless containers for winvm) pacman_install podman @@ -2496,8 +2646,7 @@ supplemental_software() { # makepkg's integrity check fails on that file even though the package tarball # itself verifies. Rechecked 2026-06-24 — the original expired-PGP-signature # cause is gone, but this LICENSE-drift keeps the workaround necessary. - action="installing python-lyricsgenius (integrity workaround)" && display "task" "$action" - yay -S --noconfirm --mflags --skipinteg python-lyricsgenius >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "installing python-lyricsgenius (integrity workaround)" yay -S --noconfirm --mflags --skipinteg python-lyricsgenius aur_install tidal-dl # tidal-dl:tidal as yt-dlp:youtube aur_install tidaler # tidal downloader (tidal-dl-ng fork) aur_install freetube # privacy-focused YouTube desktop client @@ -2507,6 +2656,16 @@ supplemental_software() { boot_ux() { action="Boot UX" && display "title" "$action" + tighten_efi_permissions + add_nvme_early_module + configure_initramfs_hook + configure_encrypted_autologin + configure_tlp_power + trim_firmware + configure_grub +} + +tighten_efi_permissions() { # Tighten /efi mount permissions so kernel images, initramfs, and # bootloader config aren't world-readable. archinstall's defaults leave # them at 0755; fmask/dmask below makes files 0600 and dirs 0700. @@ -2519,6 +2678,9 @@ boot_ux() { || error_warn "$action" "$?" fi +} + +add_nvme_early_module() { # Add nvme module for early loading on NVMe systems # Ensures NVMe devices are available when ZFS/other hooks try to access them if has_nvme_drives; then @@ -2546,6 +2708,9 @@ boot_ux() { echo "FONT=ter-132n" >> /etc/vconsole.conf fi +} + +configure_initramfs_hook() { # Only switch to systemd hook for non-ZFS systems # ZFS initramfs hook is busybox-based and incompatible with systemd hook if ! is_zfs_root; then @@ -2570,6 +2735,9 @@ StandardOutput=null StandardError=journal+console EOF +} + +configure_encrypted_autologin() { # Automatic login for encrypted systems (prompts if no CLI flag and root is encrypted) configure_autologin @@ -2592,6 +2760,9 @@ HandleLidSwitchExternalPower=ignore HandleLidSwitchDocked=ignore EOF +} + +configure_tlp_power() { # TLP power management — laptops only (battery present). Manages wifi, # USB, PCIe, and CPU power policy on AC/battery transitions. systemd-rfkill # is masked per TLP's docs (it fights TLP's radio-state handling). @@ -2610,12 +2781,14 @@ PLATFORM_PROFILE_ON_BAT=low-power # Off by default — uncomment (and match the BAT name) to enable. #STOP_CHARGE_THRESH_BAT1=80 EOF - action="enabling TLP service" && display "task" "$action" - systemctl enable tlp.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling TLP service" systemctl enable tlp.service systemctl mask systemd-rfkill.service systemd-rfkill.socket >> "$logfile" 2>&1 || \ error_warn "masking systemd-rfkill for TLP" "$?" fi +} + +trim_firmware() { # Firmware trim — Framework 13 Intel only (matched by DMI), where the # hardware set is known: i915 graphics (linux-firmware-intel), ath9k wifi # (linux-firmware-atheros, firmware-free driver but kept for safety), and @@ -2633,10 +2806,12 @@ EOF linux-firmware-mellanox linux-firmware-nfp linux-firmware-nvidia \ linux-firmware-other linux-firmware-qlogic linux-firmware-radeon \ >> "$logfile" 2>&1 || error_warn "$action" "$?" - action="rebuilding initramfs after firmware trim" && display "task" "$action" - mkinitcpio -P >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "rebuilding initramfs after firmware trim" mkinitcpio -P fi +} + +configure_grub() { # GRUB: reset timeouts, adjust log levels, larger menu for HiDPI screens, and show splashscreen # Note: nvme.noacpi=1 disables NVMe ACPI power management to prevent freezes on some drives. # Safe to keep on newer drives (minor power cost), remove if battery life is critical. @@ -2654,8 +2829,7 @@ EOF # Regenerate GRUB config after all modifications if [ -f /etc/default/grub ]; then - action="generating grub configuration" && display "task" "$action" - grub-mkconfig -o /boot/grub/grub.cfg >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "generating grub configuration" grub-mkconfig -o /boot/grub/grub.cfg fi } diff --git a/working/collapsible-waybar-sides/spike-findings.org b/assets/2026-06-18-collapsible-waybar-sides-spike-findings.org index 4d45ed1..4d45ed1 100644 --- a/working/collapsible-waybar-sides/spike-findings.org +++ b/assets/2026-06-18-collapsible-waybar-sides-spike-findings.org diff --git a/working/collapsible-waybar-sides/collapsible-waybar-sides-spec.org b/assets/2026-06-19-collapsible-waybar-sides-spec.org index b9ddc0d..b9ddc0d 100644 --- a/working/collapsible-waybar-sides/collapsible-waybar-sides-spec.org +++ b/assets/2026-06-19-collapsible-waybar-sides-spec.org diff --git a/assets/outbox/2026-06-25-1248-from-archangel-accepted-the-stale-baked-archzfs-db-zfs.org b/assets/outbox/2026-06-25-1248-from-archangel-accepted-the-stale-baked-archzfs-db-zfs.org new file mode 100644 index 0000000..1e0ebf4 --- /dev/null +++ b/assets/outbox/2026-06-25-1248-from-archangel-accepted-the-stale-baked-archzfs-db-zfs.org @@ -0,0 +1,5 @@ +#+TITLE: Accepted: the stale-baked-archzfs-db ZFS install bug. Diagno +#+SOURCE: from archangel +#+DATE: 2026-06-25 12:48:41 -0400 + +Accepted: the stale-baked-archzfs-db ZFS install bug. Diagnosis confirmed against source — install_base (installer/archangel:759) appends [archzfs] at ~771 and runs pacstrap -K at 805 with no db refresh between, so pacstrap resolves zfs-dkms from the baked ISO db (2.3.3) instead of the current channel (2.4.2). Filed as archangel [#A] :bug:blocker: 'Refresh archzfs db before pacstrap in install_base', tagged :blocker: so it surfaces as priority work. Fix direction: pacman -Sy after the repo appends and before pacstrap -K. A fresh ISO rebuild still needed for aged ISOs, but the refresh is the durable fix. Will notify you when a fixed ISO is available so you can retry make test FS_PROFILE=zfs. diff --git a/assets/outbox/2026-06-25-1347-from-archangel-delivered-you-re-unblocked-the-stale.org b/assets/outbox/2026-06-25-1347-from-archangel-delivered-you-re-unblocked-the-stale.org new file mode 100644 index 0000000..296e37c --- /dev/null +++ b/assets/outbox/2026-06-25-1347-from-archangel-delivered-you-re-unblocked-the-stale.org @@ -0,0 +1,5 @@ +#+TITLE: Delivered — you're unblocked. The stale-baked-archzfs-db ZFS +#+SOURCE: from archangel +#+DATE: 2026-06-25 13:47:29 -0400 + +Delivered — you're unblocked. The stale-baked-archzfs-db ZFS install bug is fixed and verified. installer/archangel now runs pacman -Syy before pacstrap (commit 89691a0, pushed to main). I rebuilt the ISO and ran the single-disk ZFS install in QEMU (scripts/test-install.sh single-disk = FILESYSTEM=zfs, NO_ENCRYPT=yes, the same scenario you reproduced): pacstrap resolved zfs-dkms-2.4.2-1 / zfs-utils-2.4.2-2 (not the stale 2.3.3), DKMS built zfs/2.4.2 against kernel 6.18.36-1-lts, the system rebooted, and the ZFS pool came up healthy. TEST PASSED. Fixed ISO: out/archangel-2026-06-25-vmlinuz-6.18.36-lts-x86_64.iso (kernel 6.18.36-lts). Retry make test FS_PROFILE=zfs against this ISO. Note: build.sh clears the host pacoloco archzfs cache on each build, so a fresh build is what guarantees current packages — if you ever see 2.3.3 again, rebuild rather than reusing an aged ISO. diff --git a/assets/outbox/2026-06-25-1359-from-archangel-got-the-heads-up-everything-it-flags-is.org b/assets/outbox/2026-06-25-1359-from-archangel-got-the-heads-up-everything-it-flags-is.org new file mode 100644 index 0000000..e437d7d --- /dev/null +++ b/assets/outbox/2026-06-25-1359-from-archangel-got-the-heads-up-everything-it-flags-is.org @@ -0,0 +1,5 @@ +#+TITLE: Got the heads-up — everything it flags is already handled. I +#+SOURCE: from archangel +#+DATE: 2026-06-25 13:59:17 -0400 + +Got the heads-up — everything it flags is already handled. I adopted the staged pacman -Syy edit as-is and committed it (89691a0, pushed to main), so no mystery edit remains. My own ISO rebuild ran clean (your work/ teardown left out/ intact; the new build created and tore down its own work/). Verified the ZFS single-disk install end-to-end before the unblock note I sent at 13:47 — that note supersedes this thread. You're clear to retry create-base-vm FS_PROFILE=zfs against out/archangel-2026-06-25-vmlinuz-6.18.36-lts-x86_64.iso. diff --git a/assets/outbox/2026-06-27-2148-from-archangel-accepted-both-archangel-bug-handoffs.org b/assets/outbox/2026-06-27-2148-from-archangel-accepted-both-archangel-bug-handoffs.org new file mode 100644 index 0000000..213017d --- /dev/null +++ b/assets/outbox/2026-06-27-2148-from-archangel-accepted-both-archangel-bug-handoffs.org @@ -0,0 +1,5 @@ +#+TITLE: Accepted both archangel bug handoffs (pacman.conf 0600, 2026 +#+SOURCE: from archangel +#+DATE: 2026-06-27 21:48:03 -0400 + +Accepted both archangel bug handoffs (pacman.conf 0600, 2026-06-25; baked AUR list breaks btrfs, 2026-06-27). Both diagnoses confirmed against archangel source. Filed as tasks; the baked-AUR-list one is tagged :blocker: since it blocks your btrfs base creation. Starting the fixes now — will ping you with a fixed ISO once both land and verify so you can rebuild the btrfs base and re-add btrfs to the green-test matrix. diff --git a/assets/outbox/2026-06-27-2301-from-archangel-delivered-both-archangel-bugs-fixed-and.org b/assets/outbox/2026-06-27-2301-from-archangel-delivered-both-archangel-bugs-fixed-and.org new file mode 100644 index 0000000..9384f88 --- /dev/null +++ b/assets/outbox/2026-06-27-2301-from-archangel-delivered-both-archangel-bugs-fixed-and.org @@ -0,0 +1,15 @@ +#+TITLE: Delivered — both archangel bugs fixed and verified, fixed IS +#+SOURCE: from archangel +#+DATE: 2026-06-27 23:01:40 -0400 + +Delivered — both archangel bugs fixed and verified, fixed ISO ready. + +ISO: /home/cjennings/code/archangel/out/archangel-2026-06-27-vmlinuz-6.18.36-lts-x86_64.iso (2.1G, built 2026-06-27 22:36) + +Bug #2 (the blocker — baked AUR list broke btrfs): fixed. install_base now filters the baked AUR set by target filesystem, dropping zfs-only tooling (zfs-auto-snapshot, zrepl) on a non-zfs install. The ISO still bakes the full set; only the install selection is filtered. Verified end-to-end in a VM: a btrfs-single unattended install now completes cleanly (the 266-pkg pacstrap transaction no longer includes zfs-auto-snapshot, so the 'unable to satisfy dependency zfs' abort is gone). A zfs single-disk install still passes, so the zfs path is unaffected. + +Bug #1 (installed /etc/pacman.conf landing 0600): fixed. strip_repo_stanza now writes through the existing config instead of mv-ing a 0600 mktemp over it, so the installed pacman.conf keeps its 0644. Unit-tested for mode preservation; the btrfs/zfs installs both completed past the strip step. + +You're unblocked: rebuild the btrfs base from this ISO and re-add btrfs to the green-test matrix. Fix is committed on archangel main (2ead674) and pushed. + +Note: the companion EFI \EFI\BOOT\BOOTX64.EFI removable-fallback hardening you mentioned is filed on the archangel side but not done in this pass — separate, optional, not part of this unblock. diff --git a/scripts/hypr-live-update-guard b/scripts/hypr-live-update-guard new file mode 100755 index 0000000..4f561ae --- /dev/null +++ b/scripts/hypr-live-update-guard @@ -0,0 +1,70 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-3.0-or-later +# hypr-live-update-guard - abort a live GPU/compositor library upgrade. +# +# Installed as a pacman PreTransaction hook. When an upgrade transaction +# includes GPU/compositor runtime libraries (mesa, hyprland, wayland, GPU +# drivers, ...) AND a Hyprland session is running, this aborts the +# transaction BEFORE any package is swapped. Replacing those libraries out +# from under a live compositor makes the next GPU-lib call hit a now +# "(deleted)" file and SIGABRT, taking the Wayland clients down with it +# (hit on ratio 2026-06-07: mesa + hyprland upgraded live, Hyprland crashed +# and took awww/insync/emacs with it). Aborting at PreTransaction is the +# safe point: nothing has been replaced yet, so the running session is +# untouched and the user can re-run the upgrade from a TTY. +# +# Pacman feeds the matched package names on stdin (NeedsTargets). +# +# Test seams / overrides (env): +# HYPR_GUARD_RUNNING 1/0 forces the running check (default: pgrep Hyprland) +# HYPR_ALLOW_LIVE_UPDATE 1 proceeds anyway (skip the guard) +# HYPR_GUARD_SENTINEL path whose existence also proceeds anyway +# (default /run/archsetup-allow-live-gpu-update, +# cleared on reboot since /run is tmpfs) + +set -u + +sentinel="${HYPR_GUARD_SENTINEL:-/run/archsetup-allow-live-gpu-update}" + +# Explicit override: the user knows what they're doing. +if [ "${HYPR_ALLOW_LIVE_UPDATE:-0}" = "1" ] || [ -e "$sentinel" ]; then + exit 0 +fi + +hyprland_running() { + if [ -n "${HYPR_GUARD_RUNNING:-}" ]; then + [ "$HYPR_GUARD_RUNNING" = "1" ] + return + fi + pgrep -x Hyprland >/dev/null 2>&1 +} + +# No live session means no live swap to worry about. Let the upgrade run -- +# this is exactly the from-a-TTY-after-logout path the warning points to. +hyprland_running || exit 0 + +# Collect the triggering packages (stdin from NeedsTargets) for the message. +pkgs=$(cat 2>/dev/null | sort -u | tr '\n' ' ') + +cat >&2 <<EOF + +========================================================================== + BLOCKED: live GPU/compositor library upgrade while Hyprland is running +========================================================================== + Packages in this upgrade can crash the running compositor if swapped now: + ${pkgs:-(GPU/compositor runtime libraries)} + + Replacing these out from under a live Hyprland session makes the next + GPU-lib call hit a deleted library and SIGABRT, taking your Wayland apps + down with it (and risking an unclean shutdown). + + Do it safely instead -- from a TTY with Hyprland stopped: + 1. Log out of Hyprland, or switch to a console (Ctrl+Alt+F2) and log in. + 2. Re-run the upgrade there: sudo pacman -Syu + + To override and proceed anyway (not recommended while Hyprland runs): + sudo touch $sentinel && sudo pacman -Syu +========================================================================== + +EOF +exit 1 diff --git a/scripts/testing/lib/vm-utils.sh b/scripts/testing/lib/vm-utils.sh index 10c0ca5..b85e773 100755 --- a/scripts/testing/lib/vm-utils.sh +++ b/scripts/testing/lib/vm-utils.sh @@ -11,7 +11,9 @@ # VM configuration defaults VM_CPUS="${VM_CPUS:-4}" -VM_RAM="${VM_RAM:-4096}" # MB +# 8 GiB headroom for AUR builds: makepkg runs -j$VM_CPUS, and parallel cc1plus +# (~700 MB each on heavy C++ packages) OOM-killed under the old 4 GiB default. +VM_RAM="${VM_RAM:-8192}" # MB VM_DISK_SIZE="${VM_DISK_SIZE:-50}" # GB # Filesystem profile: selects which base image + archangel config the harness @@ -59,7 +61,11 @@ init_vm_paths() { local img_suffix="" [ "$FS_PROFILE" != "btrfs" ] && img_suffix="-$FS_PROFILE" DISK_PATH="$VM_IMAGES_DIR/archsetup-base${img_suffix}.qcow2" - OVMF_VARS="$VM_IMAGES_DIR/OVMF_VARS.fd" + # Per-profile NVRAM: UEFI boot entries live here, outside the qcow2, so a + # disk-snapshot revert can't restore them. Sharing one file across profiles + # let a zfs run's ZFSBootMenu entries clobber the btrfs GRUB entry, leaving + # the btrfs base unbootable (no removable ESP fallback to recover from). + OVMF_VARS="$VM_IMAGES_DIR/OVMF_VARS${img_suffix}.fd" PID_FILE="$VM_IMAGES_DIR/qemu.pid" MONITOR_SOCK="$VM_IMAGES_DIR/qemu-monitor.sock" SERIAL_LOG="$VM_IMAGES_DIR/qemu-serial.log" diff --git a/scripts/testing/tests/test_desktop.py b/scripts/testing/tests/test_desktop.py index 53e54e1..c02d2b6 100644 --- a/scripts/testing/tests/test_desktop.py +++ b/scripts/testing/tests/test_desktop.py @@ -50,6 +50,19 @@ def test_hyprland_config_present(host, hyprland_installed, home, rel): @pytest.mark.attribution("archsetup") +def test_live_update_guard_installed(host, hyprland_installed): + if not hyprland_installed: + pytest.skip("Hyprland not installed (DESKTOP_ENV != hyprland)") + guard = host.file("/usr/local/bin/hypr-live-update-guard") + assert guard.exists, "live-update guard script missing" + assert guard.mode & 0o111, "live-update guard not executable" + hook = host.file("/etc/pacman.d/hooks/hypr-live-update-guard.hook") + assert hook.exists, "live-update guard pacman hook missing" + assert "hypr-live-update-guard" in hook.content_string, \ + "hook does not invoke the guard script" + + +@pytest.mark.attribution("archsetup") def test_portal_settings_backend_not_disabled(host, hyprland_installed, home): if not hyprland_installed: pytest.skip("Hyprland not installed") diff --git a/tests/hypr-live-update-guard/test_hypr_live_update_guard.py b/tests/hypr-live-update-guard/test_hypr_live_update_guard.py new file mode 100644 index 0000000..5ec5ce8 --- /dev/null +++ b/tests/hypr-live-update-guard/test_hypr_live_update_guard.py @@ -0,0 +1,95 @@ +"""Tests for the hypr-live-update-guard pacman PreTransaction hook script. + +The guard aborts a live pacman upgrade of GPU/compositor runtime libraries +(mesa, hyprland, wayland, GPU drivers) while a Hyprland session is running, +so the compositor doesn't SIGABRT when a now-"(deleted)" library is next +called. It reads the triggering package names on stdin (pacman NeedsTargets) +and exits non-zero to abort the transaction (AbortOnFail) before any package +is swapped. When Hyprland isn't running, or an override is set, it exits 0 +and the upgrade proceeds. + +Test seams (env vars the production script honors): + HYPR_GUARD_RUNNING 1/0 forces the Hyprland-running check (default: pgrep) + HYPR_ALLOW_LIVE_UPDATE 1 overrides the guard (proceed anyway) + HYPR_GUARD_SENTINEL path whose existence also overrides the guard + +Run from repo root: + python3 -m unittest tests.hypr-live-update-guard.test_hypr_live_update_guard +""" + +import os +import subprocess +import tempfile +import unittest + + +REPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) +GUARD = os.path.join(REPO_ROOT, "scripts", "hypr-live-update-guard") + + +def run_guard(stdin="mesa\n", running="1", allow=None, sentinel=None): + env = dict(os.environ) + env["HYPR_GUARD_RUNNING"] = running + if allow is not None: + env["HYPR_ALLOW_LIVE_UPDATE"] = allow + # Point the sentinel at a path that does not exist unless a test sets one, + # so the host's real /run state can't leak into the result. + env["HYPR_GUARD_SENTINEL"] = sentinel if sentinel else "/nonexistent/guard-sentinel" + return subprocess.run( + ["sh", GUARD], + input=stdin, capture_output=True, text=True, timeout=10, env=env, + ) + + +class HyprLiveUpdateGuard(unittest.TestCase): + # --- Normal cases --------------------------------------------------- + + def test_running_with_dangerous_pkg_aborts(self): + r = run_guard(stdin="mesa\n", running="1") + self.assertEqual(r.returncode, 1, r.stderr) + + def test_abort_message_names_the_package_and_tty_remedy(self): + r = run_guard(stdin="mesa\n", running="1") + self.assertIn("mesa", r.stderr) + self.assertIn("TTY", r.stderr) + + def test_not_running_allows(self): + r = run_guard(stdin="mesa\n", running="0") + self.assertEqual(r.returncode, 0, r.stderr) + + def test_not_running_is_silent(self): + r = run_guard(stdin="mesa\nhyprland\n", running="0") + self.assertEqual(r.stderr.strip(), "") + + # --- Boundary cases ------------------------------------------------- + + def test_multiple_packages_all_listed(self): + r = run_guard(stdin="mesa\nhyprland\nvulkan-radeon\n", running="1") + self.assertEqual(r.returncode, 1) + for pkg in ("mesa", "hyprland", "vulkan-radeon"): + self.assertIn(pkg, r.stderr) + + def test_running_with_empty_stdin_still_guards(self): + # The hook only fires when dangerous targets exist, so an empty target + # list shouldn't normally happen; if Hyprland is up, stay safe (abort). + r = run_guard(stdin="", running="1") + self.assertEqual(r.returncode, 1) + + # --- Override / error cases ----------------------------------------- + + def test_env_override_proceeds_even_when_running(self): + r = run_guard(stdin="mesa\n", running="1", allow="1") + self.assertEqual(r.returncode, 0, r.stderr) + + def test_sentinel_file_override_proceeds(self): + with tempfile.NamedTemporaryFile(prefix="guard-allow-") as f: + r = run_guard(stdin="mesa\n", running="1", sentinel=f.name) + self.assertEqual(r.returncode, 0, r.stderr) + + def test_override_env_zero_does_not_bypass(self): + r = run_guard(stdin="mesa\n", running="1", allow="0") + self.assertEqual(r.returncode, 1, r.stderr) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/installer-steps/test_orchestrators.py b/tests/installer-steps/test_orchestrators.py new file mode 100644 index 0000000..e62c198 --- /dev/null +++ b/tests/installer-steps/test_orchestrators.py @@ -0,0 +1,117 @@ +"""Characterization tests for the decomposed installer step orchestrators. + +The 2026 decomposition turned the giant step functions into thin +orchestrators that call one named sub-function per concern. These tests pin +the call SEQUENCE of each orchestrator: a dropped, added, or reordered +sub-step call fails the test. They guard the wiring, not the sub-functions' +own behavior (those mutate the system and are exercised by the VM harness). + +Method: sed-extract the orchestrator from the real `archsetup` (its body is +now just `display` + sub-function calls), source it with `display` silenced +and every sub-function replaced by a recorder that echoes its own name, run +it, and assert stdout is the expected ordered list. + +Run from repo root: + python3 -m unittest tests.installer-steps.test_orchestrators +""" + +import os +import subprocess +import textwrap +import unittest + + +REPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) +ARCHSETUP = os.path.join(REPO_ROOT, "archsetup") + +# orchestrator -> exact ordered sub-step calls +ORCHESTRATORS = { + "essential_services": [ + "configure_randomness", "configure_networking", "configure_power", + "configure_ssh_server", "configure_fail2ban", "configure_firewall", + "configure_service_discovery", "configure_job_scheduling", + "configure_package_cache", "configure_snapshots", + "configure_user_lingering", + ], + "prerequisites": [ + "bootstrap_pacman_keyring", "install_required_software", + "configure_build_environment", "configure_package_mirrors", + ], + "developer_workstation": [ + "install_programming_languages", "install_editors", + "install_android_utilities", "install_vpn_tools", + "install_devops_utilities", + ], + "boot_ux": [ + "tighten_efi_permissions", "add_nvme_early_module", + "configure_initramfs_hook", "configure_encrypted_autologin", + "configure_tlp_power", "trim_firmware", "configure_grub", + ], + "user_customizations": [ + "clone_user_repos", "stow_dotfiles", "prune_waybar_battery", + "refresh_desktop_caches", "configure_dconf_defaults", + "finalize_dotfiles", "create_user_directories", + ], +} + + +def run_orchestrator(func, stubs, extra_defs=""): + """Source `func` from archsetup with `stubs` recording their names.""" + stub_defs = "\n".join(f"{s}() {{ echo {s}; }}" for s in stubs) + script = textwrap.dedent(f"""\ + display() {{ :; }} + {stub_defs} + {extra_defs} + source <(sed -n '/^{func}() {{/,/^}}/p' "{ARCHSETUP}") + {func} + """) + result = subprocess.run( + ["bash", "-c", script], + capture_output=True, text=True, timeout=10, + ) + return result + + +class OrchestratorSequence(unittest.TestCase): + def test_each_orchestrator_calls_substeps_in_order(self): + for func, expected in ORCHESTRATORS.items(): + with self.subTest(orchestrator=func): + result = run_orchestrator(func, expected) + self.assertEqual(result.returncode, 0, result.stderr) + got = result.stdout.split() + self.assertEqual(got, expected, + f"{func} call sequence drifted") + + +class SnapshotDispatch(unittest.TestCase): + """configure_snapshots branches on filesystem; pin each branch.""" + + SUBS = ["configure_zfs_snapshots", "configure_btrfs_snapshots"] + + def test_zfs_root_runs_zfs_snapshots(self): + result = run_orchestrator( + "configure_snapshots", self.SUBS, + extra_defs="is_zfs_root() { return 0; }\nis_btrfs_root() { return 1; }", + ) + self.assertEqual(result.returncode, 0, result.stderr) + self.assertEqual(result.stdout.split(), ["configure_zfs_snapshots"]) + + def test_btrfs_root_runs_btrfs_snapshots(self): + result = run_orchestrator( + "configure_snapshots", self.SUBS, + extra_defs="is_zfs_root() { return 1; }\nis_btrfs_root() { return 0; }", + ) + self.assertEqual(result.returncode, 0, result.stderr) + self.assertEqual(result.stdout.split(), ["configure_btrfs_snapshots"]) + + def test_other_filesystem_runs_neither(self): + result = run_orchestrator( + "configure_snapshots", self.SUBS, + extra_defs="is_zfs_root() { return 1; }\nis_btrfs_root() { return 1; }", + ) + self.assertEqual(result.returncode, 0, result.stderr) + self.assertEqual(result.stdout.split(), []) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/run-task/test_run_task.py b/tests/run-task/test_run_task.py new file mode 100644 index 0000000..35036dd --- /dev/null +++ b/tests/run-task/test_run_task.py @@ -0,0 +1,172 @@ +"""Tests for the run_task / enable_service helpers in the archsetup installer. + +run_task is the installer's describe-run-warn primitive. It replaces the +hand-written idiom that recurs ~100 times across the script: + + action="enabling rngd service" && display "task" "$action" + systemctl enable rngd >> "$logfile" 2>&1 || error_warn "$action" "$?" + +as a single call: + + run_task "enabling rngd service" systemctl enable rngd + +It announces the task via display, runs the command with stdout+stderr +appended to $logfile, and on failure calls error_warn with the command's +real exit code (non-fatal). enable_service is a thin wrapper that enables +one or more systemd units with the conventional "enabling <unit> service" +wording. + +These tests exercise the REAL function bodies, extracted from the +`archsetup` script at run time (not a copy), with recording stubs standing +in for display, error_warn, and systemctl. The command run by run_task is +genuinely executed. + +Run from repo root: + python3 -m unittest tests.run-task.test_run_task +""" + +import os +import shutil +import subprocess +import tempfile +import unittest + + +REPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) +ARCHSETUP = os.path.join(REPO_ROOT, "archsetup") + +# A bash harness that sources the real run_task + enable_service out of the +# installer, with recording stubs for their dependencies. Each stub appends a +# tab-separated record to a file named by an env var, so the Python side can +# assert what was called. The real command passed to run_task still runs. +WRAPPER = r"""#!/bin/bash +ARCHSETUP="$1"; shift +logfile="$LOGFILE" + +display() { printf '%s\t%s\n' "$1" "$2" >> "$DISPLAY_LOG"; } +error_warn() { printf '%s\t%s\n' "$1" "$2" >> "$ERRWARN_LOG"; return 1; } +systemctl() { printf 'systemctl %s\n' "$*"; } + +source <(sed -n '/^run_task() {/,/^}/p' "$ARCHSETUP") +source <(sed -n '/^enable_service() {/,/^}/p' "$ARCHSETUP") + +"$@" +""" + + +class RunTaskHarness(unittest.TestCase): + def setUp(self): + self.tmp = tempfile.mkdtemp(prefix="run-task-test-") + self.wrapper = os.path.join(self.tmp, "run.sh") + with open(self.wrapper, "w") as f: + f.write(WRAPPER) + os.chmod(self.wrapper, 0o755) + self.logfile = os.path.join(self.tmp, "install.log") + self.display_log = os.path.join(self.tmp, "display.log") + self.errwarn_log = os.path.join(self.tmp, "errwarn.log") + + def tearDown(self): + shutil.rmtree(self.tmp, ignore_errors=True) + + def call(self, *args): + env = dict(os.environ) + env["LOGFILE"] = self.logfile + env["DISPLAY_LOG"] = self.display_log + env["ERRWARN_LOG"] = self.errwarn_log + return subprocess.run( + ["bash", self.wrapper, ARCHSETUP, *args], + capture_output=True, text=True, timeout=10, env=env, + ) + + def read(self, path): + if not os.path.exists(path): + return "" + with open(path) as f: + return f.read() + + # --- Normal cases ----------------------------------------------------- + + def test_run_task_success_announces_and_runs(self): + result = self.call("run_task", "doing a thing", "true") + self.assertEqual(result.returncode, 0, result.stderr) + # Announced as a "task" with the exact description. + self.assertEqual(self.read(self.display_log), "task\tdoing a thing\n") + # No warning on success. + self.assertEqual(self.read(self.errwarn_log), "") + + def test_run_task_captures_command_output_to_logfile(self): + result = self.call("run_task", "echo something", "echo", "hello-from-cmd") + self.assertEqual(result.returncode, 0, result.stderr) + self.assertIn("hello-from-cmd", self.read(self.logfile)) + # Command output is logged, not printed to the terminal. + self.assertNotIn("hello-from-cmd", result.stdout) + + def test_run_task_captures_stderr_to_logfile(self): + # `ls` of a missing path writes to stderr; it must land in the logfile. + missing = os.path.join(self.tmp, "no-such-path") + self.call("run_task", "listing", "ls", missing) + self.assertIn("no-such-path", self.read(self.logfile)) + + def test_run_task_preserves_multiple_arguments(self): + self.call("run_task", "multi-arg", "printf", "%s|%s|%s", "a", "b", "c") + self.assertIn("a|b|c", self.read(self.logfile)) + + def test_run_task_preserves_arguments_with_spaces(self): + self.call("run_task", "spacey", "printf", "[%s]", "two words") + self.assertIn("[two words]", self.read(self.logfile)) + + # --- enable_service --------------------------------------------------- + + def test_enable_service_single_unit(self): + self.call("enable_service", "rngd") + self.assertEqual(self.read(self.display_log), "task\tenabling rngd service\n") + self.assertIn("systemctl enable rngd", self.read(self.logfile)) + + def test_enable_service_multiple_units(self): + self.call("enable_service", "foo", "bar", "baz") + disp = self.read(self.display_log) + self.assertIn("task\tenabling foo service\n", disp) + self.assertIn("task\tenabling bar service\n", disp) + self.assertIn("task\tenabling baz service\n", disp) + log = self.read(self.logfile) + self.assertIn("systemctl enable foo", log) + self.assertIn("systemctl enable bar", log) + self.assertIn("systemctl enable baz", log) + + # --- Error cases ------------------------------------------------------ + + def test_run_task_failure_warns_with_description(self): + result = self.call("run_task", "failing thing", "false") + self.assertNotEqual(result.returncode, 0) + self.assertEqual(self.read(self.errwarn_log), "failing thing\t1\n") + + def test_run_task_failure_propagates_real_exit_code(self): + # `bash -c 'exit 42'` must surface 42 to error_warn, not a clobbered 0. + self.call("run_task", "exit-42", "bash", "-c", "exit 42") + self.assertEqual(self.read(self.errwarn_log), "exit-42\t42\n") + + def test_enable_service_failure_warns_per_unit(self): + # Override systemctl to fail; each unit should produce a warning. + env = dict(os.environ) + env["LOGFILE"] = self.logfile + env["DISPLAY_LOG"] = self.display_log + env["ERRWARN_LOG"] = self.errwarn_log + # Re-create wrapper with a failing systemctl stub for this case. + failing = os.path.join(self.tmp, "run-fail.sh") + with open(failing, "w") as f: + f.write(WRAPPER.replace( + "systemctl() { printf 'systemctl %s\\n' \"$*\"; }", + "systemctl() { printf 'systemctl %s\\n' \"$*\"; return 1; }", + )) + os.chmod(failing, 0o755) + subprocess.run( + ["bash", failing, ARCHSETUP, "enable_service", "alpha", "beta"], + capture_output=True, text=True, timeout=10, env=env, + ) + warns = self.read(self.errwarn_log) + self.assertIn("enabling alpha service\t1\n", warns) + self.assertIn("enabling beta service\t1\n", warns) + + +if __name__ == "__main__": + unittest.main() @@ -21,6 +21,26 @@ The vocabulary is open — topic tags are coined as needed — so these are conv - *Effort / autonomy*: =:quick:= a spare-moment fix (minutes, not a sitting); =:solo:= Claude can carry it end to end — there's a build path, a test path, and no upfront decision needed (a leftover manual spot-check doesn't disqualify it). - *Topic / area* (open): the subsystem a task touches — e.g. =:hyprland:= =:waybar:= =:mpd:= =:music:= =:network:= =:tooling:= =:llm:= =:eask:= =:pocketbook:= =:cmail:=. Coin a new one when it aids filtering. * Archsetup Open Work +** TODO [#B] btrfs base VM unbuildable — archangel ISO bakes zfs-auto-snapshot :bug:test: +=make test-vm-base= (btrfs) fails in archangel's installer: the ISO bakes a fixed +AUR list ("downgrade yay informant zrepl pacman-cleanup-hook zfs-auto-snapshot +topgrade ventoy-bin") into every install regardless of =FILESYSTEM=. On a btrfs +install =zfs= isn't present, so =zfs-auto-snapshot='s =zfs= dependency can't +resolve and the unattended pacstrap aborts ("unable to satisfy dependency 'zfs' +required by zfs-auto-snapshot"). This is an archangel ISO bug (the baked list isn't +controllable from =archsetup-test.conf=), so it blocks btrfs-profile VM testing +until archangel ships an ISO that conditions the AUR list on the filesystem (or +drops zfs tooling from non-zfs installs). The 2026-06-27 btrfs base regen attempt +also wiped the prior (unbootable) btrfs base, so there's no btrfs base image until +this is fixed. zfs-profile testing works (=make test FS_PROFILE=zfs=). + +Companion hardening (defense-in-depth, archangel-side): install the bootloader +with a removable =\EFI\BOOT\BOOTX64.EFI= fallback so a base boots even from +fresh/empty NVRAM, and real installs survive firmware that drops boot entries. + +** TODO [#C] Scratchpad launch turns on focus-follows-mouse :bug:hyprland: +Imported from roam inbox 2026-06-25. Repro: with two tiled windows, moving the mouse over the other tile does nothing (focus-follows-mouse off, as expected). Then launch a terminal (scratchpad), move the mouse over a tile, and focus now switches to the window under the pointer. Something about the scratchpad/terminal launch flips focus-follows-mouse on. Find what re-enables it (likely a Hyprland focus/input setting or a pyprland scratchpad side effect) and keep it off. + ** TODO [#B] Scrolling layout: frame fit + wrap-around :hyprland: :PROPERTIES: :LAST_REVIEWED: 2026-06-13 @@ -53,14 +73,6 @@ From the roam inbox: Zoom opens at a tiny size. Needs diagnosis (HiDPI scaling v :END: From the roam inbox: hiding a window (e.g. the org-capture popup) then unhiding it should leave the unhidden window focused, but another window typically takes focus. Also =ctrl+j/k= (layout-navigate) can't reach the unhidden window afterward — it should always reach any visible window except the waybar. Involves stash-restore + layout-navigate; needs interactive reproduction with Craig. -** TODO [#B] Guard against live mesa/hyprland/wayland-runtime updates :hyprland: -:PROPERTIES: -:LAST_REVIEWED: 2026-06-09 -:END: -A live =pacman -Syu= that swaps mesa/hyprland/wayland runtime libs out from under a running Hyprland session can crash the compositor: the next GPU-lib call hits a now-"(deleted)" library and SIGABRTs, taking the Wayland clients down with it. Hit ratio 2026-06-07 (mesa 26.0.6 -> 26.1.2 + hyprland upgraded live; Hyprland SIGABRT took down awww/insync/emacs). Likely the driver behind ratio's high lifetime unsafe-shutdown ratio — a crashed compositor forces a hard reset. - -Ship a guard: an update wrapper, or a documented practice, that when a pending =-Syu= set includes mesa/hyprland/wayland runtime libs advises running it from a TTY (or after logging out of Hyprland) rather than live. Returned to archsetup from archangel 2026-06-09 — hyprland/mesa are installed and managed by archsetup, not the ISO installer. - ** TODO [#C] Pocketbook development backlog :pocketbook: :PROPERTIES: :LAST_REVIEWED: 2026-05-26 @@ -136,15 +148,10 @@ A roam-inbox capture asked for the same widget and expands the scope, so folding - *Multiple simultaneous* — several timers/alarms/stopwatches set and displayed at once, in one panel. - Deliverable includes proposing a few panel designs and recommending one before building. -** TODO [#B] Collapsible waybar sides :waybar: -:PROPERTIES: -:LAST_REVIEWED: 2026-06-09 -:END: -Let either side of the waybar collapse horizontally to a minimal base set, toggled by a click. Each collapsible side carries a small triangle / arrowhead pointing toward the screen edge it collapses into (away from center). Clicking it collapses that side to its base set and flips the arrow to point back toward center; clicking again restores the full side. Same shape-changes-with-state idea as the auto-dim indicator. - -Spec ready (2026-06-19): [[file:working/collapsible-waybar-sides/collapsible-waybar-sides-spec.org]]. Spike settled the mechanism: [[file:working/collapsible-waybar-sides/spike-findings.org]]. +** TODO [#B] Sysmon module right-click cycles the visible metric :feature:waybar: +Builds on the just-shipped =custom/sysmon= collapse (dotfiles be7469b). Right-clicking the module rotates which metric is the visible one, in a fixed order: battery → cpu → temp → mem → disk → back to battery. Each click advances one step and wraps around. The host default (battery on a laptop, disk on a desktop) is the starting/reset metric; the tooltip keeps showing all metrics regardless. Left-click stays =pypr toggle monitor= (the btop popup) — the cycle lives on =on-click-right=. -Decisions locked: right base set = date + worldclock + tray; left base set = menu + workspaces; per-side independent; host-agnostic (base set constant, full set is each host's existing config). Mechanism = config-swap + SIGUSR2 reload via an active-config copy in =$XDG_RUNTIME_DIR= (the CSS/state-file approach was disproven — GTK3 can't reflow-hide native modules). Lives in =~/.dotfiles/hyprland/=. Next: implement per the spec (TDD the toggle + arrow scripts). +Implementation notes: =waybar-sysmon= needs a persisted selection (a state file in =$XDG_RUNTIME_DIR/waybar/=, absent = host default) that it reads to pick the visible metric. A new =sysmon-cycle= helper bumps the index and signals the module to refresh (add a =signal= to =custom/sysmon=, like the other custom modules; wire =sysmon-cycle= to =on-click-right=). TDD both — extend =tests/waybar-sysmon= for selection-driven output, add a =tests/sysmon-cycle= for the index advance/wrap and the signal. ** TODO [#B] Network-manager dropdown, nmcli-backed with GPG-stored secrets :waybar:network: :PROPERTIES: @@ -521,81 +528,6 @@ Some operations log to ~$logfile~, others don't - standardize logging All package installs should log, all system modifications should log, all errors should log with context Makes debugging failed installations easier -** DONE [#B] Add backup before system file modifications :solo: -CLOSED: [2026-06-25 Thu] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-24 -:END: -Safety net for /etc/X11/xorg.conf.d and other system file edits -Files like ~/etc/sudoers~, ~/etc/pacman.conf~, ~/etc/default/grub~ modified without backup -If modifications fail or are incorrect, difficult to recover - should backup files to ~.backup~ before modifying - -Done 2026-06-25: added a =backup_system_file <path>= helper next to =safe_rm_rf= — it snapshots a pre-existing file to =<path>.archsetup.bak= before an in-place edit, idempotent (never clobbers an existing backup, so the pristine original survives repeated edits and re-runs), =cp -p= to preserve mode/ownership, no-op when the file is absent. Took the narrow scope (Craig's call): route only the in-place =sed -i= / append edits to *pre-existing* files through it — locale.gen, makepkg.conf, pacman.conf, sudoers, conf.d/wireless-regdom, geoclue.conf, conf.d/pacman-contrib, fstab, mkinitcpio.conf, vconsole.conf — and skip the brand-new drop-in files archsetup fully owns (nothing to back up; recovery is just deleting them). Tests: =tests/backup-system-file/= (7 Normal/Boundary/Error, incl. mode-preserved, existing-backup-not-overwritten, missing-target no-op, cp-failure). =make test-unit= green across all 5 suites; =bash -n= clean; only shellcheck note is the known SC2329 false positive (indirect STEPS dispatch). Integration verification is the next VM run. - -** DONE [#B] Migrate bare-metal test runner to Testinfra, then delete the shell sweep :test: -CLOSED: [2026-06-25 Thu] -Plan + ZFS-coverage expansion: [[file:docs/design/2026-06-25-zfs-vm-test-coverage.org]] (build a ZFS base VM via archangel + a =FS_PROFILE= selector so =make test= covers the ZFS path, then migrate this runner to key auth + Testinfra against it, then delete the dead =validation.sh= functions = phase E here). -=run-test.sh= (VM) now uses the Testinfra/pytest sweep as its authoritative validator, but =run-test-baremetal.sh= (lines ~243-244) still calls the old =run_all_validations= / =validate_all_services= from =scripts/testing/lib/validation.sh=. Migrate the bare-metal runner to =run_testinfra_validation= too (same key + ssh-config approach, adapted for a real host), then delete the now-dead shell-sweep functions from =validation.sh=. Keep the live helpers: =ssh_cmd=, =attribute_issue=, =capture_pre/post_install_state=, =analyze_log_diff=, =categorize_errors=, =generate_issue_report=, and the =VALIDATION_*= counters/arrays. Deferred from the Testinfra cutover because it needs a bare-metal test loop to validate, out of scope for the VM-only autonomous run. -*** 2026-06-25 Thu @ 12:37:02 -0400 P-A/P-B shipped (FS_PROFILE selector); P-C blocked on archangel ZFS-install bug -P-A + P-B landed in =353b179=: =archsetup-test-zfs.conf= (archangel ZFS config) + an =FS_PROFILE= (btrfs default / zfs) selector across =vm-utils.sh= (=init_vm_paths= derives a per-profile image + validates the profile), =create-base-vm.sh= (selects the archangel config), =run-test.sh= (--help + profile display), and the Makefile (=make test FS_PROFILE=zfs=). Design simplification recorded: no =archsetup-vm-zfs.conf= needed — archsetup auto-detects ZFS from the live root via =is_zfs_root()=, so the archsetup run config is shared; only the archangel base config + base image differ. Open Q1 resolved: archangel supports ZFS root natively (it's the default FS). - -P-C (build the ZFS base image) is BLOCKED on archangel. =create-base-vm.sh FS_PROFILE=zfs= built the disk + booted the archangel ISO fine, but the archangel install died: =dkms install zfs/2.3.3 -k 6.18.36-1-lts= exited 1, ZFS module not built. Root cause is in archangel, not archsetup: it appends the [archzfs] experimental repo then runs =pacstrap -K= with no =pacman -Sy= refresh, so it uses the archzfs sync db baked into the Feb-2026 ISO (zfs-dkms 2.3.3) while linux-lts is pulled fresh (6.18.36). 2.3.3 doesn't build against 6.18. velox runs zfs-dkms 2.4.2 on the same kernel from the same channel, so the fix exists upstream — archangel just needs to refresh the db before pacstrap (+ a fresh ISO). Bug + dependency handoff sent to archangel inbox (=2026-06-25-1236-from-archsetup-bug-zfs-install-fails-stale-baked.org=). Retry P-C once a fixed archangel ISO is available. P-D (bare-metal migration code) is still workable in the meantime against the btrfs VM / velox. - -*** 2026-06-25 Thu @ 16:05:07 -0400 archangel unblocked; ZFS base built; 3 archsetup bugs fixed (local); re-run paused -archangel shipped the fix (archangel =89691a0=: =pacman -Syy= before pacstrap) + rebuilt the ISO. With it, =create-base-vm.sh FS_PROFILE=zfs= built a verified ZFS-root base (=archsetup-base-zfs.qcow2=, clean-install snapshot, kernel 6.18.36). =make test FS_PROFILE=zfs= then surfaced three real archsetup bugs against the current archangel base, each fixed in a LOCAL (unpushed) commit: -- =8ed42b9= informant: the base ships informant; its pacman PreTransaction hook (AbortOnFail) blocked archsetup's first transaction. Fix: =informant read --all= up front (guarded). PROVEN. -- =66caeb5= pacman.conf perms: the base ships =/etc/pacman.conf= 0600 (archangel =strip_repo_stanza= mktemp+mv clobbers perms), breaking user =makepkg=/=yay=. Fix: =chmod 644= after archsetup's edits. PROVEN (run reached 75 min deep). -- =05ec096= reflector: archsetup configured reflector's timer but never ran it, so installs used the base's 425-mirror worldwide list and pacman stalled ~15 min on a slow/unresponsive mirror. Fix: run reflector once before the heavy installs (=timeout=-bounded, non-fatal). NOT yet integration-proven — the next re-run validates it. -Second archangel handoff sent for the pacman.conf-0600 root cause (=2026-06-25-1440-...=); archsetup's chmod is defensive, archangel should ship 0644. Paused before the re-run at Craig's request (he starts =sudo make test FS_PROFILE=zfs= from the laptop). Possible harness-side factor on the stall: slirp IPv6 blackholing (one stalled conn was IPv6) — watch if it recurs despite reflector. - -*** 2026-06-25 Thu @ 21:56:12 -0400 P-C GREEN — ZFS VM test path passes end to end -=make test FS_PROFILE=zfs= PASSED: archsetup exit 0 (full ~68-min ZFS install, reflector held — no stall), pytest =95 passed, 0 failed, 11 skipped=. The ZFS-conditional checks now run the ZFS branch instead of skipping: =test_bootloader_installed= (ZFSBootMenu EFI binary at /efi/EFI/ZBM), =test_mkinitcpio_hooks= (zfs udev hook), =test_console_font_configured= (vconsole.conf), =test_zfs_has_sanoid= all PASS; =test_backup_created_for_mkinitcpio= correctly SKIPs (ZFS+virtio edits nothing). The 3 archsetup issues (gamemode, mu, signal-cli AUR) are the known non-critical residuals, same as on btrfs. Four commits pushed to main: =8ed42b9= informant news-hook, =66caeb5= pacman.conf 0644, =05ec096= reflector-during-install, =eb379c3= ZFS-aware boot/backup tests. P-C (ZFS coverage, design phases A-C) is DONE. Remaining on this task: P-D (migrate run-test-baremetal.sh to inject_root_key + run_testinfra_validation) and P-E (delete the dead validation.sh shell sweep). -*** 2026-06-25 Thu @ 23:26:02 -0400 P-D + P-E done — whole epic closed -P-D (=771b92e=): migrated =run-test-baremetal.sh= to key auth + Testinfra. =inject_root_key= generalized to =root@$VM_IP= (vm-utils) so it serves both runners; the bare-metal runner now injects the key after the genesis rollback, threads =SSH_KEY_OPT= + a new =--port= through every ssh/scp, and validates via =run_testinfra_validation= instead of the shell sweep. Follow-up fix =fb495d4=: =set +e= around the validator (it returns pytest's rc, which under =set -e= aborted before the report) — caught by the smoke test. Validated against the ZFS VM (=--validate-only=, localhost:2222): connectivity, ZFS check, key auth, Testinfra connect+run, report all work; a green bare-metal install still needs real ZFS hardware. - -P-E (=a4a339b=): deleted the dead shell sweep from =validation.sh= now both runners use Testinfra — run_all_validations, validate_all_services, run_full_validation, the ~35 validate_* checks, validation_pass/fail/warn/skip. Kept the live helpers (ssh_cmd, attribute_issue, capture_pre/post_install_state, analyze_log_diff, categorize_errors, generate_issue_report, VALIDATION_* counters + arrays). 1156 → 314 lines. Verified: no dangling refs, both runners parse + smoke-run clean, unit suite green. - -Known follow-ups (not blockers): (1) archangel still owes the pacman.conf-0600 root-cause fix (handoff in its inbox; archsetup's chmod is the defensive layer). (2) The bare-metal runner runs =bash archsetup= with no --config-file — pre-existing, would prompt on real hardware; out of this epic's scope. (3) A true green bare-metal run needs real ZFS hardware (ratio). - -** DONE [#B] Implement Testinfra test suite for archsetup -CLOSED: [2026-06-25 Thu] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-24 -:END: -*** 2026-06-25 Thu @ Final fresh make test GREEN — Testinfra is the validator -=make test= (fresh build, 150-min cap) PASSED: =TEST PASSED=, =Validation: PASSED=, pytest =96 passed, 10 skipped, 0 failed, 0 errors=, pytest as the authoritative gate. ParallelDownloads now =10= on the fixed build. End-state: the VM test runner validates post-install via the Testinfra/pytest sweep (=scripts/testing/tests/=, 88 tests + conftest fixtures) — full parity with the old shell sweep plus expansion coverage (sshd hardening, =backup_system_file= .bak files, applied pacman/makepkg/NM/fail2ban/reflector config). Three real bugs surfaced + fixed by this work: (1) the 2026-06-24 sshd hardening had silently broken =make test= (root password SSH died mid-run → key auth, f50fc1d); (2) =ParallelDownloads= stuck at Arch's default 5 (sed only matched the commented form → fixed, 2d63802); (3) install monitor cap too tight at 90 min (→ 150, fe84b71). Follow-up filed: migrate =run-test-baremetal.sh= off the shell sweep, then delete the dead =validation.sh= functions (P5). -*** 2026-06-25 Thu @ Decision: port to Testinfra + expand coverage, design doc first -Reviewed against the existing harness: =scripts/testing/lib/validation.sh= already runs ~14 post-install checks (=run_all_validations=), so this isn't net-new capability — it's porting that shell validation to Testinfra/pytest for better expressiveness + reporting, then growing coverage. Craig's call (prioritizes test investment over feature speed): do the port and expand. Starting with a design doc in =docs/design/= per the task's own "design doc not yet written" note. Stale slice to drop/rescope: the X11/startx end-to-end tests (fleet is Wayland/Hyprland now). -*** 2026-06-25 Thu @ 00:54:22 -0400 P1 scaffold landed (advisory, alongside shell sweep) -Built the Testinfra harness skeleton: =scripts/testing/tests/= (conftest.py with the attribution marker + report hook + =target_user= fixture; 3 parity checks — user exists/shell, ufw enabled, dotfiles stowed+readable), =scripts/testing/lib/testinfra.sh= (=run_testinfra_validation=: ephemeral-key injection, ssh-config, pytest-over-SSH; advisory + non-fatal, =RUN_TESTINFRA= toggle), wired into run-test.sh after the shell sweep, and added =python-pytest python-pytest-testinfra= to =make deps=. Verified on host: py_compile clean, =pytest --collect-only= green in a throwaway venv (4 tests, fixtures resolve), =bash -n= + shellcheck clean, unit suite still green. Integration (the pytest sweep actually running against a VM) is unverified here — needs a =make test= run. Decisions locked: inject test key; run both through parity; full expansion (P4) in this task after the P3 cutover. -*** 2026-06-25 Thu @ 01:12:09 -0400 P2 full parity port (88 tests) -Ported the whole shell sweep to pytest: test_users (exists/shell/15 groups parametrized), test_packages (yay+functional, pacman, terminus-font, emacs+config readable, git, 5 dev tools), test_services (required enabled/active, enabled-only, timers, optional skip-if-absent, DoT drop-in, fail2ban/nmcli responds, log-cleanup cron, syncthing lingering, DNS/mDNS/docker skips), test_desktop (Hyprland tools+configs+portal+socket gated on install/compositor, DWM suckless, autologin), test_boot (grub, mkinitcpio hooks branched on zfs_root, console-font-in-initramfs, nvme gated, zfs/sanoid), test_keyring (dir 700/owner/default=login), test_archsetup (log no Error:, ≥12 state markers). conftest fixtures: target_user/home/zfs_root/has_nvme/hyprland_installed/dwm_installed/compositor_running/on_slirp. 88 tests collected, py_compile clean. Correctness fix vs the shell sweep: check =awww= not the stale =swww=. Installed python-pytest-testinfra on velox so the harness gate passes. Next: VM run to diff pytest vs shell sweep for parity. -*** 2026-06-25 Thu @ 01:24:11 -0400 Fixed: sshd hardening had silently broken =make test= -VM run #1 aborted ~6 min in (Error 5), before any validation ran. Root cause (pre-existing, not the Testinfra work): the 2026-06-24 sshd hardening sets =PermitRootLogin prohibit-password= + reloads sshd mid-install, and the harness SSHes as root by *password* throughout — so every op after that step got "Permission denied" and run-test.sh fataled before validations. Fix: =inject_root_key= authorizes a throwaway root key right after first SSH (before archsetup runs) and all helpers (=wait_for_ssh=/=vm_exec=/=copy_to_vm=/=copy_from_vm=/=ssh_cmd=) gained =$SSH_KEY_OPT= so they use key auth, which =prohibit-password= still allows. testinfra.sh reuses that key. Additive (password stays as fallback). bash -n + shellcheck clean. Re-running the VM suite to confirm it now reaches the validation + pytest phases. -*** 2026-06-25 Thu @ 03:33:33 -0400 Parity proven + P4 expansion validated on a live VM -VM run #3 (=make test-keep=, kept VM up): pytest parity = 78 passed / 10 skipped / 0 fail / 0 err — matches & exceeds the shell sweep (53/0/0). Then built P4 expansion against the live VM (iterating in ~30s, no rebuild): test_hardening (sshd prohibit-password, sysctl printk, /etc/issue emptied, vconsole font, /efi fmask), test_config_applied (pacman ParallelDownloads/Color/multilib, makepkg MAKEFLAGS/OPTIONS, NM dns+wifi-privacy drop-ins, fail2ban jail, reflector), test_backups (=.archsetup.bak= present for pacman.conf/makepkg.conf/sudoers/mkinitcpio.conf — end-to-end proof of the backup feature). Full suite vs live VM: 95 passed / 10 skipped / 1 fail. The 1 fail = a REAL archsetup bug the tests caught: =ParallelDownloads= stayed at the Arch default 5 because the sed only matched a commented =#ParallelDownloads=, but current Arch ships it uncommented — fixed the sed to match both (=^#\?ParallelDownloads=). Also fixed a test bug (=grep -qx '[multilib]'= → =grep -Fxq=, the brackets were a regex char class). Remaining: P3 cutover (pytest authoritative) + P5 retire shell sweep, then a final fresh =make test=. -*** 2026-06-25 Thu @ 03:38:28 -0400 P3 cutover: Testinfra is now the authoritative validator -run-test.sh dropped the =run_all_validations= + =validate_all_services= shell-sweep calls; =run_testinfra_validation= now drives =TEST_PASSED= (returns pytest's rc; "couldn't run" = fail, not a silent pass). It surfaces pytest's pass/skip/fail counts through the shared =VALIDATION_*= counters and parses =testinfra-attribution.txt= into the issue arrays so =generate_issue_report= still buckets failures archsetup/base/unknown. Validated the failure path against the still-up VM: pytest rc=1, failure correctly bucketed to [archsetup]. P5 (physically delete the dead shell-sweep functions) is NOT done here — =run-test-baremetal.sh= still calls =run_all_validations=/=validate_all_services=, so deletion must wait until the bare-metal runner is migrated too (filed below). Final step: fresh =make test= to confirm the pass path (ParallelDownloads now 10) with pytest as the gate. -*** 2026-06-25 Thu @ 08:35:26 -0400 Final run hit the harness 90-min install cap (not a regression) -The fresh =make test= timed out at 9/12 steps while building =vagrant= from AUR (=ARCHSETUP timed out after 90 minutes=, exit 124), so validation ran against a half-installed system → 10 pytest failures, all late-step (issue/sysctl/vconsole/mkinitcpio/docker/state-markers). The suite worked correctly — it caught an incomplete install. Verified my ParallelDownloads sed is clean (no pacman corruption) and archsetup logged 0 errors. Root cause: =MAX_POLLS=180= (90 min) is too tight for a full install with heavy AUR builds; bumped to 300 (150 min). Re-running. -Create comprehensive integration tests using Testinfra (Python + pytest) to validate archsetup installations - -Tests should cover: -- Smoke tests: user created, key packages installed, dotfiles present -- Integration tests: services running, configs valid, X11 starts, apps launch -- End-to-end tests: login as user, startx, open terminal, run emacs, verify workflows - -Framework: Testinfra with pytest (SSH-native, built-in modules for files/packages/services/commands) -Location: scripts/testing/tests/ directory -Integration: Run via pytest against test VMs after archsetup completes -Benefits: Expressive Python tests, excellent reporting, can test interactive scenarios - -A design doc (not yet written) should cover: -- Complete example test suite (test_integration.py) -- Tiered testing strategy (smoke/integration/end-to-end) -- How to run tests and integrate with run-test.sh -- Comparison with alternatives (Goss) - ** TODO [#B] Set up automated test schedule :PROPERTIES: :LAST_REVIEWED: 2026-05-21 @@ -788,6 +720,33 @@ Parse yay errors and provide specific, actionable fixes instead of generic error Enhance existing indicators to show what's happening in real-time ** TODO Manual testing and validation +*** Live-update guard aborts a GPU/compositor upgrade while Hyprland runs +What we're verifying: the pacman PreTransaction hook =hypr-live-update-guard= aborts a =-Syu= that swaps GPU/compositor libs while Hyprland is live, and stays quiet once the session is stopped. Unit tests cover the script's decision logic; this confirms pacman parses the hook, feeds the matched targets on stdin (=NeedsTargets=), and =AbortOnFail= actually stops the transaction. Run on a Hyprland box (ratio/velox). +- Prereq on machines installed before this shipped: place the guard if missing (a fresh archsetup install does this in the hyprland step). +#+begin_src sh :results output +if [ ! -e /usr/local/bin/hypr-live-update-guard ]; then + sudo cp ~/code/archsetup/scripts/hypr-live-update-guard /usr/local/bin/ && sudo chmod 755 /usr/local/bin/hypr-live-update-guard +fi +sudo cp ~/code/archsetup/scripts/hypr-live-update-guard /usr/local/bin/ # refresh +ls -l /usr/local/bin/hypr-live-update-guard /etc/pacman.d/hooks/hypr-live-update-guard.hook 2>&1 +#+end_src +- Quick contract check (no pending upgrade needed): feed the script the hook's stdin contract with Hyprland running. +#+begin_src sh :results output +printf 'mesa\nhyprland\n' | /usr/local/bin/hypr-live-update-guard; echo "exit=$?" +#+end_src +Expected: exit=1, plus the BLOCKED banner naming mesa/hyprland and the from-a-TTY remedy. +- Real firing inside pacman: with a mesa/hyprland/wayland/GPU-driver upgrade actually pending AND Hyprland running, run the upgrade. +#+begin_src sh :results output +sudo pacman -Syu +#+end_src +Expected: pacman runs the "Checking for a live Hyprland session..." hook and aborts; no packages upgraded. +- The from-a-TTY path: the guard keys off the Hyprland *process*, so switching VTs while Hyprland still runs does NOT clear it (correct -- the session is still vulnerable). Fully log out of Hyprland (or =hyprctl dispatch exit=) so no Hyprland process remains, then from the console/display-manager run the upgrade again. +Expected: the guard stays quiet and the upgrade completes. +- Override while running (escape hatch): +#+begin_src sh :results output +sudo touch /run/archsetup-allow-live-gpu-update && echo "sentinel set" +#+end_src +Expected: with the sentinel present, =sudo pacman -Syu= proceeds despite Hyprland running. (The sentinel clears on reboot -- /run is tmpfs.) *** Wallpaper survives relogin (waypaper --restore) What we're verifying: the hyprland =exec-once= now runs =waypaper --restore= instead of a hardcoded =awww img=, so a wallpaper chosen via =set-wallpaper= / waypaper / dirvish persists across a relogin. The exec-once only fires at Hyprland startup, so this can't be confirmed without a real relogin. (Mechanism already verified: =waypaper --restore= applied the persisted wallpaper via the awww backend, exit 0.) - Set a wallpaper different from the current one (or pick one in waypaper, Super+Shift+P): @@ -851,39 +810,9 @@ A 2026-06-22 roam capture expands the scope past a passive indicator: the wifi m :END: From the roam inbox (2026-06-22): with Emacs integrated into the system as file manager and instant note-taker, make bouncing it trivial. A waybar component showing the emacs service status, with detail on hover, that turns the server on / off / bounce via right-click. Pairs with running the Emacs daemon as a managed systemd user service. -** TODO [#C] Collapse waybar sysmonitor to a single icon + hover :feature:waybar: -:PROPERTIES: -:LAST_REVIEWED: 2026-06-24 -:END: -From the roam inbox (2026-06-22): replace the spread-out sysmonitor readouts (temp, cpu, mem, storage) with one visible icon showing a single chosen metric, the rest in the hover tooltip. Open question: fold it into the battery component instead of a standalone module. Implementation lives in the waybar config under ~/.dotfiles. - -** DONE [#C] Proton Mail Bridge font size :chore:quick: -CLOSED: [2026-06-24 Wed] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-24 -:END: -From the roam inbox (2026-06-22): adjust the Proton Mail Bridge UI font to a comfortable size. The bridge is a Qt app, so it likely keys off Qt scaling or the qt5ct/qt6ct config like the other Qt apps (QT_SCALE_FACTOR or a font setting). - -Done 2026-06-24 (dotfiles =hyprland.conf:47=): the bridge is a Qt6 *QML* app, so it ignores the qt6ct General font — bumped the UI font via =QT_FONT_DPI= on the autostart instead. Changed the exec-once to =env QT_FONT_DPI=108 protonmail-bridge --no-window= (default DPI is 96; 108 = 1.125x). Iterated live with Craig: 120 too big, 108 comfortable. hyprland.conf is a stow symlink so the change is already live; applies at every login. The =~/.config/autostart/Proton Mail Bridge.desktop= entry is dormant under Hyprland (no XDG-autostart), so it was left as-is. - -** TODO [#C] Rename idle inhibitor to something more intuitive :chore:waybar: -:PROPERTIES: -:LAST_REVIEWED: 2026-06-24 -:END: -From the roam inbox (2026-06-24): the "idle inhibitor" name doesn't work as a mnemonic — something like "sleep" (i.e. "keep awake" / "no-sleep") would land better. Decide the new name, then rename across the touchpoints: the =custom/idle= waybar module, the keybind mnemonic, and the backing script names (=hypridle-toggle= / =waybar-idle= from the 2026-06-24 idle-inhibitor work). Needs Craig's call on the name first, so not solo. - ** TODO [#C] set-wallpaper detaches waypaper config from its stow symlink :bug:hyprland:quick: =set-wallpaper= persists with =mv "$tmp" "$CONFIG"=, which replaces the =~/.config/waypaper/config.ini= stow symlink with a real file. After the first run the live config is detached from =~/.dotfiles/hyprland/.config/waypaper/config.ini=, so a later =git pull= + restow won't update it and set-wallpaper changes never flow back to the repo. Fix: write in place rather than =mv= over the symlink — e.g. =cp "$tmp" "$CONFIG"= (follows the symlink to the real dotfiles file), or resolve the link target and write there. Lives in =~/.dotfiles/hyprland/.local/bin/set-wallpaper=; it has a test suite, so add a Boundary case for "CONFIG is a symlink". -** DONE [#C] Wallpaper login-restore is hardcoded, not waypaper --restore :hyprland:quick:solo: -CLOSED: [2026-06-24 Wed] -:PROPERTIES: -:LAST_REVIEWED: 2026-06-24 -:END: -The Hyprland =exec-once= (=hyprland.conf:26=) restores the wallpaper with a hardcoded =awww img ~/pictures/wallpaper/trondheim-norway.jpg=, so any wallpaper set later (via =set-wallpaper=, waypaper, or the dirvish =bg=) reverts on relogin. =set-wallpaper= now persists the choice to =waypaper/config.ini=, so switch the exec-once to =waypaper --restore= (after =awww-daemon= is up) to make set wallpapers survive a relogin. Small, dotfiles-only; verify by setting a different wallpaper, relogging, and confirming it sticks. - -Done 2026-06-24 (dotfiles): swapped the line-26 exec-once from the hardcoded =awww img …/trondheim-norway.jpg= to =awww-daemon & sleep 1 && waypaper --restore=. waypaper has a real =awww= backend (in its =--backend= list), the stowed =waypaper/config.ini= carries =backend = awww= plus a default =wallpaper == line, so =--restore= works on a fresh install too. Mechanism verified live: =waypaper --restore= reapplied the persisted wallpaper via awww, exit 0. Relogin confirmation filed under "Manual testing and validation". Follow-up filed: =set-wallpaper='s =mv= detached the live =waypaper/config.ini= from its stow symlink, so set-wallpaper changes no longer flow back to dotfiles. - * Archsetup Resolved ** DONE [#B] Full install logs should contain timestamps @@ -1465,3 +1394,142 @@ Findings (2026-06-24): the Wayland wallpaper utility on this setup is =awww= (wa Done 2026-06-24 (dotfiles 8be2484): added =set-wallpaper <image>= to the hyprland tier — sets live via =awww img= and persists the choice into =waypaper/config.ini=, the single Wayland-correct entry point. Resolves relative paths, validates the file, exits non-zero without persisting if awww fails. 8 Normal/Boundary/Error tests green; live-verified (awww set it, config rewrote). Notified =.emacs.d= to point the dirvish =bg= command at =set-wallpaper <file>= — that wiring is its piece (dependency cleared, =:blocker:= dropped). Follow-up (separate, small): the login restore =exec-once= in =hyprland.conf= is hardcoded to =trondheim-norway.jpg=, so a wallpaper set via =set-wallpaper= shows live but won't survive a relogin until the exec-once becomes =waypaper --restore= (which reads the now-persisted config). Filed below. +** DONE [#B] Add backup before system file modifications :solo: +CLOSED: [2026-06-25 Thu] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +Safety net for /etc/X11/xorg.conf.d and other system file edits +Files like ~/etc/sudoers~, ~/etc/pacman.conf~, ~/etc/default/grub~ modified without backup +If modifications fail or are incorrect, difficult to recover - should backup files to ~.backup~ before modifying + +Done 2026-06-25: added a =backup_system_file <path>= helper next to =safe_rm_rf= — it snapshots a pre-existing file to =<path>.archsetup.bak= before an in-place edit, idempotent (never clobbers an existing backup, so the pristine original survives repeated edits and re-runs), =cp -p= to preserve mode/ownership, no-op when the file is absent. Took the narrow scope (Craig's call): route only the in-place =sed -i= / append edits to *pre-existing* files through it — locale.gen, makepkg.conf, pacman.conf, sudoers, conf.d/wireless-regdom, geoclue.conf, conf.d/pacman-contrib, fstab, mkinitcpio.conf, vconsole.conf — and skip the brand-new drop-in files archsetup fully owns (nothing to back up; recovery is just deleting them). Tests: =tests/backup-system-file/= (7 Normal/Boundary/Error, incl. mode-preserved, existing-backup-not-overwritten, missing-target no-op, cp-failure). =make test-unit= green across all 5 suites; =bash -n= clean; only shellcheck note is the known SC2329 false positive (indirect STEPS dispatch). Integration verification is the next VM run. +** DONE [#B] Migrate bare-metal test runner to Testinfra, then delete the shell sweep :test: +CLOSED: [2026-06-25 Thu] +Plan + ZFS-coverage expansion: [[file:docs/design/2026-06-25-zfs-vm-test-coverage.org]] (build a ZFS base VM via archangel + a =FS_PROFILE= selector so =make test= covers the ZFS path, then migrate this runner to key auth + Testinfra against it, then delete the dead =validation.sh= functions = phase E here). +=run-test.sh= (VM) now uses the Testinfra/pytest sweep as its authoritative validator, but =run-test-baremetal.sh= (lines ~243-244) still calls the old =run_all_validations= / =validate_all_services= from =scripts/testing/lib/validation.sh=. Migrate the bare-metal runner to =run_testinfra_validation= too (same key + ssh-config approach, adapted for a real host), then delete the now-dead shell-sweep functions from =validation.sh=. Keep the live helpers: =ssh_cmd=, =attribute_issue=, =capture_pre/post_install_state=, =analyze_log_diff=, =categorize_errors=, =generate_issue_report=, and the =VALIDATION_*= counters/arrays. Deferred from the Testinfra cutover because it needs a bare-metal test loop to validate, out of scope for the VM-only autonomous run. +*** 2026-06-25 Thu @ 12:37:02 -0400 P-A/P-B shipped (FS_PROFILE selector); P-C blocked on archangel ZFS-install bug +P-A + P-B landed in =353b179=: =archsetup-test-zfs.conf= (archangel ZFS config) + an =FS_PROFILE= (btrfs default / zfs) selector across =vm-utils.sh= (=init_vm_paths= derives a per-profile image + validates the profile), =create-base-vm.sh= (selects the archangel config), =run-test.sh= (--help + profile display), and the Makefile (=make test FS_PROFILE=zfs=). Design simplification recorded: no =archsetup-vm-zfs.conf= needed — archsetup auto-detects ZFS from the live root via =is_zfs_root()=, so the archsetup run config is shared; only the archangel base config + base image differ. Open Q1 resolved: archangel supports ZFS root natively (it's the default FS). + +P-C (build the ZFS base image) is BLOCKED on archangel. =create-base-vm.sh FS_PROFILE=zfs= built the disk + booted the archangel ISO fine, but the archangel install died: =dkms install zfs/2.3.3 -k 6.18.36-1-lts= exited 1, ZFS module not built. Root cause is in archangel, not archsetup: it appends the [archzfs] experimental repo then runs =pacstrap -K= with no =pacman -Sy= refresh, so it uses the archzfs sync db baked into the Feb-2026 ISO (zfs-dkms 2.3.3) while linux-lts is pulled fresh (6.18.36). 2.3.3 doesn't build against 6.18. velox runs zfs-dkms 2.4.2 on the same kernel from the same channel, so the fix exists upstream — archangel just needs to refresh the db before pacstrap (+ a fresh ISO). Bug + dependency handoff sent to archangel inbox (=2026-06-25-1236-from-archsetup-bug-zfs-install-fails-stale-baked.org=). Retry P-C once a fixed archangel ISO is available. P-D (bare-metal migration code) is still workable in the meantime against the btrfs VM / velox. + +*** 2026-06-25 Thu @ 16:05:07 -0400 archangel unblocked; ZFS base built; 3 archsetup bugs fixed (local); re-run paused +archangel shipped the fix (archangel =89691a0=: =pacman -Syy= before pacstrap) + rebuilt the ISO. With it, =create-base-vm.sh FS_PROFILE=zfs= built a verified ZFS-root base (=archsetup-base-zfs.qcow2=, clean-install snapshot, kernel 6.18.36). =make test FS_PROFILE=zfs= then surfaced three real archsetup bugs against the current archangel base, each fixed in a LOCAL (unpushed) commit: +- =8ed42b9= informant: the base ships informant; its pacman PreTransaction hook (AbortOnFail) blocked archsetup's first transaction. Fix: =informant read --all= up front (guarded). PROVEN. +- =66caeb5= pacman.conf perms: the base ships =/etc/pacman.conf= 0600 (archangel =strip_repo_stanza= mktemp+mv clobbers perms), breaking user =makepkg=/=yay=. Fix: =chmod 644= after archsetup's edits. PROVEN (run reached 75 min deep). +- =05ec096= reflector: archsetup configured reflector's timer but never ran it, so installs used the base's 425-mirror worldwide list and pacman stalled ~15 min on a slow/unresponsive mirror. Fix: run reflector once before the heavy installs (=timeout=-bounded, non-fatal). NOT yet integration-proven — the next re-run validates it. +Second archangel handoff sent for the pacman.conf-0600 root cause (=2026-06-25-1440-...=); archsetup's chmod is defensive, archangel should ship 0644. Paused before the re-run at Craig's request (he starts =sudo make test FS_PROFILE=zfs= from the laptop). Possible harness-side factor on the stall: slirp IPv6 blackholing (one stalled conn was IPv6) — watch if it recurs despite reflector. + +*** 2026-06-25 Thu @ 21:56:12 -0400 P-C GREEN — ZFS VM test path passes end to end +=make test FS_PROFILE=zfs= PASSED: archsetup exit 0 (full ~68-min ZFS install, reflector held — no stall), pytest =95 passed, 0 failed, 11 skipped=. The ZFS-conditional checks now run the ZFS branch instead of skipping: =test_bootloader_installed= (ZFSBootMenu EFI binary at /efi/EFI/ZBM), =test_mkinitcpio_hooks= (zfs udev hook), =test_console_font_configured= (vconsole.conf), =test_zfs_has_sanoid= all PASS; =test_backup_created_for_mkinitcpio= correctly SKIPs (ZFS+virtio edits nothing). The 3 archsetup issues (gamemode, mu, signal-cli AUR) are the known non-critical residuals, same as on btrfs. Four commits pushed to main: =8ed42b9= informant news-hook, =66caeb5= pacman.conf 0644, =05ec096= reflector-during-install, =eb379c3= ZFS-aware boot/backup tests. P-C (ZFS coverage, design phases A-C) is DONE. Remaining on this task: P-D (migrate run-test-baremetal.sh to inject_root_key + run_testinfra_validation) and P-E (delete the dead validation.sh shell sweep). +*** 2026-06-25 Thu @ 23:26:02 -0400 P-D + P-E done — whole epic closed +P-D (=771b92e=): migrated =run-test-baremetal.sh= to key auth + Testinfra. =inject_root_key= generalized to =root@$VM_IP= (vm-utils) so it serves both runners; the bare-metal runner now injects the key after the genesis rollback, threads =SSH_KEY_OPT= + a new =--port= through every ssh/scp, and validates via =run_testinfra_validation= instead of the shell sweep. Follow-up fix =fb495d4=: =set +e= around the validator (it returns pytest's rc, which under =set -e= aborted before the report) — caught by the smoke test. Validated against the ZFS VM (=--validate-only=, localhost:2222): connectivity, ZFS check, key auth, Testinfra connect+run, report all work; a green bare-metal install still needs real ZFS hardware. + +P-E (=a4a339b=): deleted the dead shell sweep from =validation.sh= now both runners use Testinfra — run_all_validations, validate_all_services, run_full_validation, the ~35 validate_* checks, validation_pass/fail/warn/skip. Kept the live helpers (ssh_cmd, attribute_issue, capture_pre/post_install_state, analyze_log_diff, categorize_errors, generate_issue_report, VALIDATION_* counters + arrays). 1156 → 314 lines. Verified: no dangling refs, both runners parse + smoke-run clean, unit suite green. + +Known follow-ups (not blockers): (1) archangel still owes the pacman.conf-0600 root-cause fix (handoff in its inbox; archsetup's chmod is the defensive layer). (2) The bare-metal runner runs =bash archsetup= with no --config-file — pre-existing, would prompt on real hardware; out of this epic's scope. (3) A true green bare-metal run needs real ZFS hardware (ratio). +** DONE [#B] Implement Testinfra test suite for archsetup +CLOSED: [2026-06-25 Thu] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +*** 2026-06-25 Thu @ Final fresh make test GREEN — Testinfra is the validator +=make test= (fresh build, 150-min cap) PASSED: =TEST PASSED=, =Validation: PASSED=, pytest =96 passed, 10 skipped, 0 failed, 0 errors=, pytest as the authoritative gate. ParallelDownloads now =10= on the fixed build. End-state: the VM test runner validates post-install via the Testinfra/pytest sweep (=scripts/testing/tests/=, 88 tests + conftest fixtures) — full parity with the old shell sweep plus expansion coverage (sshd hardening, =backup_system_file= .bak files, applied pacman/makepkg/NM/fail2ban/reflector config). Three real bugs surfaced + fixed by this work: (1) the 2026-06-24 sshd hardening had silently broken =make test= (root password SSH died mid-run → key auth, f50fc1d); (2) =ParallelDownloads= stuck at Arch's default 5 (sed only matched the commented form → fixed, 2d63802); (3) install monitor cap too tight at 90 min (→ 150, fe84b71). Follow-up filed: migrate =run-test-baremetal.sh= off the shell sweep, then delete the dead =validation.sh= functions (P5). +*** 2026-06-25 Thu @ Decision: port to Testinfra + expand coverage, design doc first +Reviewed against the existing harness: =scripts/testing/lib/validation.sh= already runs ~14 post-install checks (=run_all_validations=), so this isn't net-new capability — it's porting that shell validation to Testinfra/pytest for better expressiveness + reporting, then growing coverage. Craig's call (prioritizes test investment over feature speed): do the port and expand. Starting with a design doc in =docs/design/= per the task's own "design doc not yet written" note. Stale slice to drop/rescope: the X11/startx end-to-end tests (fleet is Wayland/Hyprland now). +*** 2026-06-25 Thu @ 00:54:22 -0400 P1 scaffold landed (advisory, alongside shell sweep) +Built the Testinfra harness skeleton: =scripts/testing/tests/= (conftest.py with the attribution marker + report hook + =target_user= fixture; 3 parity checks — user exists/shell, ufw enabled, dotfiles stowed+readable), =scripts/testing/lib/testinfra.sh= (=run_testinfra_validation=: ephemeral-key injection, ssh-config, pytest-over-SSH; advisory + non-fatal, =RUN_TESTINFRA= toggle), wired into run-test.sh after the shell sweep, and added =python-pytest python-pytest-testinfra= to =make deps=. Verified on host: py_compile clean, =pytest --collect-only= green in a throwaway venv (4 tests, fixtures resolve), =bash -n= + shellcheck clean, unit suite still green. Integration (the pytest sweep actually running against a VM) is unverified here — needs a =make test= run. Decisions locked: inject test key; run both through parity; full expansion (P4) in this task after the P3 cutover. +*** 2026-06-25 Thu @ 01:12:09 -0400 P2 full parity port (88 tests) +Ported the whole shell sweep to pytest: test_users (exists/shell/15 groups parametrized), test_packages (yay+functional, pacman, terminus-font, emacs+config readable, git, 5 dev tools), test_services (required enabled/active, enabled-only, timers, optional skip-if-absent, DoT drop-in, fail2ban/nmcli responds, log-cleanup cron, syncthing lingering, DNS/mDNS/docker skips), test_desktop (Hyprland tools+configs+portal+socket gated on install/compositor, DWM suckless, autologin), test_boot (grub, mkinitcpio hooks branched on zfs_root, console-font-in-initramfs, nvme gated, zfs/sanoid), test_keyring (dir 700/owner/default=login), test_archsetup (log no Error:, ≥12 state markers). conftest fixtures: target_user/home/zfs_root/has_nvme/hyprland_installed/dwm_installed/compositor_running/on_slirp. 88 tests collected, py_compile clean. Correctness fix vs the shell sweep: check =awww= not the stale =swww=. Installed python-pytest-testinfra on velox so the harness gate passes. Next: VM run to diff pytest vs shell sweep for parity. +*** 2026-06-25 Thu @ 01:24:11 -0400 Fixed: sshd hardening had silently broken =make test= +VM run #1 aborted ~6 min in (Error 5), before any validation ran. Root cause (pre-existing, not the Testinfra work): the 2026-06-24 sshd hardening sets =PermitRootLogin prohibit-password= + reloads sshd mid-install, and the harness SSHes as root by *password* throughout — so every op after that step got "Permission denied" and run-test.sh fataled before validations. Fix: =inject_root_key= authorizes a throwaway root key right after first SSH (before archsetup runs) and all helpers (=wait_for_ssh=/=vm_exec=/=copy_to_vm=/=copy_from_vm=/=ssh_cmd=) gained =$SSH_KEY_OPT= so they use key auth, which =prohibit-password= still allows. testinfra.sh reuses that key. Additive (password stays as fallback). bash -n + shellcheck clean. Re-running the VM suite to confirm it now reaches the validation + pytest phases. +*** 2026-06-25 Thu @ 03:33:33 -0400 Parity proven + P4 expansion validated on a live VM +VM run #3 (=make test-keep=, kept VM up): pytest parity = 78 passed / 10 skipped / 0 fail / 0 err — matches & exceeds the shell sweep (53/0/0). Then built P4 expansion against the live VM (iterating in ~30s, no rebuild): test_hardening (sshd prohibit-password, sysctl printk, /etc/issue emptied, vconsole font, /efi fmask), test_config_applied (pacman ParallelDownloads/Color/multilib, makepkg MAKEFLAGS/OPTIONS, NM dns+wifi-privacy drop-ins, fail2ban jail, reflector), test_backups (=.archsetup.bak= present for pacman.conf/makepkg.conf/sudoers/mkinitcpio.conf — end-to-end proof of the backup feature). Full suite vs live VM: 95 passed / 10 skipped / 1 fail. The 1 fail = a REAL archsetup bug the tests caught: =ParallelDownloads= stayed at the Arch default 5 because the sed only matched a commented =#ParallelDownloads=, but current Arch ships it uncommented — fixed the sed to match both (=^#\?ParallelDownloads=). Also fixed a test bug (=grep -qx '[multilib]'= → =grep -Fxq=, the brackets were a regex char class). Remaining: P3 cutover (pytest authoritative) + P5 retire shell sweep, then a final fresh =make test=. +*** 2026-06-25 Thu @ 03:38:28 -0400 P3 cutover: Testinfra is now the authoritative validator +run-test.sh dropped the =run_all_validations= + =validate_all_services= shell-sweep calls; =run_testinfra_validation= now drives =TEST_PASSED= (returns pytest's rc; "couldn't run" = fail, not a silent pass). It surfaces pytest's pass/skip/fail counts through the shared =VALIDATION_*= counters and parses =testinfra-attribution.txt= into the issue arrays so =generate_issue_report= still buckets failures archsetup/base/unknown. Validated the failure path against the still-up VM: pytest rc=1, failure correctly bucketed to [archsetup]. P5 (physically delete the dead shell-sweep functions) is NOT done here — =run-test-baremetal.sh= still calls =run_all_validations=/=validate_all_services=, so deletion must wait until the bare-metal runner is migrated too (filed below). Final step: fresh =make test= to confirm the pass path (ParallelDownloads now 10) with pytest as the gate. +*** 2026-06-25 Thu @ 08:35:26 -0400 Final run hit the harness 90-min install cap (not a regression) +The fresh =make test= timed out at 9/12 steps while building =vagrant= from AUR (=ARCHSETUP timed out after 90 minutes=, exit 124), so validation ran against a half-installed system → 10 pytest failures, all late-step (issue/sysctl/vconsole/mkinitcpio/docker/state-markers). The suite worked correctly — it caught an incomplete install. Verified my ParallelDownloads sed is clean (no pacman corruption) and archsetup logged 0 errors. Root cause: =MAX_POLLS=180= (90 min) is too tight for a full install with heavy AUR builds; bumped to 300 (150 min). Re-running. +Create comprehensive integration tests using Testinfra (Python + pytest) to validate archsetup installations + +Tests should cover: +- Smoke tests: user created, key packages installed, dotfiles present +- Integration tests: services running, configs valid, X11 starts, apps launch +- End-to-end tests: login as user, startx, open terminal, run emacs, verify workflows + +Framework: Testinfra with pytest (SSH-native, built-in modules for files/packages/services/commands) +Location: scripts/testing/tests/ directory +Integration: Run via pytest against test VMs after archsetup completes +Benefits: Expressive Python tests, excellent reporting, can test interactive scenarios + +A design doc (not yet written) should cover: +- Complete example test suite (test_integration.py) +- Tiered testing strategy (smoke/integration/end-to-end) +- How to run tests and integrate with run-test.sh +- Comparison with alternatives (Goss) +** DONE [#C] Proton Mail Bridge font size :chore:quick: +CLOSED: [2026-06-24 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +From the roam inbox (2026-06-22): adjust the Proton Mail Bridge UI font to a comfortable size. The bridge is a Qt app, so it likely keys off Qt scaling or the qt5ct/qt6ct config like the other Qt apps (QT_SCALE_FACTOR or a font setting). + +Done 2026-06-24 (dotfiles =hyprland.conf:47=): the bridge is a Qt6 *QML* app, so it ignores the qt6ct General font — bumped the UI font via =QT_FONT_DPI= on the autostart instead. Changed the exec-once to =env QT_FONT_DPI=108 protonmail-bridge --no-window= (default DPI is 96; 108 = 1.125x). Iterated live with Craig: 120 too big, 108 comfortable. hyprland.conf is a stow symlink so the change is already live; applies at every login. The =~/.config/autostart/Proton Mail Bridge.desktop= entry is dormant under Hyprland (no XDG-autostart), so it was left as-is. +** DONE [#C] Wallpaper login-restore is hardcoded, not waypaper --restore :hyprland:quick:solo: +CLOSED: [2026-06-24 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +The Hyprland =exec-once= (=hyprland.conf:26=) restores the wallpaper with a hardcoded =awww img ~/pictures/wallpaper/trondheim-norway.jpg=, so any wallpaper set later (via =set-wallpaper=, waypaper, or the dirvish =bg=) reverts on relogin. =set-wallpaper= now persists the choice to =waypaper/config.ini=, so switch the exec-once to =waypaper --restore= (after =awww-daemon= is up) to make set wallpapers survive a relogin. Small, dotfiles-only; verify by setting a different wallpaper, relogging, and confirming it sticks. + +Done 2026-06-24 (dotfiles): swapped the line-26 exec-once from the hardcoded =awww img …/trondheim-norway.jpg= to =awww-daemon & sleep 1 && waypaper --restore=. waypaper has a real =awww= backend (in its =--backend= list), the stowed =waypaper/config.ini= carries =backend = awww= plus a default =wallpaper == line, so =--restore= works on a fresh install too. Mechanism verified live: =waypaper --restore= reapplied the persisted wallpaper via awww, exit 0. Relogin confirmation filed under "Manual testing and validation". Follow-up filed: =set-wallpaper='s =mv= detached the live =waypaper/config.ini= from its stow symlink, so set-wallpaper changes no longer flow back to dotfiles. +** DONE [#B] VM test harness shared one NVRAM file across filesystem profiles :bug:test: +CLOSED: [2026-06-27 Sat] +The harness shared one OVMF NVRAM file (=vm-images/OVMF_VARS.fd=) across the btrfs +and zfs profiles (=init_vm_paths= suffixed the disk image per profile but not the +NVRAM). NVRAM lives outside the qcow2, so a disk-snapshot revert can't restore it, +and a zfs run's ZFSBootMenu boot entries clobbered the btrfs GRUB entry. With no +removable =\EFI\BOOT\BOOTX64.EFI= fallback on the base ESP, the next btrfs run +booted into UEFI with no bootable device ("BdsDxe: No bootable option or device +was found", then PXE/HTTP, then SSH timeout before archsetup ran). Found +2026-06-27 trying to VM-validate the installer refactor. + +Fixed: =OVMF_VARS= now carries the same per-profile suffix as the disk image +(=OVMF_VARS${img_suffix}.fd=) in =vm-utils.sh init_vm_paths=, so btrfs and zfs keep +separate NVRAM. Validated by a full green zfs run 2026-06-27 (ArchSetup exit 0, +Testinfra 96 passed / 0 failed). Remaining hardening tracked below. +** DONE [#B] Guard against live mesa/hyprland/wayland-runtime updates :hyprland: +CLOSED: [2026-06-28 Sun] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-09 +:END: +A live =pacman -Syu= that swaps mesa/hyprland/wayland runtime libs out from under a running Hyprland session can crash the compositor: the next GPU-lib call hits a now-"(deleted)" library and SIGABRTs, taking the Wayland clients down with it. Hit ratio 2026-06-07 (mesa 26.0.6 -> 26.1.2 + hyprland upgraded live; Hyprland SIGABRT took down awww/insync/emacs). Likely the driver behind ratio's high lifetime unsafe-shutdown ratio — a crashed compositor forces a hard reset. + +Shipped as a pacman PreTransaction hook rather than a wrapper, so it fires no matter how the upgrade is launched (pacman, yay, topgrade). =scripts/hypr-live-update-guard= aborts the transaction before any package is swapped when the GPU/compositor runtime set is being upgraded AND Hyprland is running, pointing the user to re-run from a TTY with the session stopped; it stays quiet when Hyprland isn't running (the safe from-a-TTY path). Override via =HYPR_ALLOW_LIVE_UPDATE=1= or by touching the sentinel file named in the abort message. archsetup installs the script to =/usr/local/bin= and the hook to =/etc/pacman.d/hooks/= in the hyprland path. Decision logic unit-tested (=tests/hypr-live-update-guard=, 9 cases). Live firing test filed under Manual testing and validation. Commits: archsetup (this session). +** DONE [#B] Collapsible waybar sides :waybar: +CLOSED: [2026-06-27 Sat] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-09 +:END: +Let either side of the waybar collapse horizontally to a minimal base set, toggled by a click. Each collapsible side carries a small triangle / arrowhead pointing toward the screen edge it collapses into (away from center). Clicking it collapses that side to its base set and flips the arrow to point back toward center; clicking again restores the full side. Same shape-changes-with-state idea as the auto-dim indicator. + +Spec (2026-06-19): [[file:assets/2026-06-19-collapsible-waybar-sides-spec.org]]. Spike that settled the mechanism: [[file:assets/2026-06-18-collapsible-waybar-sides-spike-findings.org]]. + +Decisions locked: right base set = date + worldclock + tray; left base set = menu + workspaces; per-side independent; host-agnostic (base set constant, full set is each host's existing config). Mechanism = config-swap + SIGUSR2 reload via an active-config copy in =$XDG_RUNTIME_DIR= (the CSS/state-file approach was disproven — GTK3 can't reflow-hide native modules). Lives in =~/.dotfiles/hyprland/=. + +Shipped per spec (dotfiles 804bef6): 3 TDD'd scripts (=waybar-active-config=, =waybar-collapse=, =waybar-arrow=; 22 cases), arrow modules wired into the config (left arrow innermost-left, right arrow innermost-right), CSS ×3, =$mod+[= / =$mod+]= keybinds, and =waybar-toggle= relaunch updated to load the active config so a crash preserves collapse state. Verified live: click, keybind, and per-side independence all work; expand round-trips exactly to canonical. +** DONE [#C] Collapse waybar sysmonitor to a single icon + hover :feature:waybar: +CLOSED: [2026-06-27 Sat] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +From the roam inbox (2026-06-22): replace the spread-out sysmonitor readouts (temp, cpu, mem, storage) with one visible icon showing a single chosen metric, the rest in the hover tooltip. Open question: fold it into the battery component instead of a standalone module. Implementation lives in the waybar config under ~/.dotfiles. + +Shipped as a standalone =custom/sysmon= module (Craig's call: host-dependent primary — battery on laptop, disk on desktop — rather than fold into battery, which is laptop-only). Backing script =waybar-sysmon= gathers cpu/temp/mem/disk/battery, shows the host-appropriate metric, rest in tooltip; 13-case TDD suite; removed the 5 native modules + their CSS across all 3 themes. Dotfiles be7469b. +** DONE [#C] Rename idle inhibitor to something more intuitive :chore:waybar: +CLOSED: [2026-06-27 Sat] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +From the roam inbox (2026-06-24): the "idle inhibitor" name doesn't work as a mnemonic — something like "sleep" (i.e. "keep awake" / "no-sleep") would land better. Decide the new name, then rename across the touchpoints: the =custom/idle= waybar module, the keybind mnemonic, and the backing script names (=hypridle-toggle= / =waybar-idle= from the 2026-06-24 idle-inhibitor work). Needs Craig's call on the name first, so not solo. + +Renamed to "caffeine" (Craig's call, 2026-06-27): =custom/caffeine= module, =waybar-caffeine= + =caffeine-toggle= scripts, tooltip "Caffeine: ON/OFF", CSS + test suites updated. Keybind stays =$mod+I= (=$mod+C= is hyprpicker). Shipped in dotfiles 8b45b51. |
