16 files changed, 953 insertions, 199 deletions
diff --git a/CLAUDE.md b/CLAUDE.md
index 3264692..b6b3f04 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -133,3 +133,7 @@ Full palette reference: `assets/color-themes/dupre/dupre-palette.org`
 
 - **VM tests run committed code, not your working tree.** `scripts/testing/run-test.sh` provisions the VM from `git bundle create <file> HEAD` (it simulates `git clone`), so an uncommitted edit to `archsetup` or the pytest suite silently runs the old code. Commit (even a throwaway WIP commit) before `make test FS_PROFILE=...`, or the change isn't exercised. (`gotcha` — 2026-06-25)
 - **Iterate the pytest sweep against a kept VM, not a reinstall.** `make test-keep FS_PROFILE=...` leaves the VM up after the install and writes `testinfra_ssh_config` + `root_key` into `test-results/<timestamp>/`. Point pytest at that ssh-config to re-run only the Testinfra checks in ~30s instead of a ~70-minute full reinstall. Use it when iterating test assertions, not installer logic. (`pattern` — 2026-06-25)
+- **VM UEFI NVRAM lives outside the qcow2 and must be per-profile.** OVMF boot entries live in the `OVMF_VARS` file, not the disk image, so reverting the `clean-install` snapshot does NOT restore them. The base ESPs have no removable `\EFI\BOOT\BOOTX64.EFI` fallback, so a base boots only via its NVRAM entry — lose or overwrite it and the VM dies in UEFI ("No bootable option") and SSH-times-out before archsetup runs. `init_vm_paths` now suffixes `OVMF_VARS` per `FS_PROFILE` (matching the disk image); never share one NVRAM file across btrfs/zfs. (`gotcha` — 2026-06-28)
+- **sed/awk function extraction breaks on column-0 `}` inside heredocs.** The `tests/` harness and any `/^name() {/,/^}/` extraction stop at the first line beginning with `}` — but a JSON heredoc body (e.g. the docker `daemon.json` in `developer_workstation`) has a column-0 `}` that is NOT the function's close. Find the real closing brace before slicing, or the bounds are silently wrong. (`gotcha` — 2026-06-28)
+- **AUR builds need ≥8 GiB VM RAM.** `makepkg` runs `-j$VM_CPUS`, and parallel `cc1plus` (~700 MB each on heavy C++ AUR packages) OOM-killed under the old 4 GiB `VM_RAM` default; the install still passed (yay retries) but the kills showed as attributed issues. Default is now 8192 MB. If you raise `VM_CPUS`, raise `VM_RAM` with it. (`threshold` — 2026-06-28)
+- **Guard live upgrades with a PreTransaction hook, not a wrapper.** `hypr-live-update-guard` is a pacman `PreTransaction` hook (`AbortOnFail` + `NeedsTargets`) so it fires no matter how the upgrade launches (pacman, yay, topgrade) and aborts before any package is swapped — the safe point, since nothing is replaced yet. A shell wrapper around `pacman` would be bypassed by the other front-ends. (`pattern` — 2026-06-28)
diff --git a/archsetup b/archsetup
index a7cce2c..7531821 100755
--- a/archsetup
+++ b/archsetup
@@ -595,6 +595,29 @@ display() {
 
 ### Installation Helpers
 
+# Describe-run-warn primitive. Announces a task, runs the command with
+# stdout+stderr appended to $logfile, and on failure logs a non-fatal
+# warning carrying the command's real exit code. Replaces the recurring
+#   action="desc" && display "task" "$action"
+#   cmd >> "$logfile" 2>&1 || error_warn "$action" "$?"
+# idiom with a single call:
+#   run_task "desc" cmd arg...
+run_task() {
+    local desc="$1"
+    shift
+    display "task" "$desc"
+    "$@" >> "$logfile" 2>&1 || error_warn "$desc" "$?"
+}
+
+# Enable one or more systemd units with the conventional wording.
+# Each unit is announced and warned independently via run_task.
+enable_service() {
+    local unit
+    for unit in "$@"; do
+        run_task "enabling $unit service" systemctl enable "$unit"
+    done
+}
+
 MAX_INSTALL_RETRIES=3
 retry_install() {
     local pkg="$1"
@@ -874,6 +897,13 @@ prerequisites() {
 
     display "title" "Prerequisites"
 
+    bootstrap_pacman_keyring
+    install_required_software
+    configure_build_environment
+    configure_package_mirrors
+}
+
+bootstrap_pacman_keyring() {
     display "subtitle" "Bootstrapping"
 
     # If the base ships informant (e.g. an archangel-installed system), it
@@ -912,6 +942,9 @@ prerequisites() {
     done
     $refresh_ok || error_fatal "$action" "$?"
 
+}
+
+install_required_software() {
     display "subtitle" "Required Software"
 
     for software in linux-firmware wireless-regdb base-devel ca-certificates \
@@ -920,6 +953,9 @@ prerequisites() {
         pacman_install "$software"
     done
 
+}
+
+configure_build_environment() {
     display "subtitle" "Environment Configuration"
 
     # configure locale (must happen before package installs that depend on locale)
@@ -974,6 +1010,9 @@ prerequisites() {
 
     pacman -Sy >> "$logfile" 2>&1
 
+}
+
+configure_package_mirrors() {
     action="Package Mirrors" && display "subtitle" "$action"
     pacman_install reflector
 
@@ -1045,7 +1084,7 @@ create_user() {
     mkdir -p "/home/$username/.cache/zsh/"
 
     # give $username sudo nopasswd rights (required for aur installs)
-    display "task" "granting permissions"
+    action="granting permissions" && display "task" "$action"
     backup_system_file /etc/sudoers
     (echo "%$username ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers) \
         || error_warn "$action" "$?"
@@ -1076,6 +1115,16 @@ create_user() {
 user_customizations() {
     action="User Customizations" && display "title" "$action"
 
+    clone_user_repos
+    stow_dotfiles
+    prune_waybar_battery
+    refresh_desktop_caches
+    configure_dconf_defaults
+    finalize_dotfiles
+    create_user_directories
+}
+
+clone_user_repos() {
     # Clone archsetup to user's home directory so dotfile symlinks are accessible.
     # This ensures symlinks point to a user-readable location regardless of how
     # archsetup was invoked (curl|bash, from /root, etc.)
@@ -1108,6 +1157,9 @@ user_customizations() {
     # root runs stow/restore against the user-owned clone; mark it safe.
     git config --global --add safe.directory "$dotfiles_dir" >> "$logfile" 2>&1 || true
 
+}
+
+stow_dotfiles() {
     # Stow the universal layer plus the per-environment layer. Headless installs
     # (none) get the standalone minimal/ tree instead of common/.
     case "$desktop_env" in
@@ -1138,6 +1190,9 @@ user_customizations() {
             ;;
     esac
 
+}
+
+prune_waybar_battery() {
     # Remove battery module from waybar config on desktops with no battery
     # (hyprland only — waybar isn't part of the dwm or minimal trees).
     if [[ "$desktop_env" == "hyprland" ]] && ! ls /sys/class/power_supply/BAT* &>/dev/null; then
@@ -1150,12 +1205,14 @@ user_customizations() {
         sed -i '/"battery": {/,/^    },$/d' "$waybar_config"
     fi
 
+}
+
+refresh_desktop_caches() {
     # install fontconfig before refreshing cache (provides fc-cache)
     pacman_install fontconfig
 
     # Refresh font cache for any fonts in dotfiles
-    action="refreshing font cache" && display "task" "$action"
-    fc-cache -f >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "refreshing font cache" fc-cache -f
 
     # install desktop-file-utils before updating database (provides update-desktop-database)
     pacman_install desktop-file-utils
@@ -1165,6 +1222,9 @@ user_customizations() {
     (sudo -u "$username" update-desktop-database "/home/$username/.local/share/applications" \
                                 >> "$logfile" 2>&1 ) || true
 
+}
+
+configure_dconf_defaults() {
     # GTK and GNOME desktop interface settings — read by GTK apps and
     # xdg-desktop-portal-gtk. Written as a system-wide dconf db rather than
     # per-user dbus-launch dconf writes: the system path needs no session
@@ -1193,6 +1253,9 @@ EOF
         dconf update
     ) >> "$logfile" 2>&1 || error_warn "$action" "$?"
 
+}
+
+finalize_dotfiles() {
     action="marking archsetup dir as safe.directory" && display "task" "$action"
     git config --global --add safe.directory "$user_archsetup_dir" >> "$logfile" 2>&1 \
         || error_warn "$action" "$?"
@@ -1202,9 +1265,11 @@ EOF
     # (e.g. the /etc/skel .bashrc/.bash_profile a fresh user starts with). Runs
     # for every desktop_env, including none — minimal/ ships those skel-colliding
     # files too, so its --adopt needs the same restore.
-    action="restoring dotfile versions" && display "task" "$action"
-    git -C "$dotfiles_dir" restore . >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "restoring dotfile versions" git -C "$dotfiles_dir" restore .
 
+}
+
+create_user_directories() {
     action="creating common directories" && display "task" "$action"
     # Create default directories and grant permissions
     {
@@ -1254,22 +1319,36 @@ aur_installer() {
 ### Essential Services
 essential_services() {
     display "title" "Essential Services"
+    configure_randomness
+    configure_networking
+    configure_power
+    configure_ssh_server
+    configure_fail2ban
+    configure_firewall
+    configure_service_discovery
+    configure_job_scheduling
+    configure_package_cache
+    configure_snapshots
+    configure_user_lingering
+}
+
+configure_randomness() {
 
     # Randomness
 
     display "subtitle" "Randomness"
     pacman_install rng-tools
-    action="enabling rngd service" && display "task" "$action"
-    systemctl enable rngd >> "$logfile" 2>&1  || error_warn "$action" "$?"
-    action="starting rngd service" && display "task" "$action"
-    systemctl start rngd >> "$logfile" 2>&1  || error_warn "$action" "$?"
+    enable_service rngd
+    run_task "starting rngd service" systemctl start rngd
+}
+
+configure_networking() {
 
     # Networking
 
     display "subtitle" "Networking"
     pacman_install networkmanager
-    action="enabling NetworkManager" && display "task" "$action"
-    systemctl enable NetworkManager.service >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "enabling NetworkManager" systemctl enable NetworkManager.service
 
     action="configuring MAC address randomization" && display "task" "$action"
     mkdir -p /etc/NetworkManager/conf.d
@@ -1319,28 +1398,29 @@ EOF
     # Note: If Docker containers have DNS issues, systemd-resolved's stub resolver
     # (127.0.0.53) may be the cause. Fix: configure Docker to use direct DNS, or
     # disable systemd-resolved and use /etc/resolv.conf directly. (2026-01-18)
-    action="enabling systemd-resolved" && display "task" "$action"
-    systemctl enable systemd-resolved >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "enabling systemd-resolved" systemctl enable systemd-resolved
 
     # Create resolv.conf symlink to systemd-resolved
-    action="linking resolv.conf to systemd-resolved" && display "task" "$action"
-    ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "linking resolv.conf to systemd-resolved" ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
+}
+
+configure_power() {
 
     # Power
 
     display "subtitle" "Power"
     pacman_install upower
-    action="enabling upower service" && display "task" "$action"
-    systemctl enable upower >> "$logfile" 2>&1  || error_warn "$action" "$?"
+    enable_service upower
+}
+
+configure_ssh_server() {
 
     # Secure Shell
 
     display "subtitle" "Secure Shell"
     pacman_install openssh
-    action="enabling the openssh service to run at boot" && display "task" "$action"
-    systemctl enable sshd >> "$logfile" 2>&1  || error_warn "$action" "$?"
-    action="starting the openssh service" && display "task" "$action"
-    systemctl start sshd >> "$logfile" 2>&1   || error_warn "$action" "$?"
+    run_task "enabling the openssh service to run at boot" systemctl enable sshd
+    run_task "starting the openssh service" systemctl start sshd
 
     action="hardening sshd (root login by key only)" && display "task" "$action"
     cat << 'EOF' > /etc/ssh/sshd_config.d/10-hardening.conf
@@ -1349,6 +1429,9 @@ EOF
 PermitRootLogin prohibit-password
 EOF
     systemctl reload sshd >> "$logfile" 2>&1 || error_warn "$action" "$?"
+}
+
+configure_fail2ban() {
 
     # SSH Brute Force Protection
 
@@ -1373,16 +1456,16 @@ maxretry = 3
 bantime = 1h
 EOF
 
-    action="enabling fail2ban service" && display "task" "$action"
-    systemctl enable fail2ban >> "$logfile" 2>&1 || error_warn "$action" "$?"
-    action="starting fail2ban service" && display "task" "$action"
-    systemctl start fail2ban >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    enable_service fail2ban
+    run_task "starting fail2ban service" systemctl start fail2ban
+}
+
+configure_firewall() {
 
     display "subtitle" "Firewall"
     pacman_install ufw
 
-    action="configuring ufw to deny by default" && display "task" "$action"
-    ufw default deny incoming >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "configuring ufw to deny by default" ufw default deny incoming
 
     # Firewall rules - only open ports for services we actually run
     for protocol in \
@@ -1410,11 +1493,9 @@ EOF
     action="rate-limiting SSH to protect from brute force attacks" && display "task" "$action"
     (ufw limit 22/tcp >> "$logfile" 2>&1) || error_warn "$action" "$?"
 
-    action="enabling firewall" && display "task" "$action"
-    ufw --force enable >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "enabling firewall" ufw --force enable
 
-    action="enabling firewall service to launch on boot" && display "task" "$action"
-    systemctl enable ufw.service >> "$logfile" 2>&1  || error_warn "$action" "$?"
+    run_task "enabling firewall service to launch on boot" systemctl enable ufw.service
 
     # Verify firewall is actually active
     # Note: In VM environments, UFW may show inactive due to missing kernel
@@ -1425,6 +1506,9 @@ EOF
         error_messages=("FIREWALL NOT ACTIVE - run: sudo ufw enable" "${error_messages[@]}")
         error_warn "$action" "1"
     fi
+}
+
+configure_service_discovery() {
 
     # Service Discovery
 
@@ -1436,17 +1520,14 @@ EOF
         display "task" "skipping avahi (already running)"
     else
         pacman_install avahi # service discovery on a local network using mdns
-        action="enabling avahi for mDNS discovery" && display "task" "$action"
-        systemctl enable avahi-daemon.service >> "$logfile" 2>&1 || error_warn "$action" "$?"
+        run_task "enabling avahi for mDNS discovery" systemctl enable avahi-daemon.service
     fi
 
     pacman_install wsdd
-    action="enabling wsdd for Windows network discovery" && display "task" "$action"
-    systemctl enable wsdd.service >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "enabling wsdd for Windows network discovery" systemctl enable wsdd.service
 
     pacman_install geoclue # geolocation service for location-aware apps
-    action="enabling geoclue geolocation service" && display "task" "$action"
-    systemctl enable geoclue.service >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "enabling geoclue geolocation service" systemctl enable geoclue.service
 
     # Enable BeaconDB as geoclue wifi location provider (default MLS/Ichnaea API is defunct)
     action="configuring geoclue to use BeaconDB location service" && display "task" "$action"
@@ -1476,37 +1557,52 @@ EOF
 After=systemd-sysusers.service
 EOF
 
+}
+
+configure_job_scheduling() {
     # Job Scheduling
 
     display "subtitle" "Job Scheduling"
     pacman_install cronie
-    action="enabling cronie to launch at boot" && display "task" "$action"
-    systemctl enable cronie  >> "$logfile" 2>&1  || error_warn "$action" "$?"
+    run_task "enabling cronie to launch at boot" systemctl enable cronie
     pacman_install at
-    action="enabling the batch delayed command scheduler" && display "task" "$action"
-    systemctl enable atd  >> "$logfile" 2>&1  || error_warn "$action" "$?"
+    run_task "enabling the batch delayed command scheduler" systemctl enable atd
 
     action="installing log cleanup cron job" && display "task" "$action"
     (sudo -u "$username" crontab -l 2>/dev/null; \
         echo "0 12 * * * \$HOME/.local/bin/cron/log-cleanup") \
         | sudo -u "$username" crontab - \
         >> "$logfile" 2>&1 || error_warn "$action" "$?"
+}
+
+configure_package_cache() {
 
     # Package Repository Cache Maintenance
 
     display "subtitle" "Package Repository Cache Maintenance"
     pacman_install pacman-contrib
-    action="enabling the package cache cleanup timer" && display "task" "$action"
-    systemctl enable --now paccache.timer  >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "enabling the package cache cleanup timer" systemctl enable --now paccache.timer
 
     action="configuring paccache to keep 3 versions" && display "task" "$action"
     backup_system_file /etc/conf.d/pacman-contrib
     sed -i 's/^PACCACHE_ARGS=.*/PACCACHE_ARGS=-k3/' /etc/conf.d/pacman-contrib
 
-    # Snapshot Service - filesystem-aware
+}
+
+configure_snapshots() {
+
     display "subtitle" "Snapshot Service"
 
     if is_zfs_root; then
+        configure_zfs_snapshots
+    elif is_btrfs_root; then
+        configure_btrfs_snapshots
+    else
+        display "task" "ext4/other filesystem detected"
+    fi
+}
+
+configure_zfs_snapshots() {
         # ZFS: Install sanoid for snapshot management
         display "task" "ZFS detected - installing sanoid"
         aur_install sanoid
@@ -1610,8 +1706,7 @@ Persistent=true
 WantedBy=timers.target
 EOF
 
-        action="enabling sanoid timer" && display "task" "$action"
-        systemctl enable sanoid.timer >> "$logfile" 2>&1 || error_warn "$action" "$?"
+        run_task "enabling sanoid timer" systemctl enable sanoid.timer
 
         action="enabling weekly ZFS scrub" && display "task" "$action"
         # Get pool name dynamically (usually zroot)
@@ -1623,7 +1718,9 @@ EOF
         #   systemctl enable --now zfs-replicate.timer
         display "task" "zfs-replicate timer created (enable after SSH key setup to TrueNAS)"
 
-    elif is_btrfs_root; then
+}
+
+configure_btrfs_snapshots() {
         # Btrfs: Install snapper for snapshot management
         display "task" "btrfs detected - installing snapper and grub-btrfs"
         pacman_install snapper
@@ -1665,16 +1762,13 @@ EOF
         snapper -c root set-config "TIMELINE_LIMIT_MONTHLY=1" >> "$logfile" 2>&1
         snapper -c root set-config "TIMELINE_LIMIT_YEARLY=0" >> "$logfile" 2>&1
 
-        action="enabling snapper timeline timer" && display "task" "$action"
-        systemctl enable snapper-timeline.timer >> "$logfile" 2>&1 || error_warn "$action" "$?"
+        run_task "enabling snapper timeline timer" systemctl enable snapper-timeline.timer
         systemctl enable snapper-cleanup.timer >> "$logfile" 2>&1 || error_warn "$action" "$?"
 
-        action="enabling grub-btrfsd for boot menu snapshots" && display "task" "$action"
-        systemctl enable grub-btrfsd >> "$logfile" 2>&1 || error_warn "$action" "$?"
+        run_task "enabling grub-btrfsd for boot menu snapshots" systemctl enable grub-btrfsd
 
         # Allow user to use snapper without root (required for snapper-gui)
-        action="allowing wheel group to use snapper" && display "task" "$action"
-        snapper -c root set-config "ALLOW_GROUPS=wheel" >> "$logfile" 2>&1 || error_warn "$action" "$?"
+        run_task "allowing wheel group to use snapper" snapper -c root set-config "ALLOW_GROUPS=wheel"
         snapper -c root set-config "SYNC_ACL=yes" >> "$logfile" 2>&1 || error_warn "$action" "$?"
         # Set ACL on .snapshots directory for wheel group access
         setfacl -m g:wheel:rx /.snapshots >> "$logfile" 2>&1 || error_warn "$action" "$?"
@@ -1682,9 +1776,9 @@ EOF
         # Install snapper GUI (AUR)
         aur_install snapper-gui-git
 
-    else
-        display "task" "ext4/other filesystem detected"
-    fi
+}
+
+configure_user_lingering() {
 
     # User Services Lingering
     # Keeps user-level systemd services (e.g., protonmail-bridge) running without
@@ -1692,8 +1786,7 @@ EOF
     # user-level IMAP/SMTP daemons over SSH or from remote agents.
 
     display "subtitle" "User Services"
-    action="enabling user-services lingering for $username" && display "task" "$action"
-    loginctl enable-linger "$username" >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "enabling user-services lingering for $username" loginctl enable-linger "$username"
 }
 
 ### Xorg Display Manager
@@ -1718,8 +1811,7 @@ Section "ServerFlags"
     Option "DontZap"      "True"
 EndSection
 EOF
-    action="configuring xorg server" && display "task" "$action"
-    chmod 644 /etc/X11/xorg.conf.d/00-no-vt-or-zap.conf >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "configuring xorg server" chmod 644 /etc/X11/xorg.conf.d/00-no-vt-or-zap.conf
 
     # Install GPU-specific drivers
     install_gpu_drivers
@@ -1811,6 +1903,48 @@ UDEVEOF
     sed -i "s/ARCHSETUP_USERNAME/${username}/" /etc/udev/rules.d/99-logitech-brio.rules
     chmod 644 /etc/udev/rules.d/99-logitech-brio.rules
     fi
+
+    # Live-update guard: a pacman PreTransaction hook that aborts an upgrade of
+    # GPU/compositor runtime libraries while a Hyprland session is running, so
+    # the live compositor doesn't SIGABRT when a library is swapped underneath
+    # it (hit ratio 2026-06-07: live mesa + hyprland upgrade crashed Hyprland and
+    # its clients). Re-run the upgrade from a TTY with Hyprland stopped and the
+    # guard stays quiet.
+    action="Live-Update Guard" && display "subtitle" "$action"
+    run_task "installing the live GPU/compositor update guard" \
+        cp "$user_archsetup_dir/scripts/hypr-live-update-guard" /usr/local/bin/hypr-live-update-guard
+    chmod 755 /usr/local/bin/hypr-live-update-guard
+
+    action="installing the live-update guard pacman hook" && display "task" "$action"
+    mkdir -p /etc/pacman.d/hooks
+    cat > /etc/pacman.d/hooks/hypr-live-update-guard.hook << 'HOOKEOF'
+[Trigger]
+Operation = Upgrade
+Type = Package
+Target = mesa
+Target = mesa-*
+Target = wayland
+Target = libdrm
+Target = libglvnd
+Target = hyprland
+Target = aquamarine
+Target = hyprutils
+Target = hyprgraphics
+Target = vulkan-radeon
+Target = vulkan-intel
+Target = vulkan-mesa-layers
+Target = nvidia-utils
+Target = lib32-nvidia-utils
+Target = xorg-xwayland
+
+[Action]
+Description = Checking for a live Hyprland session before swapping GPU/compositor libs...
+When = PreTransaction
+Exec = /usr/local/bin/hypr-live-update-guard
+AbortOnFail
+NeedsTargets
+HOOKEOF
+    chmod 644 /etc/pacman.d/hooks/hypr-live-update-guard.hook
 }
 
 ### Display Server (conditional)
@@ -1968,8 +2102,7 @@ desktop_environment() {
         pacman_install "$software"
     done
     pacman_install solaar             # Logitech device manager
-    action="enabling bluetooth to launch at boot" && display "task" "$action"
-    systemctl enable bluetooth.service  >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "enabling bluetooth to launch at boot" systemctl enable bluetooth.service
 
     # Command Line Utilities
 
@@ -2085,8 +2218,7 @@ gaming() {
     pacman_install steam
 
     # Enable gamemode service for user
-    action="enabling gamemode for user" && display "task" "$action"
-    sudo -u "$username" systemctl --user enable gamemoded.service >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "enabling gamemode for user" sudo -u "$username" systemctl --user enable gamemoded.service
 }
 
 ### Zig Toolchain Pin
@@ -2192,6 +2324,14 @@ developer_workstation() {
 
     action="Developer Workstation" && display "title" "$action"
 
+    install_programming_languages
+    install_editors
+    install_android_utilities
+    install_vpn_tools
+    install_devops_utilities
+}
+
+install_programming_languages() {
     action="Programming Languages and Utilities" && display "subtitle" "$action"
     # Rust (via rustup — must precede AUR packages that compile with rust)
     pacman_install rustup      # Rust toolchain manager
@@ -2259,6 +2399,9 @@ developer_workstation() {
     pacman_install hyperfine        # statistical command-line benchmarking
     pacman_install doggo            # modern dig: readable DNS client, DoH/DoT/DoQ
 
+}
+
+install_editors() {
     action="Programming Editors" && display "subtitle" "$action"
     pacman_install mg                        # mini emacs
 
@@ -2317,19 +2460,27 @@ developer_workstation() {
             >> "$logfile" 2>&1 || error_warn "$action" "$?"
     fi
 
+}
+
+install_android_utilities() {
     action="Android Utilities" && display "subtitle" "$action"
     pacman_install android-file-transfer
     pacman_install android-tools
 
+}
+
+install_vpn_tools() {
     action="VPN Tools" && display "subtitle" "$action"
     pacman_install wireguard-tools               # VPN - add configs to /etc/wireguard/
     pacman_install systemd-resolvconf            # resolvconf for wg-quick DNS integration
     pacman_install proton-vpn-gtk-app            # Proton VPN GUI client with system tray
     pacman_install tailscale                     # mesh VPN - run 'tailscale up' to authenticate
 
-    action="enabling tailscale service" && display "task" "$action"
-    systemctl enable tailscaled >> "$logfile" 2>&1 || error_warn "$action" "$?"
+    run_task "enabling tailscale service" systemctl enable tailscaled
 
+}
+
+install_devops_utilities() {
     action="DevOps Utilities" && display "subtitle" "$action"
 
     action="installing devops virtualization and automation tools" && display "task" "$action"
@@ -2357,8 +2508,7 @@ developer_workstation() {
 }
 EOF
     fi
-    action="enabling docker service to launch on boot" && display "task" "$action"
-    systemctl enable docker.service >> "$logfile" 2>&1  || error_warn "$action" "$?"
+    run_task "enabling docker service to launch on boot" systemctl enable docker.service
 
     # podman (rootless containers for winvm)
     pacman_install podman
@@ -2496,8 +2646,7 @@ supplemental_software() {
     # makepkg's integrity check fails on that file even though the package tarball
     # itself verifies. Rechecked 2026-06-24 — the original expired-PGP-signature
     # cause is gone, but this LICENSE-drift keeps the workaround necessary.
-    action="installing python-lyricsgenius (integrity workaround)" && display "task" "$action"
-    yay -S --noconfirm --mflags --skipinteg python-lyricsgenius >> "$logfile" 2>&1  || error_warn "$action" "$?"
+    run_task "installing python-lyricsgenius (integrity workaround)" yay -S --noconfirm --mflags --skipinteg python-lyricsgenius
     aur_install tidal-dl                             # tidal-dl:tidal as yt-dlp:youtube
     aur_install tidaler                              # tidal downloader (tidal-dl-ng fork)
     aur_install freetube                             # privacy-focused YouTube desktop client
@@ -2507,6 +2656,16 @@ supplemental_software() {
 boot_ux() {
     action="Boot UX" && display "title" "$action"
 
+    tighten_efi_permissions
+    add_nvme_early_module
+    configure_initramfs_hook
+    configure_encrypted_autologin
+    configure_tlp_power
+    trim_firmware
+    configure_grub
+}
+
+tighten_efi_permissions() {
     # Tighten /efi mount permissions so kernel images, initramfs, and
     # bootloader config aren't world-readable. archinstall's defaults leave
     # them at 0755; fmask/dmask below makes files 0600 and dirs 0700.
@@ -2519,6 +2678,9 @@ boot_ux() {
             || error_warn "$action" "$?"
     fi
 
+}
+
+add_nvme_early_module() {
     # Add nvme module for early loading on NVMe systems
     # Ensures NVMe devices are available when ZFS/other hooks try to access them
     if has_nvme_drives; then
@@ -2546,6 +2708,9 @@ boot_ux() {
         echo "FONT=ter-132n" >> /etc/vconsole.conf
     fi
 
+}
+
+configure_initramfs_hook() {
     # Only switch to systemd hook for non-ZFS systems
     # ZFS initramfs hook is busybox-based and incompatible with systemd hook
     if ! is_zfs_root; then
@@ -2570,6 +2735,9 @@ StandardOutput=null
 StandardError=journal+console
 EOF
 
+}
+
+configure_encrypted_autologin() {
     # Automatic login for encrypted systems (prompts if no CLI flag and root is encrypted)
     configure_autologin
 
@@ -2592,6 +2760,9 @@ HandleLidSwitchExternalPower=ignore
 HandleLidSwitchDocked=ignore
 EOF
 
+}
+
+configure_tlp_power() {
     # TLP power management — laptops only (battery present). Manages wifi,
     # USB, PCIe, and CPU power policy on AC/battery transitions. systemd-rfkill
     # is masked per TLP's docs (it fights TLP's radio-state handling).
@@ -2610,12 +2781,14 @@ PLATFORM_PROFILE_ON_BAT=low-power
 # Off by default — uncomment (and match the BAT name) to enable.
 #STOP_CHARGE_THRESH_BAT1=80
 EOF
-        action="enabling TLP service" && display "task" "$action"
-        systemctl enable tlp.service >> "$logfile" 2>&1 || error_warn "$action" "$?"
+        run_task "enabling TLP service" systemctl enable tlp.service
         systemctl mask systemd-rfkill.service systemd-rfkill.socket >> "$logfile" 2>&1 || \
             error_warn "masking systemd-rfkill for TLP" "$?"
     fi
 
+}
+
+trim_firmware() {
     # Firmware trim — Framework 13 Intel only (matched by DMI), where the
     # hardware set is known: i915 graphics (linux-firmware-intel), ath9k wifi
     # (linux-firmware-atheros, firmware-free driver but kept for safety), and
@@ -2633,10 +2806,12 @@ EOF
             linux-firmware-mellanox linux-firmware-nfp linux-firmware-nvidia \
             linux-firmware-other linux-firmware-qlogic linux-firmware-radeon \
             >> "$logfile" 2>&1 || error_warn "$action" "$?"
-        action="rebuilding initramfs after firmware trim" && display "task" "$action"
-        mkinitcpio -P >> "$logfile" 2>&1 || error_warn "$action" "$?"
+        run_task "rebuilding initramfs after firmware trim" mkinitcpio -P
     fi
 
+}
+
+configure_grub() {
     # GRUB: reset timeouts, adjust log levels, larger menu for HiDPI screens, and show splashscreen
     # Note: nvme.noacpi=1 disables NVMe ACPI power management to prevent freezes on some drives.
     # Safe to keep on newer drives (minor power cost), remove if battery life is critical.
@@ -2654,8 +2829,7 @@ EOF
 
     # Regenerate GRUB config after all modifications
     if [ -f /etc/default/grub ]; then
-        action="generating grub configuration" && display "task" "$action"
-        grub-mkconfig -o /boot/grub/grub.cfg  >> "$logfile" 2>&1  || error_warn "$action" "$?"
+        run_task "generating grub configuration" grub-mkconfig -o /boot/grub/grub.cfg
     fi
 }
 
diff --git a/working/collapsible-waybar-sides/spike-findings.org b/assets/2026-06-18-collapsible-waybar-sides-spike-findings.org
index 4d45ed1..4d45ed1 100644
--- a/working/collapsible-waybar-sides/spike-findings.org
+++ b/assets/2026-06-18-collapsible-waybar-sides-spike-findings.org
diff --git a/working/collapsible-waybar-sides/collapsible-waybar-sides-spec.org b/assets/2026-06-19-collapsible-waybar-sides-spec.org
index b9ddc0d..b9ddc0d 100644
--- a/working/collapsible-waybar-sides/collapsible-waybar-sides-spec.org
+++ b/assets/2026-06-19-collapsible-waybar-sides-spec.org
diff --git a/assets/outbox/2026-06-25-1248-from-archangel-accepted-the-stale-baked-archzfs-db-zfs.org b/assets/outbox/2026-06-25-1248-from-archangel-accepted-the-stale-baked-archzfs-db-zfs.org
new file mode 100644
index 0000000..1e0ebf4
--- /dev/null
+++ b/assets/outbox/2026-06-25-1248-from-archangel-accepted-the-stale-baked-archzfs-db-zfs.org
@@ -0,0 +1,5 @@
+#+TITLE: Accepted: the stale-baked-archzfs-db ZFS install bug. Diagno
+#+SOURCE: from archangel
+#+DATE: 2026-06-25 12:48:41 -0400
+
+Accepted: the stale-baked-archzfs-db ZFS install bug. Diagnosis confirmed against source — install_base (installer/archangel:759) appends [archzfs] at ~771 and runs pacstrap -K at 805 with no db refresh between, so pacstrap resolves zfs-dkms from the baked ISO db (2.3.3) instead of the current channel (2.4.2). Filed as archangel [#A] :bug:blocker: 'Refresh archzfs db before pacstrap in install_base', tagged :blocker: so it surfaces as priority work. Fix direction: pacman -Sy after the repo appends and before pacstrap -K. A fresh ISO rebuild still needed for aged ISOs, but the refresh is the durable fix. Will notify you when a fixed ISO is available so you can retry make test FS_PROFILE=zfs.
diff --git a/assets/outbox/2026-06-25-1347-from-archangel-delivered-you-re-unblocked-the-stale.org b/assets/outbox/2026-06-25-1347-from-archangel-delivered-you-re-unblocked-the-stale.org
new file mode 100644
index 0000000..296e37c
--- /dev/null
+++ b/assets/outbox/2026-06-25-1347-from-archangel-delivered-you-re-unblocked-the-stale.org
@@ -0,0 +1,5 @@
+#+TITLE: Delivered — you're unblocked. The stale-baked-archzfs-db ZFS
+#+SOURCE: from archangel
+#+DATE: 2026-06-25 13:47:29 -0400
+
+Delivered — you're unblocked. The stale-baked-archzfs-db ZFS install bug is fixed and verified. installer/archangel now runs pacman -Syy before pacstrap (commit 89691a0, pushed to main). I rebuilt the ISO and ran the single-disk ZFS install in QEMU (scripts/test-install.sh single-disk = FILESYSTEM=zfs, NO_ENCRYPT=yes, the same scenario you reproduced): pacstrap resolved zfs-dkms-2.4.2-1 / zfs-utils-2.4.2-2 (not the stale 2.3.3), DKMS built zfs/2.4.2 against kernel 6.18.36-1-lts, the system rebooted, and the ZFS pool came up healthy. TEST PASSED. Fixed ISO: out/archangel-2026-06-25-vmlinuz-6.18.36-lts-x86_64.iso (kernel 6.18.36-lts). Retry make test FS_PROFILE=zfs against this ISO. Note: build.sh clears the host pacoloco archzfs cache on each build, so a fresh build is what guarantees current packages — if you ever see 2.3.3 again, rebuild rather than reusing an aged ISO.
diff --git a/assets/outbox/2026-06-25-1359-from-archangel-got-the-heads-up-everything-it-flags-is.org b/assets/outbox/2026-06-25-1359-from-archangel-got-the-heads-up-everything-it-flags-is.org
new file mode 100644
index 0000000..e437d7d
--- /dev/null
+++ b/assets/outbox/2026-06-25-1359-from-archangel-got-the-heads-up-everything-it-flags-is.org
@@ -0,0 +1,5 @@
+#+TITLE: Got the heads-up — everything it flags is already handled. I
+#+SOURCE: from archangel
+#+DATE: 2026-06-25 13:59:17 -0400
+
+Got the heads-up — everything it flags is already handled. I adopted the staged pacman -Syy edit as-is and committed it (89691a0, pushed to main), so no mystery edit remains. My own ISO rebuild ran clean (your work/ teardown left out/ intact; the new build created and tore down its own work/). Verified the ZFS single-disk install end-to-end before the unblock note I sent at 13:47 — that note supersedes this thread. You're clear to retry create-base-vm FS_PROFILE=zfs against out/archangel-2026-06-25-vmlinuz-6.18.36-lts-x86_64.iso.
diff --git a/assets/outbox/2026-06-27-2148-from-archangel-accepted-both-archangel-bug-handoffs.org b/assets/outbox/2026-06-27-2148-from-archangel-accepted-both-archangel-bug-handoffs.org
new file mode 100644
index 0000000..213017d
--- /dev/null
+++ b/assets/outbox/2026-06-27-2148-from-archangel-accepted-both-archangel-bug-handoffs.org
@@ -0,0 +1,5 @@
+#+TITLE: Accepted both archangel bug handoffs (pacman.conf 0600, 2026
+#+SOURCE: from archangel
+#+DATE: 2026-06-27 21:48:03 -0400
+
+Accepted both archangel bug handoffs (pacman.conf 0600, 2026-06-25; baked AUR list breaks btrfs, 2026-06-27). Both diagnoses confirmed against archangel source. Filed as tasks; the baked-AUR-list one is tagged :blocker: since it blocks your btrfs base creation. Starting the fixes now — will ping you with a fixed ISO once both land and verify so you can rebuild the btrfs base and re-add btrfs to the green-test matrix.
diff --git a/assets/outbox/2026-06-27-2301-from-archangel-delivered-both-archangel-bugs-fixed-and.org b/assets/outbox/2026-06-27-2301-from-archangel-delivered-both-archangel-bugs-fixed-and.org
new file mode 100644
index 0000000..9384f88
--- /dev/null
+++ b/assets/outbox/2026-06-27-2301-from-archangel-delivered-both-archangel-bugs-fixed-and.org
@@ -0,0 +1,15 @@
+#+TITLE: Delivered — both archangel bugs fixed and verified, fixed IS
+#+SOURCE: from archangel
+#+DATE: 2026-06-27 23:01:40 -0400
+
+Delivered — both archangel bugs fixed and verified, fixed ISO ready.
+
+ISO: /home/cjennings/code/archangel/out/archangel-2026-06-27-vmlinuz-6.18.36-lts-x86_64.iso (2.1G, built 2026-06-27 22:36)
+
+Bug #2 (the blocker — baked AUR list broke btrfs): fixed. install_base now filters the baked AUR set by target filesystem, dropping zfs-only tooling (zfs-auto-snapshot, zrepl) on a non-zfs install. The ISO still bakes the full set; only the install selection is filtered. Verified end-to-end in a VM: a btrfs-single unattended install now completes cleanly (the 266-pkg pacstrap transaction no longer includes zfs-auto-snapshot, so the 'unable to satisfy dependency zfs' abort is gone). A zfs single-disk install still passes, so the zfs path is unaffected.
+
+Bug #1 (installed /etc/pacman.conf landing 0600): fixed. strip_repo_stanza now writes through the existing config instead of mv-ing a 0600 mktemp over it, so the installed pacman.conf keeps its 0644. Unit-tested for mode preservation; the btrfs/zfs installs both completed past the strip step.
+
+You're unblocked: rebuild the btrfs base from this ISO and re-add btrfs to the green-test matrix. Fix is committed on archangel main (2ead674) and pushed.
+
+Note: the companion EFI \EFI\BOOT\BOOTX64.EFI removable-fallback hardening you mentioned is filed on the archangel side but not done in this pass — separate, optional, not part of this unblock.
diff --git a/scripts/hypr-live-update-guard b/scripts/hypr-live-update-guard
new file mode 100755
index 0000000..4f561ae
--- /dev/null
+++ b/scripts/hypr-live-update-guard
@@ -0,0 +1,70 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-3.0-or-later
+# hypr-live-update-guard - abort a live GPU/compositor library upgrade.
+#
+# Installed as a pacman PreTransaction hook. When an upgrade transaction
+# includes GPU/compositor runtime libraries (mesa, hyprland, wayland, GPU
+# drivers, ...) AND a Hyprland session is running, this aborts the
+# transaction BEFORE any package is swapped. Replacing those libraries out
+# from under a live compositor makes the next GPU-lib call hit a now
+# "(deleted)" file and SIGABRT, taking the Wayland clients down with it
+# (hit on ratio 2026-06-07: mesa + hyprland upgraded live, Hyprland crashed
+# and took awww/insync/emacs with it). Aborting at PreTransaction is the
+# safe point: nothing has been replaced yet, so the running session is
+# untouched and the user can re-run the upgrade from a TTY.
+#
+# Pacman feeds the matched package names on stdin (NeedsTargets).
+#
+# Test seams / overrides (env):
+#   HYPR_GUARD_RUNNING     1/0 forces the running check (default: pgrep Hyprland)
+#   HYPR_ALLOW_LIVE_UPDATE 1 proceeds anyway (skip the guard)
+#   HYPR_GUARD_SENTINEL    path whose existence also proceeds anyway
+#                          (default /run/archsetup-allow-live-gpu-update,
+#                          cleared on reboot since /run is tmpfs)
+
+set -u
+
+sentinel="${HYPR_GUARD_SENTINEL:-/run/archsetup-allow-live-gpu-update}"
+
+# Explicit override: the user knows what they're doing.
+if [ "${HYPR_ALLOW_LIVE_UPDATE:-0}" = "1" ] || [ -e "$sentinel" ]; then
+    exit 0
+fi
+
+hyprland_running() {
+    if [ -n "${HYPR_GUARD_RUNNING:-}" ]; then
+        [ "$HYPR_GUARD_RUNNING" = "1" ]
+        return
+    fi
+    pgrep -x Hyprland >/dev/null 2>&1
+}
+
+# No live session means no live swap to worry about. Let the upgrade run --
+# this is exactly the from-a-TTY-after-logout path the warning points to.
+hyprland_running || exit 0
+
+# Collect the triggering packages (stdin from NeedsTargets) for the message.
+pkgs=$(cat 2>/dev/null | sort -u | tr '\n' ' ')
+
+cat >&2 <<EOF
+
+==========================================================================
+ BLOCKED: live GPU/compositor library upgrade while Hyprland is running
+==========================================================================
+ Packages in this upgrade can crash the running compositor if swapped now:
+   ${pkgs:-(GPU/compositor runtime libraries)}
+
+ Replacing these out from under a live Hyprland session makes the next
+ GPU-lib call hit a deleted library and SIGABRT, taking your Wayland apps
+ down with it (and risking an unclean shutdown).
+
+ Do it safely instead -- from a TTY with Hyprland stopped:
+   1. Log out of Hyprland, or switch to a console (Ctrl+Alt+F2) and log in.
+   2. Re-run the upgrade there:  sudo pacman -Syu
+
+ To override and proceed anyway (not recommended while Hyprland runs):
+   sudo touch $sentinel && sudo pacman -Syu
+==========================================================================
+
+EOF
+exit 1
diff --git a/scripts/testing/lib/vm-utils.sh b/scripts/testing/lib/vm-utils.sh
index 10c0ca5..b85e773 100755
--- a/scripts/testing/lib/vm-utils.sh
+++ b/scripts/testing/lib/vm-utils.sh
@@ -11,7 +11,9 @@
 
 # VM configuration defaults
 VM_CPUS="${VM_CPUS:-4}"
-VM_RAM="${VM_RAM:-4096}"  # MB
+# 8 GiB headroom for AUR builds: makepkg runs -j$VM_CPUS, and parallel cc1plus
+# (~700 MB each on heavy C++ packages) OOM-killed under the old 4 GiB default.
+VM_RAM="${VM_RAM:-8192}"  # MB
 VM_DISK_SIZE="${VM_DISK_SIZE:-50}"  # GB
 
 # Filesystem profile: selects which base image + archangel config the harness
@@ -59,7 +61,11 @@ init_vm_paths() {
     local img_suffix=""
     [ "$FS_PROFILE" != "btrfs" ] && img_suffix="-$FS_PROFILE"
     DISK_PATH="$VM_IMAGES_DIR/archsetup-base${img_suffix}.qcow2"
-    OVMF_VARS="$VM_IMAGES_DIR/OVMF_VARS.fd"
+    # Per-profile NVRAM: UEFI boot entries live here, outside the qcow2, so a
+    # disk-snapshot revert can't restore them. Sharing one file across profiles
+    # let a zfs run's ZFSBootMenu entries clobber the btrfs GRUB entry, leaving
+    # the btrfs base unbootable (no removable ESP fallback to recover from).
+    OVMF_VARS="$VM_IMAGES_DIR/OVMF_VARS${img_suffix}.fd"
     PID_FILE="$VM_IMAGES_DIR/qemu.pid"
     MONITOR_SOCK="$VM_IMAGES_DIR/qemu-monitor.sock"
     SERIAL_LOG="$VM_IMAGES_DIR/qemu-serial.log"
diff --git a/scripts/testing/tests/test_desktop.py b/scripts/testing/tests/test_desktop.py
index 53e54e1..c02d2b6 100644
--- a/scripts/testing/tests/test_desktop.py
+++ b/scripts/testing/tests/test_desktop.py
@@ -50,6 +50,19 @@ def test_hyprland_config_present(host, hyprland_installed, home, rel):
 
 
 @pytest.mark.attribution("archsetup")
+def test_live_update_guard_installed(host, hyprland_installed):
+    if not hyprland_installed:
+        pytest.skip("Hyprland not installed (DESKTOP_ENV != hyprland)")
+    guard = host.file("/usr/local/bin/hypr-live-update-guard")
+    assert guard.exists, "live-update guard script missing"
+    assert guard.mode & 0o111, "live-update guard not executable"
+    hook = host.file("/etc/pacman.d/hooks/hypr-live-update-guard.hook")
+    assert hook.exists, "live-update guard pacman hook missing"
+    assert "hypr-live-update-guard" in hook.content_string, \
+        "hook does not invoke the guard script"
+
+
+@pytest.mark.attribution("archsetup")
 def test_portal_settings_backend_not_disabled(host, hyprland_installed, home):
     if not hyprland_installed:
         pytest.skip("Hyprland not installed")
diff --git a/tests/hypr-live-update-guard/test_hypr_live_update_guard.py b/tests/hypr-live-update-guard/test_hypr_live_update_guard.py
new file mode 100644
index 0000000..5ec5ce8
--- /dev/null
+++ b/tests/hypr-live-update-guard/test_hypr_live_update_guard.py
@@ -0,0 +1,95 @@
+"""Tests for the hypr-live-update-guard pacman PreTransaction hook script.
+
+The guard aborts a live pacman upgrade of GPU/compositor runtime libraries
+(mesa, hyprland, wayland, GPU drivers) while a Hyprland session is running,
+so the compositor doesn't SIGABRT when a now-"(deleted)" library is next
+called. It reads the triggering package names on stdin (pacman NeedsTargets)
+and exits non-zero to abort the transaction (AbortOnFail) before any package
+is swapped. When Hyprland isn't running, or an override is set, it exits 0
+and the upgrade proceeds.
+
+Test seams (env vars the production script honors):
+  HYPR_GUARD_RUNNING       1/0 forces the Hyprland-running check (default: pgrep)
+  HYPR_ALLOW_LIVE_UPDATE   1 overrides the guard (proceed anyway)
+  HYPR_GUARD_SENTINEL      path whose existence also overrides the guard
+
+Run from repo root:
+    python3 -m unittest tests.hypr-live-update-guard.test_hypr_live_update_guard
+"""
+
+import os
+import subprocess
+import tempfile
+import unittest
+
+
+REPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))
+GUARD = os.path.join(REPO_ROOT, "scripts", "hypr-live-update-guard")
+
+
+def run_guard(stdin="mesa\n", running="1", allow=None, sentinel=None):
+    env = dict(os.environ)
+    env["HYPR_GUARD_RUNNING"] = running
+    if allow is not None:
+        env["HYPR_ALLOW_LIVE_UPDATE"] = allow
+    # Point the sentinel at a path that does not exist unless a test sets one,
+    # so the host's real /run state can't leak into the result.
+    env["HYPR_GUARD_SENTINEL"] = sentinel if sentinel else "/nonexistent/guard-sentinel"
+    return subprocess.run(
+        ["sh", GUARD],
+        input=stdin, capture_output=True, text=True, timeout=10, env=env,
+    )
+
+
+class HyprLiveUpdateGuard(unittest.TestCase):
+    # --- Normal cases ---------------------------------------------------
+
+    def test_running_with_dangerous_pkg_aborts(self):
+        r = run_guard(stdin="mesa\n", running="1")
+        self.assertEqual(r.returncode, 1, r.stderr)
+
+    def test_abort_message_names_the_package_and_tty_remedy(self):
+        r = run_guard(stdin="mesa\n", running="1")
+        self.assertIn("mesa", r.stderr)
+        self.assertIn("TTY", r.stderr)
+
+    def test_not_running_allows(self):
+        r = run_guard(stdin="mesa\n", running="0")
+        self.assertEqual(r.returncode, 0, r.stderr)
+
+    def test_not_running_is_silent(self):
+        r = run_guard(stdin="mesa\nhyprland\n", running="0")
+        self.assertEqual(r.stderr.strip(), "")
+
+    # --- Boundary cases -------------------------------------------------
+
+    def test_multiple_packages_all_listed(self):
+        r = run_guard(stdin="mesa\nhyprland\nvulkan-radeon\n", running="1")
+        self.assertEqual(r.returncode, 1)
+        for pkg in ("mesa", "hyprland", "vulkan-radeon"):
+            self.assertIn(pkg, r.stderr)
+
+    def test_running_with_empty_stdin_still_guards(self):
+        # The hook only fires when dangerous targets exist, so an empty target
+        # list shouldn't normally happen; if Hyprland is up, stay safe (abort).
+        r = run_guard(stdin="", running="1")
+        self.assertEqual(r.returncode, 1)
+
+    # --- Override / error cases -----------------------------------------
+
+    def test_env_override_proceeds_even_when_running(self):
+        r = run_guard(stdin="mesa\n", running="1", allow="1")
+        self.assertEqual(r.returncode, 0, r.stderr)
+
+    def test_sentinel_file_override_proceeds(self):
+        with tempfile.NamedTemporaryFile(prefix="guard-allow-") as f:
+            r = run_guard(stdin="mesa\n", running="1", sentinel=f.name)
+            self.assertEqual(r.returncode, 0, r.stderr)
+
+    def test_override_env_zero_does_not_bypass(self):
+        r = run_guard(stdin="mesa\n", running="1", allow="0")
+        self.assertEqual(r.returncode, 1, r.stderr)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/tests/installer-steps/test_orchestrators.py b/tests/installer-steps/test_orchestrators.py
new file mode 100644
index 0000000..e62c198
--- /dev/null
+++ b/tests/installer-steps/test_orchestrators.py
@@ -0,0 +1,117 @@
+"""Characterization tests for the decomposed installer step orchestrators.
+
+The 2026 decomposition turned the giant step functions into thin
+orchestrators that call one named sub-function per concern. These tests pin
+the call SEQUENCE of each orchestrator: a dropped, added, or reordered
+sub-step call fails the test. They guard the wiring, not the sub-functions'
+own behavior (those mutate the system and are exercised by the VM harness).
+
+Method: sed-extract the orchestrator from the real `archsetup` (its body is
+now just `display` + sub-function calls), source it with `display` silenced
+and every sub-function replaced by a recorder that echoes its own name, run
+it, and assert stdout is the expected ordered list.
+
+Run from repo root:
+    python3 -m unittest tests.installer-steps.test_orchestrators
+"""
+
+import os
+import subprocess
+import textwrap
+import unittest
+
+
+REPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))
+ARCHSETUP = os.path.join(REPO_ROOT, "archsetup")
+
+# orchestrator -> exact ordered sub-step calls
+ORCHESTRATORS = {
+    "essential_services": [
+        "configure_randomness", "configure_networking", "configure_power",
+        "configure_ssh_server", "configure_fail2ban", "configure_firewall",
+        "configure_service_discovery", "configure_job_scheduling",
+        "configure_package_cache", "configure_snapshots",
+        "configure_user_lingering",
+    ],
+    "prerequisites": [
+        "bootstrap_pacman_keyring", "install_required_software",
+        "configure_build_environment", "configure_package_mirrors",
+    ],
+    "developer_workstation": [
+        "install_programming_languages", "install_editors",
+        "install_android_utilities", "install_vpn_tools",
+        "install_devops_utilities",
+    ],
+    "boot_ux": [
+        "tighten_efi_permissions", "add_nvme_early_module",
+        "configure_initramfs_hook", "configure_encrypted_autologin",
+        "configure_tlp_power", "trim_firmware", "configure_grub",
+    ],
+    "user_customizations": [
+        "clone_user_repos", "stow_dotfiles", "prune_waybar_battery",
+        "refresh_desktop_caches", "configure_dconf_defaults",
+        "finalize_dotfiles", "create_user_directories",
+    ],
+}
+
+
+def run_orchestrator(func, stubs, extra_defs=""):
+    """Source `func` from archsetup with `stubs` recording their names."""
+    stub_defs = "\n".join(f"{s}() {{ echo {s}; }}" for s in stubs)
+    script = textwrap.dedent(f"""\
+        display() {{ :; }}
+        {stub_defs}
+        {extra_defs}
+        source <(sed -n '/^{func}() {{/,/^}}/p' "{ARCHSETUP}")
+        {func}
+    """)
+    result = subprocess.run(
+        ["bash", "-c", script],
+        capture_output=True, text=True, timeout=10,
+    )
+    return result
+
+
+class OrchestratorSequence(unittest.TestCase):
+    def test_each_orchestrator_calls_substeps_in_order(self):
+        for func, expected in ORCHESTRATORS.items():
+            with self.subTest(orchestrator=func):
+                result = run_orchestrator(func, expected)
+                self.assertEqual(result.returncode, 0, result.stderr)
+                got = result.stdout.split()
+                self.assertEqual(got, expected,
+                                 f"{func} call sequence drifted")
+
+
+class SnapshotDispatch(unittest.TestCase):
+    """configure_snapshots branches on filesystem; pin each branch."""
+
+    SUBS = ["configure_zfs_snapshots", "configure_btrfs_snapshots"]
+
+    def test_zfs_root_runs_zfs_snapshots(self):
+        result = run_orchestrator(
+            "configure_snapshots", self.SUBS,
+            extra_defs="is_zfs_root() { return 0; }\nis_btrfs_root() { return 1; }",
+        )
+        self.assertEqual(result.returncode, 0, result.stderr)
+        self.assertEqual(result.stdout.split(), ["configure_zfs_snapshots"])
+
+    def test_btrfs_root_runs_btrfs_snapshots(self):
+        result = run_orchestrator(
+            "configure_snapshots", self.SUBS,
+            extra_defs="is_zfs_root() { return 1; }\nis_btrfs_root() { return 0; }",
+        )
+        self.assertEqual(result.returncode, 0, result.stderr)
+        self.assertEqual(result.stdout.split(), ["configure_btrfs_snapshots"])
+
+    def test_other_filesystem_runs_neither(self):
+        result = run_orchestrator(
+            "configure_snapshots", self.SUBS,
+            extra_defs="is_zfs_root() { return 1; }\nis_btrfs_root() { return 1; }",
+        )
+        self.assertEqual(result.returncode, 0, result.stderr)
+        self.assertEqual(result.stdout.split(), [])
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/tests/run-task/test_run_task.py b/tests/run-task/test_run_task.py
new file mode 100644
index 0000000..35036dd
--- /dev/null
+++ b/tests/run-task/test_run_task.py
@@ -0,0 +1,172 @@
+"""Tests for the run_task / enable_service helpers in the archsetup installer.
+
+run_task is the installer's describe-run-warn primitive. It replaces the
+hand-written idiom that recurs ~100 times across the script:
+
+    action="enabling rngd service" && display "task" "$action"
+    systemctl enable rngd >> "$logfile" 2>&1 || error_warn "$action" "$?"
+
+as a single call:
+
+    run_task "enabling rngd service" systemctl enable rngd
+
+It announces the task via display, runs the command with stdout+stderr
+appended to $logfile, and on failure calls error_warn with the command's
+real exit code (non-fatal). enable_service is a thin wrapper that enables
+one or more systemd units with the conventional "enabling <unit> service"
+wording.
+
+These tests exercise the REAL function bodies, extracted from the
+`archsetup` script at run time (not a copy), with recording stubs standing
+in for display, error_warn, and systemctl. The command run by run_task is
+genuinely executed.
+
+Run from repo root:
+    python3 -m unittest tests.run-task.test_run_task
+"""
+
+import os
+import shutil
+import subprocess
+import tempfile
+import unittest
+
+
+REPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))
+ARCHSETUP = os.path.join(REPO_ROOT, "archsetup")
+
+# A bash harness that sources the real run_task + enable_service out of the
+# installer, with recording stubs for their dependencies. Each stub appends a
+# tab-separated record to a file named by an env var, so the Python side can
+# assert what was called. The real command passed to run_task still runs.
+WRAPPER = r"""#!/bin/bash
+ARCHSETUP="$1"; shift
+logfile="$LOGFILE"
+
+display()    { printf '%s\t%s\n' "$1" "$2" >> "$DISPLAY_LOG"; }
+error_warn() { printf '%s\t%s\n' "$1" "$2" >> "$ERRWARN_LOG"; return 1; }
+systemctl()  { printf 'systemctl %s\n' "$*"; }
+
+source <(sed -n '/^run_task() {/,/^}/p' "$ARCHSETUP")
+source <(sed -n '/^enable_service() {/,/^}/p' "$ARCHSETUP")
+
+"$@"
+"""
+
+
+class RunTaskHarness(unittest.TestCase):
+    def setUp(self):
+        self.tmp = tempfile.mkdtemp(prefix="run-task-test-")
+        self.wrapper = os.path.join(self.tmp, "run.sh")
+        with open(self.wrapper, "w") as f:
+            f.write(WRAPPER)
+        os.chmod(self.wrapper, 0o755)
+        self.logfile = os.path.join(self.tmp, "install.log")
+        self.display_log = os.path.join(self.tmp, "display.log")
+        self.errwarn_log = os.path.join(self.tmp, "errwarn.log")
+
+    def tearDown(self):
+        shutil.rmtree(self.tmp, ignore_errors=True)
+
+    def call(self, *args):
+        env = dict(os.environ)
+        env["LOGFILE"] = self.logfile
+        env["DISPLAY_LOG"] = self.display_log
+        env["ERRWARN_LOG"] = self.errwarn_log
+        return subprocess.run(
+            ["bash", self.wrapper, ARCHSETUP, *args],
+            capture_output=True, text=True, timeout=10, env=env,
+        )
+
+    def read(self, path):
+        if not os.path.exists(path):
+            return ""
+        with open(path) as f:
+            return f.read()
+
+    # --- Normal cases -----------------------------------------------------
+
+    def test_run_task_success_announces_and_runs(self):
+        result = self.call("run_task", "doing a thing", "true")
+        self.assertEqual(result.returncode, 0, result.stderr)
+        # Announced as a "task" with the exact description.
+        self.assertEqual(self.read(self.display_log), "task\tdoing a thing\n")
+        # No warning on success.
+        self.assertEqual(self.read(self.errwarn_log), "")
+
+    def test_run_task_captures_command_output_to_logfile(self):
+        result = self.call("run_task", "echo something", "echo", "hello-from-cmd")
+        self.assertEqual(result.returncode, 0, result.stderr)
+        self.assertIn("hello-from-cmd", self.read(self.logfile))
+        # Command output is logged, not printed to the terminal.
+        self.assertNotIn("hello-from-cmd", result.stdout)
+
+    def test_run_task_captures_stderr_to_logfile(self):
+        # `ls` of a missing path writes to stderr; it must land in the logfile.
+        missing = os.path.join(self.tmp, "no-such-path")
+        self.call("run_task", "listing", "ls", missing)
+        self.assertIn("no-such-path", self.read(self.logfile))
+
+    def test_run_task_preserves_multiple_arguments(self):
+        self.call("run_task", "multi-arg", "printf", "%s|%s|%s", "a", "b", "c")
+        self.assertIn("a|b|c", self.read(self.logfile))
+
+    def test_run_task_preserves_arguments_with_spaces(self):
+        self.call("run_task", "spacey", "printf", "[%s]", "two words")
+        self.assertIn("[two words]", self.read(self.logfile))
+
+    # --- enable_service ---------------------------------------------------
+
+    def test_enable_service_single_unit(self):
+        self.call("enable_service", "rngd")
+        self.assertEqual(self.read(self.display_log), "task\tenabling rngd service\n")
+        self.assertIn("systemctl enable rngd", self.read(self.logfile))
+
+    def test_enable_service_multiple_units(self):
+        self.call("enable_service", "foo", "bar", "baz")
+        disp = self.read(self.display_log)
+        self.assertIn("task\tenabling foo service\n", disp)
+        self.assertIn("task\tenabling bar service\n", disp)
+        self.assertIn("task\tenabling baz service\n", disp)
+        log = self.read(self.logfile)
+        self.assertIn("systemctl enable foo", log)
+        self.assertIn("systemctl enable bar", log)
+        self.assertIn("systemctl enable baz", log)
+
+    # --- Error cases ------------------------------------------------------
+
+    def test_run_task_failure_warns_with_description(self):
+        result = self.call("run_task", "failing thing", "false")
+        self.assertNotEqual(result.returncode, 0)
+        self.assertEqual(self.read(self.errwarn_log), "failing thing\t1\n")
+
+    def test_run_task_failure_propagates_real_exit_code(self):
+        # `bash -c 'exit 42'` must surface 42 to error_warn, not a clobbered 0.
+        self.call("run_task", "exit-42", "bash", "-c", "exit 42")
+        self.assertEqual(self.read(self.errwarn_log), "exit-42\t42\n")
+
+    def test_enable_service_failure_warns_per_unit(self):
+        # Override systemctl to fail; each unit should produce a warning.
+        env = dict(os.environ)
+        env["LOGFILE"] = self.logfile
+        env["DISPLAY_LOG"] = self.display_log
+        env["ERRWARN_LOG"] = self.errwarn_log
+        # Re-create wrapper with a failing systemctl stub for this case.
+        failing = os.path.join(self.tmp, "run-fail.sh")
+        with open(failing, "w") as f:
+            f.write(WRAPPER.replace(
+                "systemctl()  { printf 'systemctl %s\\n' \"$*\"; }",
+                "systemctl()  { printf 'systemctl %s\\n' \"$*\"; return 1; }",
+            ))
+        os.chmod(failing, 0o755)
+        subprocess.run(
+            ["bash", failing, ARCHSETUP, "enable_service", "alpha", "beta"],
+            capture_output=True, text=True, timeout=10, env=env,
+        )
+        warns = self.read(self.errwarn_log)
+        self.assertIn("enabling alpha service\t1\n", warns)
+        self.assertIn("enabling beta service\t1\n", warns)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/todo.org b/todo.org
index 7f58559..6b4901c 100644
--- a/todo.org
+++ b/todo.org
@@ -21,6 +21,26 @@ The vocabulary is open — topic tags are coined as needed — so these are conv
 - *Effort / autonomy*: =:quick:= a spare-moment fix (minutes, not a sitting); =:solo:= Claude can carry it end to end — there's a build path, a test path, and no upfront decision needed (a leftover manual spot-check doesn't disqualify it).
 - *Topic / area* (open): the subsystem a task touches — e.g. =:hyprland:= =:waybar:= =:mpd:= =:music:= =:network:= =:tooling:= =:llm:= =:eask:= =:pocketbook:= =:cmail:=. Coin a new one when it aids filtering.
 * Archsetup Open Work
+** TODO [#B] btrfs base VM unbuildable — archangel ISO bakes zfs-auto-snapshot :bug:test:
+=make test-vm-base= (btrfs) fails in archangel's installer: the ISO bakes a fixed
+AUR list ("downgrade yay informant zrepl pacman-cleanup-hook zfs-auto-snapshot
+topgrade ventoy-bin") into every install regardless of =FILESYSTEM=. On a btrfs
+install =zfs= isn't present, so =zfs-auto-snapshot='s =zfs= dependency can't
+resolve and the unattended pacstrap aborts ("unable to satisfy dependency 'zfs'
+required by zfs-auto-snapshot"). This is an archangel ISO bug (the baked list isn't
+controllable from =archsetup-test.conf=), so it blocks btrfs-profile VM testing
+until archangel ships an ISO that conditions the AUR list on the filesystem (or
+drops zfs tooling from non-zfs installs). The 2026-06-27 btrfs base regen attempt
+also wiped the prior (unbootable) btrfs base, so there's no btrfs base image until
+this is fixed. zfs-profile testing works (=make test FS_PROFILE=zfs=).
+
+Companion hardening (defense-in-depth, archangel-side): install the bootloader
+with a removable =\EFI\BOOT\BOOTX64.EFI= fallback so a base boots even from
+fresh/empty NVRAM, and real installs survive firmware that drops boot entries.
+
+** TODO [#C] Scratchpad launch turns on focus-follows-mouse :bug:hyprland:
+Imported from roam inbox 2026-06-25. Repro: with two tiled windows, moving the mouse over the other tile does nothing (focus-follows-mouse off, as expected). Then launch a terminal (scratchpad), move the mouse over a tile, and focus now switches to the window under the pointer. Something about the scratchpad/terminal launch flips focus-follows-mouse on. Find what re-enables it (likely a Hyprland focus/input setting or a pyprland scratchpad side effect) and keep it off.
+
 ** TODO [#B] Scrolling layout: frame fit + wrap-around :hyprland:
 :PROPERTIES:
 :LAST_REVIEWED: 2026-06-13
@@ -53,14 +73,6 @@ From the roam inbox: Zoom opens at a tiny size. Needs diagnosis (HiDPI scaling v
 :END:
 From the roam inbox: hiding a window (e.g. the org-capture popup) then unhiding it should leave the unhidden window focused, but another window typically takes focus. Also =ctrl+j/k= (layout-navigate) can't reach the unhidden window afterward — it should always reach any visible window except the waybar. Involves stash-restore + layout-navigate; needs interactive reproduction with Craig.
 
-** TODO [#B] Guard against live mesa/hyprland/wayland-runtime updates :hyprland:
-:PROPERTIES:
-:LAST_REVIEWED: 2026-06-09
-:END:
-A live =pacman -Syu= that swaps mesa/hyprland/wayland runtime libs out from under a running Hyprland session can crash the compositor: the next GPU-lib call hits a now-"(deleted)" library and SIGABRTs, taking the Wayland clients down with it. Hit ratio 2026-06-07 (mesa 26.0.6 -> 26.1.2 + hyprland upgraded live; Hyprland SIGABRT took down awww/insync/emacs). Likely the driver behind ratio's high lifetime unsafe-shutdown ratio — a crashed compositor forces a hard reset.
-
-Ship a guard: an update wrapper, or a documented practice, that when a pending =-Syu= set includes mesa/hyprland/wayland runtime libs advises running it from a TTY (or after logging out of Hyprland) rather than live. Returned to archsetup from archangel 2026-06-09 — hyprland/mesa are installed and managed by archsetup, not the ISO installer.
-
 ** TODO [#C] Pocketbook development backlog :pocketbook:
 :PROPERTIES:
 :LAST_REVIEWED: 2026-05-26
@@ -136,15 +148,10 @@ A roam-inbox capture asked for the same widget and expands the scope, so folding
 - *Multiple simultaneous* — several timers/alarms/stopwatches set and displayed at once, in one panel.
 - Deliverable includes proposing a few panel designs and recommending one before building.
 
-** TODO [#B] Collapsible waybar sides :waybar:
-:PROPERTIES:
-:LAST_REVIEWED: 2026-06-09
-:END:
-Let either side of the waybar collapse horizontally to a minimal base set, toggled by a click. Each collapsible side carries a small triangle / arrowhead pointing toward the screen edge it collapses into (away from center). Clicking it collapses that side to its base set and flips the arrow to point back toward center; clicking again restores the full side. Same shape-changes-with-state idea as the auto-dim indicator.
-
-Spec ready (2026-06-19): [[file:working/collapsible-waybar-sides/collapsible-waybar-sides-spec.org]]. Spike settled the mechanism: [[file:working/collapsible-waybar-sides/spike-findings.org]].
+** TODO [#B] Sysmon module right-click cycles the visible metric :feature:waybar:
+Builds on the just-shipped =custom/sysmon= collapse (dotfiles be7469b). Right-clicking the module rotates which metric is the visible one, in a fixed order: battery → cpu → temp → mem → disk → back to battery. Each click advances one step and wraps around. The host default (battery on a laptop, disk on a desktop) is the starting/reset metric; the tooltip keeps showing all metrics regardless. Left-click stays =pypr toggle monitor= (the btop popup) — the cycle lives on =on-click-right=.
 
-Decisions locked: right base set = date + worldclock + tray; left base set = menu + workspaces; per-side independent; host-agnostic (base set constant, full set is each host's existing config). Mechanism = config-swap + SIGUSR2 reload via an active-config copy in =$XDG_RUNTIME_DIR= (the CSS/state-file approach was disproven — GTK3 can't reflow-hide native modules). Lives in =~/.dotfiles/hyprland/=. Next: implement per the spec (TDD the toggle + arrow scripts).
+Implementation notes: =waybar-sysmon= needs a persisted selection (a state file in =$XDG_RUNTIME_DIR/waybar/=, absent = host default) that it reads to pick the visible metric. A new =sysmon-cycle= helper bumps the index and signals the module to refresh (add a =signal= to =custom/sysmon=, like the other custom modules; wire =sysmon-cycle= to =on-click-right=). TDD both — extend =tests/waybar-sysmon= for selection-driven output, add a =tests/sysmon-cycle= for the index advance/wrap and the signal.
 
 ** TODO [#B] Network-manager dropdown, nmcli-backed with GPG-stored secrets :waybar:network:
 :PROPERTIES:
@@ -521,81 +528,6 @@ Some operations log to ~$logfile~, others don't - standardize logging
 All package installs should log, all system modifications should log, all errors should log with context
 Makes debugging failed installations easier
 
-** DONE [#B] Add backup before system file modifications :solo:
-CLOSED: [2026-06-25 Thu]
-:PROPERTIES:
-:LAST_REVIEWED: 2026-06-24
-:END:
-Safety net for /etc/X11/xorg.conf.d and other system file edits
-Files like ~/etc/sudoers~, ~/etc/pacman.conf~, ~/etc/default/grub~ modified without backup
-If modifications fail or are incorrect, difficult to recover - should backup files to ~.backup~ before modifying
-
-Done 2026-06-25: added a =backup_system_file <path>= helper next to =safe_rm_rf= — it snapshots a pre-existing file to =<path>.archsetup.bak= before an in-place edit, idempotent (never clobbers an existing backup, so the pristine original survives repeated edits and re-runs), =cp -p= to preserve mode/ownership, no-op when the file is absent. Took the narrow scope (Craig's call): route only the in-place =sed -i= / append edits to *pre-existing* files through it — locale.gen, makepkg.conf, pacman.conf, sudoers, conf.d/wireless-regdom, geoclue.conf, conf.d/pacman-contrib, fstab, mkinitcpio.conf, vconsole.conf — and skip the brand-new drop-in files archsetup fully owns (nothing to back up; recovery is just deleting them). Tests: =tests/backup-system-file/= (7 Normal/Boundary/Error, incl. mode-preserved, existing-backup-not-overwritten, missing-target no-op, cp-failure). =make test-unit= green across all 5 suites; =bash -n= clean; only shellcheck note is the known SC2329 false positive (indirect STEPS dispatch). Integration verification is the next VM run.
-
-** DONE [#B] Migrate bare-metal test runner to Testinfra, then delete the shell sweep :test:
-CLOSED: [2026-06-25 Thu]
-Plan + ZFS-coverage expansion: [[file:docs/design/2026-06-25-zfs-vm-test-coverage.org]] (build a ZFS base VM via archangel + a =FS_PROFILE= selector so =make test= covers the ZFS path, then migrate this runner to key auth + Testinfra against it, then delete the dead =validation.sh= functions = phase E here).
-=run-test.sh= (VM) now uses the Testinfra/pytest sweep as its authoritative validator, but =run-test-baremetal.sh= (lines ~243-244) still calls the old =run_all_validations= / =validate_all_services= from =scripts/testing/lib/validation.sh=. Migrate the bare-metal runner to =run_testinfra_validation= too (same key + ssh-config approach, adapted for a real host), then delete the now-dead shell-sweep functions from =validation.sh=. Keep the live helpers: =ssh_cmd=, =attribute_issue=, =capture_pre/post_install_state=, =analyze_log_diff=, =categorize_errors=, =generate_issue_report=, and the =VALIDATION_*= counters/arrays. Deferred from the Testinfra cutover because it needs a bare-metal test loop to validate, out of scope for the VM-only autonomous run.
-*** 2026-06-25 Thu @ 12:37:02 -0400 P-A/P-B shipped (FS_PROFILE selector); P-C blocked on archangel ZFS-install bug
-P-A + P-B landed in =353b179=: =archsetup-test-zfs.conf= (archangel ZFS config) + an =FS_PROFILE= (btrfs default / zfs) selector across =vm-utils.sh= (=init_vm_paths= derives a per-profile image + validates the profile), =create-base-vm.sh= (selects the archangel config), =run-test.sh= (--help + profile display), and the Makefile (=make test FS_PROFILE=zfs=). Design simplification recorded: no =archsetup-vm-zfs.conf= needed — archsetup auto-detects ZFS from the live root via =is_zfs_root()=, so the archsetup run config is shared; only the archangel base config + base image differ. Open Q1 resolved: archangel supports ZFS root natively (it's the default FS).
-
-P-C (build the ZFS base image) is BLOCKED on archangel. =create-base-vm.sh FS_PROFILE=zfs= built the disk + booted the archangel ISO fine, but the archangel install died: =dkms install zfs/2.3.3 -k 6.18.36-1-lts= exited 1, ZFS module not built. Root cause is in archangel, not archsetup: it appends the [archzfs] experimental repo then runs =pacstrap -K= with no =pacman -Sy= refresh, so it uses the archzfs sync db baked into the Feb-2026 ISO (zfs-dkms 2.3.3) while linux-lts is pulled fresh (6.18.36). 2.3.3 doesn't build against 6.18. velox runs zfs-dkms 2.4.2 on the same kernel from the same channel, so the fix exists upstream — archangel just needs to refresh the db before pacstrap (+ a fresh ISO). Bug + dependency handoff sent to archangel inbox (=2026-06-25-1236-from-archsetup-bug-zfs-install-fails-stale-baked.org=). Retry P-C once a fixed archangel ISO is available. P-D (bare-metal migration code) is still workable in the meantime against the btrfs VM / velox.
-
-*** 2026-06-25 Thu @ 16:05:07 -0400 archangel unblocked; ZFS base built; 3 archsetup bugs fixed (local); re-run paused
-archangel shipped the fix (archangel =89691a0=: =pacman -Syy= before pacstrap) + rebuilt the ISO. With it, =create-base-vm.sh FS_PROFILE=zfs= built a verified ZFS-root base (=archsetup-base-zfs.qcow2=, clean-install snapshot, kernel 6.18.36). =make test FS_PROFILE=zfs= then surfaced three real archsetup bugs against the current archangel base, each fixed in a LOCAL (unpushed) commit:
-- =8ed42b9= informant: the base ships informant; its pacman PreTransaction hook (AbortOnFail) blocked archsetup's first transaction. Fix: =informant read --all= up front (guarded). PROVEN.
-- =66caeb5= pacman.conf perms: the base ships =/etc/pacman.conf= 0600 (archangel =strip_repo_stanza= mktemp+mv clobbers perms), breaking user =makepkg=/=yay=. Fix: =chmod 644= after archsetup's edits. PROVEN (run reached 75 min deep).
-- =05ec096= reflector: archsetup configured reflector's timer but never ran it, so installs used the base's 425-mirror worldwide list and pacman stalled ~15 min on a slow/unresponsive mirror. Fix: run reflector once before the heavy installs (=timeout=-bounded, non-fatal). NOT yet integration-proven — the next re-run validates it.
-Second archangel handoff sent for the pacman.conf-0600 root cause (=2026-06-25-1440-...=); archsetup's chmod is defensive, archangel should ship 0644. Paused before the re-run at Craig's request (he starts =sudo make test FS_PROFILE=zfs= from the laptop). Possible harness-side factor on the stall: slirp IPv6 blackholing (one stalled conn was IPv6) — watch if it recurs despite reflector.
-
-*** 2026-06-25 Thu @ 21:56:12 -0400 P-C GREEN — ZFS VM test path passes end to end
-=make test FS_PROFILE=zfs= PASSED: archsetup exit 0 (full ~68-min ZFS install, reflector held — no stall), pytest =95 passed, 0 failed, 11 skipped=. The ZFS-conditional checks now run the ZFS branch instead of skipping: =test_bootloader_installed= (ZFSBootMenu EFI binary at /efi/EFI/ZBM), =test_mkinitcpio_hooks= (zfs udev hook), =test_console_font_configured= (vconsole.conf), =test_zfs_has_sanoid= all PASS; =test_backup_created_for_mkinitcpio= correctly SKIPs (ZFS+virtio edits nothing). The 3 archsetup issues (gamemode, mu, signal-cli AUR) are the known non-critical residuals, same as on btrfs. Four commits pushed to main: =8ed42b9= informant news-hook, =66caeb5= pacman.conf 0644, =05ec096= reflector-during-install, =eb379c3= ZFS-aware boot/backup tests. P-C (ZFS coverage, design phases A-C) is DONE. Remaining on this task: P-D (migrate run-test-baremetal.sh to inject_root_key + run_testinfra_validation) and P-E (delete the dead validation.sh shell sweep).
-*** 2026-06-25 Thu @ 23:26:02 -0400 P-D + P-E done — whole epic closed
-P-D (=771b92e=): migrated =run-test-baremetal.sh= to key auth + Testinfra. =inject_root_key= generalized to =root@$VM_IP= (vm-utils) so it serves both runners; the bare-metal runner now injects the key after the genesis rollback, threads =SSH_KEY_OPT= + a new =--port= through every ssh/scp, and validates via =run_testinfra_validation= instead of the shell sweep. Follow-up fix =fb495d4=: =set +e= around the validator (it returns pytest's rc, which under =set -e= aborted before the report) — caught by the smoke test. Validated against the ZFS VM (=--validate-only=, localhost:2222): connectivity, ZFS check, key auth, Testinfra connect+run, report all work; a green bare-metal install still needs real ZFS hardware.
-
-P-E (=a4a339b=): deleted the dead shell sweep from =validation.sh= now both runners use Testinfra — run_all_validations, validate_all_services, run_full_validation, the ~35 validate_* checks, validation_pass/fail/warn/skip. Kept the live helpers (ssh_cmd, attribute_issue, capture_pre/post_install_state, analyze_log_diff, categorize_errors, generate_issue_report, VALIDATION_* counters + arrays). 1156 → 314 lines. Verified: no dangling refs, both runners parse + smoke-run clean, unit suite green.
-
-Known follow-ups (not blockers): (1) archangel still owes the pacman.conf-0600 root-cause fix (handoff in its inbox; archsetup's chmod is the defensive layer). (2) The bare-metal runner runs =bash archsetup= with no --config-file — pre-existing, would prompt on real hardware; out of this epic's scope. (3) A true green bare-metal run needs real ZFS hardware (ratio).
-
-** DONE [#B] Implement Testinfra test suite for archsetup
-CLOSED: [2026-06-25 Thu]
-:PROPERTIES:
-:LAST_REVIEWED: 2026-06-24
-:END:
-*** 2026-06-25 Thu @ Final fresh make test GREEN — Testinfra is the validator
-=make test= (fresh build, 150-min cap) PASSED: =TEST PASSED=, =Validation: PASSED=, pytest =96 passed, 10 skipped, 0 failed, 0 errors=, pytest as the authoritative gate. ParallelDownloads now =10= on the fixed build. End-state: the VM test runner validates post-install via the Testinfra/pytest sweep (=scripts/testing/tests/=, 88 tests + conftest fixtures) — full parity with the old shell sweep plus expansion coverage (sshd hardening, =backup_system_file= .bak files, applied pacman/makepkg/NM/fail2ban/reflector config). Three real bugs surfaced + fixed by this work: (1) the 2026-06-24 sshd hardening had silently broken =make test= (root password SSH died mid-run → key auth, f50fc1d); (2) =ParallelDownloads= stuck at Arch's default 5 (sed only matched the commented form → fixed, 2d63802); (3) install monitor cap too tight at 90 min (→ 150, fe84b71). Follow-up filed: migrate =run-test-baremetal.sh= off the shell sweep, then delete the dead =validation.sh= functions (P5).
-*** 2026-06-25 Thu @ Decision: port to Testinfra + expand coverage, design doc first
-Reviewed against the existing harness: =scripts/testing/lib/validation.sh= already runs ~14 post-install checks (=run_all_validations=), so this isn't net-new capability — it's porting that shell validation to Testinfra/pytest for better expressiveness + reporting, then growing coverage. Craig's call (prioritizes test investment over feature speed): do the port and expand. Starting with a design doc in =docs/design/= per the task's own "design doc not yet written" note. Stale slice to drop/rescope: the X11/startx end-to-end tests (fleet is Wayland/Hyprland now).
-*** 2026-06-25 Thu @ 00:54:22 -0400 P1 scaffold landed (advisory, alongside shell sweep)
-Built the Testinfra harness skeleton: =scripts/testing/tests/= (conftest.py with the attribution marker + report hook + =target_user= fixture; 3 parity checks — user exists/shell, ufw enabled, dotfiles stowed+readable), =scripts/testing/lib/testinfra.sh= (=run_testinfra_validation=: ephemeral-key injection, ssh-config, pytest-over-SSH; advisory + non-fatal, =RUN_TESTINFRA= toggle), wired into run-test.sh after the shell sweep, and added =python-pytest python-pytest-testinfra= to =make deps=. Verified on host: py_compile clean, =pytest --collect-only= green in a throwaway venv (4 tests, fixtures resolve), =bash -n= + shellcheck clean, unit suite still green. Integration (the pytest sweep actually running against a VM) is unverified here — needs a =make test= run. Decisions locked: inject test key; run both through parity; full expansion (P4) in this task after the P3 cutover.
-*** 2026-06-25 Thu @ 01:12:09 -0400 P2 full parity port (88 tests)
-Ported the whole shell sweep to pytest: test_users (exists/shell/15 groups parametrized), test_packages (yay+functional, pacman, terminus-font, emacs+config readable, git, 5 dev tools), test_services (required enabled/active, enabled-only, timers, optional skip-if-absent, DoT drop-in, fail2ban/nmcli responds, log-cleanup cron, syncthing lingering, DNS/mDNS/docker skips), test_desktop (Hyprland tools+configs+portal+socket gated on install/compositor, DWM suckless, autologin), test_boot (grub, mkinitcpio hooks branched on zfs_root, console-font-in-initramfs, nvme gated, zfs/sanoid), test_keyring (dir 700/owner/default=login), test_archsetup (log no Error:, ≥12 state markers). conftest fixtures: target_user/home/zfs_root/has_nvme/hyprland_installed/dwm_installed/compositor_running/on_slirp. 88 tests collected, py_compile clean. Correctness fix vs the shell sweep: check =awww= not the stale =swww=. Installed python-pytest-testinfra on velox so the harness gate passes. Next: VM run to diff pytest vs shell sweep for parity.
-*** 2026-06-25 Thu @ 01:24:11 -0400 Fixed: sshd hardening had silently broken =make test=
-VM run #1 aborted ~6 min in (Error 5), before any validation ran. Root cause (pre-existing, not the Testinfra work): the 2026-06-24 sshd hardening sets =PermitRootLogin prohibit-password= + reloads sshd mid-install, and the harness SSHes as root by *password* throughout — so every op after that step got "Permission denied" and run-test.sh fataled before validations. Fix: =inject_root_key= authorizes a throwaway root key right after first SSH (before archsetup runs) and all helpers (=wait_for_ssh=/=vm_exec=/=copy_to_vm=/=copy_from_vm=/=ssh_cmd=) gained =$SSH_KEY_OPT= so they use key auth, which =prohibit-password= still allows. testinfra.sh reuses that key. Additive (password stays as fallback). bash -n + shellcheck clean. Re-running the VM suite to confirm it now reaches the validation + pytest phases.
-*** 2026-06-25 Thu @ 03:33:33 -0400 Parity proven + P4 expansion validated on a live VM
-VM run #3 (=make test-keep=, kept VM up): pytest parity = 78 passed / 10 skipped / 0 fail / 0 err — matches & exceeds the shell sweep (53/0/0). Then built P4 expansion against the live VM (iterating in ~30s, no rebuild): test_hardening (sshd prohibit-password, sysctl printk, /etc/issue emptied, vconsole font, /efi fmask), test_config_applied (pacman ParallelDownloads/Color/multilib, makepkg MAKEFLAGS/OPTIONS, NM dns+wifi-privacy drop-ins, fail2ban jail, reflector), test_backups (=.archsetup.bak= present for pacman.conf/makepkg.conf/sudoers/mkinitcpio.conf — end-to-end proof of the backup feature). Full suite vs live VM: 95 passed / 10 skipped / 1 fail. The 1 fail = a REAL archsetup bug the tests caught: =ParallelDownloads= stayed at the Arch default 5 because the sed only matched a commented =#ParallelDownloads=, but current Arch ships it uncommented — fixed the sed to match both (=^#\?ParallelDownloads=). Also fixed a test bug (=grep -qx '[multilib]'= → =grep -Fxq=, the brackets were a regex char class). Remaining: P3 cutover (pytest authoritative) + P5 retire shell sweep, then a final fresh =make test=.
-*** 2026-06-25 Thu @ 03:38:28 -0400 P3 cutover: Testinfra is now the authoritative validator
-run-test.sh dropped the =run_all_validations= + =validate_all_services= shell-sweep calls; =run_testinfra_validation= now drives =TEST_PASSED= (returns pytest's rc; "couldn't run" = fail, not a silent pass). It surfaces pytest's pass/skip/fail counts through the shared =VALIDATION_*= counters and parses =testinfra-attribution.txt= into the issue arrays so =generate_issue_report= still buckets failures archsetup/base/unknown. Validated the failure path against the still-up VM: pytest rc=1, failure correctly bucketed to [archsetup]. P5 (physically delete the dead shell-sweep functions) is NOT done here — =run-test-baremetal.sh= still calls =run_all_validations=/=validate_all_services=, so deletion must wait until the bare-metal runner is migrated too (filed below). Final step: fresh =make test= to confirm the pass path (ParallelDownloads now 10) with pytest as the gate.
-*** 2026-06-25 Thu @ 08:35:26 -0400 Final run hit the harness 90-min install cap (not a regression)
-The fresh =make test= timed out at 9/12 steps while building =vagrant= from AUR (=ARCHSETUP timed out after 90 minutes=, exit 124), so validation ran against a half-installed system → 10 pytest failures, all late-step (issue/sysctl/vconsole/mkinitcpio/docker/state-markers). The suite worked correctly — it caught an incomplete install. Verified my ParallelDownloads sed is clean (no pacman corruption) and archsetup logged 0 errors. Root cause: =MAX_POLLS=180= (90 min) is too tight for a full install with heavy AUR builds; bumped to 300 (150 min). Re-running.
-Create comprehensive integration tests using Testinfra (Python + pytest) to validate archsetup installations
-
-Tests should cover:
-- Smoke tests: user created, key packages installed, dotfiles present
-- Integration tests: services running, configs valid, X11 starts, apps launch
-- End-to-end tests: login as user, startx, open terminal, run emacs, verify workflows
-
-Framework: Testinfra with pytest (SSH-native, built-in modules for files/packages/services/commands)
-Location: scripts/testing/tests/ directory
-Integration: Run via pytest against test VMs after archsetup completes
-Benefits: Expressive Python tests, excellent reporting, can test interactive scenarios
-
-A design doc (not yet written) should cover:
-- Complete example test suite (test_integration.py)
-- Tiered testing strategy (smoke/integration/end-to-end)
-- How to run tests and integrate with run-test.sh
-- Comparison with alternatives (Goss)
-
 ** TODO [#B] Set up automated test schedule
 :PROPERTIES:
 :LAST_REVIEWED: 2026-05-21
@@ -788,6 +720,33 @@ Parse yay errors and provide specific, actionable fixes instead of generic error
 Enhance existing indicators to show what's happening in real-time
 
 ** TODO Manual testing and validation
+*** Live-update guard aborts a GPU/compositor upgrade while Hyprland runs
+What we're verifying: the pacman PreTransaction hook =hypr-live-update-guard= aborts a =-Syu= that swaps GPU/compositor libs while Hyprland is live, and stays quiet once the session is stopped. Unit tests cover the script's decision logic; this confirms pacman parses the hook, feeds the matched targets on stdin (=NeedsTargets=), and =AbortOnFail= actually stops the transaction. Run on a Hyprland box (ratio/velox).
+- Prereq on machines installed before this shipped: place the guard if missing (a fresh archsetup install does this in the hyprland step).
+#+begin_src sh :results output
+if [ ! -e /usr/local/bin/hypr-live-update-guard ]; then
+  sudo cp ~/code/archsetup/scripts/hypr-live-update-guard /usr/local/bin/ && sudo chmod 755 /usr/local/bin/hypr-live-update-guard
+fi
+sudo cp ~/code/archsetup/scripts/hypr-live-update-guard /usr/local/bin/  # refresh
+ls -l /usr/local/bin/hypr-live-update-guard /etc/pacman.d/hooks/hypr-live-update-guard.hook 2>&1
+#+end_src
+- Quick contract check (no pending upgrade needed): feed the script the hook's stdin contract with Hyprland running.
+#+begin_src sh :results output
+printf 'mesa\nhyprland\n' | /usr/local/bin/hypr-live-update-guard; echo "exit=$?"
+#+end_src
+Expected: exit=1, plus the BLOCKED banner naming mesa/hyprland and the from-a-TTY remedy.
+- Real firing inside pacman: with a mesa/hyprland/wayland/GPU-driver upgrade actually pending AND Hyprland running, run the upgrade.
+#+begin_src sh :results output
+sudo pacman -Syu
+#+end_src
+Expected: pacman runs the "Checking for a live Hyprland session..." hook and aborts; no packages upgraded.
+- The from-a-TTY path: the guard keys off the Hyprland *process*, so switching VTs while Hyprland still runs does NOT clear it (correct -- the session is still vulnerable). Fully log out of Hyprland (or =hyprctl dispatch exit=) so no Hyprland process remains, then from the console/display-manager run the upgrade again.
+Expected: the guard stays quiet and the upgrade completes.
+- Override while running (escape hatch):
+#+begin_src sh :results output
+sudo touch /run/archsetup-allow-live-gpu-update && echo "sentinel set"
+#+end_src
+Expected: with the sentinel present, =sudo pacman -Syu= proceeds despite Hyprland running. (The sentinel clears on reboot -- /run is tmpfs.)
 *** Wallpaper survives relogin (waypaper --restore)
 What we're verifying: the hyprland =exec-once= now runs =waypaper --restore= instead of a hardcoded =awww img=, so a wallpaper chosen via =set-wallpaper= / waypaper / dirvish persists across a relogin. The exec-once only fires at Hyprland startup, so this can't be confirmed without a real relogin. (Mechanism already verified: =waypaper --restore= applied the persisted wallpaper via the awww backend, exit 0.)
 - Set a wallpaper different from the current one (or pick one in waypaper, Super+Shift+P):
@@ -851,39 +810,9 @@ A 2026-06-22 roam capture expands the scope past a passive indicator: the wifi m
 :END:
 From the roam inbox (2026-06-22): with Emacs integrated into the system as file manager and instant note-taker, make bouncing it trivial. A waybar component showing the emacs service status, with detail on hover, that turns the server on / off / bounce via right-click. Pairs with running the Emacs daemon as a managed systemd user service.
 
-** TODO [#C] Collapse waybar sysmonitor to a single icon + hover :feature:waybar:
-:PROPERTIES:
-:LAST_REVIEWED: 2026-06-24
-:END:
-From the roam inbox (2026-06-22): replace the spread-out sysmonitor readouts (temp, cpu, mem, storage) with one visible icon showing a single chosen metric, the rest in the hover tooltip. Open question: fold it into the battery component instead of a standalone module. Implementation lives in the waybar config under ~/.dotfiles.
-
-** DONE [#C] Proton Mail Bridge font size :chore:quick:
-CLOSED: [2026-06-24 Wed]
-:PROPERTIES:
-:LAST_REVIEWED: 2026-06-24
-:END:
-From the roam inbox (2026-06-22): adjust the Proton Mail Bridge UI font to a comfortable size. The bridge is a Qt app, so it likely keys off Qt scaling or the qt5ct/qt6ct config like the other Qt apps (QT_SCALE_FACTOR or a font setting).
-
-Done 2026-06-24 (dotfiles =hyprland.conf:47=): the bridge is a Qt6 *QML* app, so it ignores the qt6ct General font — bumped the UI font via =QT_FONT_DPI= on the autostart instead. Changed the exec-once to =env QT_FONT_DPI=108 protonmail-bridge --no-window= (default DPI is 96; 108 = 1.125x). Iterated live with Craig: 120 too big, 108 comfortable. hyprland.conf is a stow symlink so the change is already live; applies at every login. The =~/.config/autostart/Proton Mail Bridge.desktop= entry is dormant under Hyprland (no XDG-autostart), so it was left as-is.
-
-** TODO [#C] Rename idle inhibitor to something more intuitive :chore:waybar:
-:PROPERTIES:
-:LAST_REVIEWED: 2026-06-24
-:END:
-From the roam inbox (2026-06-24): the "idle inhibitor" name doesn't work as a mnemonic — something like "sleep" (i.e. "keep awake" / "no-sleep") would land better. Decide the new name, then rename across the touchpoints: the =custom/idle= waybar module, the keybind mnemonic, and the backing script names (=hypridle-toggle= / =waybar-idle= from the 2026-06-24 idle-inhibitor work). Needs Craig's call on the name first, so not solo.
-
 ** TODO [#C] set-wallpaper detaches waypaper config from its stow symlink :bug:hyprland:quick:
 =set-wallpaper= persists with =mv "$tmp" "$CONFIG"=, which replaces the =~/.config/waypaper/config.ini= stow symlink with a real file. After the first run the live config is detached from =~/.dotfiles/hyprland/.config/waypaper/config.ini=, so a later =git pull= + restow won't update it and set-wallpaper changes never flow back to the repo. Fix: write in place rather than =mv= over the symlink — e.g. =cp "$tmp" "$CONFIG"= (follows the symlink to the real dotfiles file), or resolve the link target and write there. Lives in =~/.dotfiles/hyprland/.local/bin/set-wallpaper=; it has a test suite, so add a Boundary case for "CONFIG is a symlink".
 
-** DONE [#C] Wallpaper login-restore is hardcoded, not waypaper --restore :hyprland:quick:solo:
-CLOSED: [2026-06-24 Wed]
-:PROPERTIES:
-:LAST_REVIEWED: 2026-06-24
-:END:
-The Hyprland =exec-once= (=hyprland.conf:26=) restores the wallpaper with a hardcoded =awww img ~/pictures/wallpaper/trondheim-norway.jpg=, so any wallpaper set later (via =set-wallpaper=, waypaper, or the dirvish =bg=) reverts on relogin. =set-wallpaper= now persists the choice to =waypaper/config.ini=, so switch the exec-once to =waypaper --restore= (after =awww-daemon= is up) to make set wallpapers survive a relogin. Small, dotfiles-only; verify by setting a different wallpaper, relogging, and confirming it sticks.
-
-Done 2026-06-24 (dotfiles): swapped the line-26 exec-once from the hardcoded =awww img …/trondheim-norway.jpg= to =awww-daemon & sleep 1 && waypaper --restore=. waypaper has a real =awww= backend (in its =--backend= list), the stowed =waypaper/config.ini= carries =backend = awww= plus a default =wallpaper == line, so =--restore= works on a fresh install too. Mechanism verified live: =waypaper --restore= reapplied the persisted wallpaper via awww, exit 0. Relogin confirmation filed under "Manual testing and validation". Follow-up filed: =set-wallpaper='s =mv= detached the live =waypaper/config.ini= from its stow symlink, so set-wallpaper changes no longer flow back to dotfiles.
-
 * Archsetup Resolved
 
 ** DONE [#B] Full install logs should contain timestamps
@@ -1465,3 +1394,142 @@ Findings (2026-06-24): the Wayland wallpaper utility on this setup is =awww= (wa
 Done 2026-06-24 (dotfiles 8be2484): added =set-wallpaper <image>= to the hyprland tier — sets live via =awww img= and persists the choice into =waypaper/config.ini=, the single Wayland-correct entry point. Resolves relative paths, validates the file, exits non-zero without persisting if awww fails. 8 Normal/Boundary/Error tests green; live-verified (awww set it, config rewrote). Notified =.emacs.d= to point the dirvish =bg= command at =set-wallpaper <file>= — that wiring is its piece (dependency cleared, =:blocker:= dropped).
 
 Follow-up (separate, small): the login restore =exec-once= in =hyprland.conf= is hardcoded to =trondheim-norway.jpg=, so a wallpaper set via =set-wallpaper= shows live but won't survive a relogin until the exec-once becomes =waypaper --restore= (which reads the now-persisted config). Filed below.
+** DONE [#B] Add backup before system file modifications :solo:
+CLOSED: [2026-06-25 Thu]
+:PROPERTIES:
+:LAST_REVIEWED: 2026-06-24
+:END:
+Safety net for /etc/X11/xorg.conf.d and other system file edits
+Files like ~/etc/sudoers~, ~/etc/pacman.conf~, ~/etc/default/grub~ modified without backup
+If modifications fail or are incorrect, difficult to recover - should backup files to ~.backup~ before modifying
+
+Done 2026-06-25: added a =backup_system_file <path>= helper next to =safe_rm_rf= — it snapshots a pre-existing file to =<path>.archsetup.bak= before an in-place edit, idempotent (never clobbers an existing backup, so the pristine original survives repeated edits and re-runs), =cp -p= to preserve mode/ownership, no-op when the file is absent. Took the narrow scope (Craig's call): route only the in-place =sed -i= / append edits to *pre-existing* files through it — locale.gen, makepkg.conf, pacman.conf, sudoers, conf.d/wireless-regdom, geoclue.conf, conf.d/pacman-contrib, fstab, mkinitcpio.conf, vconsole.conf — and skip the brand-new drop-in files archsetup fully owns (nothing to back up; recovery is just deleting them). Tests: =tests/backup-system-file/= (7 Normal/Boundary/Error, incl. mode-preserved, existing-backup-not-overwritten, missing-target no-op, cp-failure). =make test-unit= green across all 5 suites; =bash -n= clean; only shellcheck note is the known SC2329 false positive (indirect STEPS dispatch). Integration verification is the next VM run.
+** DONE [#B] Migrate bare-metal test runner to Testinfra, then delete the shell sweep :test:
+CLOSED: [2026-06-25 Thu]
+Plan + ZFS-coverage expansion: [[file:docs/design/2026-06-25-zfs-vm-test-coverage.org]] (build a ZFS base VM via archangel + a =FS_PROFILE= selector so =make test= covers the ZFS path, then migrate this runner to key auth + Testinfra against it, then delete the dead =validation.sh= functions = phase E here).
+=run-test.sh= (VM) now uses the Testinfra/pytest sweep as its authoritative validator, but =run-test-baremetal.sh= (lines ~243-244) still calls the old =run_all_validations= / =validate_all_services= from =scripts/testing/lib/validation.sh=. Migrate the bare-metal runner to =run_testinfra_validation= too (same key + ssh-config approach, adapted for a real host), then delete the now-dead shell-sweep functions from =validation.sh=. Keep the live helpers: =ssh_cmd=, =attribute_issue=, =capture_pre/post_install_state=, =analyze_log_diff=, =categorize_errors=, =generate_issue_report=, and the =VALIDATION_*= counters/arrays. Deferred from the Testinfra cutover because it needs a bare-metal test loop to validate, out of scope for the VM-only autonomous run.
+*** 2026-06-25 Thu @ 12:37:02 -0400 P-A/P-B shipped (FS_PROFILE selector); P-C blocked on archangel ZFS-install bug
+P-A + P-B landed in =353b179=: =archsetup-test-zfs.conf= (archangel ZFS config) + an =FS_PROFILE= (btrfs default / zfs) selector across =vm-utils.sh= (=init_vm_paths= derives a per-profile image + validates the profile), =create-base-vm.sh= (selects the archangel config), =run-test.sh= (--help + profile display), and the Makefile (=make test FS_PROFILE=zfs=). Design simplification recorded: no =archsetup-vm-zfs.conf= needed — archsetup auto-detects ZFS from the live root via =is_zfs_root()=, so the archsetup run config is shared; only the archangel base config + base image differ. Open Q1 resolved: archangel supports ZFS root natively (it's the default FS).
+
+P-C (build the ZFS base image) is BLOCKED on archangel. =create-base-vm.sh FS_PROFILE=zfs= built the disk + booted the archangel ISO fine, but the archangel install died: =dkms install zfs/2.3.3 -k 6.18.36-1-lts= exited 1, ZFS module not built. Root cause is in archangel, not archsetup: it appends the [archzfs] experimental repo then runs =pacstrap -K= with no =pacman -Sy= refresh, so it uses the archzfs sync db baked into the Feb-2026 ISO (zfs-dkms 2.3.3) while linux-lts is pulled fresh (6.18.36). 2.3.3 doesn't build against 6.18. velox runs zfs-dkms 2.4.2 on the same kernel from the same channel, so the fix exists upstream — archangel just needs to refresh the db before pacstrap (+ a fresh ISO). Bug + dependency handoff sent to archangel inbox (=2026-06-25-1236-from-archsetup-bug-zfs-install-fails-stale-baked.org=). Retry P-C once a fixed archangel ISO is available. P-D (bare-metal migration code) is still workable in the meantime against the btrfs VM / velox.
+
+*** 2026-06-25 Thu @ 16:05:07 -0400 archangel unblocked; ZFS base built; 3 archsetup bugs fixed (local); re-run paused
+archangel shipped the fix (archangel =89691a0=: =pacman -Syy= before pacstrap) + rebuilt the ISO. With it, =create-base-vm.sh FS_PROFILE=zfs= built a verified ZFS-root base (=archsetup-base-zfs.qcow2=, clean-install snapshot, kernel 6.18.36). =make test FS_PROFILE=zfs= then surfaced three real archsetup bugs against the current archangel base, each fixed in a LOCAL (unpushed) commit:
+- =8ed42b9= informant: the base ships informant; its pacman PreTransaction hook (AbortOnFail) blocked archsetup's first transaction. Fix: =informant read --all= up front (guarded). PROVEN.
+- =66caeb5= pacman.conf perms: the base ships =/etc/pacman.conf= 0600 (archangel =strip_repo_stanza= mktemp+mv clobbers perms), breaking user =makepkg=/=yay=. Fix: =chmod 644= after archsetup's edits. PROVEN (run reached 75 min deep).
+- =05ec096= reflector: archsetup configured reflector's timer but never ran it, so installs used the base's 425-mirror worldwide list and pacman stalled ~15 min on a slow/unresponsive mirror. Fix: run reflector once before the heavy installs (=timeout=-bounded, non-fatal). NOT yet integration-proven — the next re-run validates it.
+Second archangel handoff sent for the pacman.conf-0600 root cause (=2026-06-25-1440-...=); archsetup's chmod is defensive, archangel should ship 0644. Paused before the re-run at Craig's request (he starts =sudo make test FS_PROFILE=zfs= from the laptop). Possible harness-side factor on the stall: slirp IPv6 blackholing (one stalled conn was IPv6) — watch if it recurs despite reflector.
+
+*** 2026-06-25 Thu @ 21:56:12 -0400 P-C GREEN — ZFS VM test path passes end to end
+=make test FS_PROFILE=zfs= PASSED: archsetup exit 0 (full ~68-min ZFS install, reflector held — no stall), pytest =95 passed, 0 failed, 11 skipped=. The ZFS-conditional checks now run the ZFS branch instead of skipping: =test_bootloader_installed= (ZFSBootMenu EFI binary at /efi/EFI/ZBM), =test_mkinitcpio_hooks= (zfs udev hook), =test_console_font_configured= (vconsole.conf), =test_zfs_has_sanoid= all PASS; =test_backup_created_for_mkinitcpio= correctly SKIPs (ZFS+virtio edits nothing). The 3 archsetup issues (gamemode, mu, signal-cli AUR) are the known non-critical residuals, same as on btrfs. Four commits pushed to main: =8ed42b9= informant news-hook, =66caeb5= pacman.conf 0644, =05ec096= reflector-during-install, =eb379c3= ZFS-aware boot/backup tests. P-C (ZFS coverage, design phases A-C) is DONE. Remaining on this task: P-D (migrate run-test-baremetal.sh to inject_root_key + run_testinfra_validation) and P-E (delete the dead validation.sh shell sweep).
+*** 2026-06-25 Thu @ 23:26:02 -0400 P-D + P-E done — whole epic closed
+P-D (=771b92e=): migrated =run-test-baremetal.sh= to key auth + Testinfra. =inject_root_key= generalized to =root@$VM_IP= (vm-utils) so it serves both runners; the bare-metal runner now injects the key after the genesis rollback, threads =SSH_KEY_OPT= + a new =--port= through every ssh/scp, and validates via =run_testinfra_validation= instead of the shell sweep. Follow-up fix =fb495d4=: =set +e= around the validator (it returns pytest's rc, which under =set -e= aborted before the report) — caught by the smoke test. Validated against the ZFS VM (=--validate-only=, localhost:2222): connectivity, ZFS check, key auth, Testinfra connect+run, report all work; a green bare-metal install still needs real ZFS hardware.
+
+P-E (=a4a339b=): deleted the dead shell sweep from =validation.sh= now both runners use Testinfra — run_all_validations, validate_all_services, run_full_validation, the ~35 validate_* checks, validation_pass/fail/warn/skip. Kept the live helpers (ssh_cmd, attribute_issue, capture_pre/post_install_state, analyze_log_diff, categorize_errors, generate_issue_report, VALIDATION_* counters + arrays). 1156 → 314 lines. Verified: no dangling refs, both runners parse + smoke-run clean, unit suite green.
+
+Known follow-ups (not blockers): (1) archangel still owes the pacman.conf-0600 root-cause fix (handoff in its inbox; archsetup's chmod is the defensive layer). (2) The bare-metal runner runs =bash archsetup= with no --config-file — pre-existing, would prompt on real hardware; out of this epic's scope. (3) A true green bare-metal run needs real ZFS hardware (ratio).
+** DONE [#B] Implement Testinfra test suite for archsetup
+CLOSED: [2026-06-25 Thu]
+:PROPERTIES:
+:LAST_REVIEWED: 2026-06-24
+:END:
+*** 2026-06-25 Thu @ Final fresh make test GREEN — Testinfra is the validator
+=make test= (fresh build, 150-min cap) PASSED: =TEST PASSED=, =Validation: PASSED=, pytest =96 passed, 10 skipped, 0 failed, 0 errors=, pytest as the authoritative gate. ParallelDownloads now =10= on the fixed build. End-state: the VM test runner validates post-install via the Testinfra/pytest sweep (=scripts/testing/tests/=, 88 tests + conftest fixtures) — full parity with the old shell sweep plus expansion coverage (sshd hardening, =backup_system_file= .bak files, applied pacman/makepkg/NM/fail2ban/reflector config). Three real bugs surfaced + fixed by this work: (1) the 2026-06-24 sshd hardening had silently broken =make test= (root password SSH died mid-run → key auth, f50fc1d); (2) =ParallelDownloads= stuck at Arch's default 5 (sed only matched the commented form → fixed, 2d63802); (3) install monitor cap too tight at 90 min (→ 150, fe84b71). Follow-up filed: migrate =run-test-baremetal.sh= off the shell sweep, then delete the dead =validation.sh= functions (P5).
+*** 2026-06-25 Thu @ Decision: port to Testinfra + expand coverage, design doc first
+Reviewed against the existing harness: =scripts/testing/lib/validation.sh= already runs ~14 post-install checks (=run_all_validations=), so this isn't net-new capability — it's porting that shell validation to Testinfra/pytest for better expressiveness + reporting, then growing coverage. Craig's call (prioritizes test investment over feature speed): do the port and expand. Starting with a design doc in =docs/design/= per the task's own "design doc not yet written" note. Stale slice to drop/rescope: the X11/startx end-to-end tests (fleet is Wayland/Hyprland now).
+*** 2026-06-25 Thu @ 00:54:22 -0400 P1 scaffold landed (advisory, alongside shell sweep)
+Built the Testinfra harness skeleton: =scripts/testing/tests/= (conftest.py with the attribution marker + report hook + =target_user= fixture; 3 parity checks — user exists/shell, ufw enabled, dotfiles stowed+readable), =scripts/testing/lib/testinfra.sh= (=run_testinfra_validation=: ephemeral-key injection, ssh-config, pytest-over-SSH; advisory + non-fatal, =RUN_TESTINFRA= toggle), wired into run-test.sh after the shell sweep, and added =python-pytest python-pytest-testinfra= to =make deps=. Verified on host: py_compile clean, =pytest --collect-only= green in a throwaway venv (4 tests, fixtures resolve), =bash -n= + shellcheck clean, unit suite still green. Integration (the pytest sweep actually running against a VM) is unverified here — needs a =make test= run. Decisions locked: inject test key; run both through parity; full expansion (P4) in this task after the P3 cutover.
+*** 2026-06-25 Thu @ 01:12:09 -0400 P2 full parity port (88 tests)
+Ported the whole shell sweep to pytest: test_users (exists/shell/15 groups parametrized), test_packages (yay+functional, pacman, terminus-font, emacs+config readable, git, 5 dev tools), test_services (required enabled/active, enabled-only, timers, optional skip-if-absent, DoT drop-in, fail2ban/nmcli responds, log-cleanup cron, syncthing lingering, DNS/mDNS/docker skips), test_desktop (Hyprland tools+configs+portal+socket gated on install/compositor, DWM suckless, autologin), test_boot (grub, mkinitcpio hooks branched on zfs_root, console-font-in-initramfs, nvme gated, zfs/sanoid), test_keyring (dir 700/owner/default=login), test_archsetup (log no Error:, ≥12 state markers). conftest fixtures: target_user/home/zfs_root/has_nvme/hyprland_installed/dwm_installed/compositor_running/on_slirp. 88 tests collected, py_compile clean. Correctness fix vs the shell sweep: check =awww= not the stale =swww=. Installed python-pytest-testinfra on velox so the harness gate passes. Next: VM run to diff pytest vs shell sweep for parity.
+*** 2026-06-25 Thu @ 01:24:11 -0400 Fixed: sshd hardening had silently broken =make test=
+VM run #1 aborted ~6 min in (Error 5), before any validation ran. Root cause (pre-existing, not the Testinfra work): the 2026-06-24 sshd hardening sets =PermitRootLogin prohibit-password= + reloads sshd mid-install, and the harness SSHes as root by *password* throughout — so every op after that step got "Permission denied" and run-test.sh fataled before validations. Fix: =inject_root_key= authorizes a throwaway root key right after first SSH (before archsetup runs) and all helpers (=wait_for_ssh=/=vm_exec=/=copy_to_vm=/=copy_from_vm=/=ssh_cmd=) gained =$SSH_KEY_OPT= so they use key auth, which =prohibit-password= still allows. testinfra.sh reuses that key. Additive (password stays as fallback). bash -n + shellcheck clean. Re-running the VM suite to confirm it now reaches the validation + pytest phases.
+*** 2026-06-25 Thu @ 03:33:33 -0400 Parity proven + P4 expansion validated on a live VM
+VM run #3 (=make test-keep=, kept VM up): pytest parity = 78 passed / 10 skipped / 0 fail / 0 err — matches & exceeds the shell sweep (53/0/0). Then built P4 expansion against the live VM (iterating in ~30s, no rebuild): test_hardening (sshd prohibit-password, sysctl printk, /etc/issue emptied, vconsole font, /efi fmask), test_config_applied (pacman ParallelDownloads/Color/multilib, makepkg MAKEFLAGS/OPTIONS, NM dns+wifi-privacy drop-ins, fail2ban jail, reflector), test_backups (=.archsetup.bak= present for pacman.conf/makepkg.conf/sudoers/mkinitcpio.conf — end-to-end proof of the backup feature). Full suite vs live VM: 95 passed / 10 skipped / 1 fail. The 1 fail = a REAL archsetup bug the tests caught: =ParallelDownloads= stayed at the Arch default 5 because the sed only matched a commented =#ParallelDownloads=, but current Arch ships it uncommented — fixed the sed to match both (=^#\?ParallelDownloads=). Also fixed a test bug (=grep -qx '[multilib]'= → =grep -Fxq=, the brackets were a regex char class). Remaining: P3 cutover (pytest authoritative) + P5 retire shell sweep, then a final fresh =make test=.
+*** 2026-06-25 Thu @ 03:38:28 -0400 P3 cutover: Testinfra is now the authoritative validator
+run-test.sh dropped the =run_all_validations= + =validate_all_services= shell-sweep calls; =run_testinfra_validation= now drives =TEST_PASSED= (returns pytest's rc; "couldn't run" = fail, not a silent pass). It surfaces pytest's pass/skip/fail counts through the shared =VALIDATION_*= counters and parses =testinfra-attribution.txt= into the issue arrays so =generate_issue_report= still buckets failures archsetup/base/unknown. Validated the failure path against the still-up VM: pytest rc=1, failure correctly bucketed to [archsetup]. P5 (physically delete the dead shell-sweep functions) is NOT done here — =run-test-baremetal.sh= still calls =run_all_validations=/=validate_all_services=, so deletion must wait until the bare-metal runner is migrated too (filed below). Final step: fresh =make test= to confirm the pass path (ParallelDownloads now 10) with pytest as the gate.
+*** 2026-06-25 Thu @ 08:35:26 -0400 Final run hit the harness 90-min install cap (not a regression)
+The fresh =make test= timed out at 9/12 steps while building =vagrant= from AUR (=ARCHSETUP timed out after 90 minutes=, exit 124), so validation ran against a half-installed system → 10 pytest failures, all late-step (issue/sysctl/vconsole/mkinitcpio/docker/state-markers). The suite worked correctly — it caught an incomplete install. Verified my ParallelDownloads sed is clean (no pacman corruption) and archsetup logged 0 errors. Root cause: =MAX_POLLS=180= (90 min) is too tight for a full install with heavy AUR builds; bumped to 300 (150 min). Re-running.
+Create comprehensive integration tests using Testinfra (Python + pytest) to validate archsetup installations
+
+Tests should cover:
+- Smoke tests: user created, key packages installed, dotfiles present
+- Integration tests: services running, configs valid, X11 starts, apps launch
+- End-to-end tests: login as user, startx, open terminal, run emacs, verify workflows
+
+Framework: Testinfra with pytest (SSH-native, built-in modules for files/packages/services/commands)
+Location: scripts/testing/tests/ directory
+Integration: Run via pytest against test VMs after archsetup completes
+Benefits: Expressive Python tests, excellent reporting, can test interactive scenarios
+
+A design doc (not yet written) should cover:
+- Complete example test suite (test_integration.py)
+- Tiered testing strategy (smoke/integration/end-to-end)
+- How to run tests and integrate with run-test.sh
+- Comparison with alternatives (Goss)
+** DONE [#C] Proton Mail Bridge font size :chore:quick:
+CLOSED: [2026-06-24 Wed]
+:PROPERTIES:
+:LAST_REVIEWED: 2026-06-24
+:END:
+From the roam inbox (2026-06-22): adjust the Proton Mail Bridge UI font to a comfortable size. The bridge is a Qt app, so it likely keys off Qt scaling or the qt5ct/qt6ct config like the other Qt apps (QT_SCALE_FACTOR or a font setting).
+
+Done 2026-06-24 (dotfiles =hyprland.conf:47=): the bridge is a Qt6 *QML* app, so it ignores the qt6ct General font — bumped the UI font via =QT_FONT_DPI= on the autostart instead. Changed the exec-once to =env QT_FONT_DPI=108 protonmail-bridge --no-window= (default DPI is 96; 108 = 1.125x). Iterated live with Craig: 120 too big, 108 comfortable. hyprland.conf is a stow symlink so the change is already live; applies at every login. The =~/.config/autostart/Proton Mail Bridge.desktop= entry is dormant under Hyprland (no XDG-autostart), so it was left as-is.
+** DONE [#C] Wallpaper login-restore is hardcoded, not waypaper --restore :hyprland:quick:solo:
+CLOSED: [2026-06-24 Wed]
+:PROPERTIES:
+:LAST_REVIEWED: 2026-06-24
+:END:
+The Hyprland =exec-once= (=hyprland.conf:26=) restores the wallpaper with a hardcoded =awww img ~/pictures/wallpaper/trondheim-norway.jpg=, so any wallpaper set later (via =set-wallpaper=, waypaper, or the dirvish =bg=) reverts on relogin. =set-wallpaper= now persists the choice to =waypaper/config.ini=, so switch the exec-once to =waypaper --restore= (after =awww-daemon= is up) to make set wallpapers survive a relogin. Small, dotfiles-only; verify by setting a different wallpaper, relogging, and confirming it sticks.
+
+Done 2026-06-24 (dotfiles): swapped the line-26 exec-once from the hardcoded =awww img …/trondheim-norway.jpg= to =awww-daemon & sleep 1 && waypaper --restore=. waypaper has a real =awww= backend (in its =--backend= list), the stowed =waypaper/config.ini= carries =backend = awww= plus a default =wallpaper == line, so =--restore= works on a fresh install too. Mechanism verified live: =waypaper --restore= reapplied the persisted wallpaper via awww, exit 0. Relogin confirmation filed under "Manual testing and validation". Follow-up filed: =set-wallpaper='s =mv= detached the live =waypaper/config.ini= from its stow symlink, so set-wallpaper changes no longer flow back to dotfiles.
+** DONE [#B] VM test harness shared one NVRAM file across filesystem profiles :bug:test:
+CLOSED: [2026-06-27 Sat]
+The harness shared one OVMF NVRAM file (=vm-images/OVMF_VARS.fd=) across the btrfs
+and zfs profiles (=init_vm_paths= suffixed the disk image per profile but not the
+NVRAM). NVRAM lives outside the qcow2, so a disk-snapshot revert can't restore it,
+and a zfs run's ZFSBootMenu boot entries clobbered the btrfs GRUB entry. With no
+removable =\EFI\BOOT\BOOTX64.EFI= fallback on the base ESP, the next btrfs run
+booted into UEFI with no bootable device ("BdsDxe: No bootable option or device
+was found", then PXE/HTTP, then SSH timeout before archsetup ran). Found
+2026-06-27 trying to VM-validate the installer refactor.
+
+Fixed: =OVMF_VARS= now carries the same per-profile suffix as the disk image
+(=OVMF_VARS${img_suffix}.fd=) in =vm-utils.sh init_vm_paths=, so btrfs and zfs keep
+separate NVRAM. Validated by a full green zfs run 2026-06-27 (ArchSetup exit 0,
+Testinfra 96 passed / 0 failed). Remaining hardening tracked below.
+** DONE [#B] Guard against live mesa/hyprland/wayland-runtime updates :hyprland:
+CLOSED: [2026-06-28 Sun]
+:PROPERTIES:
+:LAST_REVIEWED: 2026-06-09
+:END:
+A live =pacman -Syu= that swaps mesa/hyprland/wayland runtime libs out from under a running Hyprland session can crash the compositor: the next GPU-lib call hits a now-"(deleted)" library and SIGABRTs, taking the Wayland clients down with it. Hit ratio 2026-06-07 (mesa 26.0.6 -> 26.1.2 + hyprland upgraded live; Hyprland SIGABRT took down awww/insync/emacs). Likely the driver behind ratio's high lifetime unsafe-shutdown ratio — a crashed compositor forces a hard reset.
+
+Shipped as a pacman PreTransaction hook rather than a wrapper, so it fires no matter how the upgrade is launched (pacman, yay, topgrade). =scripts/hypr-live-update-guard= aborts the transaction before any package is swapped when the GPU/compositor runtime set is being upgraded AND Hyprland is running, pointing the user to re-run from a TTY with the session stopped; it stays quiet when Hyprland isn't running (the safe from-a-TTY path). Override via =HYPR_ALLOW_LIVE_UPDATE=1= or by touching the sentinel file named in the abort message. archsetup installs the script to =/usr/local/bin= and the hook to =/etc/pacman.d/hooks/= in the hyprland path. Decision logic unit-tested (=tests/hypr-live-update-guard=, 9 cases). Live firing test filed under Manual testing and validation. Commits: archsetup (this session).
+** DONE [#B] Collapsible waybar sides :waybar:
+CLOSED: [2026-06-27 Sat]
+:PROPERTIES:
+:LAST_REVIEWED: 2026-06-09
+:END:
+Let either side of the waybar collapse horizontally to a minimal base set, toggled by a click. Each collapsible side carries a small triangle / arrowhead pointing toward the screen edge it collapses into (away from center). Clicking it collapses that side to its base set and flips the arrow to point back toward center; clicking again restores the full side. Same shape-changes-with-state idea as the auto-dim indicator.
+
+Spec (2026-06-19): [[file:assets/2026-06-19-collapsible-waybar-sides-spec.org]]. Spike that settled the mechanism: [[file:assets/2026-06-18-collapsible-waybar-sides-spike-findings.org]].
+
+Decisions locked: right base set = date + worldclock + tray; left base set = menu + workspaces; per-side independent; host-agnostic (base set constant, full set is each host's existing config). Mechanism = config-swap + SIGUSR2 reload via an active-config copy in =$XDG_RUNTIME_DIR= (the CSS/state-file approach was disproven — GTK3 can't reflow-hide native modules). Lives in =~/.dotfiles/hyprland/=.
+
+Shipped per spec (dotfiles 804bef6): 3 TDD'd scripts (=waybar-active-config=, =waybar-collapse=, =waybar-arrow=; 22 cases), arrow modules wired into the config (left arrow innermost-left, right arrow innermost-right), CSS ×3, =$mod+[= / =$mod+]= keybinds, and =waybar-toggle= relaunch updated to load the active config so a crash preserves collapse state. Verified live: click, keybind, and per-side independence all work; expand round-trips exactly to canonical.
+** DONE [#C] Collapse waybar sysmonitor to a single icon + hover :feature:waybar:
+CLOSED: [2026-06-27 Sat]
+:PROPERTIES:
+:LAST_REVIEWED: 2026-06-24
+:END:
+From the roam inbox (2026-06-22): replace the spread-out sysmonitor readouts (temp, cpu, mem, storage) with one visible icon showing a single chosen metric, the rest in the hover tooltip. Open question: fold it into the battery component instead of a standalone module. Implementation lives in the waybar config under ~/.dotfiles.
+
+Shipped as a standalone =custom/sysmon= module (Craig's call: host-dependent primary — battery on laptop, disk on desktop — rather than fold into battery, which is laptop-only). Backing script =waybar-sysmon= gathers cpu/temp/mem/disk/battery, shows the host-appropriate metric, rest in tooltip; 13-case TDD suite; removed the 5 native modules + their CSS across all 3 themes. Dotfiles be7469b.
+** DONE [#C] Rename idle inhibitor to something more intuitive :chore:waybar:
+CLOSED: [2026-06-27 Sat]
+:PROPERTIES:
+:LAST_REVIEWED: 2026-06-24
+:END:
+From the roam inbox (2026-06-24): the "idle inhibitor" name doesn't work as a mnemonic — something like "sleep" (i.e. "keep awake" / "no-sleep") would land better. Decide the new name, then rename across the touchpoints: the =custom/idle= waybar module, the keybind mnemonic, and the backing script names (=hypridle-toggle= / =waybar-idle= from the 2026-06-24 idle-inhibitor work). Needs Craig's call on the name first, so not solo.
+
+Renamed to "caffeine" (Craig's call, 2026-06-27): =custom/caffeine= module, =waybar-caffeine= + =caffeine-toggle= scripts, tooltip "Caffeine: ON/OFF", CSS + test suites updated. Keybind stays =$mod+I= (=$mod+C= is hyprpicker). Shipped in dotfiles 8b45b51.