diff options
61 files changed, 5045 insertions, 1352 deletions
@@ -125,6 +125,15 @@ Full palette reference: `assets/color-themes/dupre/dupre-palette.org` - Desktop file overrides go in `~/.dotfiles/hyprland/.local/share/applications/` - MPD is configured but mpv handles audio file associations - Firewall is ufw (configured in `archsetup`, default-deny incoming, explicit allow list). Tailscale traffic **does** traverse ufw on ratio — a probe from a tailnet IP is still blocked unless a rule covers the port. Don't assume tailnet-only services bypass the firewall; they need an explicit ufw rule like any other. -- This machine is **ratio**; **velox** is a laptop. Both run Hyprland (Wayland). archsetup still supports dwm/X11, but no current machine uses it. +- Never assume which machine this is — always run `uname -n` to find the hostname (the `hostname` binary is absent, so `uname -n` is the source of truth; `uname -r` is the kernel release, not the host). The fleet is **ratio** (workstation) and **velox** (laptop), both Hyprland (Wayland). archsetup still supports dwm/X11, but no current machine uses it. - Remote repository on cjennings.net - .ai/ is gitignored; living project context is in .ai/notes.org + +## Codified Insights + +- **VM tests run committed code, not your working tree.** `scripts/testing/run-test.sh` provisions the VM from `git bundle create <file> HEAD` (it simulates `git clone`), so an uncommitted edit to `archsetup` or the pytest suite silently runs the old code. Commit (even a throwaway WIP commit) before `make test FS_PROFILE=...`, or the change isn't exercised. (`gotcha` — 2026-06-25) +- **Iterate the pytest sweep against a kept VM, not a reinstall.** `make test-keep FS_PROFILE=...` leaves the VM up after the install and writes `testinfra_ssh_config` + `root_key` into `test-results/<timestamp>/`. Point pytest at that ssh-config to re-run only the Testinfra checks in ~30s instead of a ~70-minute full reinstall. Use it when iterating test assertions, not installer logic. (`pattern` — 2026-06-25) +- **VM UEFI NVRAM lives outside the qcow2 and must be per-profile.** OVMF boot entries live in the `OVMF_VARS` file, not the disk image, so reverting the `clean-install` snapshot does NOT restore them. The base ESPs have no removable `\EFI\BOOT\BOOTX64.EFI` fallback, so a base boots only via its NVRAM entry — lose or overwrite it and the VM dies in UEFI ("No bootable option") and SSH-times-out before archsetup runs. `init_vm_paths` now suffixes `OVMF_VARS` per `FS_PROFILE` (matching the disk image); never share one NVRAM file across btrfs/zfs. (`gotcha` — 2026-06-28) +- **sed/awk function extraction breaks on column-0 `}` inside heredocs.** The `tests/` harness and any `/^name() {/,/^}/` extraction stop at the first line beginning with `}` — but a JSON heredoc body (e.g. the docker `daemon.json` in `developer_workstation`) has a column-0 `}` that is NOT the function's close. Find the real closing brace before slicing, or the bounds are silently wrong. (`gotcha` — 2026-06-28) +- **AUR builds need ≥8 GiB VM RAM.** `makepkg` runs `-j$VM_CPUS`, and parallel `cc1plus` (~700 MB each on heavy C++ AUR packages) OOM-killed under the old 4 GiB `VM_RAM` default; the install still passed (yay retries) but the kills showed as attributed issues. Default is now 8192 MB. If you raise `VM_CPUS`, raise `VM_RAM` with it. (`threshold` — 2026-06-28) +- **Guard live upgrades with a PreTransaction hook, not a wrapper.** `hypr-live-update-guard` is a pacman `PreTransaction` hook (`AbortOnFail` + `NeedsTargets`) so it fires no matter how the upgrade launches (pacman, yay, topgrade) and aborts before any package is swapped — the safe point, since nothing is replaced yet. A shell wrapper around `pacman` would be bypassed by the other front-ends. (`pattern` — 2026-06-28) @@ -7,6 +7,17 @@ .PHONY: help deps test-unit test test-keep test-vm-base package-diff +# Filesystem profile for the VM harness: btrfs (default) or zfs. Selects the +# base image the scripts build/use; exported so create-base-vm.sh + run-test.sh +# pick the matching archangel config and image. e.g. make test FS_PROFILE=zfs +FS_PROFILE ?= btrfs +export FS_PROFILE +ifeq ($(FS_PROFILE),btrfs) +BASE_IMAGE := vm-images/archsetup-base.qcow2 +else +BASE_IMAGE := vm-images/archsetup-base-$(FS_PROFILE).qcow2 +endif + # Default target - show help help: @echo "archsetup - install and test" @@ -19,6 +30,9 @@ help: @echo " test-vm-base Create base VM only (runs archangel)" @echo " package-diff Compare archsetup's declared packages vs this system" @echo "" + @echo "Filesystem profile (test, test-keep, test-vm-base):" + @echo " FS_PROFILE=btrfs (default) or zfs, e.g. make test FS_PROFILE=zfs" + @echo "" @echo "Dotfile stow operations now live in the dotfiles repo:" @echo " cd ~/.dotfiles && make stow|restow|reset|unstow|import <de>" @echo "" @@ -27,7 +41,8 @@ help: deps: @echo "Installing VM testing dependencies..." sudo pacman -S --needed qemu-full virt-manager virt-viewer libguestfs \ - bridge-utils dnsmasq archiso sshpass socat + bridge-utils dnsmasq archiso sshpass socat \ + python-pytest python-pytest-testinfra @echo "" @echo "Done. For VM testing, also ensure libvirtd is running:" @echo " sudo systemctl enable --now libvirtd" @@ -50,18 +65,18 @@ test-vm-base: # Test - run full VM integration test suite (creates base VM if needed) test: - @if [ ! -f vm-images/archsetup-base.qcow2 ] || \ - ! qemu-img snapshot -l vm-images/archsetup-base.qcow2 2>/dev/null | grep -q "clean-install"; then \ - echo "Base VM not found or missing snapshot, creating..."; \ + @if [ ! -f $(BASE_IMAGE) ] || \ + ! qemu-img snapshot -l $(BASE_IMAGE) 2>/dev/null | grep -q "clean-install"; then \ + echo "Base VM not found or missing snapshot, creating ($(FS_PROFILE))..."; \ bash scripts/testing/create-base-vm.sh; \ fi @bash scripts/testing/run-test.sh # Test and keep VM running (for manual testing after archsetup) test-keep: - @if [ ! -f vm-images/archsetup-base.qcow2 ] || \ - ! qemu-img snapshot -l vm-images/archsetup-base.qcow2 2>/dev/null | grep -q "clean-install"; then \ - echo "Base VM not found or missing snapshot, creating..."; \ + @if [ ! -f $(BASE_IMAGE) ] || \ + ! qemu-img snapshot -l $(BASE_IMAGE) 2>/dev/null | grep -q "clean-install"; then \ + echo "Base VM not found or missing snapshot, creating ($(FS_PROFILE))..."; \ bash scripts/testing/create-base-vm.sh; \ fi @bash scripts/testing/run-test.sh --keep @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # ArchSetup - Craig Jennings <craigmartinjennings@gmail.com> # https://cjennings.net/archsetup # License: GNU GPLv3 @@ -35,6 +36,7 @@ show_status_only=false skip_gpu_drivers=false enable_autologin="" # empty=auto-detect, true=force enable, false=skip install_claude_code=true # false to skip the claude-code native (curl|sh) install +install_device_udev_rules=true # false to skip device-specific udev rules (the Logitech BRIO camera rule) while [ $# -gt 0 ]; do case "$1" in @@ -123,6 +125,8 @@ load_config() { [[ "$NO_GPU_DRIVERS" == "yes" ]] && skip_gpu_drivers=true [[ "$INSTALL_CLAUDE_CODE" == "yes" ]] && install_claude_code=true [[ "$INSTALL_CLAUDE_CODE" == "no" ]] && install_claude_code=false + [[ "$INSTALL_DEVICE_UDEV_RULES" == "yes" ]] && install_device_udev_rules=true + [[ "$INSTALL_DEVICE_UDEV_RULES" == "no" ]] && install_device_udev_rules=false # Repository overrides [[ -n "$DWM_REPO" ]] && dwm_repo="$DWM_REPO" @@ -188,6 +192,10 @@ validate_config() { echo "ERROR: INSTALL_CLAUDE_CODE must be 'yes' or 'no'. Got: '$INSTALL_CLAUDE_CODE'" >&2 exit 1 fi + if [[ -n "$INSTALL_DEVICE_UDEV_RULES" && "$INSTALL_DEVICE_UDEV_RULES" != "yes" && "$INSTALL_DEVICE_UDEV_RULES" != "no" ]]; then + echo "ERROR: INSTALL_DEVICE_UDEV_RULES must be 'yes' or 'no'. Got: '$INSTALL_DEVICE_UDEV_RULES'" >&2 + exit 1 + fi if [[ -n "$locale" && ! "$locale" =~ ^[a-z]{2,3}(_[A-Z]{2})?(\.[A-Za-z0-9-]+)?(@[A-Za-z]+)?$ ]]; then echo "ERROR: LOCALE looks malformed: '$locale'. Expected e.g. en_US.UTF-8" >&2 @@ -373,13 +381,39 @@ safe_rm_rf() { rm -rf "$target" } +# backup_system_file <path> +# Snapshot a pre-existing system file to <path>.archsetup.bak before archsetup +# edits it in place, so a botched in-place edit (fstab, mkinitcpio.conf, +# sudoers, ...) is recoverable. Idempotent: never overwrites an existing +# backup, so the pristine original survives repeated edits within a run and +# across re-runs of the installer. Returns 0 and does nothing when <path> +# does not exist (nothing to back up) or when a backup is already present. +# Uses `cp -p` so a restored sudoers/fstab keeps its mode and ownership. +# Prints its own warning to stderr and returns non-zero only when the copy +# itself fails. Self-contained — no dependency on error_warn. +backup_system_file() { + local target="$1" + local backup="${target}.archsetup.bak" + + if [ -z "$target" ]; then + echo "backup_system_file: empty target" >&2 + return 1 + fi + [ -f "$target" ] || return 0 # nothing to back up + [ -e "$backup" ] && return 0 # pristine backup already captured + if ! cp -p -- "$target" "$backup"; then + echo "backup_system_file: failed to back up '$target'" >&2 + return 1 + fi +} + # Handle --status flag (must be after state_dir is defined) -if $show_status_only; then +if [ "$show_status_only" = "true" ]; then show_status fi # Handle --fresh flag -if $fresh_install; then +if [ "$fresh_install" = "true" ]; then echo "Starting fresh installation (removing previous state)..." safe_rm_rf "$state_dir" "/var/lib/archsetup" fi @@ -561,6 +595,29 @@ display() { ### Installation Helpers +# Describe-run-warn primitive. Announces a task, runs the command with +# stdout+stderr appended to $logfile, and on failure logs a non-fatal +# warning carrying the command's real exit code. Replaces the recurring +# action="desc" && display "task" "$action" +# cmd >> "$logfile" 2>&1 || error_warn "$action" "$?" +# idiom with a single call: +# run_task "desc" cmd arg... +run_task() { + local desc="$1" + shift + display "task" "$desc" + "$@" >> "$logfile" 2>&1 || error_warn "$desc" "$?" +} + +# Enable one or more systemd units with the conventional wording. +# Each unit is announced and warned independently via run_task. +enable_service() { + local unit + for unit in "$@"; do + run_task "enabling $unit service" systemctl enable "$unit" + done +} + MAX_INSTALL_RETRIES=3 retry_install() { local pkg="$1" @@ -743,7 +800,7 @@ EOF # GPU Driver Installation install_gpu_drivers() { - if $skip_gpu_drivers; then + if [ "$skip_gpu_drivers" = "true" ]; then display "task" "Skipping GPU driver installation (--no-gpu-drivers)" return 0 fi @@ -770,7 +827,7 @@ install_gpu_drivers() { done # Fallback: check PCI bus modalias if DRM not available (early boot/chroot) - if ! $detected_intel && ! $detected_amd && ! $detected_nvidia; then + if [ "$detected_intel" != "true" ] && [ "$detected_amd" != "true" ] && [ "$detected_nvidia" != "true" ]; then for modalias_file in /sys/bus/pci/devices/*/modalias; do if [[ -r "$modalias_file" ]]; then modalias=$(cat "$modalias_file" 2>/dev/null) @@ -787,21 +844,21 @@ install_gpu_drivers() { fi # Install drivers based on detected hardware - if $detected_intel; then + if [ "$detected_intel" = "true" ]; then display "task" "Intel GPU detected (via modalias) - installing drivers" pacman_install mesa pacman_install intel-media-driver # hardware video acceleration pacman_install vulkan-intel # Vulkan support fi - if $detected_amd; then + if [ "$detected_amd" = "true" ]; then display "task" "AMD GPU detected (via modalias) - installing drivers" pacman_install mesa # includes VAAPI drivers (libva-mesa-driver was folded in) pacman_install xf86-video-amdgpu pacman_install vulkan-radeon fi - if $detected_nvidia; then + if [ "$detected_nvidia" = "true" ]; then display "task" "NVIDIA GPU detected (via modalias) - installing drivers" # nvidia-dkms left the repos; nvidia-open-dkms is the packaged driver # (Turing and newer — pre-Turing cards need an AUR legacy variant, @@ -813,7 +870,7 @@ install_gpu_drivers() { fi # Fallback for VMs or unknown hardware - if ! $detected_intel && ! $detected_amd && ! $detected_nvidia; then + if [ "$detected_intel" != "true" ] && [ "$detected_amd" != "true" ] && [ "$detected_nvidia" != "true" ]; then display "task" "No GPU detected via modalias - installing generic drivers" pacman_install mesa pacman_install xf86-video-vesa @@ -840,8 +897,26 @@ prerequisites() { display "title" "Prerequisites" + bootstrap_pacman_keyring + install_required_software + configure_build_environment + configure_package_mirrors +} + +bootstrap_pacman_keyring() { display "subtitle" "Bootstrapping" + # If the base ships informant (e.g. an archangel-installed system), it + # registers a pacman PreTransaction hook (AbortOnFail) that blocks every + # package transaction while Arch news is unread. Mark it all read up front + # so the keyring/refresh/install steps below don't abort. --all marks + # without printing or prompting; a bare `informant read` is interactive and + # would hang an unattended run. No-op when informant isn't installed. + if command -v informant >/dev/null 2>&1; then + action="marking Arch news read (informant)" && display "task" "$action" + informant read --all >> "$logfile" 2>&1 || true + fi + action="ensuring current Arch Linux keyring" && display "task" "$action" (pacman -Syy) >> "$logfile" 2>&1 || error_fatal "$action" "$?" (pacman -S --noconfirm archlinux-keyring) >> "$logfile" 2>&1 || \ @@ -867,6 +942,9 @@ prerequisites() { done $refresh_ok || error_fatal "$action" "$?" +} + +install_required_software() { display "subtitle" "Required Software" for software in linux-firmware wireless-regdb base-devel ca-certificates \ @@ -875,6 +953,9 @@ prerequisites() { pacman_install "$software" done +} + +configure_build_environment() { display "subtitle" "Environment Configuration" # configure locale (must happen before package installs that depend on locale) @@ -883,6 +964,7 @@ prerequisites() { action="configuring locale ($locale)" && display "task" "$action" # Uncomment the selected locale in locale.gen (format: "en_US.UTF-8 UTF-8") locale_entry="${locale} ${locale##*.}" # e.g., "en_US.UTF-8 UTF-8" + backup_system_file /etc/locale.gen sed -i "s|^#${locale_entry}|${locale_entry}|" /etc/locale.gen (locale-gen >> "$logfile" 2>&1) || error_warn "$action" "$?" echo "LANG=$locale" > /etc/locale.conf @@ -903,6 +985,7 @@ prerequisites() { systemctl enable chronyd.service >> "$logfile" 2>&1 || error_warn "$action" "$?" action="configuring compiler to use all processor cores" && display "task" "$action" + backup_system_file /etc/makepkg.conf sed -i "s/-j2/-j$(nproc)/;s/^#MAKEFLAGS/MAKEFLAGS/" /etc/makepkg.conf >> "$logfile" 2>&1 action="disabling debug packages in makepkg" && display "task" "$action" @@ -910,13 +993,26 @@ prerequisites() { # enable pacman concurrent downloads and color action="enabling concurrent downloads" && display "task" "$action" - sed -i "s/^#ParallelDownloads.*$/ParallelDownloads = 10/;s/^#Color$/Color/" /etc/pacman.conf + backup_system_file /etc/pacman.conf + # Match a commented OR already-uncommented ParallelDownloads: current Arch + # ships it uncommented at 5, so a "^#"-only match silently leaves it at 5. + sed -i "s/^#\?ParallelDownloads.*$/ParallelDownloads = 10/;s/^#Color$/Color/" /etc/pacman.conf # enable multilib repository (required for 32-bit libraries, Steam, etc.) action="enabling multilib repository" && display "task" "$action" sed -i '/^#\[multilib\]/{s/^#//;n;s/^#//}' /etc/pacman.conf + + # Keep pacman.conf world-readable. User-level makepkg/yay reads it to + # resolve dependencies, so a root-only file makes every AUR build fail with + # "/etc/pacman.conf could not be read: Permission denied". Some base images + # (a current archangel ISO) ship it 0600; 0644 is the Arch default. + chmod 644 /etc/pacman.conf + pacman -Sy >> "$logfile" 2>&1 +} + +configure_package_mirrors() { action="Package Mirrors" && display "subtitle" "$action" pacman_install reflector @@ -933,6 +1029,21 @@ prerequisites() { --save /etc/pacman.d/mirrorlist EOF + # Run reflector now, not only on the timer. The base image ships the full + # unsorted worldwide mirrorlist (hundreds of mirrors), so the heavy package + # installs below can stall for many minutes on a slow or unresponsive one. + # Curate to a few fast, recently-synced HTTPS mirrors up front, then refresh + # the databases against them. Bounded by `timeout` and non-fatal: reflector's + # own per-probe timeouts cap it, and the base list still works if it can't + # finish, so a flaky mirror-status service never blocks the install. + action="selecting fast mirrors (reflector)" && display "task" "$action" + backup_system_file /etc/pacman.d/mirrorlist + if timeout 120 reflector @/etc/xdg/reflector/reflector.conf >> "$logfile" 2>&1; then + pacman -Syy >> "$logfile" 2>&1 || error_warn "refreshing databases after reflector" "$?" + else + error_warn "selecting fast mirrors (reflector)" "$?" + fi + action="enabling the reflector timer" && display "task" "$action" (systemctl enable reflector.timer >> "$logfile" 2>&1) || \ error_warn "$action" "$?" @@ -973,7 +1084,8 @@ create_user() { mkdir -p "/home/$username/.cache/zsh/" # give $username sudo nopasswd rights (required for aur installs) - display "task" "granting permissions" + action="granting permissions" && display "task" "$action" + backup_system_file /etc/sudoers (echo "%$username ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers) \ || error_warn "$action" "$?" @@ -1003,6 +1115,16 @@ create_user() { user_customizations() { action="User Customizations" && display "title" "$action" + clone_user_repos + stow_dotfiles + prune_waybar_battery + refresh_desktop_caches + configure_dconf_defaults + finalize_dotfiles + create_user_directories +} + +clone_user_repos() { # Clone archsetup to user's home directory so dotfile symlinks are accessible. # This ensures symlinks point to a user-readable location regardless of how # archsetup was invoked (curl|bash, from /root, etc.) @@ -1035,6 +1157,9 @@ user_customizations() { # root runs stow/restore against the user-owned clone; mark it safe. git config --global --add safe.directory "$dotfiles_dir" >> "$logfile" 2>&1 || true +} + +stow_dotfiles() { # Stow the universal layer plus the per-environment layer. Headless installs # (none) get the standalone minimal/ tree instead of common/. case "$desktop_env" in @@ -1065,6 +1190,9 @@ user_customizations() { ;; esac +} + +prune_waybar_battery() { # Remove battery module from waybar config on desktops with no battery # (hyprland only — waybar isn't part of the dwm or minimal trees). if [[ "$desktop_env" == "hyprland" ]] && ! ls /sys/class/power_supply/BAT* &>/dev/null; then @@ -1077,12 +1205,14 @@ user_customizations() { sed -i '/"battery": {/,/^ },$/d' "$waybar_config" fi +} + +refresh_desktop_caches() { # install fontconfig before refreshing cache (provides fc-cache) pacman_install fontconfig # Refresh font cache for any fonts in dotfiles - action="refreshing font cache" && display "task" "$action" - fc-cache -f >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "refreshing font cache" fc-cache -f # install desktop-file-utils before updating database (provides update-desktop-database) pacman_install desktop-file-utils @@ -1092,6 +1222,9 @@ user_customizations() { (sudo -u "$username" update-desktop-database "/home/$username/.local/share/applications" \ >> "$logfile" 2>&1 ) || true +} + +configure_dconf_defaults() { # GTK and GNOME desktop interface settings — read by GTK apps and # xdg-desktop-portal-gtk. Written as a system-wide dconf db rather than # per-user dbus-launch dconf writes: the system path needs no session @@ -1120,6 +1253,9 @@ EOF dconf update ) >> "$logfile" 2>&1 || error_warn "$action" "$?" +} + +finalize_dotfiles() { action="marking archsetup dir as safe.directory" && display "task" "$action" git config --global --add safe.directory "$user_archsetup_dir" >> "$logfile" 2>&1 \ || error_warn "$action" "$?" @@ -1129,9 +1265,11 @@ EOF # (e.g. the /etc/skel .bashrc/.bash_profile a fresh user starts with). Runs # for every desktop_env, including none — minimal/ ships those skel-colliding # files too, so its --adopt needs the same restore. - action="restoring dotfile versions" && display "task" "$action" - git -C "$dotfiles_dir" restore . >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "restoring dotfile versions" git -C "$dotfiles_dir" restore . + +} +create_user_directories() { action="creating common directories" && display "task" "$action" # Create default directories and grant permissions { @@ -1181,22 +1319,36 @@ aur_installer() { ### Essential Services essential_services() { display "title" "Essential Services" + configure_randomness + configure_networking + configure_power + configure_ssh_server + configure_fail2ban + configure_firewall + configure_service_discovery + configure_job_scheduling + configure_package_cache + configure_snapshots + configure_user_lingering +} + +configure_randomness() { # Randomness display "subtitle" "Randomness" pacman_install rng-tools - action="enabling rngd service" && display "task" "$action" - systemctl enable rngd >> "$logfile" 2>&1 || error_warn "$action" "$?" - action="starting rngd service" && display "task" "$action" - systemctl start rngd >> "$logfile" 2>&1 || error_warn "$action" "$?" + enable_service rngd + run_task "starting rngd service" systemctl start rngd +} + +configure_networking() { # Networking display "subtitle" "Networking" pacman_install networkmanager - action="enabling NetworkManager" && display "task" "$action" - systemctl enable NetworkManager.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling NetworkManager" systemctl enable NetworkManager.service action="configuring MAC address randomization" && display "task" "$action" mkdir -p /etc/NetworkManager/conf.d @@ -1219,6 +1371,7 @@ EOF current_lang="${LANG:-en_US.UTF-8}" wireless_region="${current_lang:3:2}" # extract country code (positions 3-4) action="configuring wireless regulatory domain ($wireless_region)" && display "task" "$action" + backup_system_file /etc/conf.d/wireless-regdom sed -i "s|^#WIRELESS_REGDOM=\"${wireless_region}\"|WIRELESS_REGDOM=\"${wireless_region}\"|" /etc/conf.d/wireless-regdom # Encrypted DNS (DNS over TLS) @@ -1245,28 +1398,40 @@ EOF # Note: If Docker containers have DNS issues, systemd-resolved's stub resolver # (127.0.0.53) may be the cause. Fix: configure Docker to use direct DNS, or # disable systemd-resolved and use /etc/resolv.conf directly. (2026-01-18) - action="enabling systemd-resolved" && display "task" "$action" - systemctl enable systemd-resolved >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling systemd-resolved" systemctl enable systemd-resolved # Create resolv.conf symlink to systemd-resolved - action="linking resolv.conf to systemd-resolved" && display "task" "$action" - ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "linking resolv.conf to systemd-resolved" ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf +} + +configure_power() { # Power display "subtitle" "Power" pacman_install upower - action="enabling upower service" && display "task" "$action" - systemctl enable upower >> "$logfile" 2>&1 || error_warn "$action" "$?" + enable_service upower +} + +configure_ssh_server() { # Secure Shell display "subtitle" "Secure Shell" pacman_install openssh - action="enabling the openssh service to run at boot" && display "task" "$action" - systemctl enable sshd >> "$logfile" 2>&1 || error_warn "$action" "$?" - action="starting the openssh service" && display "task" "$action" - systemctl start sshd >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling the openssh service to run at boot" systemctl enable sshd + run_task "starting the openssh service" systemctl start sshd + + action="hardening sshd (root login by key only)" && display "task" "$action" + cat << 'EOF' > /etc/ssh/sshd_config.d/10-hardening.conf +# Root may log in by key only, never by password. PasswordAuthentication is +# left at the default so a normal user can still bootstrap a key via ssh-copy-id. +PermitRootLogin prohibit-password +EOF + systemctl reload sshd >> "$logfile" 2>&1 || error_warn "$action" "$?" +} + +configure_fail2ban() { # SSH Brute Force Protection @@ -1291,16 +1456,16 @@ maxretry = 3 bantime = 1h EOF - action="enabling fail2ban service" && display "task" "$action" - systemctl enable fail2ban >> "$logfile" 2>&1 || error_warn "$action" "$?" - action="starting fail2ban service" && display "task" "$action" - systemctl start fail2ban >> "$logfile" 2>&1 || error_warn "$action" "$?" + enable_service fail2ban + run_task "starting fail2ban service" systemctl start fail2ban +} + +configure_firewall() { display "subtitle" "Firewall" pacman_install ufw - action="configuring ufw to deny by default" && display "task" "$action" - ufw default deny incoming >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "configuring ufw to deny by default" ufw default deny incoming # Firewall rules - only open ports for services we actually run for protocol in \ @@ -1328,11 +1493,9 @@ EOF action="rate-limiting SSH to protect from brute force attacks" && display "task" "$action" (ufw limit 22/tcp >> "$logfile" 2>&1) || error_warn "$action" "$?" - action="enabling firewall" && display "task" "$action" - ufw --force enable >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling firewall" ufw --force enable - action="enabling firewall service to launch on boot" && display "task" "$action" - systemctl enable ufw.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling firewall service to launch on boot" systemctl enable ufw.service # Verify firewall is actually active # Note: In VM environments, UFW may show inactive due to missing kernel @@ -1343,6 +1506,9 @@ EOF error_messages=("FIREWALL NOT ACTIVE - run: sudo ufw enable" "${error_messages[@]}") error_warn "$action" "1" fi +} + +configure_service_discovery() { # Service Discovery @@ -1354,27 +1520,26 @@ EOF display "task" "skipping avahi (already running)" else pacman_install avahi # service discovery on a local network using mdns - action="enabling avahi for mDNS discovery" && display "task" "$action" - systemctl enable avahi-daemon.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling avahi for mDNS discovery" systemctl enable avahi-daemon.service fi pacman_install wsdd - action="enabling wsdd for Windows network discovery" && display "task" "$action" - systemctl enable wsdd.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling wsdd for Windows network discovery" systemctl enable wsdd.service pacman_install geoclue # geolocation service for location-aware apps - action="enabling geoclue geolocation service" && display "task" "$action" - systemctl enable geoclue.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling geoclue geolocation service" systemctl enable geoclue.service # Enable BeaconDB as geoclue wifi location provider (default MLS/Ichnaea API is defunct) action="configuring geoclue to use BeaconDB location service" && display "task" "$action" if grep -q '^#url=https://api.beacondb.net/v1/geolocate' /etc/geoclue/geoclue.conf 2>/dev/null; then + backup_system_file /etc/geoclue/geoclue.conf sed -i 's|^#url=https://api.beacondb.net/v1/geolocate|url=https://api.beacondb.net/v1/geolocate|' /etc/geoclue/geoclue.conf fi # Whitelist gammastep in geoclue config (geoclue demo agent is started via hyprland.conf exec-once) action="whitelisting gammastep in geoclue" && display "task" "$action" if ! grep -q "^\[gammastep\]" /etc/geoclue/geoclue.conf 2>/dev/null; then + backup_system_file /etc/geoclue/geoclue.conf cat >> /etc/geoclue/geoclue.conf << 'EOF' [gammastep] @@ -1392,36 +1557,52 @@ EOF After=systemd-sysusers.service EOF +} + +configure_job_scheduling() { # Job Scheduling display "subtitle" "Job Scheduling" pacman_install cronie - action="enabling cronie to launch at boot" && display "task" "$action" - systemctl enable cronie >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling cronie to launch at boot" systemctl enable cronie pacman_install at - action="enabling the batch delayed command scheduler" && display "task" "$action" - systemctl enable atd >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling the batch delayed command scheduler" systemctl enable atd action="installing log cleanup cron job" && display "task" "$action" (sudo -u "$username" crontab -l 2>/dev/null; \ echo "0 12 * * * \$HOME/.local/bin/cron/log-cleanup") \ | sudo -u "$username" crontab - \ >> "$logfile" 2>&1 || error_warn "$action" "$?" +} + +configure_package_cache() { # Package Repository Cache Maintenance display "subtitle" "Package Repository Cache Maintenance" pacman_install pacman-contrib - action="enabling the package cache cleanup timer" && display "task" "$action" - systemctl enable --now paccache.timer >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling the package cache cleanup timer" systemctl enable --now paccache.timer action="configuring paccache to keep 3 versions" && display "task" "$action" + backup_system_file /etc/conf.d/pacman-contrib sed -i 's/^PACCACHE_ARGS=.*/PACCACHE_ARGS=-k3/' /etc/conf.d/pacman-contrib - # Snapshot Service - filesystem-aware +} + +configure_snapshots() { + display "subtitle" "Snapshot Service" if is_zfs_root; then + configure_zfs_snapshots + elif is_btrfs_root; then + configure_btrfs_snapshots + else + display "task" "ext4/other filesystem detected" + fi +} + +configure_zfs_snapshots() { # ZFS: Install sanoid for snapshot management display "task" "ZFS detected - installing sanoid" aur_install sanoid @@ -1525,8 +1706,7 @@ Persistent=true WantedBy=timers.target EOF - action="enabling sanoid timer" && display "task" "$action" - systemctl enable sanoid.timer >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling sanoid timer" systemctl enable sanoid.timer action="enabling weekly ZFS scrub" && display "task" "$action" # Get pool name dynamically (usually zroot) @@ -1538,7 +1718,9 @@ EOF # systemctl enable --now zfs-replicate.timer display "task" "zfs-replicate timer created (enable after SSH key setup to TrueNAS)" - elif is_btrfs_root; then +} + +configure_btrfs_snapshots() { # Btrfs: Install snapper for snapshot management display "task" "btrfs detected - installing snapper and grub-btrfs" pacman_install snapper @@ -1580,16 +1762,13 @@ EOF snapper -c root set-config "TIMELINE_LIMIT_MONTHLY=1" >> "$logfile" 2>&1 snapper -c root set-config "TIMELINE_LIMIT_YEARLY=0" >> "$logfile" 2>&1 - action="enabling snapper timeline timer" && display "task" "$action" - systemctl enable snapper-timeline.timer >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling snapper timeline timer" systemctl enable snapper-timeline.timer systemctl enable snapper-cleanup.timer >> "$logfile" 2>&1 || error_warn "$action" "$?" - action="enabling grub-btrfsd for boot menu snapshots" && display "task" "$action" - systemctl enable grub-btrfsd >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling grub-btrfsd for boot menu snapshots" systemctl enable grub-btrfsd # Allow user to use snapper without root (required for snapper-gui) - action="allowing wheel group to use snapper" && display "task" "$action" - snapper -c root set-config "ALLOW_GROUPS=wheel" >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "allowing wheel group to use snapper" snapper -c root set-config "ALLOW_GROUPS=wheel" snapper -c root set-config "SYNC_ACL=yes" >> "$logfile" 2>&1 || error_warn "$action" "$?" # Set ACL on .snapshots directory for wheel group access setfacl -m g:wheel:rx /.snapshots >> "$logfile" 2>&1 || error_warn "$action" "$?" @@ -1597,9 +1776,9 @@ EOF # Install snapper GUI (AUR) aur_install snapper-gui-git - else - display "task" "ext4/other filesystem detected" - fi +} + +configure_user_lingering() { # User Services Lingering # Keeps user-level systemd services (e.g., protonmail-bridge) running without @@ -1607,8 +1786,7 @@ EOF # user-level IMAP/SMTP daemons over SSH or from remote agents. display "subtitle" "User Services" - action="enabling user-services lingering for $username" && display "task" "$action" - loginctl enable-linger "$username" >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling user-services lingering for $username" loginctl enable-linger "$username" } ### Xorg Display Manager @@ -1633,8 +1811,7 @@ Section "ServerFlags" Option "DontZap" "True" EndSection EOF - action="configuring xorg server" && display "task" "$action" - chmod 644 /etc/X11/xorg.conf.d/00-no-vt-or-zap.conf >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "configuring xorg server" chmod 644 /etc/X11/xorg.conf.d/00-no-vt-or-zap.conf # Install GPU-specific drivers install_gpu_drivers @@ -1714,7 +1891,10 @@ hyprland() { pacman_install wev # wayland event debugger (xev equivalent) - # Logitech BRIO webcam auto-configuration + # Device-specific udev rules (opt-out via INSTALL_DEVICE_UDEV_RULES=no). + # Currently the Logitech BRIO webcam auto-configuration; tied to hardware not + # everyone has, so it's gated for the open-source release. + if [ "$install_device_udev_rules" = "true" ]; then action="creating Logitech BRIO udev rule" && display "task" "$action" cat > /etc/udev/rules.d/99-logitech-brio.rules << 'UDEVEOF' # Apply camera settings when Logitech BRIO is connected @@ -1722,6 +1902,49 @@ ACTION=="add", SUBSYSTEM=="video4linux", ATTRS{idVendor}=="046d", ATTRS{idProduc UDEVEOF sed -i "s/ARCHSETUP_USERNAME/${username}/" /etc/udev/rules.d/99-logitech-brio.rules chmod 644 /etc/udev/rules.d/99-logitech-brio.rules + fi + + # Live-update guard: a pacman PreTransaction hook that aborts an upgrade of + # GPU/compositor runtime libraries while a Hyprland session is running, so + # the live compositor doesn't SIGABRT when a library is swapped underneath + # it (hit ratio 2026-06-07: live mesa + hyprland upgrade crashed Hyprland and + # its clients). Re-run the upgrade from a TTY with Hyprland stopped and the + # guard stays quiet. + action="Live-Update Guard" && display "subtitle" "$action" + run_task "installing the live GPU/compositor update guard" \ + cp "$user_archsetup_dir/scripts/hypr-live-update-guard" /usr/local/bin/hypr-live-update-guard + chmod 755 /usr/local/bin/hypr-live-update-guard + + action="installing the live-update guard pacman hook" && display "task" "$action" + mkdir -p /etc/pacman.d/hooks + cat > /etc/pacman.d/hooks/hypr-live-update-guard.hook << 'HOOKEOF' +[Trigger] +Operation = Upgrade +Type = Package +Target = mesa +Target = mesa-* +Target = wayland +Target = libdrm +Target = libglvnd +Target = hyprland +Target = aquamarine +Target = hyprutils +Target = hyprgraphics +Target = vulkan-radeon +Target = vulkan-intel +Target = vulkan-mesa-layers +Target = nvidia-utils +Target = lib32-nvidia-utils +Target = xorg-xwayland + +[Action] +Description = Checking for a live Hyprland session before swapping GPU/compositor libs... +When = PreTransaction +Exec = /usr/local/bin/hypr-live-update-guard +AbortOnFail +NeedsTargets +HOOKEOF + chmod 644 /etc/pacman.d/hooks/hypr-live-update-guard.hook } ### Display Server (conditional) @@ -1879,8 +2102,7 @@ desktop_environment() { pacman_install "$software" done pacman_install solaar # Logitech device manager - action="enabling bluetooth to launch at boot" && display "task" "$action" - systemctl enable bluetooth.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling bluetooth to launch at boot" systemctl enable bluetooth.service # Command Line Utilities @@ -1996,8 +2218,7 @@ gaming() { pacman_install steam # Enable gamemode service for user - action="enabling gamemode for user" && display "task" "$action" - sudo -u "$username" systemctl --user enable gamemoded.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling gamemode for user" sudo -u "$username" systemctl --user enable gamemoded.service } ### Zig Toolchain Pin @@ -2103,6 +2324,14 @@ developer_workstation() { action="Developer Workstation" && display "title" "$action" + install_programming_languages + install_editors + install_android_utilities + install_vpn_tools + install_devops_utilities +} + +install_programming_languages() { action="Programming Languages and Utilities" && display "subtitle" "$action" # Rust (via rustup — must precede AUR packages that compile with rust) pacman_install rustup # Rust toolchain manager @@ -2146,7 +2375,7 @@ developer_workstation() { # AI coding assistant (native install to ~/.local/bin), opt-out via # INSTALL_CLAUDE_CODE=no / --no-claude-code. Gated because it's curl|sh from # a third party and not every user wants AI tooling. - if [ "$install_claude_code" = true ]; then + if [ "$install_claude_code" = "true" ]; then action="installing claude-code via native installer" && display "task" "$action" (sudo -u "$username" bash -c 'curl -fsSL https://claude.ai/install.sh | sh' >> "$logfile" 2>&1) || \ error_warn "$action" "$?" @@ -2165,7 +2394,14 @@ developer_workstation() { pacman_install meld # Visual diff pacman_install ripgrep # Fast grep utility pacman_install zoxide # Smart cd command that learns your habits + pacman_install bat # cat with syntax highlighting + git markers + pacman_install dust # intuitive du: proportional disk-usage tree + pacman_install hyperfine # statistical command-line benchmarking + pacman_install doggo # modern dig: readable DNS client, DoH/DoT/DoQ +} + +install_editors() { action="Programming Editors" && display "subtitle" "$action" pacman_install mg # mini emacs @@ -2224,19 +2460,27 @@ developer_workstation() { >> "$logfile" 2>&1 || error_warn "$action" "$?" fi +} + +install_android_utilities() { action="Android Utilities" && display "subtitle" "$action" pacman_install android-file-transfer pacman_install android-tools +} + +install_vpn_tools() { action="VPN Tools" && display "subtitle" "$action" pacman_install wireguard-tools # VPN - add configs to /etc/wireguard/ pacman_install systemd-resolvconf # resolvconf for wg-quick DNS integration pacman_install proton-vpn-gtk-app # Proton VPN GUI client with system tray pacman_install tailscale # mesh VPN - run 'tailscale up' to authenticate - action="enabling tailscale service" && display "task" "$action" - systemctl enable tailscaled >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling tailscale service" systemctl enable tailscaled +} + +install_devops_utilities() { action="DevOps Utilities" && display "subtitle" "$action" action="installing devops virtualization and automation tools" && display "task" "$action" @@ -2264,8 +2508,7 @@ developer_workstation() { } EOF fi - action="enabling docker service to launch on boot" && display "task" "$action" - systemctl enable docker.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling docker service to launch on boot" systemctl enable docker.service # podman (rootless containers for winvm) pacman_install podman @@ -2398,9 +2641,12 @@ supplemental_software() { aur_install warpinator # secure file transfers aur_install zsh-fast-syntax-highlighting-git # Optimized and extended zsh-syntax-highlighting - # working around an temp integ issue with python-lyricsgenius expiration date - action="prep to workaround tidal-dl issue" && display "task" "$action" - yay -S --noconfirm --mflags --skipinteg python-lyricsgenius >> "$logfile" 2>&1 || error_warn "$action" "$?" + # python-lyricsgenius needs --skipinteg: its AUR PKGBUILD pins a b2sum for a + # LICENSE.txt pulled from the project's github master (a moving target), so + # makepkg's integrity check fails on that file even though the package tarball + # itself verifies. Rechecked 2026-06-24 — the original expired-PGP-signature + # cause is gone, but this LICENSE-drift keeps the workaround necessary. + run_task "installing python-lyricsgenius (integrity workaround)" yay -S --noconfirm --mflags --skipinteg python-lyricsgenius aur_install tidal-dl # tidal-dl:tidal as yt-dlp:youtube aur_install tidaler # tidal downloader (tidal-dl-ng fork) aur_install freetube # privacy-focused YouTube desktop client @@ -2410,6 +2656,16 @@ supplemental_software() { boot_ux() { action="Boot UX" && display "title" "$action" + tighten_efi_permissions + add_nvme_early_module + configure_initramfs_hook + configure_encrypted_autologin + configure_tlp_power + trim_firmware + configure_grub +} + +tighten_efi_permissions() { # Tighten /efi mount permissions so kernel images, initramfs, and # bootloader config aren't world-readable. archinstall's defaults leave # them at 0755; fmask/dmask below makes files 0600 and dirs 0700. @@ -2417,14 +2673,19 @@ boot_ux() { if grep -qE "^[^#].*[[:space:]]/efi[[:space:]]+vfat[[:space:]]" /etc/fstab \ && ! grep -E "^[^#].*[[:space:]]/efi[[:space:]]+vfat[[:space:]]" /etc/fstab | grep -q "fmask="; then action="tightening /efi mount permissions in fstab" && display "task" "$action" + backup_system_file /etc/fstab sed -i -E '/^[^#].*[[:space:]]\/efi[[:space:]]+vfat[[:space:]]/ s/([[:space:]]+vfat[[:space:]]+)([^[:space:]]+)/\1\2,fmask=0177,dmask=0077/' /etc/fstab \ || error_warn "$action" "$?" fi +} + +add_nvme_early_module() { # Add nvme module for early loading on NVMe systems # Ensures NVMe devices are available when ZFS/other hooks try to access them if has_nvme_drives; then action="adding nvme to mkinitcpio MODULES for early loading" && display "task" "$action" + backup_system_file /etc/mkinitcpio.conf if grep -q "^MODULES=()" /etc/mkinitcpio.conf; then sed -i 's/^MODULES=()/MODULES=(nvme)/' /etc/mkinitcpio.conf elif grep -q "^MODULES=(" /etc/mkinitcpio.conf && ! grep -q "nvme" /etc/mkinitcpio.conf; then @@ -2440,16 +2701,21 @@ boot_ux() { error_warn "$action" "$?" action="configuring console font" && display "task" "$action" + backup_system_file /etc/vconsole.conf if grep -q "^FONT=" /etc/vconsole.conf 2>/dev/null; then sed -i 's/^FONT=.*/FONT=ter-132n/' /etc/vconsole.conf else echo "FONT=ter-132n" >> /etc/vconsole.conf fi +} + +configure_initramfs_hook() { # Only switch to systemd hook for non-ZFS systems # ZFS initramfs hook is busybox-based and incompatible with systemd hook if ! is_zfs_root; then action="delegating fsck messages from udev to systemd" && display "task" "$action" + backup_system_file /etc/mkinitcpio.conf sed -i '/^HOOKS=/ s/\budev\b/systemd/' /etc/mkinitcpio.conf || error_warn "$action" "$?" mkinitcpio -P >> "$logfile" 2>&1 || error_warn "running mkinitcpio -P to silence fsck messages" "$?" fi @@ -2469,6 +2735,9 @@ StandardOutput=null StandardError=journal+console EOF +} + +configure_encrypted_autologin() { # Automatic login for encrypted systems (prompts if no CLI flag and root is encrypted) configure_autologin @@ -2491,6 +2760,9 @@ HandleLidSwitchExternalPower=ignore HandleLidSwitchDocked=ignore EOF +} + +configure_tlp_power() { # TLP power management — laptops only (battery present). Manages wifi, # USB, PCIe, and CPU power policy on AC/battery transitions. systemd-rfkill # is masked per TLP's docs (it fights TLP's radio-state handling). @@ -2509,12 +2781,14 @@ PLATFORM_PROFILE_ON_BAT=low-power # Off by default — uncomment (and match the BAT name) to enable. #STOP_CHARGE_THRESH_BAT1=80 EOF - action="enabling TLP service" && display "task" "$action" - systemctl enable tlp.service >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "enabling TLP service" systemctl enable tlp.service systemctl mask systemd-rfkill.service systemd-rfkill.socket >> "$logfile" 2>&1 || \ error_warn "masking systemd-rfkill for TLP" "$?" fi +} + +trim_firmware() { # Firmware trim — Framework 13 Intel only (matched by DMI), where the # hardware set is known: i915 graphics (linux-firmware-intel), ath9k wifi # (linux-firmware-atheros, firmware-free driver but kept for safety), and @@ -2532,10 +2806,12 @@ EOF linux-firmware-mellanox linux-firmware-nfp linux-firmware-nvidia \ linux-firmware-other linux-firmware-qlogic linux-firmware-radeon \ >> "$logfile" 2>&1 || error_warn "$action" "$?" - action="rebuilding initramfs after firmware trim" && display "task" "$action" - mkinitcpio -P >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "rebuilding initramfs after firmware trim" mkinitcpio -P fi +} + +configure_grub() { # GRUB: reset timeouts, adjust log levels, larger menu for HiDPI screens, and show splashscreen # Note: nvme.noacpi=1 disables NVMe ACPI power management to prevent freezes on some drives. # Safe to keep on newer drives (minor power cost), remove if battery life is critical. @@ -2553,8 +2829,7 @@ EOF # Regenerate GRUB config after all modifications if [ -f /etc/default/grub ]; then - action="generating grub configuration" && display "task" "$action" - grub-mkconfig -o /boot/grub/grub.cfg >> "$logfile" 2>&1 || error_warn "$action" "$?" + run_task "generating grub configuration" grub-mkconfig -o /boot/grub/grub.cfg fi } diff --git a/archsetup.conf.example b/archsetup.conf.example index 962d05f..82f6898 100644 --- a/archsetup.conf.example +++ b/archsetup.conf.example @@ -39,6 +39,11 @@ # Set to "no" to skip it (no AI tooling, avoids the curl-pipe-bash step). #INSTALL_CLAUDE_CODE=yes +# Install device-specific udev rules (default: yes) +# Currently the Logitech BRIO webcam auto-configuration rule. These target +# specific hardware; set to "no" if your machine doesn't have those devices. +#INSTALL_DEVICE_UDEV_RULES=yes + # System locale (default: prompt if not already configured) # Common options: en_US.UTF-8, en_GB.UTF-8, de_DE.UTF-8, es_ES.UTF-8, # fr_FR.UTF-8, pt_BR.UTF-8, ja_JP.UTF-8, zh_CN.UTF-8 diff --git a/working/collapsible-waybar-sides/spike-findings.org b/assets/2026-06-18-collapsible-waybar-sides-spike-findings.org index 4d45ed1..4d45ed1 100644 --- a/working/collapsible-waybar-sides/spike-findings.org +++ b/assets/2026-06-18-collapsible-waybar-sides-spike-findings.org diff --git a/working/collapsible-waybar-sides/collapsible-waybar-sides-spec.org b/assets/2026-06-19-collapsible-waybar-sides-spec.org index b9ddc0d..b9ddc0d 100644 --- a/working/collapsible-waybar-sides/collapsible-waybar-sides-spec.org +++ b/assets/2026-06-19-collapsible-waybar-sides-spec.org diff --git a/assets/color-themes/generate-palette.sh b/assets/color-themes/generate-palette.sh index 456d1a4..5a11264 100755 --- a/assets/color-themes/generate-palette.sh +++ b/assets/color-themes/generate-palette.sh @@ -1,4 +1,5 @@ #!/bin/sh +# SPDX-License-Identifier: GPL-3.0-or-later # Generate dupre-palette.png from color definitions using ImageMagick. # Output: assets/color-themes/dupre/dupre-palette.png diff --git a/assets/easyeffects-eq-presets.sh b/assets/easyeffects-eq-presets.sh index 40e9cd9..9a2ecef 100755 --- a/assets/easyeffects-eq-presets.sh +++ b/assets/easyeffects-eq-presets.sh @@ -1,4 +1,5 @@ #!/usr/bin/env bash +# SPDX-License-Identifier: GPL-3.0-or-later # Install EasyEffects parametric EQ presets (Harman target) # # Presets: diff --git a/assets/outbox/2026-06-24-2314-from-.emacs.d-delivered-side-pointed-dirvish-bg-cj.org b/assets/outbox/2026-06-24-2314-from-.emacs.d-delivered-side-pointed-dirvish-bg-cj.org new file mode 100644 index 0000000..32d8940 --- /dev/null +++ b/assets/outbox/2026-06-24-2314-from-.emacs.d-delivered-side-pointed-dirvish-bg-cj.org @@ -0,0 +1,5 @@ +#+TITLE: Delivered side: pointed dirvish 'bg' (cj/set-wallpaper in mo +#+SOURCE: from .emacs.d +#+DATE: 2026-06-24 23:14:07 -0400 + +Delivered side: pointed dirvish 'bg' (cj/set-wallpaper in modules/dirvish-config.el) at the set-wallpaper script for the Wayland branch, replacing the dead swww call. Test updated + green, live-reloaded into the daemon, set-wallpaper confirmed on PATH (your dotfiles 8be2484 symlink). The wallpaper dependency is closed — you can drop the :blocker:. The separate 'dirvish doesn't preview images' item stays open on my side. diff --git a/assets/outbox/2026-06-24-lint-followups-resolved.org b/assets/outbox/2026-06-24-lint-followups-resolved.org new file mode 100644 index 0000000..5f3b06f --- /dev/null +++ b/assets/outbox/2026-06-24-lint-followups-resolved.org @@ -0,0 +1,6 @@ +* 2026-06-24 Wed — Task-review health: 34 top-level [#A]/[#B]/[#C] tasks unreviewed for >30 days (daily review may have slipped) + +* lint-org follow-ups — todo.org (2026-06-24) +** TODO obsolete-properties-drawer — Incorrect contents for PROPERTIES drawer (line 138) + +* 2026-06-24 Wed — Task-review health: 27 top-level [#A]/[#B]/[#C] tasks unreviewed for >30 days (daily review may have slipped) diff --git a/assets/outbox/2026-06-25-1248-from-archangel-accepted-the-stale-baked-archzfs-db-zfs.org b/assets/outbox/2026-06-25-1248-from-archangel-accepted-the-stale-baked-archzfs-db-zfs.org new file mode 100644 index 0000000..1e0ebf4 --- /dev/null +++ b/assets/outbox/2026-06-25-1248-from-archangel-accepted-the-stale-baked-archzfs-db-zfs.org @@ -0,0 +1,5 @@ +#+TITLE: Accepted: the stale-baked-archzfs-db ZFS install bug. Diagno +#+SOURCE: from archangel +#+DATE: 2026-06-25 12:48:41 -0400 + +Accepted: the stale-baked-archzfs-db ZFS install bug. Diagnosis confirmed against source — install_base (installer/archangel:759) appends [archzfs] at ~771 and runs pacstrap -K at 805 with no db refresh between, so pacstrap resolves zfs-dkms from the baked ISO db (2.3.3) instead of the current channel (2.4.2). Filed as archangel [#A] :bug:blocker: 'Refresh archzfs db before pacstrap in install_base', tagged :blocker: so it surfaces as priority work. Fix direction: pacman -Sy after the repo appends and before pacstrap -K. A fresh ISO rebuild still needed for aged ISOs, but the refresh is the durable fix. Will notify you when a fixed ISO is available so you can retry make test FS_PROFILE=zfs. diff --git a/assets/outbox/2026-06-25-1347-from-archangel-delivered-you-re-unblocked-the-stale.org b/assets/outbox/2026-06-25-1347-from-archangel-delivered-you-re-unblocked-the-stale.org new file mode 100644 index 0000000..296e37c --- /dev/null +++ b/assets/outbox/2026-06-25-1347-from-archangel-delivered-you-re-unblocked-the-stale.org @@ -0,0 +1,5 @@ +#+TITLE: Delivered — you're unblocked. The stale-baked-archzfs-db ZFS +#+SOURCE: from archangel +#+DATE: 2026-06-25 13:47:29 -0400 + +Delivered — you're unblocked. The stale-baked-archzfs-db ZFS install bug is fixed and verified. installer/archangel now runs pacman -Syy before pacstrap (commit 89691a0, pushed to main). I rebuilt the ISO and ran the single-disk ZFS install in QEMU (scripts/test-install.sh single-disk = FILESYSTEM=zfs, NO_ENCRYPT=yes, the same scenario you reproduced): pacstrap resolved zfs-dkms-2.4.2-1 / zfs-utils-2.4.2-2 (not the stale 2.3.3), DKMS built zfs/2.4.2 against kernel 6.18.36-1-lts, the system rebooted, and the ZFS pool came up healthy. TEST PASSED. Fixed ISO: out/archangel-2026-06-25-vmlinuz-6.18.36-lts-x86_64.iso (kernel 6.18.36-lts). Retry make test FS_PROFILE=zfs against this ISO. Note: build.sh clears the host pacoloco archzfs cache on each build, so a fresh build is what guarantees current packages — if you ever see 2.3.3 again, rebuild rather than reusing an aged ISO. diff --git a/assets/outbox/2026-06-25-1359-from-archangel-got-the-heads-up-everything-it-flags-is.org b/assets/outbox/2026-06-25-1359-from-archangel-got-the-heads-up-everything-it-flags-is.org new file mode 100644 index 0000000..e437d7d --- /dev/null +++ b/assets/outbox/2026-06-25-1359-from-archangel-got-the-heads-up-everything-it-flags-is.org @@ -0,0 +1,5 @@ +#+TITLE: Got the heads-up — everything it flags is already handled. I +#+SOURCE: from archangel +#+DATE: 2026-06-25 13:59:17 -0400 + +Got the heads-up — everything it flags is already handled. I adopted the staged pacman -Syy edit as-is and committed it (89691a0, pushed to main), so no mystery edit remains. My own ISO rebuild ran clean (your work/ teardown left out/ intact; the new build created and tore down its own work/). Verified the ZFS single-disk install end-to-end before the unblock note I sent at 13:47 — that note supersedes this thread. You're clear to retry create-base-vm FS_PROFILE=zfs against out/archangel-2026-06-25-vmlinuz-6.18.36-lts-x86_64.iso. diff --git a/assets/outbox/2026-06-27-2148-from-archangel-accepted-both-archangel-bug-handoffs.org b/assets/outbox/2026-06-27-2148-from-archangel-accepted-both-archangel-bug-handoffs.org new file mode 100644 index 0000000..213017d --- /dev/null +++ b/assets/outbox/2026-06-27-2148-from-archangel-accepted-both-archangel-bug-handoffs.org @@ -0,0 +1,5 @@ +#+TITLE: Accepted both archangel bug handoffs (pacman.conf 0600, 2026 +#+SOURCE: from archangel +#+DATE: 2026-06-27 21:48:03 -0400 + +Accepted both archangel bug handoffs (pacman.conf 0600, 2026-06-25; baked AUR list breaks btrfs, 2026-06-27). Both diagnoses confirmed against archangel source. Filed as tasks; the baked-AUR-list one is tagged :blocker: since it blocks your btrfs base creation. Starting the fixes now — will ping you with a fixed ISO once both land and verify so you can rebuild the btrfs base and re-add btrfs to the green-test matrix. diff --git a/assets/outbox/2026-06-27-2301-from-archangel-delivered-both-archangel-bugs-fixed-and.org b/assets/outbox/2026-06-27-2301-from-archangel-delivered-both-archangel-bugs-fixed-and.org new file mode 100644 index 0000000..9384f88 --- /dev/null +++ b/assets/outbox/2026-06-27-2301-from-archangel-delivered-both-archangel-bugs-fixed-and.org @@ -0,0 +1,15 @@ +#+TITLE: Delivered — both archangel bugs fixed and verified, fixed IS +#+SOURCE: from archangel +#+DATE: 2026-06-27 23:01:40 -0400 + +Delivered — both archangel bugs fixed and verified, fixed ISO ready. + +ISO: /home/cjennings/code/archangel/out/archangel-2026-06-27-vmlinuz-6.18.36-lts-x86_64.iso (2.1G, built 2026-06-27 22:36) + +Bug #2 (the blocker — baked AUR list broke btrfs): fixed. install_base now filters the baked AUR set by target filesystem, dropping zfs-only tooling (zfs-auto-snapshot, zrepl) on a non-zfs install. The ISO still bakes the full set; only the install selection is filtered. Verified end-to-end in a VM: a btrfs-single unattended install now completes cleanly (the 266-pkg pacstrap transaction no longer includes zfs-auto-snapshot, so the 'unable to satisfy dependency zfs' abort is gone). A zfs single-disk install still passes, so the zfs path is unaffected. + +Bug #1 (installed /etc/pacman.conf landing 0600): fixed. strip_repo_stanza now writes through the existing config instead of mv-ing a 0600 mktemp over it, so the installed pacman.conf keeps its 0644. Unit-tested for mode preservation; the btrfs/zfs installs both completed past the strip step. + +You're unblocked: rebuild the btrfs base from this ISO and re-add btrfs to the green-test matrix. Fix is committed on archangel main (2ead674) and pushed. + +Note: the companion EFI \EFI\BOOT\BOOTX64.EFI removable-fallback hardening you mentioned is filed on the archangel side but not done in this pass — separate, optional, not part of this unblock. diff --git a/assets/outbox/2026-06-28-lint-followups-task-review-health.org b/assets/outbox/2026-06-28-lint-followups-task-review-health.org new file mode 100644 index 0000000..28c2e38 --- /dev/null +++ b/assets/outbox/2026-06-28-lint-followups-task-review-health.org @@ -0,0 +1,2 @@ + +* 2026-06-28 Sun — Task-review health: 27 top-level [#A]/[#B]/[#C] tasks unreviewed for >30 days (daily review may have slipped) diff --git a/docs/design/2026-06-25-testinfra-validation.org b/docs/design/2026-06-25-testinfra-validation.org new file mode 100644 index 0000000..5c82aa2 --- /dev/null +++ b/docs/design/2026-06-25-testinfra-validation.org @@ -0,0 +1,238 @@ +#+TITLE: Design: Testinfra Post-Install Validation for archsetup +#+AUTHOR: Craig Jennings +#+DATE: 2026-06-25 +#+STATUS: Accepted (2026-06-25) + +* Problem + +The VM integration harness (=scripts/testing/run-test.sh=) runs archsetup in a +QEMU VM, then verifies the result two ways: + +1. Parses archsetup's own install log for its Error Summary and the + =ARCHSETUP_EXECUTION_COMPLETE= marker (did the script finish, did it log + errors). +2. Runs =run_all_validations= from =scripts/testing/lib/validation.sh= — a + hand-rolled, shell-based post-install assertion sweep of ~26 checks over SSH. + +The shell sweep works, but each check is 6-40 lines of =ssh_cmd= + +=validation_pass/fail= + =attribute_issue= boilerplate, the pass/fail counters +are hand-maintained globals, and the reporting is bespoke. Adding or reading a +check is heavier than it should be, and growing the suite (archsetup configures +far more than the 26 checks cover) compounds that weight. + +This doc proposes porting the post-install validation to Testinfra (Python + +pytest) for more expressive checks and better reporting, then growing coverage. + +* Decision + +Port the post-install validation layer to Testinfra + pytest, reaching parity +with the existing =validation.sh= sweep, then expand coverage. Recorded +rationale: the up-front port cost (parity rewrite + a test-only dependency) is +an accepted trade — the priority is a robust, well-reported, growing validation +suite over feature speed. The framework swap alone buys ergonomics and +reporting, not coverage, so it is paired with real new coverage (below). + +This replaces the shell sweep; it does not touch archsetup's own install-log +parsing (that stays as a separate signal). The full coverage expansion (P4) +lands in this task too, sequenced strictly after the parity cutover so the +parity verification stays clean. + +* Current harness (what exists today) + +** Flow (run-test.sh) +1. Revert VM to base snapshot, boot, wait for SSH. +2. =capture_pre_install_state=. +3. Bundle + copy archsetup + dotfiles into the VM, run archsetup in background, + poll to completion. +4. =capture_post_install_state=. +5. =run_all_validations= (the shell sweep). +6. =analyze_log_diff= + =generate_issue_report= (issue attribution). +7. Explicit pass/fail exit code; cleanup. + +** The shell sweep (validation.sh) +~26 checks under =run_all_validations=: user created / shell / groups, dotfiles, +yay, pacman working, window manager, firewall, DNS, avahi, fail2ban, +NetworkManager, emacs, git config, dev tools, zfs, boot config, autologin, +gnome-keyring, terminus font, mkinitcpio hooks, initramfs consolefont, nvme +module, archsetup log, state markers. + +** Issue attribution +=attribute_issue <msg> <bucket>= sorts each failure into one of three arrays — +=ARCHSETUP_ISSUES=, =BASE_INSTALL_ISSUES=, =UNKNOWN_ISSUES= — and +=generate_issue_report= writes them out (base-install issues route to the +archzfs inbox). This is domain logic Testinfra has no equivalent for; the port +must preserve it. + +** Connection +=ssh_cmd= uses =sshpass -p "$ROOT_PASSWORD" ssh ... -p "$SSH_PORT" root@$VM_IP=, +with =VM_IP=localhost=, =SSH_PORT=2222=, =ROOT_PASSWORD=archsetup=. + +* Design + +** Where Testinfra fits +Replace the =run_all_validations= call (step 5) with a pytest invocation against +the running VM. Steps 1-4 and 6-7 are unchanged; =analyze_log_diff= stays. +Testinfra connects over the same SSH the harness already exposes. + +** Connection model +Testinfra's paramiko/ssh backend targets the live VM via its host spec: + +#+begin_src sh +pytest scripts/testing/tests/ \ + --hosts="ssh://root@localhost:2222" \ + --ssh-config=<generated> \ + --json-report --json-report-file="$TEST_RESULTS_DIR/testinfra.json" +#+end_src + +Password auth: generate a throwaway ssh-config (or reuse sshpass via a +=--ssh-identity= once archsetup drops the key, but at validation time we only +have the root password). Simplest: a tiny generated ssh config + sshpass +wrapper, or switch the test VM to a known test key injected pre-run. Open +question below. + +** Test layout +#+begin_example +scripts/testing/tests/ + conftest.py # host fixture, markers, attribution hook, report glue + test_users.py # user created / shell / groups + test_dotfiles.py # stow symlinks, readable by user + test_packages.py # yay, pacman working, dev tools, key packages + test_services.py # firewall, dns, avahi, fail2ban, networkmanager + test_boot.py # zfs, mkinitcpio hooks, nvme, consolefont, terminus + test_desktop.py # window manager, autologin, gnome-keyring + test_archsetup.py # install log, state markers + test_hardening.py # NEW: sshd drop-in, sysctl, /etc fstab perms, backups +#+end_example + +** Example tests (parity) +#+begin_src python +def test_ufw_enabled(host): + assert host.service("ufw").is_enabled + +def test_user_cjennings_exists(host): + u = host.user("cjennings") + assert u.exists + assert u.shell == "/usr/bin/zsh" + +def test_zshrc_stowed_and_readable(host): + f = host.file("/home/cjennings/.zshrc") + assert f.is_symlink + assert ".dotfiles/" in f.linked_to + assert f.exists # not broken + assert host.run("sudo -u cjennings test -r %s" % f.path).rc == 0 + +def test_mkinitcpio_systemd_hook(host): + # non-ZFS systems delegate fsck from udev to systemd + conf = host.file("/etc/mkinitcpio.conf").content_string + assert "systemd" in conf +#+end_src + +Compare =test_ufw_enabled= (1 line) to the current =validate_firewall= (8 lines +of ssh_cmd + branch + counters). + +** Preserving issue attribution +Map the three buckets to pytest markers and collect them in a =conftest.py= +hook: + +#+begin_src python +@pytest.mark.attribution("archsetup") # or "base_install" / "unknown" +def test_ufw_enabled(host): ... +#+end_src + +A =pytest_runtest_makereport= hook records each failure under its marker's +bucket and writes the same three-way report =generate_issue_report= produces +(base-install failures still route to the archzfs inbox). Default bucket = +archsetup when unmarked. + +** Tiered strategy +Markers =@pytest.mark.smoke= (user, key packages, dotfiles present) and +=@pytest.mark.integration= (services, configs, boot). =pytest -m smoke= for a +fast gate, full run otherwise. Drop the task's original X11/startx end-to-end +slice — the fleet is Wayland/Hyprland and headless GUI e2e is flaky and +expensive; a Wayland-session smoke check can be reconsidered later as its own +task. + +** Reporting +=pytest-json-report= (or junit-xml) → =$TEST_RESULTS_DIR/=, surfaced in the +test report alongside the install-log analysis. pytest's own per-test +pass/fail/skip output replaces the hand-maintained counters. + +* Coverage + +** Parity (port all current checks) +All ~26 =validation.sh= checks, grouped per the layout above. + +** Expansion (new — the coverage win) +archsetup configures much that isn't validated today. Candidates: +- sshd hardening drop-in (=/etc/ssh/sshd_config.d/10-hardening.conf=, + PermitRootLogin prohibit-password). +- =backup_system_file= behavior — assert =.archsetup.bak= exists for files + archsetup edited in place (fstab, mkinitcpio.conf, sudoers, …). +- pacman.conf (ParallelDownloads, Color, multilib) and makepkg.conf (MAKEFLAGS, + OPTIONS) settings actually applied. +- systemd-resolved DNS-over-TLS drop-in; NetworkManager wifi-privacy. +- fail2ban jail.local present; reflector config; sysctl printk; /etc/issue + emptied; vconsole font; fstab /efi fmask/dmask perms. +- sanoid / zfs-replicate units (ZFS hosts). + +* Dependencies + +Add =python-pytest=, =python-pytest-testinfra= (pulls paramiko), and a JSON +reporter to =make deps= (test host only — not installed by archsetup itself). +Note: the existing unit suites run under =python3 -m unittest=; the integration +layer runs under pytest. Two runners, both Python; =make test-unit= unchanged, +=make test= gains the pytest step. + +* Goss comparison (the task asked) + +- *Goss* — YAML-declarative health specs, a single Go binary executed *on the + target*. Fast, no Python. But the spec must be pushed into the VM and run + there, the assertions are less programmable, and it adds a Go binary to the + flow. +- *Testinfra* — Python, runs *on the host* over SSH (nothing installed in the + VM), assertions are full Python with rich built-in modules + (File/Package/Service/User/Command), integrates with pytest's tooling. + +Choose Testinfra: it runs from the host (the VM stays clean), it's far more +programmable for the conditional checks archsetup needs (DESKTOP_ENV branches, +ZFS-vs-not), and it aligns with the repo's existing Python test tooling. + +* Migration plan (phased, TDD where the helper logic is ours) + +- *P1 — Scaffold.* conftest.py (host fixture + connection), the attribution + marker + report hook, and 3 parity checks (firewall, user, dotfiles). Wire a + pytest step into run-test.sh behind a flag so the shell sweep still runs. +- *P2 — Full parity.* Port all ~26 checks; diff a real VM run's results against + the shell sweep to confirm no check was lost. +- *P3 — Cut over.* Make pytest the primary sweep in run-test.sh; keep + =analyze_log_diff= and the install-log signal. +- *P4 — Expand.* Add the new coverage (hardening, backups, applied settings). +- *P5 — Retire.* Remove =run_all_validations= from validation.sh (keep the + capture/analyze helpers that pytest doesn't replace). + +* Acceptance criteria + +- =make test= runs archsetup in a VM, then a pytest sweep over SSH, and a real + run reports parity with (or a superset of) the current shell checks. +- Failures still sort into archsetup / base-install / unknown, with base-install + issues routed to the archzfs inbox as today. +- =make deps= installs the test dependencies; the VM has nothing extra installed. +- A documented =pytest -m smoke= fast path exists. + +* Resolved decisions (2026-06-25) + +1. *Auth at validation time — inject a throwaway test key.* Pre-run, generate + an ephemeral keypair, push the pubkey into the VM's + =/root/.ssh/authorized_keys= over the existing sshpass channel, and point + Testinfra at the private key via a generated ssh-config. No password in the + pytest invocation; paramiko key auth just works; the keypair is discarded + after the run. (Chosen over wrapping sshpass around Testinfra, which is + awkward since Testinfra spawns its own ssh connections.) +2. *Cut over — run both through parity, then switch.* Keep the shell sweep + running alongside pytest through P2 so a real VM run can diff pytest's + results against the shell sweep and prove no check was dropped. pytest + becomes primary at P3; =run_all_validations= is deleted at P5 after the + expanded suite proves out. +3. *Expansion scope — full, in this task, after cutover.* All of P4 lands here, + sequenced strictly after the P3 parity cutover so the parity diff is clean + before new checks are added. diff --git a/docs/design/2026-06-25-zfs-vm-test-coverage.org b/docs/design/2026-06-25-zfs-vm-test-coverage.org new file mode 100644 index 0000000..d9625e0 --- /dev/null +++ b/docs/design/2026-06-25-zfs-vm-test-coverage.org @@ -0,0 +1,139 @@ +#+TITLE: Design: ZFS VM Test Coverage + Bare-Metal Runner Migration +#+AUTHOR: Craig Jennings +#+DATE: 2026-06-25 +#+STATUS: Draft — for review + +* Problem + +Two gaps, one root: + +1. *The ZFS install path is untested in automation.* The VM harness + (=make test=) uses a single non-ZFS base image, so every ZFS-conditional + check skips (mkinitcpio udev hook on ZFS, sanoid, zfs-scrub timer, the whole + ZFS branch of archsetup). ZFS is exercised *only* by =run-test-baremetal.sh= + against real hardware. + +2. *=run-test-baremetal.sh= is latently broken by the sshd hardening.* It SSHes + to the target as root *by password* throughout the run, exactly the pattern + archsetup's =PermitRootLogin prohibit-password= (shipped 2026-06-24) kills + mid-install. The VM runner already hit and fixed this (=inject_root_key= + + key auth, commit f50fc1d); the bare-metal runner never got that fix, so it + almost certainly aborts mid-install now, the same way the VM runner did. + +The fix for both is the same shape: a ZFS base VM gives a safe, repeatable, +snapshot-rollback ZFS target (no sacrificial hardware), which both fills the +coverage gap *and* provides a target to migrate + validate the bare-metal +runner against. This also unblocks P5 (deleting the dead shell-sweep functions +from validation.sh), which is gated on the bare-metal runner leaving the shell +sweep. + +* Decision + +Build a ZFS base VM via archangel, add a filesystem-profile selector to the VM +harness so =make test= can target zfs or non-zfs, then migrate +=run-test-baremetal.sh= to key auth + the Testinfra sweep and validate it +against the ZFS VM. Finish by deleting the now-dead shell-sweep functions (P5). + +Explicitly rejected: loosening =PermitRootLogin= (or adding a skip-hardening +test flag). That trades a real security feature for harness convenience and +would mean never validating the hardened config. Key auth is the correct fix, +already proven in the VM runner. + +* Current state (grounded) + +- =create-base-vm.sh= boots an =archangel-*.iso=, copies =archsetup-test.conf= + into the live env, runs =archangel --config-file /root/archsetup-test.conf= + (the base-OS install — partitioning/filesystem live here), powers off, and + snapshots =clean-install= onto =vm-images/archsetup-base.qcow2=. +- =run-test.sh= hardcodes that one image + snapshot, and copies + =scripts/testing/archsetup-vm.conf= (DESKTOP_ENV=hyprland, non-ZFS) into the + VM as the archsetup config. +- =run-test-baremetal.sh= takes =--host= / =--password=, SSHes as root by + password, rolls back ZFS =@genesis= snapshots, transfers + runs archsetup, + then calls =run_all_validations= / =validate_all_services= (overriding + =VM_IP= to the target). It is the only remaining caller of the shell sweep. +- Key auth machinery already exists and is reusable: =inject_root_key= and + =SSH_KEY_OPT= in =vm-utils.sh=, and =run_testinfra_validation= in + =testinfra.sh= (drives connection from a generated ssh-config keyed on + =VM_IP= / =SSH_PORT=). + +* Design + +** A. ZFS archangel base +Add a ZFS archangel config (a =archsetup-test-zfs.conf= or equivalent) that +installs a ZFS root. Confirm archangel supports a ZFS-root config (it's a +separate project — verify its config options first). Unencrypted ZFS for the +test VM (skip the passphrase prompt; encryption isn't what we're validating). + +** B. Per-profile base images + selector +- =create-base-vm.sh= takes a profile (e.g. =FS_PROFILE=zfs|ext4=, default + current/non-ZFS), picks the matching archangel config, and writes a + profile-named image: =vm-images/archsetup-base.qcow2= (default) vs + =vm-images/archsetup-base-zfs.qcow2=. Same =clean-install= snapshot name. +- =run-test.sh= + Makefile take the same =FS_PROFILE= and select the image (via + =init_vm_paths=). The archsetup run config (=archsetup-vm.conf=) is *shared* — + archsetup auto-detects ZFS from the live root, so no per-profile run config is + needed. =make test FS_PROFILE=zfs=. + +** C. Bare-metal runner migration +Mirror the VM runner's fix in =run-test-baremetal.sh=: +- After the first successful SSH to =TARGET_HOST=, call =inject_root_key= (it + authorizes a key over the password session; set =VM_IP=TARGET_HOST=, + =SSH_PORT=22= so the helpers + ssh-config target the real host). +- Replace =run_all_validations= / =validate_all_services= with + =run_testinfra_validation= (now authoritative). +- Everything downstream already routes through =$SSH_KEY_OPT= (the vm-utils + helpers) and the ssh-config, so it survives the hardening. + +** D. Validate +- =make test FS_PROFILE=zfs= → the ZFS-conditional pytest checks now *run* + (not skip): mkinitcpio uses the udev hook, sanoid installed, zfs-scrub timer, + zfs root. Fix any real ZFS-path findings archsetup has. +- Point =run-test-baremetal.sh= at the ZFS VM (or real hardware) → confirm the + key-auth migration carries it through the hardening to a green pytest sweep. + +** E. Delete the shell sweep (P5) +Once both runners use =run_testinfra_validation=, delete the dead functions from +=validation.sh= (run_all_validations, validate_all_services, the ~26 validate_* +checks, validate_service*, run_full_validation, validation_pass/fail/warn/skip). +Keep the live helpers: ssh_cmd, attribute_issue, capture_pre/post_install_state, +analyze_log_diff, categorize_errors, generate_issue_report, VALIDATION_*. + +* Phases +- *P-A* archangel ZFS config (verify archangel ZFS support first). +- *P-B* create-base-vm.sh + run-test.sh + Makefile profile selector; build the + ZFS base image + snapshot. +- *P-C* =make test FS_PROFILE=zfs= green (ZFS-conditional tests run; fix + findings). VM-validatable here. +- *P-D* migrate run-test-baremetal.sh to key auth + Testinfra; validate against + the ZFS VM. +- *P-E* delete the dead shell-sweep functions (the standing P5 follow-up). + +* Open questions +1. *Does archangel support a ZFS-root config out of the box?* RESOLVED (yes). + ZFS is archangel's *default* filesystem (=FILESYSTEM=zfs=, validated by + =installer/lib/config.sh:validate_filesystem=), with =NO_ENCRYPT=yes= for an + unattended unencrypted install and a ready =installer/velox-zfs.conf.example= + to model. No archangel work needed. +2. *Two images vs one image + two snapshots?* RESOLVED — two images. ZFS vs + btrfs are different on-disk layouts; cleaner than juggling snapshots on one + disk. =btrfs= keeps the legacy unsuffixed =archsetup-base.qcow2=; =zfs= gets + =archsetup-base-zfs.qcow2=. +3. *Profile on run-test.sh vs a separate run-test-zfs.sh?* RESOLVED — + =FS_PROFILE= env param on the existing runner + Makefile, no duplicate + harness. +4. *Disk size / RAM for the ZFS VM* — start at the 4G RAM / 50G disk defaults; + bump =VM_RAM= only if the ZFS install OOMs (decide at P-C build time). +5. *Should the bare-metal runner stay at all once a ZFS VM exists*, or does the + ZFS VM profile make it redundant for everything except real-hardware smoke? + Defer until after P-D. + +* Design corrections (found during P-A/P-B grounding) +- The "non-ZFS" base is *btrfs*, not ext4 — =archsetup-test.conf= sets + =FILESYSTEM=btrfs=. The profile axis is zfs vs btrfs throughout. +- *No =archsetup-vm-zfs.conf= is needed.* archsetup reads no filesystem key; it + auto-detects ZFS from the live root via =is_zfs_root()= (=findmnt -n -o FSTYPE + /=, archsetup:688). The ZFS branch (sanoid, zfs-scrub timer, mkinitcpio udev + hook, docker zfs storage driver) fires whenever the running root is ZFS. So + only the *archangel* base config and the base *image* differ per profile; the + archsetup run config (=archsetup-vm.conf=) is shared. diff --git a/docs/design/2026-06-29-waybar-network-module-spec.org b/docs/design/2026-06-29-waybar-network-module-spec.org new file mode 100644 index 0000000..37b87b0 --- /dev/null +++ b/docs/design/2026-06-29-waybar-network-module-spec.org @@ -0,0 +1,1585 @@ +#+TITLE: Waybar Network Module — Design Spec +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-06-29 + +* Status + +Ready for Phase 1; Ready-with-caveats overall. Three Codex review rounds + Craig's +cj comments are all incorporated — every finding has a disposition and the findings +cookie reads complete ([31/31]), with no open decisions (enterprise scope settled: +open + WPA-PSK in v1, 802.1X add/edit vNext, activate-only). The cj comments +reshaped several decisions (no separate credential store — use NM's own; =net +doctor= + Makefile console-recovery in v1; rfkill + full-stack-bounce repair; +airplane module absorbed; VPN a later Phase 5). The only remaining caveats are +Phase-2/3 build unknowns named under Open items (gtk4-layer-shell anchoring, the +=captive= =--json= refactor) — not Phase-1 blockers. Phase 1 (indicator + console +recovery) is ready to build. + +* Goal + +One waybar network component that does the whole job: shows connection state +(including the missing "associated but no internet / captive portal" state), +manages connections from a dropdown (nmcli-backed; secrets stay in +NetworkManager's own store, no separate credential file), and runs the network +diagnostics and remediation off the same place +(captive-portal detection + forcing, bounce/reset, gateway/DNS checks, speed +test). + +It unifies three todo tasks that are really one feature: +- =[#C]= "archsetup Waybar Wi-Fi module should show no-internet state" — the + indicator state plus the 2026-06-22 roam expansion (bounce, diagnostics, speed + test off the component). +- =[#B]= "Network-manager dropdown, nmcli-backed" — the management dropdown. (The + todo task's original "GPG-stored secrets" framing is superseded: secrets stay in + NM's own store, decision 5.) +- The network diagnostics already shipped in =captive= (the hotel/captive-portal + tool, formerly =login-page=) become this module's diagnostics engine rather + than a standalone CLI. + +* Scope + +** In +- *Indicator* — wifi/ethernet icon + signal + SSID, plus an internet sub-state: + online / captive / no-internet / connecting / disconnected / airplane. +- *Absorbs the airplane module* — the airplane state + toggle move into + =custom/net= (airplane is a network concern). Once this ships, the standalone + =custom/airplane= module, the =waybar-airplane= + =airplane-mode= scripts, their + =tests/=, and the css are deleted (listed under Files touched). The + desktop-settings panel (sibling =[#B]=) no longer needs an airplane row. +- *Interface-correct* — targets the wifi (or chosen) device, not the + default-route interface, so an active USB tether or wired link can't mask + wifi state. (Same lesson =captive= fixed; the current =custom/netspeed= keys + off the default route and has the bug.) +- *Connection management (panel)* — list saved connections most-recently-used + first, live signal for in-range wifi, click to switch; add / edit / remove for + open + WPA-PSK; activate any existing saved profile (including enterprise ones + NM already stores); ethernet↔wifi and wifi↔wifi switching even when a link + appears mid-session. +- *Diagnostics (panel)* — read-only Diagnose (captive probe 204-vs-portal with + the extracted portal URL, gateway ping, DNS config) separated from mutating + Repair. Repair has tiers, lightest first: rfkill-unblock, per-connection reset + (fresh MAC), full-stack bounce (=nmcli networking off/on=, then restart + NetworkManager if that fails), and the temporary 1.1.1.1 override test. Each + Repair action confirms and verifies cleanup. +- *Speed test (panel)* — down/up/ping with a progress indicator and last-result + shown, via the already-installed =speedtest-go --json=. +- *Connection secrets* — none of our own. Settings and passwords live where NM + already keeps them: =/etc/NetworkManager/system-connections/*.nmconnection= + (root-only =0600=, the PSK/EAP secret stored inline). We read/write them through + nmcli, which handles the privilege. No separate file, no GPG, no gpg-agent — one + fewer dependency, and NM's store is already the secure-at-rest source of truth. +- *Persistence* — connectivity probe result cached in the runtime dir so the + bar reads it cheaply between probes. +- *Observability* — a redacted JSONL event log so a post-failure session can + diagnose without re-running destructive actions. + +** Out (v1, note for later) +- No replacement of NetworkManager's connection engine. NM stays the thing that + connects; we drive it via nmcli. +- No add/edit *form* for WPA-Enterprise / 802.1X in v1. The reason is effort vs + payoff: 802.1X has many interdependent fields (CA cert, client cert, identity, + anonymous identity, phase-2 auth) where a wrong entry silently fails auth, so a + trustworthy form is a lot of UI for connections Craig rarely adds (open + + WPA-PSK covers home, hotels, and phone hotspots). v1 still *activates* existing + saved enterprise profiles and points editing at =nmtui=/=nmcli=. Settled + (Craig, 2026-06-29): enterprise add/edit is vNext — 24 saved profiles on velox, + 0 enterprise, so the form would be unused UI; if one ever appears nmtui adds it + once and the module activates it thereafter. +- No per-connection captive-portal *auto-login* in v1. (That would mean storing a + portal's login form answers — room number, surname, a checkbox — and replaying + them automatically when a known portal is detected, so the page never appears. + Out for v1 because every portal's form differs and it means storing per-venue + answers; v1 just opens the portal for you.) +- No graphing/history of speed-test results beyond the last run. +- No static-IP / proxy / metered / MAC-randomization editing in v1 (activate + existing, edit elsewhere). +- No VPN / WireGuard management in v1, but it's a planned later phase (Phase 5), + not a permanent exclusion — it folds the existing archsetup wireguard tooling + into the same panel/CLI. +- The desktop-settings dropdown (sibling =[#B]=) is a separate module, but it + shares the GTK4 layer-shell panel shell built here. + +* Architecture + +Three layers. Keep the bar cheap, the panel rich, the logic in one tested place. + +1. *Engine* — a =net= Python package (src-layout, unittest), exposing a CLI. Wraps + every nmcli op and owns the diagnostics. Emits JSON. This is the testable + core (fake =nmcli= / =curl= / =speedtest-go= on PATH, like the existing + =waybar-netspeed= and =waybar-sysmon= test harnesses). Precedent: pocketbook is + Python in the dotfiles repo; =wtimer= is Python for the same testability + reason. +2. *Indicator* — a thin =waybar-net= script that calls =net status --json= and + renders icon + signal + state + tooltip. Replaces =custom/netspeed= + (throughput folds into the tooltip). +3. *Panel* — a GTK4 + gtk4-layer-shell app (mirrors pocketbook's structure) + that imports the engine. Hosts connection management, diagnostics, and the + speed test. + +How the existing pieces map in: +- =captive= (bash, shipped) — the engine shells out to it for the heavy, + interactive portal-force flow (sudo reset, DNS override, browser launch). Its + cheap portal-detection logic is mirrored natively in the engine for the fast + status path so the bar never blocks on a subprocess. =captive= stays a usable + standalone CLI. The refactor (below) extracts its probe + reset into functions + the engine can call non-interactively. +- =waybar-netspeed= (sh, shipped) — retired; its throughput sampling moves into + the engine's status output and renders in the indicator tooltip only. +- =nmcli= — the connection backend for every op. + +Language note: the engine is Python; the indicator is a thin Python or sh +wrapper over =net status --json=. The bar path must stay fast (see Performance +budgets), so the indicator does no network I/O itself — it reads link state and +the cached connectivity result. + +* Repository + dependencies + +- *Code lives in the dotfiles repo* (=~/.dotfiles=), not archsetup. The =net= + package sits in-tree like pocketbook (src-layout, unittest, Makefile target); + =waybar-net= and the =net= CLI entry live in the hyprland tier + (=hyprland/.local/bin/=). Tests under =tests/net/= and =tests/waybar-net/=. + archsetup owns only the *dependency install*, not the code. +- *archsetup installs the deps* in its Hyprland step: =gtk4-layer-shell=, + =python-gobject=, plus =nmcli=/=curl=/=resolvectl=/=rfkill= (already present via + NetworkManager/curl/systemd/util-linux). Speed test uses =speedtest-go= (AUR + =speedtest-go-bin=, already installed on velox); archsetup adds it to the AUR + list. librespeed-cli is the documented fallback if a self-hosted LibreSpeed + server is ever wanted. No =gpg= dependency (secrets live in NM's own store). +- *Daily-drivers*: a stowed-script + AUR-dep feature, so ratio needs the same + =git pull= + stow + the archsetup-added deps. Note the manual dep step in the + rollout. + +** Makefile targets (console recovery is a first-class path) +=net doctor= and the diagnostics are reachable from a bare TTY when waybar and +the GUI are down — that's the case where you most need them. The dotfiles +Makefile carries targets that wrap the =net= CLI so "get back online" is one make +command from the console: +- =make online= — =net doctor --fix= (diagnose, then apply the lightest repair: + rfkill-unblock → reset → bounce → open portal). The headline recovery target. +- =make net-doctor= — =net doctor= (read-only diagnose + recommendation). +- =make net-status= / =make net-diagnose= / =make net-portal= / =make net-reset= + / =make net-bounce= — the individual ops. +- =make test= — already runs =tests/*=; the =net= package's unittest suites are + collected the same way. +These intentionally need only nmcli/curl/rfkill (no GUI, no waybar, no Python +GTK), so they work from a TTY on a broken graphical session. + +* Connectivity model — split cadence + +The indicator polls every ~2s, but a real internet/captive probe every 2s wastes +battery and can re-trigger a captive portal. So split it: + +- *Fast path (every poll, cheap, no network)* — interface, type, SSID, signal, + IPv4 presence, throughput sample. From nmcli / sysfs only. No network I/O. +- *Slow path (cached, TTL ~45s)* — the actual internet/captive probe (the 204 + check + meta-refresh portal extraction). Result cached at + =$XDG_RUNTIME_DIR/waybar/net-connectivity.json= with a timestamp. + +The indicator reads the cache each poll. When the cache is older than the TTL, +=net status= kicks =net probe= in the background (spawn + detach, never awaited) +and renders the last cached sub-state meanwhile. A user-triggered +diagnose/reconnect refreshes the cache immediately. This keeps the bar +responsive and the portal un-poked. + +** Concurrency, atomicity, staleness +- *Single-flight* — =net probe= takes a lock file at + =$XDG_RUNTIME_DIR/waybar/net-probe.lock= (flock, non-blocking). A second probe + while one runs is a no-op, so a flapping 2s poll can't pile up overlapping + probes. +- *Atomic writes* — the cache is written to a temp file + =os.replace= (atomic + rename), so a reader never sees a half-written cache. Same pattern as =wtimer=. +- *Max probe runtime* — the probe has a hard timeout (≤ 6s total: curl + =--max-time 5= + slack). On timeout it writes an =unknown= result, never hangs. +- *Stale classes* the indicator distinguishes: fresh (< TTL), stale (TTL..3×TTL, + shown with a subdued/aging hint), expired (> 3×TTL → treat as unknown), + unknown (no cache / probe failed). The bar never shows a confident "online" + past the expired threshold. +- *Invalidation* — the cache records the iface + SSID + active-connection UUID it + was taken under; a change in any of them invalidates it immediately (a + reconnect must not show the old network's verdict). +- *Crash cleanup* — a stale lock older than the max runtime is ignored/reclaimed. + +* Performance budgets (hot path) + +The bar exec path (=waybar-net= → =net status=) must stay responsive: +- *Budget*: =net status= returns in < 100ms typical, < 250ms worst case. +- *No sleeping in the bar path.* Throughput is sampled from two reads of + =/sys/class/net/<iface>/statistics/{rx,tx}_bytes= across the *waybar poll + interval itself* (delta since the last cached sample + timestamp), not via an + in-process =sleep= like the old =waybar-netspeed=. The cache holds the prior + counters. +- *Subprocess cap*: at most one =nmcli= invocation on the hot path (a single + =nmcli -t -f ...= multi-field query), plus sysfs reads. Never a per-field + nmcli call. +- *Every subprocess has a timeout* (=nmcli --wait 2=, =subprocess timeout=). On + timeout or error the indicator emits a degraded JSON state (class + =net-degraded=, a neutral glyph) rather than blocking or crashing waybar. +- *Benchmark test*: a fake slow =nmcli= asserts =net status= still returns within + budget by falling back to the degraded state. + +* Engine — =net= CLI surface + +All subcommands take =--json= where a machine reads them. Pure formatting/state +functions under the CLI; IO (nmcli, curl, file) at the edges. Every subcommand +exits non-zero with a JSON error envelope (see JSON schemas) on failure. + +- =net status [--json] [--iface IF]= — fast link state + cached connectivity + sub-state + throughput. The indicator's source. Never does network I/O. +- =net probe [--iface IF]= — run the connectivity/captive probe now, update the + cache (single-flight, atomic), print online | captive (+ portal URL) | + no-internet | unknown. Mirrors =captive='s cheap detection natively. +- =net list [--json]= — saved connections, MRU order, active flag, plus in-range + wifi with signal. +- =net up <uuid>= / =net down [--iface IF]= — switch / disconnect. Operates on + UUID, not name (see nmcli contract). +- =net add= / =net edit <uuid>= / =net remove <uuid>= — manage connections + (open + WPA-PSK) through nmcli; the secret lands in NM's own + =.nmconnection=. Enterprise profiles are activate-only. +- =net rescan [--iface IF]= — wifi rescan. +- =net diagnose [--json]= — read-only report: gateway ping, DNS config, captive + probe. The structured contract below. Doubles as the post-failure snapshot. +- =net repair <action> [--json]= — mutating remediation, lightest first: + =rfkill= (unblock + radio on), =reset= (fresh MAC), =bounce= (full-stack: + =nmcli networking off/on=, escalating to =systemctl restart NetworkManager=), + =dns-test= (temporary 1.1.1.1 override, auto-reverted). Each confirms via the + caller and verifies cleanup. +- =net doctor [--json] [--fix]= — one-shot "get me online" mode for the console: + runs the full diagnose, then applies the lightest repair that fits (unblock + rfkill, reset, bounce, open portal) — read-only without =--fix=, acting with + it. The TTY recovery path when waybar/the GUI is down (see the Makefile + targets). +- =net portal= — run =captive='s portal-force flow (reset if needed, extract + + open the portal page). +- =net speedtest [--json]= — =speedtest-go --json= run; down/up/ping. + +* nmcli contract + +The command wrapper is the reliability boundary; SSIDs and connection names +contain spaces, colons, duplicates, hidden names, and non-ASCII. Rules: + +- *Terse, field-selected output*: =nmcli -t -f <fields> --escape yes ...= and + =nmcli -g <fields> ...= (get-values) for single-value reads. Parse with the + documented escaping (=\:= and =\\=); never naive =cut -d:=. +- *UUID is the handle.* Every saved-profile op (=up=, =down=, =modify=, =delete=) + uses the connection UUID, never the display name — names duplicate and contain + separators. =net list= surfaces UUIDs; the panel maps row → UUID. +- *Wait budgets*: activation/deactivation use =nmcli --wait <n>= with an explicit + budget (hot-path reads =--wait 2=; activation =--wait 30=). No unbounded waits. +- *Connectivity*: NM's own =nmcli networking connectivity= can return + =none/portal/limited/full/unknown=. Use it as a *cheap hint* on the fast path + when present, but the authoritative captive verdict is still our own probe + (NM's portal detection is coarser and config-dependent). +- *Parser tests* (fake nmcli fixtures): escaped colons and backslashes in SSIDs, + embedded newlines, duplicate connection names, hidden SSID (empty name), + non-ASCII SSID, the wired-appears-mid-session case, and the multi-active case + (wifi + tether both up). + +* JSON schemas + +Versioned (="v": 1=) envelopes so tests lock the contract. Sketches (fields +nullable unless noted): + +- =status=: ={v, iface, type: wifi|ethernet|none, ssid, signal, ipv4, + gateway, throughput: {rx_bps, tx_bps}, connectivity: online|captive|no-internet|unknown, + connectivity_age_s, connectivity_class: fresh|stale|expired|unknown, state: + online|captive|no-internet|connecting|disconnected|airplane|wired|degraded}=. +- =probe=: ={v, result: online|captive|no-internet|unknown, portal_url, http_code, + redirect_host, elapsed_ms, ts}=. +- =list=: ={v, connections: [{uuid, name, type, active, last_used, signal, + in_range, security}]}=. +- =diagnose=: ={v, steps: [<diagnostic step, see contract>], overall: + ok|warn|fail}=. +- =speedtest=: ={v, down_mbps, up_mbps, ping_ms, server, elapsed_ms, ts}=. +- error envelope (any command): ={v, error: {code, message, detail, partial: + bool}}= with a non-zero exit. + +* Diagnostics contract + +=net diagnose --json= returns an ordered list of steps. Each step is the unit the +panel renders and the log records: + +- =id= — stable identifier (e.g. =link=, =dhcp=, =gateway-ping=, =dns-config=, + =dns-resolve=, =http-probe=, =portal=). +- =status= — =pending | running | pass | warn | fail | skipped=. +- =title= — short human label. +- =evidence= — redacted detail (the value seen), per the redaction rules. +- =elapsed_ms=. +- =safety= — =read-only= or =mutating= (diagnose steps are all read-only). +- =next_action= — what the user/agent should do on warn/fail (e.g. "open portal", + "reset connection", "switch network"). + +Repair actions (=net repair=) carry the same shape but =safety: mutating=, plus a +=cleanup_verified: bool= field (e.g. the DNS override was reverted) and a +terminal =cleanup-unverified= status when revert can't be confirmed. + +** Diagnose vs Repair (read-only vs mutating) +The panel separates them visually and behaviorally: +- *Diagnose* — probe, gateway ping, DNS config read, captive check. No state + change, no sudo, runnable freely. +- *Repair* — reset (fresh MAC, deletes+recreates the NM profile), DNS override + test (mutates resolver, auto-reverts), portal force. Each needs an explicit + confirm, shows that it's privacy/state-changing, and verifies cleanup. A + Repair whose cleanup can't be verified ends in a visible =cleanup-unverified= + state, never a silent success. + +* Failure states, messages, recovery + +Each row below gives the *exact, final* user-facing string (not a template) with +=<placeholders>= for redacted evidence, plus the evidence field included and the +next action. The string is canonical: every surface renders the same text, so +there's one source of truth. + +Per-surface rendering of the canonical string: +- *Indicator* — the matching glyph + CSS class; the string is the tooltip + (untruncated). +- *Notification* (=notify=) — title = the failure label, body = the string. +- *CLI* — the string on stderr; =--json= puts it in =error.message= with the + evidence in =error.detail= and a stable =error.code=. +- *Panel* — the string as the section banner, with the diagnostic step's evidence + shown beneath. +Evidence is always redacted per the redaction rules (SSID/host shown; PSK/EAP/ +portal tokens never). + +- *associated, no DHCP* — "Connected to <SSID>, no IP (DHCP failed)" → + evidence: SSID, iface → reset / reconnect. +- *no-internet* — "On <SSID>, no internet (gateway reachable, no route out)" → + diagnose / switch network. +- *captive* — "Captive portal at <host> — login required" → Open portal. +- *DNS hijack* — "DNS is being redirected (portal)" → Open portal. +- *DNS broken* — "DNS not resolving (hotel DNS down); 1.1.1.1 works" → use + override / report. +- *HTTP intercepted* — "Traffic is being intercepted before it leaves" → Open + portal. +- *sudo declined* — "Reset needs admin; it was declined — nothing changed" → + retry with auth. +- *command timed out* — "<op> timed out; the system was left unchanged" → retry. +- *partial mutation* — "<op> partially applied: <what>; rolled back to <state>" + → review. +- *missing speedtest-go* — "speedtest-go not installed" → install hint. +- *no wifi hardware* (desktop) — wifi rows hidden; ethernet-only view. +- *wifi rfkill-blocked* — "WiFi is blocked (rfkill)" → unblock. The indicator + detects a soft-blocked radio (=rfkill list= shows the radio off though hardware + is present) and shows this distinct from disconnected. =net repair rfkill= (and + =net doctor --fix= as its first step) runs =rfkill unblock wifi= + =nmcli radio + wifi on= and reconnects. This is the framework-laptop case: an out-of-power + shutdown sometimes leaves wifi soft-blocked at next boot, and yes — the module + recovers it (the rfkill state is the indicator; the rfkill repair / doctor is + the one-step fix). A *hard* block (physical switch) is reported as + not-recoverable-in-software with that message. +- *wifi rfkill hard-blocked* — "WiFi is blocked by the hardware switch" → + evidence: rfkill hard state → flip the physical switch. +- *wrong password / missing secret* — "Saved password for <SSID> was rejected" → + evidence: SSID, NM auth-failure reason → re-enter the password. +- *enterprise auth/cert failure* — "Enterprise login failed for <SSID> (802.1X)" + → evidence: SSID, EAP failure reason → edit the profile in nmtui/nmcli. +- *upstream / AP / provider* — "On <SSID>, link is fine but the network has no + uplink" → evidence: gateway reachable, no route out, not a portal → switch + network or contact the venue. +- *VPN-routed* — "Connected; internet is routed through a VPN (<dev>)" → + evidence: default route on a tun/wg device or non-NM DNS owner → check the VPN, + not WiFi. +- *HTTP interception, no parseable portal URL* — "A portal is intercepting + traffic but didn't give a login link" → evidence: HTTP code, redirect host → + opens neverssl + the gateway page to log in manually. +- *DNS override cleanup unverified* — "Couldn't confirm DNS was restored after the + test" → evidence: iface, attempted revert → revert DNS manually + (=resolvectl revert <iface>=). + +Each message names whether the system was left unchanged, partially changed (with +what), or fully changed, so the user knows the residue. + +* Doctor: escalation, classification, terminal states + +=net doctor= diagnoses, classifies the failure, then (with =--fix=) applies the +*lightest* repair that fits and re-checks — it never loops destructive repairs +against a failure they can't fix. Each failure resolves to one of four outcomes, +and the doctor stops at any terminal one: + +- =fixable= — a local repair should help. Escalate lightest-first: rfkill-unblock + → reset (fresh MAC) → bounce (full stack) → portal, re-probing after each, and + stop as soon as the probe returns online. +- =needs-user-action= (terminal) — no reset/bounce will help; doctor stops and + names the exact next step. Covers: wrong WPA password / missing NM secret + (enter the password), locked keyring or polkit denial (retry with auth), + enterprise 802.1X cert/identity failure (edit the profile in =nmtui=/=nmcli=), + captive portal login-required (open the portal + accept terms). Doctor must not + delete/recreate the profile against these — that loses the saved password and + makes things worse. +- =upstream-not-local= (terminal) — the local link is up but the problem is past + it: AP has no uplink, gateway down/dropping traffic, DHCP server broken, ISP + outage, portal backend failing. =diagnose= proves it (link up + IP + gateway + reachable, but no route out and not a captive redirect), and =doctor --fix= + stops after local repairs are exhausted with "local repairs tried; likely + upstream/AP/provider" + the evidence. Next action: switch networks or contact + the venue. +- =deferred/vpn= (terminal for v1) — an active VPN / policy route / non-NM + resolver owns the default route or DNS, so "no internet" may be the VPN's fault, + not WiFi's. v1 *detects* this (default route on a =tun/wg= device, or DNS owned + by something other than the NM link) and classifies it separately — "link is + fine; internet is VPN-routed" — rather than misclassifying it as a WiFi failure. + v1 does not repair it (VPN management is Phase 5); it names the VPN as the likely + owner and stops. + +** DNS handling in doctor (explicit per class) +- *Captive DNS hijack* — open the portal (the hijack clears on login). No DNS + mutation. +- *Broken resolver, 1.1.1.1 works* — doctor offers an explicit *temporary* 1.1.1.1 + override as a repair with cleanup verification (auto-revert, =cleanup_verified=); + without =--fix= it only recommends the command. It does not leave a permanent + resolver change. +- *Port-53 / egress blocked* (even 1.1.1.1 fails) — terminal =upstream-not-local=; + doctor stops, since it's not locally fixable. + +* Failure-mode coverage + +For each common field failure: does =net diagnose= detect it, can =net doctor +--fix= repair it, and what terminal user action remains when it can't. (The +=needs-user-action= / =upstream-not-local= / =deferred/vpn= outcomes are defined +above.) + +| Failure mode | diagnose detects | doctor --fix | terminal user action | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| rfkill soft block | yes | yes (unblock) | none | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| rfkill hard block | yes | no | flip the physical switch | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| no wifi hardware | yes | n/a | use ethernet | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| associated, no DHCP | yes | yes (reset/bounce) | none, else switch network | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| gateway unreachable | yes | yes (bounce) | switch network if it | +| | | | persists | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| captive DNS hijack | yes | opens portal | log in at the portal | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| broken DNS, 1.1.1.1 works | yes | yes (temp override, | report the venue's DNS | +| | | auto-reverted) | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| HTTP captive portal | yes | opens portal | log in at the portal | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| HTTP interception, no | yes | opens neverssl + gateway | log in manually | +| parseable URL | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| upstream / AP outage | yes (link up, no route out) | no (stops after local) | switch network / contact | +| | | | venue | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| wrong WPA password / | yes | no | enter the password | +| missing secret | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| enterprise auth / cert | yes | no | edit the profile in | +| failure | | | nmtui/nmcli | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| duplicate SSID / | yes (UUID-keyed) | yes (activate by UUID) | none | +| connection-name | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| hidden SSID | yes | yes (connect by name) | enter SSID + password | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| multiple active links | yes | n/a | pick the interface | +| (wifi+tether) | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| wedged NetworkManager | yes | yes (bounce → restart NM) | none, else reboot | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| slow / hung command | yes (degraded) | retries within budget | retry | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| stale / corrupt cache | yes | self-heals (atomic + | none | +| | | invalidation) | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| DNS cleanup failure | yes | flags cleanup-unverified | revert DNS manually | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| missing speedtest backend | yes | n/a | install speedtest-go | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| VPN / policy-routing | yes (route/DNS ownership) | no (deferred to Phase 5) | check the VPN | +| interference | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| + +* Observability — logging + redaction + +- *Event log*: JSONL at =$XDG_STATE_HOME/net/events.jsonl= (fallback + =~/.local/state/net/events.jsonl=), size-rotated (e.g. 1 MB × 3). Every + mutating op and probe appends an event: =ts, op, argv (redacted), exit_code, + stderr_tail, elapsed_ms, iface, nm_uuid, probe_url_class, http_code, + redirect_host, cache_event=. +- *Redaction (always on)*: PSKs, EAP identities/passwords, NM secrets, and + portal query tokens are never logged. MAC addresses, full IPs, and SSID are + redacted when configured (=redact_mac=, =redact_ip=, =redact_ssid= in config). +- *Post-failure diagnosis*: =net doctor --json= is the snapshot + recommendation + (diagnose plus the suggested repair), =net diagnose --json= the raw report, and + the event log the history. =net doctor= is the console-recoverable entry point + (reachable as =make online= / =make net-doctor=). +- *Secret-leak tests*: assert no PSK/EAP/portal-token ever appears in any JSON + output, log line, or error message. + +* Indicator (task #C — Phase 1, the fast win) + +** States (internet sub-state on top of link state) +- online — associated and the probe returned 204. Normal icon. +- captive — associated, probe hit a portal. Distinct glyph + warning CSS class; + tooltip names the portal host; left-click opens diagnostics with the portal + ready to open (Phase 2+; see interactions for the Phase-1 interim). +- no-internet — associated, probe failed (no portal, no 204). Distinct glyph + + warning class. +- degraded — =net status= couldn't read link state within budget (slow/failed + nmcli). Neutral glyph, =net-degraded= class. Never blocks the bar. +- rfkill-blocked — the radio is soft-blocked (=rfkill=), distinct from + disconnected. Distinct glyph; the fix is =net repair rfkill= / =net doctor=. +- connecting / disconnected / airplane / wired — as today, plus wired shown + correctly even when it appears after session start. (airplane is now this + module's state, absorbed from the retired airplane module.) + +** Glyphs +Nerd-font codepoints, final values verified live before merge (same discipline +as wtimer). Reuse the signal-strength ramp already in =waybar-netspeed=; add a +captive / no-internet / degraded overlay glyph. + +** Tooltip +SSID + signal + IPv4 + gateway + the throughput readout (absorbed from +netspeed) + the last probe result and its age (stale/expired hinted). + +** Interactions (phase-aware; no keyboard-modifier clicks — waybar can't qualify +clicks by modifier, so the rich actions live in the panel, not ctrl/super-click) +Clicks never block the bar: each dispatches a detached background job and reports +via =notify=, single-flight per action. +- *Phase 1 (no panel yet)*: left-click runs =net probe= + notify (refreshes the + state on demand) and keeps the existing =pypr toggle network= scratchpad as the + interim manager; right-click runs =net repair reset= in the background + + notify; middle-click runs =net portal=. +- *Phase 2+ (panel exists)*: left-click opens the panel (focused on the relevant + section — diagnostics when captive); right/middle keep the background + reset/portal shortcuts. + +* Panel (tasks #B + #C diagnostics — Phases 2-3) + +GTK4 + gtk4-layer-shell, pocketbook scaffold (src-layout package, unittest, +Makefile, gtk4-layer-shell anchored dropdown under the bar). One panel shell, +reused by the future desktop-settings panel. + +Sections: +1. *Connections* — list, MRU-first, active marked, live signal bars for in-range + wifi; row click switches; buttons for add / edit / remove; a rescan control. +2. *Diagnose* (read-only) — Probe (204/captive, shows portal URL + Open), Gateway + ping, DNS config. Streaming step output (the diagnostics contract). +3. *Repair* (mutating, confirmed) — tiered lightest-first: Unblock rfkill, Reset + (fresh MAC), Bounce (full stack), DNS override test, Force portal. A "Get me + online" button runs =net doctor --fix= (the auto-escalating sequence). +4. *Speed test* — Run button, progress, down/up/ping result + last-run line. + +** Panel state, cancellation, permissions +State machines for: connection-list loading, rescan-in-progress, +activation-in-progress, diagnose-running, repair-running, speedtest-running. Plus +the real terminal states on this two-machine fleet: no-wifi-hardware (desktop → +ethernet-only view) and missing speedtest-go. (No GPG-key state — there's no +credential store; secrets live in NM.) ("No NetworkManager" is not a modeled +state — NM is always present +on these machines; if nmcli is somehow absent the panel shows a single hard-error +and exits.) Long operations show elapsed time and are cancellable where the +underlying op allows (rescan, speedtest, probe); clearly non-cancellable ones +(an in-flight activation) show elapsed + a disabled control. Permission-denied +(sudo/polkit declined) is a first-class outcome with the "nothing changed" +message, never a silent failure. + +Interaction-pattern catalog (=~/code/rulesets/patterns/=) principles that apply: +- transient-state-buttons — all the network levers in one place, reachable by + one chord (the bar click), state visible. +- default-most-common-friction-proportional — connections MRU-ordered so the + common pick is first; destructive ops (remove) and privacy-changing ones + (reset, override) get a confirm, switching does not. +- one-prompt-picker-typed-prefix — if the connection picker ever goes + keyboard-driven, kind (wifi/eth/saved/in-range) + name in one typed picker. + +** Panel UX flow (settle before Phase 2) +The concrete interaction defaults, so the GTK build isn't inventing them: +- *Default focus*: the Connections section, current connection's row selected. If + the indicator opened the panel because of a captive/no-internet state, focus + Diagnose instead with the relevant action highlighted. +- *Row content*: glyph (signal bars / wired / active check) + name + a secondary + line (security type, "active"/last-used). The active row is visually pinned at + top of its group. +- *Buttons*: one *primary* per section (Connections: Connect to the selected row; + Diagnose: Run diagnose; Repair: "Get me online"; Speed test: Run). Secondary + actions (add / edit / remove / rescan; individual repair tiers) are smaller and + grouped. +- *Disabled rules*: Connect disabled on the already-active row; Repair tiers + disabled while one runs; Speed test disabled while running; add/edit disabled + for enterprise (with the "edit in nmtui/nmcli" hint). +- *Confirmations* (exact wording): Reset → "Reset <SSID>? This drops the + connection and reconnects with a new MAC."; Bounce → "Restart networking? All + links drop briefly."; DNS override → "Temporarily set DNS to 1.1.1.1 for the + test? It reverts automatically."; Remove → "Forget <SSID>? The saved password is + deleted." +- *"Get me online" reporting*: shows each escalation step live (Unblock rfkill → + Reset → Bounce → Portal) with per-step pass/fail and stops at the first that + restores internet or at a terminal state, naming the next action. +- *After close*: the bar reflects the new state immediately (signal/refresh on + next poll); a running speedtest/diagnose keeps running and notifies on finish + (panel close doesn't cancel it). +- *Keyboard*: Esc closes; Tab moves between sections; arrows move rows; Enter + fires the section primary; the connection list is type-to-filter. + +* Connection management (nmcli) + +- Every op via nmcli per the nmcli contract above (terse, escaped, UUID-keyed, + bounded =--wait=). +- MRU ordering from NM's =connection.timestamp= (last activated), descending. +- Ethernet appears in the list whenever a wired device is present, selectable at + any time; switching just brings the chosen connection up. +- *Mutation safety + rollback*: switching keeps the current connection up until + the new one activates successfully (=nmcli --wait 30=); on failure it does not + tear down the working link, surfaces the failure, and leaves the prior + connection active. =net down= notes that NM may auto-reactivate a profile and + reports the post-op active connection so the user isn't surprised. A switch that + needs a password it doesn't have prompts (or fails with "password required"), + never silently strands. The exact NM command sequence (preflight active-state + read → activate target → verify default route → on failure, confirm prior + still up) is pinned in the engine and tested against fake nmcli. +- *Add/edit scope*: open + WPA-PSK only in v1. Existing saved profiles of any + type (including enterprise) can be *activated*; editing an enterprise profile + shows "edit via nmtui/nmcli" rather than a broken partial form. + +* Connection secrets (no separate store) + +Per Craig's call: don't build a parallel credential store. Settings and secrets +live where NetworkManager already keeps them, so there's one source of truth and +no extra dependency (no GPG, no gpg-agent, no =~/.config/net/connections=). + +- *Where secrets live*: =/etc/NetworkManager/system-connections/<name>.nmconnection=, + root-owned =0600=, with the PSK/EAP secret stored inline (the default + =secret-flags=0= "owned by NM"). That's already secure-at-rest (root-only) and + is what =nmcli= reads/writes. +- *How we touch them*: every add/edit/remove goes through =nmcli= (=connection add + / modify / delete=), which writes the =.nmconnection= with the right ownership + and perms. We never read or write =system-connections= files directly (root) and + never copy a secret out of them. +- *No export / import / sync* — there's nothing to sync. A new machine gets its + connections the way it always has (the user joins, or restores NM profiles), + not from a tool-specific vault. +- *config file*: =~/.config/net/config= still exists, but only for non-secret + preferences (speedtest server, redaction flags, probe TTL). It holds no + credentials. +- *No secret leakage*: PSK/EAP never appear in =net=' =--json= output, the event + log, or error text (tested) — even though NM is the store, our surfaces must not + echo a secret =nmcli= happens to return. + +* Speed test + +- Backend: *=speedtest-go=* (=--json=, =--server=, =--no-download/--no-upload=), + already installed on velox (AUR =speedtest-go-bin=). No new dependency for v1. + librespeed-cli is the documented fallback for a self-hosted LibreSpeed server. +- =net speedtest --json= parses speedtest-go's JSON into the =speedtest= schema. +- *Server policy*: auto-select nearest by default; allow a pinned server id in + =~/.config/net/config=. +- *Timeout + cancellation*: a hard run timeout (e.g. 60s); the panel run is + cancellable (kills the child). Offline / rate-limited / no-server errors map to + the failure-message table. +- *Tests*: fixture JSON (success) and fixture stderr (offline, no server, + malformed output) drive =net speedtest= parsing without touching the network. + +* Help + documentation + +In-app help has three layers, each reachable in the situation it's needed: + +- *CLI help (works from a dead-GUI TTY)*: =net --help= lists the subcommands in + one screen; =net <cmd> --help= documents each (flags, what it mutates, the + console-recovery targets). The Makefile targets are self-describing (=make help= + lists =online= / =net-doctor= / etc. with one-line descriptions). This is the + layer that matters most when you're at a console with no network. +- *Panel help (in the GUI)*: a small =?= affordance in the panel header opens an + inline help pane — what each section does, which Repair actions mutate state, + what the indicator glyphs/colors mean. Per-control tooltips on the less-obvious + buttons (rfkill, bounce, DNS override). No external help browser. +- *User guide (the durable doc)*: a README / docs page covering every command, + the indicator states + glyphs, the panel sections, the config file keys, the + recovery make targets, troubleshooting (the failure-message table), and + rollback. Written so a future session — or Craig six months out — can operate + and recover the module from the doc alone. + +The failure-message table above is the single source of truth for the +troubleshooting text; the guide and the panel help both render from it rather +than restating it. + +* Enhancement radar + +Low-cost adjacent affordances, each dispositioned so cheap wins aren't lost and +the v1 panel stays focused. (Several are already in v1 by virtue of other +sections; marked here so the consideration is visible.) + +| Enhancement | Disposition | Reason | +|-------------------------------------+-------------+--------------------------------------------------------| +| Open / copy portal URL | v1 | already in the captive flow; trivial Open + Copy | +|-------------------------------------+-------------+--------------------------------------------------------| +| Forget network | v1 | it's the remove op, already specced | +|-------------------------------------+-------------+--------------------------------------------------------| +| Rescan now | v1 | already a Connections control | +|-------------------------------------+-------------+--------------------------------------------------------| +| Retry with hardware MAC | v1 | captive already has --hardware-mac; expose in Repair | +|-------------------------------------+-------------+--------------------------------------------------------| +| Pin speedtest server | v1 | already a config key | +|-------------------------------------+-------------+--------------------------------------------------------| +| Copy redacted doctor report | v1 | cheap, serves the observability/support goal | +|-------------------------------------+-------------+--------------------------------------------------------| +| Show last good network / result | vNext | needs small history persistence | +|-------------------------------------+-------------+--------------------------------------------------------| +| Watch mode for net doctor | vNext | a --watch loop; handy at a TTY, not v1-critical | +|-------------------------------------+-------------+--------------------------------------------------------| +| Actionable desktop notifications | vNext | dunst supports actions; extra wiring | +|-------------------------------------+-------------+--------------------------------------------------------| +| Keyboard connection picker (fuzzel) | vNext | the typed-prefix pattern; panel covers v1 | +|-------------------------------------+-------------+--------------------------------------------------------| +| QR-code share / import WiFi | rejected | low value for a personal 2-machine setup; phones do QR | +|-------------------------------------+-------------+--------------------------------------------------------| + +* Waybar wiring + +- Replace =custom/netspeed= with =custom/net= in the bar's module list (same + slot). +- Module def: =exec: waybar-net=, =return-type: json=, =interval: 2=, a =signal= + for on-demand refresh (next free signal after wtimer's 14), =on-click=, + =on-click-right=, =on-click-middle= per the phase-aware interactions (each + dispatches a detached job, never blocks). +- Remove the old =on-click: pypr toggle network= scratchpad only once the panel + replaces it (Phase 2); Phase 1 keeps it as the interim manager. + +* Testing plan (TDD) + +- *Engine (normal)* — fake =nmcli= + =curl= + =speedtest-go= on PATH; assert + command sequences and parsed/emitted JSON for status, list, up/down, + add/edit/remove, probe, diagnose, repair, speedtest. Pure state/format + functions tested directly. JSON schemas locked by example. +- *Portal parser* — already covered in =tests/captive= (Normal/Boundary/Error + + the real SONIFI body). The engine's native probe reuses the same cases. +- *nmcli parsing* — escaped colon/backslash/newline in SSID, duplicate names, + hidden SSID, non-ASCII, wired-mid-session, multi-active (wifi+tether). +- *Failure + concurrency (the risky classes)* — slow/hung nmcli/curl/speedtest + (degraded state within budget), concurrent =net status= probe refresh + (single-flight), corrupt cache (recovered), stale cache after SSID change + (invalidated), permission denied / sudo declined, DNS-override cleanup failure + (=cleanup-unverified=), NM partial activation (rollback keeps prior link), + secret redaction, missing speedtest-go, no wifi hardware, rfkill soft/hard + block. +- *Doctor classification* — fixture-driven =net doctor= over fake nmcli/curl + asserting the right terminal classification + that =--fix= stops before + destructive repairs: auth failures (=needs-user-action=), upstream/AP failure + (=upstream-not-local=), VPN-routed failure (=deferred/vpn=), and the DNS classes + (hijack → portal, broken-but-1.1.1.1-works → offered override, egress-blocked → + upstream). Assert the failure-mode coverage table's "detects / repairs / terminal + action" holds for each row. +- *Indicator* — drive =net status --json= through =waybar-net=, assert the JSON + per state (online / captive / no-internet / degraded / wired / disconnected / + rfkill), iface override via env. +- *Panel* — pocketbook-style: backing logic (list ordering, op dispatch, + state-machine transitions), not GTK widgets. +- *NM secrets / no-leak* — add/edit writes the secret into NM via nmcli (asserted + against fake nmcli, never to a tool-owned file); assert no PSK/EAP appears in any + =--json=, log line, or error (there is no credential store to round-trip). +- *Live checklist (gated out of the suite)* — a "Manual testing and validation" + task per phase for the real-network states (captive at a hotel, no-internet, + switch under load, reset, speedtest) that can't be faked. + +** Harness + coverage gate +The concrete contract, matching the repo's existing convention (not pytest — the +dotfiles suites are =unittest=, run by =make test= as =python3 -m unittest= over +=tests/*/test_*.py=; 33 suites today): +- *Framework*: =unittest=. Each suite is =tests/<name>/test_<name>.py= + (=tests/net/=, =tests/waybar-net/=), collected by the existing =make test= loop + — no new runner, no pytest dependency. +- *Fakes on a temp PATH*: =fake-nmcli=, =fake-curl=, =fake-speedtest-go=, + =fake-rfkill=, =fake-resolvectl= live as executable stubs in =tests/<name>/= + (the =tests/layout-navigate/fake-hyprctl= pattern). A fixture file encodes the + command→canned-output map and the stub appends each invocation to a log the test + asserts against. Subprocess timeouts are simulated by a stub that sleeps past the + budget; =net status= must still return the degraded state. +- *Waybar wrappers end-to-end*: =waybar-net= is run as a subprocess with the fake + PATH and the env overrides (iface, cache path), asserting the emitted JSON — same + as =tests/waybar-netspeed=. +- *Coverage*: coverage.py is absent system-wide (and not importable), so coverage + runs in a throwaway venv (=python3 -m venv=, =pip install coverage=, =coverage + run -m unittest=, =coverage report=) — the method the wtimer suite used (95%). + Target: *branch* coverage over =net/= and the wrapper, ≥ 90% on the pure + classifier/parser modules. + +** Coverage as a gap-finder, not a number (per phase) +Line coverage alone misses the branches that matter here, so each phase ends with +a *coverage-gap pass*, not just a percentage: +- After the first green run, read the branch report and map every uncovered branch + to either a new test or a consciously-excluded live-only behavior (with a comment + or a Manual-testing entry naming it). +- *Branch coverage is required* for the pure logic: the doctor classifier (every + outcome — fixable / needs-user-action / upstream-not-local / deferred-vpn), the + cleanup-unverified path, the redaction paths, the degraded hot-path fallback, the + timeout branches, and the portal/nmcli parsers. +- A phase isn't "done" until its coverage-gap pass is recorded — uncovered logic is + either tested or explicitly excused, never silently uncovered. + +* Files touched (planned, all in =~/.dotfiles=) + +- =net/= package (src-layout, like pocketbook) — engine + panel. +- =hyprland/.local/bin/waybar-net= — the indicator (replaces =waybar-netspeed=). +- =hyprland/.local/bin/net= — engine CLI entry (console-script shim). +- =hyprland/.config/waybar/config= — swap =custom/netspeed= → =custom/net=; + remove =custom/airplane=. +- =hyprland/.config/waybar/style.css= — captive / no-internet / degraded / + rfkill classes; remove airplane classes. +- =tests/net/=, =tests/waybar-net/= — suites. +- =captive= — refactor: extract probe + reset into functions callable + non-interactively (a =--json= probe mode) so the engine reuses them. +- =~/.config/net/config= — seed config (probe TTL, speedtest server, redaction + flags). No secrets; not a credential store. +- dotfiles =Makefile= — add the console-recovery targets (=online=, =net-doctor=, + =net-status=, =net-diagnose=, =net-portal=, =net-reset=, =net-bounce=). +- *Deletions once net ships* (the airplane module is absorbed): + =hyprland/.local/bin/waybar-airplane=, =hyprland/.local/bin/airplane-mode=, + =tests/waybar-airplane/=, =tests/airplane-mode/=, and the =custom/airplane= + module + its css. +- archsetup Hyprland step — add =gtk4-layer-shell=, =python-gobject=, + =speedtest-go-bin= to the install lists (the only archsetup change; no =gpg= + added, secrets stay in NM's store). + +* Resolved decisions (Craig's calls + this response) + +1. Panel UI tech → GTK4 + gtk4-layer-shell, shared pocketbook scaffold (one + panel shell, reused by the desktop-settings sibling). +2. Engine language → Python =net= package; shells out to =captive= for the + portal-force flow, native cheap probe for the bar path. +3. Connectivity probe → split cadence (fast link poll every 2s + slow cached + internet/captive probe, TTL ~45s) with single-flight + atomic cache. +4. No keyboard-modifier clicks (waybar can't qualify them) — the panel hosts the + rich actions; bar clicks dispatch detached jobs (phase-aware). +5. No separate credential store (Craig's call, cj). Secrets live in NM's own + =system-connections= (root =0600=, inline), touched via nmcli. No GPG, no + gpg-agent, no =~/.config/net/connections=. Supersedes the earlier GPG-store + design. +6. =custom/netspeed= absorbed into =custom/net=; throughput moves to the tooltip. +7. Speed-test backend → =speedtest-go= (already installed), not a new + librespeed-cli dependency; librespeed-cli is the self-hosted fallback. +8. Code lives in the dotfiles repo; archsetup only installs deps. +9. v1 add/edit scope = open + WPA-PSK; enterprise/802.1X is activate-only, + add/edit is vNext (settled by Craig 2026-06-29 — no enterprise networks in his + history, so the form would be unused UI). +10. =net doctor= is in v1 (Craig's call, cj) — a one-shot diagnose+fix mode, + reachable from a TTY via =make online= / =make net-doctor=. (The earlier + "defer the doctor/bundle command" decision is reversed.) +11. Diagnose (read-only) and Repair (mutating, confirmed) are separated in the + panel and the CLI; Repair is tiered lightest-first (rfkill → reset → bounce). +12. =custom/net= absorbs the airplane module (Craig's call, cj); the standalone + airplane module + scripts + tests are deleted once net ships. +13. Repair includes a full-stack bounce and an rfkill-unblock (Craig's calls, + cj) — the latter recovers the framework-laptop post-power-loss soft-block. +14. VPN / WireGuard is a planned Phase 5 (Craig's call, cj), not a permanent + exclusion. + +* Implementation phases + +- *Phase 1 — Indicator + console recovery (task #C).* =net status= + =net probe= + (native cheap probe, reusing captive's logic) + the =captive= probe refactor + + =waybar-net= + the split-cadence cache (single-flight, atomic, stale classes) + + CSS states (incl. rfkill) + performance budget. Plus the CLI-only recovery path: + =net repair= tiers (rfkill / reset / bounce), =net doctor [--fix]=, and the + Makefile targets (=make online= etc.) — all testable without the GTK panel. + Absorbs the airplane state and removes the standalone airplane module. Interim + left-click keeps the existing scratchpad until the panel lands. + - *Acceptance*: fresh-login waybar smoke test shows correct state on + online/captive/no-internet/wired/rfkill; =net status= stays within budget + under a fake slow nmcli (degraded state); =net doctor --fix= recovers a + soft-blocked radio from a TTY; the live captive checklist passes at a real + portal; the airplane state works and the old airplane module is gone; + reverting = swap =custom/netspeed= + =custom/airplane= back. +- *Phase 2 — Panel shell + connection management (task #B core).* GTK4 + layer-shell scaffold + =net list/up/down/add/edit/remove/rescan= + MRU list + + mutation safety/rollback + panel state machines. + - *Acceptance*: switch wifi↔wifi and ethernet↔wifi without stranding; a failed + switch leaves the prior link up; add/edit open + WPA-PSK writes the secret to + NM; remove confirms; panel states render for loading/rescan/activation. +- *Phase 3 — Diagnostics + speed test in the panel.* Wire =net diagnose= / + =net repair= / =net doctor= / =net portal= / =net speedtest= into the Diagnose + vs Repair sections; the "Get me online" button; portal Open button; speedtest + progress + cancel. + - *Acceptance*: diagnose runs read-only; each repair tier confirms + verifies + cleanup (DNS override reverts, shown); speedtest result parses from + speedtest-go and a fixture-driven failure shows the right message. +- *Phase 4 — Docs + rollout.* In-app help (=net --help= / per-command help, the + panel help affordance), README/user-guide (commands, panel, config, + troubleshooting, the make targets, rollback), and the manual dep step on ratio. + - *Acceptance*: =net --help= and each subcommand's help are complete; the + user-guide covers every command + the recovery targets; ratio rollout + documented. +- *Phase 5 — VPN / WireGuard (future).* Fold the existing archsetup wireguard + tooling into the same panel + CLI (=net vpn ...=). Out of the v1 milestone; + specced separately when picked up. + +* Open items / risks + +- gtk4-layer-shell dropdown anchoring under a waybar module needs the same + positioning work pocketbook solved; reuse it. (Phase 2.) +- The =captive= refactor must keep the standalone CLI behavior identical while + exposing a non-interactive =--json= probe; covered by the existing + =tests/captive= suite plus new probe-mode tests. (Phase 1.) +- speedtest-go server selection variance (nearest-server flor) — pin a server in + config if results are noisy. (Phase 3.) +- The background-probe kick from =net status= must be truly non-blocking (spawn + + detach); enforced by the single-flight lock and the performance benchmark test. + +* Rollback + +Each phase is independent. The indicator (Phase 1) is a drop-in replacement for +=custom/netspeed= (and =custom/airplane=); reverting is swapping those modules +back in the config and restoring their scripts. The panel is additive — not +wiring its clicks leaves the bar working as before. No credential store to roll +back (secrets stay in NM throughout). + +* Review findings [31/31] + +** DONE Define the structured diagnostics contract :blocking: +The spec says the engine "emits JSON" and that diagnostics "reuse =captive= +verbatim", but the current =~/.dotfiles/common/.local/bin/captive= flow is a +human-readable bash script that mixes diagnostics, sudo prompts, DNS mutation, +browser launch, and terminal prose. A GTK panel cannot reliably turn that into +clear state, progress, cancellation, or useful error messages. Define the +machine contract before implementation: every diagnostic step should have a +stable id, status (=pending/running/pass/warn/fail/skipped=), redacted evidence, +elapsed time, safety outcome, and next action. Keep =captive= as the interactive +CLI, but either refactor reusable probe/reset functions behind =net diagnose +--json= or make =captive= expose a non-interactive JSON mode. This blocks the +panel and logging work because otherwise the implementer must invent the +boundary. + +Disposition: accept — added the "Diagnostics contract" section (per-step id / +status / evidence / elapsed / safety / next_action) and the =captive= =--json= +probe-mode refactor under Architecture + Files touched. + +** DONE Specify user-facing failure messages and recovery actions :blocking: +The spec names failure states like =no-internet=, =captive=, failed probe, +failed reset, missing DNS, and missing speed-test backend, but it does not define +the messages the user sees or what each message tells them to do next. For this +feature, "error" is not enough: a user needs to know whether WiFi is associated, +whether DHCP succeeded, whether DNS is hijacked/broken, whether HTTP is +intercepted, whether sudo was declined, whether a command timed out, and whether +the system was left unchanged or partially changed. Add a message table for the +indicator, panel, and CLI with: failure class, visible text, evidence included, +redaction rule, and next action. This is blocking because UX quality here is the +product, not an implementation detail. + +Disposition: accept — added the "Failure states, messages, recovery" section +covering each class, the visible message, the "what changed" residue note, and +the next action across indicator/panel/CLI. + +** DONE Define the debug log and redacted support bundle :blocking: +There is no observability section. When this fails in a hotel or cafe, an agent +needs enough evidence to diagnose it without rerunning destructive actions. Add +log location, rotation/retention, JSONL event schema, command argv logging, +exit-code/stderr capture, elapsed time, selected iface, NM active connection +UUID, probe URL class, HTTP code, redirect host, DNS servers, and cache +read/write events. Also define a =net doctor --json= or =net debug-bundle= +command that emits redacted status, recent log events, dependency versions, and +a reproduction command. Redact SSID if configured, MAC addresses, portal query +tokens, PSKs, EAP identities/passwords, IPs when requested, and all GPG/NM +secrets. This blocks implementation readiness because post-failure diagnosis is +currently left to ad hoc terminal spelunking. + +Disposition: modify — accepted the JSONL event log, the schema, and the redaction +rules in full (new "Observability" section). Deferred the dedicated =net +debug-bundle= / =net doctor= command to vNext: for a single-user tool =net +diagnose --json= (the snapshot) plus the event log (the history) cover +post-failure diagnosis; a bundle command is gold-plating for v1. Recorded under +Out + Resolved decision 10. + +** DONE Pin the nmcli parsing and timeout contract :blocking: +The spec lists nmcli operations but not the exact fields, output modes, escaping +rules, ID semantics, or timeouts. This is risky because SSIDs and connection +names can contain spaces, colons, duplicates, hidden names, and non-ASCII; the +current =waybar-netspeed= already had an SSID parsing bug. The nmcli manual +documents =--terse=, =--get-values=, =--escape=, =--wait=, ID/UUID/path +selection, =passwd-file=, and built-in connectivity states +(=none/portal/limited/full/unknown=) at +https://man.archlinux.org/man/nmcli.1.en. The spec should require UUIDs for +saved-profile operations, explicit =--wait= budgets, parser tests for escaped +colons/backslashes/newlines/duplicate names/hidden SSIDs, and a decision on when +to use or ignore =nmcli networking connectivity [check]=. This is blocking +because the command wrapper is the core reliability boundary. + +Disposition: accept — added the "nmcli contract" section: terse + =--escape= + +=--get-values=, UUID-keyed ops, explicit =--wait= budgets, NM connectivity as a +cheap hint (our probe authoritative), and the parser test matrix. + +** DONE Define cache concurrency, atomicity, and stale-state behavior :blocking: +=net status= may spawn =net probe= whenever the cache is stale, but the spec +does not define locking, process coalescing, atomic writes, crash cleanup, or +what happens when the probe hangs. With a 2s Waybar interval, a bad network could +start overlapping probes, corrupt the runtime cache, or keep showing stale +"online" while the link is gone. Add a single-flight lock under +=$XDG_RUNTIME_DIR/waybar=, atomic write+rename for cache updates, max probe +runtime, stale age classes (fresh/stale/expired/unknown), cache invalidation on +iface/SSID/connection UUID change, and tests for concurrent =net status= calls. +This blocks the fast-path design because it is the main performance and +correctness risk. + +Disposition: accept — added "Concurrency, atomicity, staleness" under the +Connectivity model: flock single-flight, temp+rename atomic write, ≤6s probe +timeout, fresh/stale/expired/unknown classes, iface/SSID/UUID invalidation, stale +lock reclaim, plus concurrency tests in the test plan. + +** DONE Bound hot-path performance with measured budgets :blocking: +The spec says the cheap poll should be sub-100ms, but the proposed fast path +still may call multiple =nmcli= commands every two seconds, read sysfs, parse +throughput, and maybe spawn a background probe. The existing =waybar-netspeed= +had a deliberate sleep for throughput sampling; replacing it must define how +throughput is sampled without sleeping in the bar path. Add a per-command budget +for =waybar-net= and =net status=, a maximum number of subprocesses on the hot +path, a timeout for every subprocess, benchmark tests with fake slow =nmcli=, +and a rule that the indicator emits a degraded JSON state rather than blocking. +This is blocking because Waybar custom modules can visibly freeze or lag when +their exec path stalls. + +Disposition: accept — added the "Performance budgets" section: <100ms typical / +<250ms worst, throughput sampled across the poll interval (no in-process sleep), +one nmcli call max on the hot path, timeouts on every subprocess, the degraded +state, and a fake-slow-nmcli benchmark test. + +** DONE Make click actions non-blocking and visible :blocking: +Waybar right-click runs =net reset= and middle-click runs =net portal= directly. +Those operations can require sudo, open browsers, mutate DNS, delete/recreate NM +profiles, or hang on network commands, but Waybar click handlers provide no +panel, terminal, progress, or cancellation surface by default. Define whether +right/middle click instead opens the panel focused on the action, dispatches a +background job with notifications, or is removed from v1. If kept, specify +single-flight behavior, how sudo/polkit prompts surface, how success/failure is +reported, and how the user can inspect logs. This blocks UX readiness because +the fastest remediation path is currently the easiest place to hide failure. + +Disposition: modify — accepted the concern; made the interactions phase-aware and +non-blocking. Every click dispatches a detached, single-flight background job and +reports via =notify=; sudo surfaces through polkit/the normal prompt; failures go +to the notify + the event log. In Phase 1 (no panel) left-click runs probe + +notify and keeps the scratchpad; from Phase 2 left-click opens the panel focused +on the action. Recorded in the Indicator "Interactions" subsection. + +** DONE Specify connection mutation safety and rollback :blocking: +The spec says row click switches connections and remove gets a confirm, but it +does not define what happens when a switch partially succeeds, disconnects the +current working link, needs a password, loses the default route, or triggers +auto-activation. The nmcli manual warns that =connection down= does not prevent +future auto-activation and may internally block a profile until user action. +Define preflight, the exact NM command sequence, whether the old active +connection is kept until the new one proves usable, when rollback is attempted, +how long activation waits, and what the panel says when rollback fails. This is +blocking because the module can strand the user offline. + +Disposition: accept — added "Mutation safety + rollback" under Connection +management: keep the prior link up until the target activates (=--wait 30=), no +teardown on failure, password-required surfaced not stranded, =net down= reports +post-op active state + the auto-reactivation caveat, and the pinned NM command +sequence is tested against fake nmcli. + +** DONE Define the credential-store security model :blocking: +The GPG store is described as optional and default-unencrypted, but the spec does +not define file modes, schema, secret-source rules, import/export prompts, +recipient verification, stale secret handling, or what is logged. It also says +NM remains source of truth while the user-owned store contains PSK/EAP secrets, +which creates two truth sources for sensitive data. Add a precise schema, +=0600= file creation with parent-dir permissions, encrypted-recipient checks, +plaintext warning text, explicit opt-in flow, redaction requirements, behavior +when NM has a secret not in the store, behavior when the store has a secret NM +rejects, and tests for no secret leakage in JSON/logs/errors. This blocks Phase +4 and the full spec because otherwise the implementer must make security +decisions mid-code. + +Disposition: accept — rewrote "Credential storage" with the versioned schema, +=0600= file / =0700= dir, recipient verification on opt-in, the plaintext +warning, secret-source rule (entered/exported, never harvested from root store), +the two-source reconciliation policy (NM wins live, store wins for what NM +lacks, stale-secret flagging), and the no-leak tests. + +** DONE Define EAP, enterprise WiFi, and unsupported connection behavior :blocking: +The store says "PSK/EAP" and connection management says add/edit, but there is +no v1 contract for WPA-Enterprise fields, certificates, identity vs anonymous +identity, hidden networks, static IP, proxy settings, metered flags, MAC +randomization, or 802.1X prompt behavior. Either scope v1 to open/WPA-PSK plus +existing saved-profile activation, or define the minimum EAP form and the +unsupported-state messages. This blocks add/edit/import because enterprise WiFi +is too sensitive to hand-wave. + +Disposition: modify (scope) — scoped v1 to open + WPA-PSK add/edit, with +*activation* of any existing saved profile (including enterprise). Enterprise / +802.1X add/edit, static-IP, proxy, metered, and MAC-randomization editing are +vNext, shown as "edit via nmtui/nmcli". Recorded in Scope/Out, Connection +management, and Resolved decision 9. + +** DONE Split read-only diagnostics from mutating remediation :blocking: +The panel's diagnostics section includes probe, bounce/reset, gateway ping, and +DNS override test in one area, while =captive= currently performs resets and +temporary DNS changes as part of its flow. Users need to know which buttons are +read-only and which mutate NM profiles, MAC mode, DNS, or browser state. Add +separate "Diagnose" and "Repair" actions, confirmations for destructive or +privacy-changing operations, explicit cleanup verification for DNS override, and +a terminal state when cleanup is unverified. This blocks readiness because +network repair must not surprise the user or leave hidden residue. + +Disposition: accept — split the panel into a read-only Diagnose section and a +confirmed, mutating Repair section (and split the CLI into =net diagnose= vs =net +repair=). Added =cleanup_verified= + a terminal =cleanup-unverified= state to the +diagnostics contract. + +** DONE Define panel state, cancellation, and permissions UX :blocking: +The panel sections list buttons and a streaming output area, but not loading +states, disabled states, empty states, keyboard/focus behavior, cancellation, or +permission-denied handling. Add panel state machines for connection list loading, +rescan in progress, activation in progress, diagnostics running, speedtest +running, and no NetworkManager/no WiFi/no permissions/no GPG key/no +librespeed-cli. Each long operation should be cancellable where possible or +clearly non-cancellable with an elapsed-time display. This blocks the GTK work +because without it the implementer must invent the user flow. + +Disposition: modify — accepted the state-machine requirement (added "Panel state, +cancellation, permissions"), but scoped the state set to what can actually occur +on the two-machine fleet: dropped "no NetworkManager" as a modeled state (NM is +always present; a missing nmcli is a single hard-error exit) and kept +no-wifi-hardware, missing speedtest-go, no-GPG-key, plus the in-progress states +with elapsed-time + cancellation where the op allows. + +** DONE Verify speed-test dependency, server choice, and failure contract :blocking: +The spec chooses =librespeed-cli= and notes availability/default-server research +as an open risk, but Phase 3 still depends on parsing its JSON and showing +progress. I checked the upstream project page +(https://github.com/librespeed/speedtest-cli) and the AUR URL named by search is +not sufficient as a verified package/install contract in this spec. Add the +exact package name/source to install, command version expected, JSON shape, +server-selection policy, timeout, cancellation behavior, offline/rate-limited +messages, and tests with fixture JSON and fixture stderr. This blocks Phase 3 +because speed-test failure modes are otherwise undefined. + +Disposition: modify — verified live and changed the backend: =speedtest-go= (AUR +=speedtest-go-bin=, 1.x) is already installed on velox and supports =--json=, +=--server=, =--no-download/--no-upload=, so v1 needs no new dependency. +librespeed-cli (AUR =librespeed-cli= / =-bin=) is the documented self-hosted +fallback. Added the "Speed test" section with server policy, timeout, +cancellation, the failure-message mapping, and fixture-JSON/stderr tests. + +** DONE Define dependency installation and repo boundaries :blocking: +The files touched section alternates between archsetup paths and the external +dotfiles repo, while pocketbook has been folded into this repo and its previous +archsetup provisioning was intentionally removed. The spec should state where +the =net= package actually lives, which repository owns the scripts/tests, +whether =gtk4-layer-shell=, =python-gobject=, =librespeed-cli=, =gpg=, =nmcli=, +=curl=, and =resolvectl= are installed by archsetup or assumed present, and the +Makefile targets for test/lint/install. This blocks implementation because the +current path plan can produce code that is not installed on a fresh machine. + +Disposition: accept — added the "Repository + dependencies" section: all code in +=~/.dotfiles= (=net/= package in-tree like pocketbook, scripts in the hyprland +tier, tests under =tests/=), archsetup owns only the dep install +(=gtk4-layer-shell=, =python-gobject=, =speedtest-go-bin=; nmcli/curl/resolvectl +already present), Makefile =make test= collects the package suite, and a +daily-drivers note for ratio. Rewrote Files touched to match. + +** DONE Expand the test plan for failure, concurrency, and live verification :blocking: +The testing plan covers normal parsing and fake command sequences, but it misses +the riskiest behaviors: slow/hung =nmcli=/=curl=/=librespeed=, concurrent +=net status= cache refresh, corrupt cache, stale cache after SSID change, +permission denied, sudo declined, DNS override cleanup failure, NM partial +activation, duplicate connection names, secret redaction, missing optional +dependencies, no WiFi hardware, wired+tether+WiFi ambiguity, portal redirect +tokens, and Waybar click handlers. Add unit/fixture tests for each class plus a +manual/live checklist gated out of the normal suite. This is blocking because +the current plan would leave the exact "things that can go wrong here" mostly +untested. + +Disposition: accept — rewrote the Testing plan with the "Failure + concurrency" +class (slow/hung commands, single-flight, corrupt/stale cache, perm-denied, +cleanup-failure, partial activation, redaction, missing deps, no-wifi, +multi-active) and a per-phase live checklist gated out of the suite. + +** DONE Define status JSON schemas and compatibility rules +The spec says all subcommands take =--json= but does not define schemas. Add +versioned JSON examples for =status=, =probe=, =list=, =diagnose=, =speedtest=, +and error envelopes, including nullable fields and unknown/degraded states. This +is non-blocking for product direction but should be fixed before code so tests +can lock the CLI contract. + +Disposition: accept — added the "JSON schemas" section with versioned (=v:1=) +envelopes for status / probe / list / diagnose / speedtest and a shared error +envelope, including the degraded/unknown states. + +** DONE Rename or alias the phasing section for workflow compatibility +The spec has a usable =Phasing= section, but the spec-review workflow expects an +=Implementation phases= section that can be lifted into =todo.org=. Rename it or +add an alias heading during response. This is non-blocking because the existing +phase decomposition is understandable, but aligning the heading prevents future +workflow friction. + +Disposition: accept — renamed =Phasing= → =Implementation phases= and added +per-phase acceptance criteria. + +** DONE Add documentation and rollout acceptance checks +Rollback is described, but docs and rollout are thin. Add README/user-guide +updates for commands, panel behavior, config file, GPG opt-in, troubleshooting, +and rollback; add acceptance checks for each phase, including a fresh-login +Waybar smoke test and restoring =custom/netspeed=. This is non-blocking but +important for handing the feature to a future session without re-discovery. + +Disposition: accept — added per-phase acceptance criteria under Implementation +phases (incl. the fresh-login waybar smoke test and the =custom/netspeed= +restore), a Phase 4 "Docs + rollout", and (answering Craig's cj follow-up) a +dedicated "Help + documentation" section with the three help layers (CLI help, +panel help affordance, user guide). + +** DONE Add a failure-mode coverage table :blocking: +The spec now names many individual network failures, but it still does not carry +one compact coverage matrix that says, for each common failure mode, whether +=net diagnose= detects it, whether =net doctor --fix= can repair it, and what +terminal user action remains when it cannot. Add a table covering at least: +rfkill soft block, rfkill hard block, no WiFi hardware, associated/no DHCP, +gateway unreachable, captive DNS hijack, broken DNS where 1.1.1.1 works, HTTP +portal, HTTP interception without a parseable portal URL, upstream/AP outage, +wrong WPA password or missing secret, enterprise auth/cert failure, duplicate +SSID/connection-name ambiguity, hidden SSID, multiple active links, wedged +NetworkManager, slow/hung command, stale/corrupt cache, DNS cleanup failure, +missing speedtest backend, and VPN/routing interference. This blocks because +Craig asked for confidence that the diagnostics and doctor cover the real field +failures, and prose scattered across sections is too easy to misread. + +Disposition: accept — added the "Failure-mode coverage" section: a 22-row table +(every mode the finding named) with detect / doctor-fix / terminal-action +columns, conformed to the org-table standard (rules under every row, ≤120). + +** DONE Pin DNS repair semantics in doctor :blocking: +The spec diagnoses DNS hijack, broken hotel DNS, and the temporary 1.1.1.1 +override test, but =net doctor --fix= does not say whether it merely recommends +the override, applies a temporary override during recovery, or leaves DNS alone +after diagnosis. Define the exact behavior for each DNS class: captive hijack +should open the portal, broken DNS where 1.1.1.1 works should either offer an +explicit temporary repair with cleanup verification or recommend the command, +and port-53/egress blocking should stop as upstream/not locally fixable. This is +blocking because DNS is one of the most common "connected but unusable" failures +and the current doctor contract is ambiguous. + +Disposition: accept — added "DNS handling in doctor (explicit per class)" under +the new Doctor section: hijack → open portal (no DNS mutation); broken-but-1.1.1.1 +→ explicit temporary override with cleanup verification under =--fix=, recommend +otherwise; egress-blocked → terminal =upstream-not-local=. + +** DONE Make auth failures terminal user-action states :blocking: +Wrong WPA password, missing NM secret, locked keyring/polkit denial, enterprise +802.1X certificate/identity failure, and portal login-required are not fixed by +resetting or bouncing NetworkManager. The doctor sequence should classify these +as =needs-user-action= terminal states, stop before looping through destructive +repairs, and tell the user the exact next action (enter password, edit profile in +=nmtui=/=nmcli=, accept portal terms, provide cert/identity, or retry with +admin auth). This blocks because repeated reset/bounce against auth failures is +slow, noisy, and can make the network state worse without helping. + +Disposition: accept — added the =needs-user-action= terminal outcome to the +Doctor section: wrong password / missing secret / keyring-or-polkit denial / +802.1X cert-or-identity failure / portal-login-required all stop the doctor before +any destructive repair and name the exact next step. + +** DONE Define upstream/AP/provider failure terminal states :blocking: +Some failures are not client-repairable: AP has no uplink, hotel gateway is +down, DHCP server is broken, gateway drops traffic, ISP outage, or captive +portal backend is failing. The spec should define how =diagnose= proves "local +link is up but upstream is broken" and how =doctor --fix= stops after local +repairs are exhausted with a clear message like "local repairs tried; likely +upstream/AP/provider" plus the evidence. This blocks because users need to know +when to stop poking the laptop and switch networks or contact the venue. + +Disposition: accept — added the =upstream-not-local= terminal outcome: diagnose +proves link-up + IP + gateway-reachable but no route out and no captive redirect; +=doctor --fix= stops after local repairs with "local repairs tried; likely +upstream/AP/provider" + evidence → switch network / contact venue. + +** DONE Decide how VPN and policy routing affect v1 diagnosis +VPN/WireGuard management is Phase 5, but active VPNs, policy routes, DNS +overrides, and firewall killswitches can break apparent internet access in v1. +The current spec does not say whether v1 detects active VPN/policy routing and +classifies "network is fine, VPN route/DNS is broken" separately from WiFi +failure. Add either a v1 diagnostic check for active VPN/default-route/DNS +ownership with a "deferred repair" outcome, or explicitly state that VPN-routed +failures are out of scope and may be misclassified. This is blocking if Craig +expects the module to diagnose normal daily-driver network failures while VPN +tooling remains separate. + +Disposition: accept (chose the detect-and-classify option) — v1 detects an active +VPN / non-NM default route / non-NM DNS owner and classifies =deferred/vpn= ("link +is fine; internet is VPN-routed"), distinct from a WiFi failure. v1 does not +repair it (VPN management is Phase 5); it names the VPN as the likely owner and +stops. Added to the Doctor section + the coverage table + a doctor-classification +test. + +** DONE Remove stale GPG-store references from the resolved spec +The spec now decides "no separate credential store; secrets live in +NetworkManager", but the Testing plan still mentions =gpg round-trip= and =GPG +store= tests, and the panel-state list still mentions a no-GPG-key state. Remove +those stale references and replace them with NM-secret/no-secret-leak tests. +This is non-blocking for product behavior but blocking for implementation +clarity: otherwise tests will be written for a credential store that no longer +exists. + +Disposition: accept — replaced the Testing-plan =gpg round-trip= / =GPG store= +bullets with an "NM secrets / no-leak" test (add/edit writes the secret via nmcli; +assert no PSK/EAP in any JSON/log/error; no store to round-trip) and dropped the +=no-GPG-key= panel state. Residue from the cj-comment pass that dropped the store. + +** DONE Reconcile status, goal, and task text before implementation :blocking: +The spec status says "Implementation-ready with caveats" and "Phase 1 ready to +build", but the body still has an unresolved enterprise add/edit VERIFY, the +Goal still says "optional GPG-encrypted secret store", and the unified task title +still names "GPG-stored secrets" even though the accepted design removed the +store. Before implementation, make the top-level status, goal, scope, task +mapping, and resolved decisions agree with the current design. This blocks +readiness because a developer starting from the top of the file would still build +or plan around abandoned GPG-store behavior. + +Disposition: accept — fixed the Goal ("secrets stay in NM's own store"), the +=[#B]= task-mapping line (notes the "GPG-stored secrets" framing is superseded by +decision 5), the enterprise VERIFY (now resolved → Status updated), and corrected +the stale =pytest= mentions to =unittest= (the repo's actual harness). Top-of-file +status/goal/scope/decisions now agree with the design. + +** DONE Resolve enterprise add/edit scope or make the caveat explicit :blocking: +The spec still says "One open question for Craig: pull enterprise add/edit into +v1?" and points to a VERIFY in =todo.org=. That is a real product-scope decision: +if enterprise add/edit is in v1, panel forms, nmcli command sequences, tests, +error messages, and docs change materially; if it is out, the UI must consistently +show activate-only with "edit in nmtui/nmcli". Decide it in the spec before +implementation, or downgrade the status to =Ready with caveats= with this exact +accepted caveat. As written, the spec cannot be plain =Ready=. + +Disposition: accept — Craig decided (2026-06-29): enterprise add/edit is vNext, +activate-only in v1. Settled in the Status line, the Scope/Out bullet, decision 9, +and the VERIFY (now DONE in todo.org). The UI shows activate-only with "edit in +nmtui/nmcli" consistently. Evidence: 24 saved profiles, 0 enterprise. + +** DONE Define the concrete test harness and coverage gate :blocking: +The spec says TDD, fake binaries on PATH, and benchmark tests, but it does not +define the actual harness contract: pytest vs unittest for the =net= package, +where fake =nmcli=/=curl=/=speedtest-go=/=rfkill=/=resolvectl= live, how test +fixtures encode command histories, how subprocess timeouts are simulated, how +Waybar scripts are executed end-to-end, and how coverage is run. Add the exact +Makefile targets (=test=, =test-unit= or package-local =pytest=), pytest config, +coverage command (e.g. branch coverage over =net/= and =waybar-net= wrappers), +minimum threshold, and the rule for reading the coverage report to add missing +tests before declaring a phase done. This blocks readiness because "what is the +test harness?" is still answerable only by analogy to older suites. + +Disposition: accept — added the "Harness + coverage gate" section. Corrected the +premise: the repo is =unittest= (=make test= → =python3 -m unittest=, 33 suites), +not pytest. Pinned the fake-binary stub convention (=tests/<name>/fake-*= on a +temp PATH), the fixture command→output map, timeout simulation, the end-to-end +=waybar-net= subprocess run, and coverage via a throwaway venv (coverage.py is +absent system-wide) with a ≥90% branch target on the pure modules. + +** DONE Use coverage to find missing behavior, not just report a percentage :blocking: +The spec does not say how coverage findings affect implementation. For this +feature, line coverage alone can miss the important holes: doctor classification +branches, cleanup-unverified paths, redaction paths, degraded hot-path fallbacks, +timeout branches, and auth/upstream/VPN terminal states. Define coverage review +criteria per phase: branch coverage for pure classifiers and parsers, named +untested branches allowed only with comments or manual-check entries, and a +required "coverage gap pass" after the first green test run that maps uncovered +logic back to tests or consciously excluded live-only behavior. This blocks +readiness because the current test plan is broad but does not force the suite to +expose missing edge tests. + +Disposition: accept — added the "Coverage as a gap-finder, not a number (per +phase)" subsection: branch coverage required for the doctor classifier (every +outcome), cleanup-unverified, redaction, degraded-fallback, timeout, and the +parsers; a mandatory coverage-gap pass after the first green run mapping each +uncovered branch to a test or a named live-only exclusion; a phase isn't done +until that pass is recorded. + +** DONE Convert error classes into exact user-facing strings and evidence fields :blocking: +The failure table and doctor outcomes classify errors well, but many messages +are still templates or descriptions rather than final text. Add exact strings +for indicator tooltip, notification, CLI stderr, JSON =error.message=, and panel +banner/step text for every failure-mode row, including cases doctor cannot fix: +wrong password, missing secret, enterprise cert failure, upstream/AP/provider +failure, VPN-routed failure, hard rfkill block, DNS cleanup failure, speedtest +missing, and HTTP interception without parseable URL. For each string, specify +the redacted evidence included and the next action. This blocks UX readiness +because "useful error" is only testable once the actual text and evidence are +defined. + +Disposition: accept — rewrote the Failure states section: each row now carries the +exact final string (with =<placeholder>= evidence), the evidence field, and the +next action, plus a per-surface rendering rule (indicator tooltip / notify / +CLI+JSON error.message+detail+code / panel banner all render the one canonical +string). Added the missing doctor-unfixable rows: hard rfkill, wrong password / +missing secret, enterprise cert failure, upstream/AP/provider, VPN-routed, HTTP +interception without a parseable URL, and DNS cleanup-unverified. + +** DONE Add an enhancement disposition table +The spec captures several good enhancements (doctor, Makefile recovery, rfkill, +airplane absorption, VPN phase), but it does not show that low-cost adjacent +enhancements were considered and accepted/deferred/rejected. Add a small radar +table for likely affordances: copy redacted doctor report, open/copy portal URL, +retry with hardware MAC, forget network, rescan now, pin speedtest server, show +last good network/result, watch mode for =net doctor=, desktop notification +actions, QR-code/share WiFi import/export, and keyboard picker. Mark each +=v1=, =vNext=, or =rejected= with a one-line reason. This is non-blocking, but it +prevents accidental loss of cheap UX wins and keeps the v1 panel focused. + +Disposition: accept — added the "Enhancement radar" table dispositioning all the +named affordances: open/copy portal URL, forget network, rescan, hardware-MAC +retry, pin speedtest server, copy redacted doctor report = v1; last-good +network/result, doctor watch mode, actionable notifications, keyboard picker = +vNext; QR-share = rejected (low value for a 2-machine personal setup). + +** DONE Tighten the panel UX flow before Phase 2 +The panel has sections and state machines, but not a concrete interaction flow: +default focused section, row content, primary/secondary buttons, disabled-state +rules, confirmation wording for reset/bounce/DNS override, how "Get me online" +reports each escalation, what stays visible after the panel closes, and keyboard +navigation. Add a short UX flow spec or wire-level outline before Phase 2. This +is non-blocking for Phase 1, but it blocks Phase 2 implementation because a GTK +panel can easily become noisy or surprising if these defaults are invented while +coding. + +Disposition: accept — added the "Panel UX flow (settle before Phase 2)" +subsection: default focus (Connections, or Diagnose when opened from a captive +state), row content, one primary button per section, disabled-state rules, exact +confirmation wording for reset/bounce/DNS-override/remove, the live "Get me +online" escalation reporting, what survives panel close, and keyboard nav. + +* Review and iteration history + +** 2026-06-29 Mon @ 17:00:39 -0400 — Codex — reviewer + +- *What changed or was recommended:* Rubric: =Not ready=. Applied the + spec-review workflow and added blocking findings for diagnostics structure, + user-facing errors, observability, nmcli contracts, cache concurrency, + hot-path performance, Waybar click actions, mutation rollback, credential + security, unsupported WiFi types, panel states, speed-test dependency + verification, install boundaries, and test coverage. +- *Why:* The spec has the right broad shape, but this feature combines a status + indicator, NetworkManager mutation, sudo-backed remediation, captive-portal + probing, secret storage, and GTK UI. Too many failure modes were still left for + implementers to invent during coding. +- *Artifacts:* Findings recorded in =Review findings=. Local code read included + =~/.dotfiles/hyprland/.local/bin/waybar-netspeed=, + =~/.dotfiles/common/.local/bin/captive=, + =~/.dotfiles/tests/waybar-netspeed/test_waybar_netspeed.py=, + =~/.dotfiles/tests/captive/test_captive.py=, pocketbook's GTK layer-shell + scaffold, current Waybar config, and the linked =todo.org= tasks. External + checks included the Arch nmcli man page and upstream LibreSpeed CLI page. + +** 2026-06-29 Mon @ 17:35:00 -0400 — Claude Code (archsetup) — responder + +- *What changed:* Processed all 18 findings — 13 accepted, 5 modified, 0 + rejected. Accepts added new sections: Repository + dependencies, Performance + budgets, nmcli contract, JSON schemas, Diagnostics contract (+ Diagnose vs + Repair), Failure states/messages/recovery, Observability, cache + concurrency/atomicity/staleness, mutation safety + rollback, the rewritten + credential security model, the expanded test plan, JSON schemas, the renamed + Implementation phases with per-phase acceptance criteria, and a Status line. + Modifies: deferred the =net debug-bundle= command (log + =diagnose --json= + cover it), made click actions phase-aware + non-blocking, scoped enterprise + WiFi to activate-only (add/edit vNext), trimmed the panel state set to states + that occur on the two-machine fleet, and switched the speed-test backend to the + already-installed =speedtest-go= (verified live) instead of a new + librespeed-cli dependency. +- *Why:* Codex's review was strong and mostly right; the modifies are scope calls + for a single-user, two-machine tool (avoid gold-plating, use installed deps, + don't model impossible states) and one phasing reconciliation (the panel is + Phase 2, so Phase-1 clicks can't open it). +- *Artifacts:* All 18 findings completed =DONE= in place with dispositions. Live + check: =pacman -Qq | grep speed= (speedtest-go-bin installed; =speedtest-go + --json= confirmed). Status now "review incorporated (Codex, 2026-06-29)", + implementation-ready with the named Phase-2/3 caveats; Phase 1 ready to build. + +** 2026-06-29 Mon @ 17:37:58 -0400 — Claude Code (archsetup) — responder (cj comments) + +- *What changed:* Folded in Craig's 10 cj comments on the spec. Design changes: + (1) dropped the separate credential store entirely — secrets stay in NM's own + =system-connections= (root =0600=, inline), touched via nmcli; no GPG, no + gpg-agent (rewrote the secrets section, decision 5, dropped Phase 4 + the gpg + dep). (2) Added =net doctor [--fix]= + Makefile console-recovery targets + (=make online= etc.) as a first-class TTY path; reversed the earlier + defer-the-doctor call (decision 10). (3) Added a full-stack =bounce= repair and + an =rfkill= unblock repair + indicator state — the rfkill one recovers the + framework-laptop post-power-loss soft-block Craig hit. (4) =custom/net= absorbs + the airplane module; the standalone airplane scripts/tests/module are deleted on + ship. (5) Moved VPN/WireGuard from "out" to a planned Phase 5. (6) Added a + "Help + documentation" section (CLI help / panel help / user guide). Answered + the enterprise-defer rationale and the captive-auto-login explanation inline. +- *Why:* Craig's comments simplify (NM-only secrets, fewer deps) and harden the + recovery story (doctor + make targets + rfkill/bounce reachable from a dead-GUI + console — his stated need). Enterprise add/edit in v1 is the one open call, + raised as a VERIFY. +- *Artifacts:* All 10 cj blocks removed. Live checks: airplane scripts/tests + present (confirmed deletion targets); =/etc/NetworkManager/system-connections/= + is root =0600= with inline secrets (confirms the NM-only secret model); + =rfkill= present. VERIFY filed in =todo.org= for the enterprise-scope decision. + +** 2026-06-29 Mon @ 17:49:24 -0400 — Codex — reviewer + +- *What changed or was recommended:* Added a second feedback batch from the + common network failure-mode review: require a diagnose/doctor coverage table, + pin DNS repair semantics, make auth failures terminal user-action states, + define upstream/AP/provider terminal states, decide how active VPN/policy + routing affects v1 diagnosis, and remove stale GPG-store references. +- *Why:* The design now has a solid recovery shape, but the field question is + sharper: for each common failure, does =diagnose= catch it, does =doctor --fix= + fix it, or does the user need to switch networks/provide credentials/contact + the venue? That mapping needs to live in the spec, not in chat. +- *Artifacts:* Findings appended under =Review findings= after reviewing the + current spec's Diagnostics, Failure states, Doctor, Testing plan, and + responder history. + +** 2026-06-29 Mon @ 18:06:22 -0400 — Codex — reviewer + +- *What changed or was recommended:* Added a readiness pass focused on + development-time questions, missed enhancements, UX cleanliness, error wording, + and test/coverage mechanics. New findings require reconciling stale status/goal + text, resolving or explicitly caveating enterprise add/edit, defining the + concrete harness and coverage gate, making coverage actionable, turning failure + classes into exact user-facing strings, adding an enhancement disposition + table, and tightening the Phase-2 panel UX flow. +- *Why:* The network failure model is now strong, but implementation would still + force developers to invent test infrastructure, coverage expectations, final + error copy, and several UX details. The spec should decide those up front or + explicitly mark them as caveats. +- *Artifacts:* Findings appended under =Review findings= after reading the + updated spec, current archsetup Makefile test targets, pocketbook's pyproject, + and the linked =todo.org= enterprise VERIFY. + +** 2026-06-29 Mon @ 17:55:08 -0400 — Claude Code (archsetup) — responder (2nd review) + +- *What changed:* Processed all 6 second-batch findings — 6 accepted, 0 modified, + 0 rejected. Added a "Doctor: escalation, classification, terminal states" + section with four outcomes (=fixable= escalating lightest-first; =needs-user- + action=, =upstream-not-local=, =deferred/vpn= terminal) and explicit per-class + DNS handling. Added a "Failure-mode coverage" table (22 rows: detect / doctor-fix + / terminal action, org-table-standard conformant, ≤120). Cleaned the last + GPG-store residue (Testing plan → NM-secret/no-leak test; dropped the no-GPG-key + panel state). Added a doctor-classification test class. +- *Why:* The findings sharpen the doctor from "escalate repairs" to "classify, + then stop at the right terminal state" — the safety property that keeps + =doctor --fix= from looping destructive repairs against a wrong password or an + upstream outage. The VPN finding got the detect-and-classify option (not repair; + that's Phase 5) so v1 doesn't misread a VPN-routed failure as WiFi down. The + coverage table is the single artifact Craig asked for to trust the field + coverage. +- *Artifacts:* All 6 findings =DONE= in place; cookie =[24/24]=. Table conformed + via =wrap-org-table.el= (max line 120). No new findings introduced; no scope + expansion past what the findings asked. Spec remains implementation-ready, the + enterprise add/edit VERIFY still the one open decision. + +** 2026-06-29 Mon @ 18:24:51 -0400 — Claude Code (archsetup) — responder (3rd review) + +- *What changed:* Processed the third review (7 findings). The enterprise-scope + finding closed first on Craig's call (vNext, activate-only). The other 6, all + accepted: (1) reconciled the top-of-file text — fixed the Goal's GPG-store + wording, the =[#B]= task-mapping line, the resolved enterprise VERIFY, and + corrected the stale =pytest= mentions to =unittest= (the repo's real harness). + (2) Added a "Harness + coverage gate" section (unittest, fake-binary stubs on a + temp PATH, venv coverage, ≥90% branch on pure modules). (3) Added a per-phase + "coverage as a gap-finder" pass. (4) Rewrote the Failure states section to exact + final strings + evidence fields + a per-surface rendering rule, and added the + missing doctor-unfixable rows. (5) Added the "Enhancement radar" table + (v1/vNext/rejected). (6) Added the "Panel UX flow" subsection. +- *Why:* The findings close the gap between "design decided" and "a developer can + start": the harness/coverage contract, the exact UX strings, and the panel flow + are the things otherwise invented mid-code. The =pytest=→=unittest= correction + was a real defect — the spec contradicted the repo's actual test convention. +- *Artifacts:* All 31 findings =DONE=; cookie =[31/31]=. Both new tables conformed + via =wrap-org-table.el= (coverage 120, radar 110). Harness verified against the + live repo (33 unittest suites, =make test=, coverage.py absent → venv). Status + raised to "Ready for Phase 1; Ready-with-caveats overall" — no open decisions + remain. diff --git a/docs/design/2026-06-29-waybar-timer-module-spec.org b/docs/design/2026-06-29-waybar-timer-module-spec.org new file mode 100644 index 0000000..4b0ed0e --- /dev/null +++ b/docs/design/2026-06-29-waybar-timer-module-spec.org @@ -0,0 +1,217 @@ +#+TITLE: Waybar Timer Module (wtimer) — Design Spec +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-06-29 + +* Goal + +One always-visible waybar module that keeps time four ways — countdown timer, +wall-clock alarm, count-up stopwatch, and pomodoro — with several items running +at once. The bar shows the most urgent item with a per-type glyph; the tooltip +lists them all. Backed by a single =wtimer= script over a small JSON state file. +notify fires on completion. fuzzel drives creation. No GTK app. + +Source task: archsetup =todo.org= "Waybar timer module" (=:waybar:=), including +the folded roam-capture scope expansion (mode-selectable single panel, +stopwatch, multiple simultaneous, per-mode hover text). + +* Scope + +** In +- *Timer* — count down a duration, notify on elapse, then remove. +- *Alarm* — fire at a wall-clock time, notify, then remove. +- *Stopwatch* — count up from start; pause/resume; manual stop. +- *Pomodoro* — work/break cycles (25/5, long break 15 after 4 works), auto-advance with a notify at each phase change, runs until cancelled. +- *Multiple simultaneous* — N items of any mix held in state. Bar shows one primary item plus a =+N= badge; tooltip lists every item with its remaining/elapsed and label. +- *Pause / resume* per item; *cancel* one or all. +- *Interactions* — click to create (fuzzel), middle-click pause/resume primary, right-click cancel (fuzzel pick), scroll to cycle which item is primary. +- *Per-type glyph + CSS state classes* (running / paused / urgent / break). +- *Persistence across waybar restarts* (state file in the runtime dir). + +** Out (v1, note for later) +- No GTK panel — waybar module + tooltip + fuzzel only. +- No persistence across *reboot* (runtime-dir state clears). Alarms set before a reboot won't survive. Acceptable v1; revisit with =~/.local/state= + a catch-up-on-boot pass if wanted. +- No sound selection per item (uses notify's type sound). +- No history/stats of completed pomodoros beyond the current run's cycle count. + +* Architecture + +- =wtimer= — a single executable Python script in =hyprland/.local/bin/=. Chosen over POSIX sh (the other waybar backings) deliberately: the multi-item state machine, time arithmetic, pomodoro FSM, and JSON I/O are cleaner in Python, and it gives real line/branch *coverage numbers* (Craig asked for them). Precedent: pocketbook is Python in this repo. +- *Pure core + thin IO shell.* All logic is pure functions taking =now= as a parameter (dependency-injected clock — satisfies testing.md: no recursion, no scope-shadowing, production reads =time.time()=, tests pass an explicit instant). The CLI layer does the IO: read state, call pure fns, write state, emit JSON, shell out to notify/fuzzel. +- *State file*: =$XDG_RUNTIME_DIR/waybar/wtimer.json= (env override =WTIMER_STATE= for tests). Same runtime-dir convention as =sysmon-metric=. +- *Heartbeat*: waybar calls =wtimer render= every 1s. =render= runs the tick logic first (detect elapsed items, fire notify, advance pomodoro, drop finished timers/alarms), then prints the waybar JSON. One entry point waybar polls; no separate daemon. +- *Concurrency (BLOCKER from review).* The 1s =render= and the click/scroll handlers (=add=, =toggle=, =cancel=, =cycle=) are separate processes doing read-modify-write on the same state file. Without serialization, last-writer-wins drops a click's =add=, or clobbers render's "item removed/advanced" write so the same item ticks and notifies again next second. So every read-modify-write takes an exclusive =flock= on the state file for the whole cycle, and writes go through a temp file + =os.replace= (atomic), so a concurrent render never reads a half-written file. This is what actually makes "notify fires once" true — the mutation is only authoritative under the lock. +- *State dir*: =render= and the mutating commands =mkdir -p= the state dir first (=$XDG_RUNTIME_DIR/waybar/= may not exist on a fresh boot). +- *Clock injection everywhere*: =now= comes from =WTIMER_NOW= (epoch) if set, else =time.time()=. Pure fns take =now= as a parameter; the CLI seeds it from the env. This lets the CLI integration tests hit boundary instants (exactly-at-target), not just the pure-fn tests. +- *Instant refresh*: after any mutating command, send waybar =SIGRTMIN+14= (the module's signal) so the bar updates immediately instead of lagging up to 1s. Faked in tests (=WTIMER_REFRESH= override, default =pkill -RTMIN+14 waybar=). + +* State model + +#+begin_src json +{ + "items": [ + {"id": "1", "type": "timer", "label": "tea", "target": 1751240400, "duration": 300, "paused_left": null}, + {"id": "2", "type": "alarm", "label": "", "target": 1751251200, "paused_left": null}, + {"id": "3", "type": "stopwatch", "label": "", "start": 1751240000, "paused_elapsed": null}, + {"id": "4", "type": "pomodoro", "label": "", "target": 1751241900, "phase": "work", + "cycle": 1, "work": 1500, "short": 300, "long": 900, "interval": 4, "paused_left": null} + ], + "primary": "1", + "seq": 4 +} +#+end_src + +- =seq= is the monotonic id source (string ids). +- *Paused* timer/pomodoro: =paused_left= holds seconds remaining; =target= ignored while paused; resume sets =target = now + paused_left=, =paused_left = null=. +- *Paused* stopwatch: =paused_elapsed= holds elapsed seconds; resume sets =start = now - paused_elapsed=. +- =primary= is the id the bar shows; =null= or stale → auto-select (below). + +* Display logic + +** Primary selection (bar text) +1. If =primary= names a live item, show it. +2. Else the running countdown (timer/alarm/pomodoro) with the smallest remaining. +3. Else the first running stopwatch. +4. Else idle (no items). + +** Bar text +- =<glyph> <time>= for the primary, plus = +N= when N other items exist. +- Idle: a dim timer glyph alone (or empty — decide at render; lean dim glyph so the module has a stable click target). +- =time= formatting: =M:SS= under 1h, =H:MM:SS= at/over 1h. Stopwatch counts up; timer/alarm/pomodoro count down to target. +- Paused item: prefix a pause glyph or rely on the =paused= class (CSS dims it). + +** Glyphs (nerd font; final codepoints verified live before merge) +- timer , alarm , stopwatch , pomodoro-work , pomodoro-break (coffee), paused , idle (dim). +- One glyph table at the top of the script so a live-render tweak is one edit. + +** Tooltip (all items) +One line per item: =<glyph> <label-or-type> <remaining/elapsed> (<state>)=. Pomodoro line shows phase + cycle (e.g. =work 2/4=). Header line summarizes count. Empty state: "No timers". + +** CSS classes (the =alt=/=class= field) +=timer= / =alarm= / =stopwatch= / =pomodoro-work= / =pomodoro-break=, plus =paused= and =urgent= (remaining < 60s). Drives color in style.css + both themes. + +* Commands (CLI) + +| Command | Effect | +|---------------------------------+---------------------------------------------------------------------| +| =wtimer render= | tick + emit waybar JSON (the heartbeat) | +| =wtimer add timer <dur> [label]=| add a countdown (=dur= like =25m=, =90s=, =1h30m=, =5= → minutes) | +| =wtimer add alarm <HH:MM> [lbl]=| add a wall-clock alarm (next occurrence of that time) | +| =wtimer add stopwatch [label]= | start a count-up | +| =wtimer add pomodoro [label]= | start a pomodoro at work phase | +| =wtimer new= | fuzzel: pick type, prompt value, dispatch to =add= (thin wrapper) | +| =wtimer toggle [id]= | pause/resume the item (default: primary) | +| =wtimer cancel <id>= | remove one item | +| =wtimer pick-cancel= | fuzzel: choose an item to cancel (right-click handler) | +| =wtimer cancel-all= | clear all | +| =wtimer cycle [next|prev]= | move the primary pointer across all items (incl. paused), state-list order, wrapping | + +Duration parse: =Nh=, =Nm=, =Ns= combos, or a bare integer = minutes. Reject +unparseable input (exit non-zero, notify nothing). Alarm parse: =HH:MM= 24h; if +that time today already passed, target tomorrow. + +* Notifications + +- Timer elapse: =notify alarm "Timer" "<label or duration> done" --persist=. +- Alarm fire: =notify alarm "Alarm" "<HH:MM><, label>" --persist=. +- Pomodoro phase change: =notify info "Pomodoro" "Work → short break (3/4)"= (no =--persist=; phase nudges shouldn't pile up), long-break and work-resume worded accordingly. +- notify is faked on PATH in tests; assert type + that it fired once per event. + +* Pomodoro semantics + +- Defaults: work 25m, short 5m, long 15m, interval 4 (long break after every 4th work). +- FSM: work → short → work → short → work → short → work → long → work … +- =cycle= counts completed works in the current set (1..interval); resets after a long break. +- Each phase elapse advances =phase=, recomputes =target=, fires the phase notify. Pomodoro never auto-removes; cancel ends it. + +* Waybar wiring + +** Module def (config) — signal 14 (next free; 8–13 used) +#+begin_src json +"custom/timer": { + "exec": "wtimer render", + "return-type": "json", + "interval": 1, + "signal": 14, + "on-click": "wtimer new", + "on-click-middle": "wtimer toggle", + "on-click-right": "wtimer pick-cancel", + "on-scroll-up": "wtimer cycle next", + "on-scroll-down": "wtimer cycle prev" +} +#+end_src + +** Position — right of the sysmon (battery/resource) module +Insert =custom/timer= into =modules-right= immediately after =custom/sysmon= +(between =custom/sysmon= and =custom/netspeed=). On screen that places it just +right of the battery/resource readout. + +** Not collapsible — survives the right-side collapse +The module *definition* lives in the canonical config object, and =waybar-collapse= +only swaps the =modules-right= *array* in the runtime copy (which it seeds from +canonical, so the def is always present). So making the timer non-collapsible is +purely an array-membership change: add =custom/timer= to the =waybar-collapse= +right *base set* so it stays listed when the right side collapses: +- laptop: =["custom/arrow-right","custom/sysmon","custom/timer","tray","custom/date","custom/worldclock"]= +- desktop: =["custom/arrow-right","custom/timer","tray","custom/date","custom/worldclock"]= +Update the =tests/waybar-collapse= base-set expectations to match (TDD the change). + +* CSS + +Add =#custom-timer= plus the state classes to all three stylesheets. Keep the +*selectors and structure* parallel across the three (what the theme-drift test +checks); the actual color *values* are per-theme (dupre vs hudson) and differ by +design, so this is structural parity, not byte-identity. Confirm against the real +CSS files what the drift test compares before editing. +- =hyprland/.config/waybar/style.css= +- =hyprland/.config/themes/dupre/...= waybar css +- =hyprland/.config/themes/hudson/...= waybar css +Colors: normal = foreground; =urgent= = a warning hue (reuse the sysmon +warn/crit palette); =paused= = dimmed; =pomodoro-break= = a calmer accent. + +* Testing plan (TDD) + +- Suite: =tests/wtimer/test_wtimer.py= (auto-discovered by =make test='s =tests/*/test_*.py= glob — no enumeration gap). +- *Pure-function tests* (fast, the bulk), explicit injected =now=: + - =parse_duration=: =25m=, =90s=, =1h30m=, =5= (→min), =0=, negative, garbage, empty (Normal/Boundary/Error). + - =parse_alarm=: future today, already-passed-today → tomorrow, =00:00=, =23:59=, =24:00=/=12:60= invalid, non-=HH:MM=. + - =format_time=: 0, 59s, 60s, 3599s, 3600s, multi-hour, negative clamps to 0. + - =add_item= for each type; =seq= increments; ids unique. + - =tick=: timer not-yet-elapsed (no change), exactly-at-target, past-target (fires once, removed); alarm same; pomodoro work→short→…→long→work advance + cycle counting + the 4th-work→long boundary; paused items never tick; multiple items in one tick. + - =select_primary=: explicit primary, stale primary falls back, soonest-remaining rule, stopwatch-only, empty. + - =render_payload=: text/tooltip/class for each type + paused + urgent + =+N= badge + idle. + - =toggle= pause then resume round-trips remaining/elapsed exactly; =cycle= wraps; =cancel= / =cancel-all=. +- *CLI integration tests* (subprocess, fakes on PATH, =WTIMER_NOW= to hit boundaries): =add= then =render= round-trip; =render= fires the faked =notify= once on an elapsed item and drops it; state file created if absent; *missing parent dir* created (fresh-boot case); corrupt/empty state file → treated as empty, no crash; mutating command sends the faked refresh signal. +- *Concurrency test*: spawn overlapping =render= + a mutating command against one state file; assert no lost update (the added item survives) and exactly-once notify (no double-fire from a clobbered tick). This is the regression guard for the flock/atomic-write fix. +- *Mocking boundary*: fake =notify=, =fuzzel=, =killall= on PATH (record calls); never mock the wtimer logic. Clock injected as a parameter. +- *Coverage*: measure with =coverage.py= if present (target 90%+ on the logic per testing.md business-logic bar); report the actual number. If =coverage= is absent, report per-command/per-branch case coverage explicitly and flag the tool gap (verification.md). +- =tests/waybar-collapse= base-set expectations updated for the new module. +- =tests/= theme-drift check stays green (CSS parity). + +* Files touched + +dotfiles branch =waybar-timer-module=: +- =hyprland/.local/bin/wtimer= (new, executable) +- =tests/wtimer/test_wtimer.py= (new) +- =hyprland/.config/waybar/config= (module def + modules-right position) +- =hyprland/.local/bin/waybar-collapse= (base-set) + =tests/waybar-collapse/...= (expectations) +- =hyprland/.config/waybar/style.css= + dupre + hudson waybar css (CSS) + +archsetup (main, at the end): +- this spec +- =todo.org= task closure + +* Resolved decisions (no approvals — my calls) + +- Python, not sh — testability + coverage; pocketbook precedent. +- One =render= heartbeat (no daemon) — simplest, waybar already polls. +- notify fires from =render='s tick, mutation guarantees once-only. +- Primary = user-cycled, else soonest-remaining; =+N= badge for the rest. +- Multiple simultaneous via tooltip list + badge (not a GTK panel) — keeps it "cool yet simple". +- Pomodoro is one self-advancing item, not four chained timers. +- Runtime-dir state (waybar-restart durable, not reboot durable) — v1. + +* Rollback + +All code on the dotfiles =waybar-timer-module= branch off =09815f3=. Squash-merge +at the end; =git switch main && git branch -D waybar-timer-module= reverts cleanly +if it goes sideways. @@ -1,4 +1,5 @@ #!/bin/sh +# SPDX-License-Identifier: GPL-3.0-or-later # update and install tmux and git pacman -Sy --noconfirm >> /dev/null diff --git a/scripts/arch-distrobox b/scripts/arch-distrobox index 4afe3d1..99c295d 100755 --- a/scripts/arch-distrobox +++ b/scripts/arch-distrobox @@ -1,4 +1,5 @@ #!/bin/sh +# SPDX-License-Identifier: GPL-3.0-or-later # ArchDistrobox - Craig Jennings <craigmartinjennings@gmail.com> # License: GNU GPLv3 diff --git a/scripts/audit-packages.sh b/scripts/audit-packages.sh index f7af19f..e41b79c 100755 --- a/scripts/audit-packages.sh +++ b/scripts/audit-packages.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # audit-packages.sh — verify every package archsetup installs still exists # at its declared source, and flag packages that moved between the official # repos and the AUR. diff --git a/scripts/cmail-setup-finish.sh b/scripts/cmail-setup-finish.sh index 704b707..7f9d3fc 100755 --- a/scripts/cmail-setup-finish.sh +++ b/scripts/cmail-setup-finish.sh @@ -1,4 +1,5 @@ #!/usr/bin/env bash +# SPDX-License-Identifier: GPL-3.0-or-later # cmail-setup-finish.sh — finish Proton Mail Bridge + cmail-action setup after # Bridge first-run. Idempotent; safe to re-run after a Bridge cert rotation or # a claude-templates re-clone. diff --git a/scripts/games.sh b/scripts/games.sh index de6a476..2ccdcb4 100755 --- a/scripts/games.sh +++ b/scripts/games.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # games installations via flatpak set -uo pipefail diff --git a/scripts/hypr-live-update-guard b/scripts/hypr-live-update-guard new file mode 100755 index 0000000..4f561ae --- /dev/null +++ b/scripts/hypr-live-update-guard @@ -0,0 +1,70 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-3.0-or-later +# hypr-live-update-guard - abort a live GPU/compositor library upgrade. +# +# Installed as a pacman PreTransaction hook. When an upgrade transaction +# includes GPU/compositor runtime libraries (mesa, hyprland, wayland, GPU +# drivers, ...) AND a Hyprland session is running, this aborts the +# transaction BEFORE any package is swapped. Replacing those libraries out +# from under a live compositor makes the next GPU-lib call hit a now +# "(deleted)" file and SIGABRT, taking the Wayland clients down with it +# (hit on ratio 2026-06-07: mesa + hyprland upgraded live, Hyprland crashed +# and took awww/insync/emacs with it). Aborting at PreTransaction is the +# safe point: nothing has been replaced yet, so the running session is +# untouched and the user can re-run the upgrade from a TTY. +# +# Pacman feeds the matched package names on stdin (NeedsTargets). +# +# Test seams / overrides (env): +# HYPR_GUARD_RUNNING 1/0 forces the running check (default: pgrep Hyprland) +# HYPR_ALLOW_LIVE_UPDATE 1 proceeds anyway (skip the guard) +# HYPR_GUARD_SENTINEL path whose existence also proceeds anyway +# (default /run/archsetup-allow-live-gpu-update, +# cleared on reboot since /run is tmpfs) + +set -u + +sentinel="${HYPR_GUARD_SENTINEL:-/run/archsetup-allow-live-gpu-update}" + +# Explicit override: the user knows what they're doing. +if [ "${HYPR_ALLOW_LIVE_UPDATE:-0}" = "1" ] || [ -e "$sentinel" ]; then + exit 0 +fi + +hyprland_running() { + if [ -n "${HYPR_GUARD_RUNNING:-}" ]; then + [ "$HYPR_GUARD_RUNNING" = "1" ] + return + fi + pgrep -x Hyprland >/dev/null 2>&1 +} + +# No live session means no live swap to worry about. Let the upgrade run -- +# this is exactly the from-a-TTY-after-logout path the warning points to. +hyprland_running || exit 0 + +# Collect the triggering packages (stdin from NeedsTargets) for the message. +pkgs=$(cat 2>/dev/null | sort -u | tr '\n' ' ') + +cat >&2 <<EOF + +========================================================================== + BLOCKED: live GPU/compositor library upgrade while Hyprland is running +========================================================================== + Packages in this upgrade can crash the running compositor if swapped now: + ${pkgs:-(GPU/compositor runtime libraries)} + + Replacing these out from under a live Hyprland session makes the next + GPU-lib call hit a deleted library and SIGABRT, taking your Wayland apps + down with it (and risking an unclean shutdown). + + Do it safely instead -- from a TTY with Hyprland stopped: + 1. Log out of Hyprland, or switch to a console (Ctrl+Alt+F2) and log in. + 2. Re-run the upgrade there: sudo pacman -Syu + + To override and proceed anyway (not recommended while Hyprland runs): + sudo touch $sentinel && sudo pacman -Syu +========================================================================== + +EOF +exit 1 diff --git a/scripts/normalize-notify-sounds.sh b/scripts/normalize-notify-sounds.sh index 52c1d36..72c4c33 100755 --- a/scripts/normalize-notify-sounds.sh +++ b/scripts/normalize-notify-sounds.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # Normalize notify sound files to a uniform RMS loudness so every notification # plays at the same perceived level. Re-encodes each file in place (ogg -> ogg). # Run once after adding or changing a sound in the notify set. diff --git a/scripts/package-inventory b/scripts/package-inventory index 2dda44b..0a4acf7 100755 --- a/scripts/package-inventory +++ b/scripts/package-inventory @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # package-inventory - Compare archsetup packages vs live system # Shows: packages in archsetup but missing from system, # packages on system but not in archsetup diff --git a/scripts/post-install.sh b/scripts/post-install.sh index 9045398..f7dd206 100755 --- a/scripts/post-install.sh +++ b/scripts/post-install.sh @@ -1,4 +1,5 @@ #!/bin/sh +# SPDX-License-Identifier: GPL-3.0-or-later logfile="$HOME/post-install.log" touch "$logfile" diff --git a/scripts/setup-chess.sh b/scripts/setup-chess.sh index 6ac8749..648eea9 100755 --- a/scripts/setup-chess.sh +++ b/scripts/setup-chess.sh @@ -1,4 +1,5 @@ #!/usr/bin/env bash +# SPDX-License-Identifier: GPL-3.0-or-later set -euo pipefail # En Croissant + lc0 + Maia + Stockfish setup script for Arch Linux. diff --git a/scripts/testing/archsetup-test-zfs.conf b/scripts/testing/archsetup-test-zfs.conf new file mode 100644 index 0000000..a5459cf --- /dev/null +++ b/scripts/testing/archsetup-test-zfs.conf @@ -0,0 +1,21 @@ +# archsetup-test-zfs.conf - Archangel config for archsetup ZFS test VMs +# Used by create-base-vm.sh (FS_PROFILE=zfs) for fully automated base VM creation +# +# Usage: archangel --config-file /root/archsetup-test.conf +# +# Note: User creation is handled by archsetup, not archangel. +# See archsetup-vm.conf for archsetup configuration (shared across profiles - +# archsetup detects ZFS from the live root, so it needs no filesystem setting). +# +# Unencrypted ZFS root: encryption isn't what the harness validates, and +# NO_ENCRYPT=yes skips the passphrase prompt for a fully unattended install. + +FILESYSTEM=zfs +HOSTNAME=archsetup-test +TIMEZONE=America/Chicago +LOCALE=en_US.UTF-8 +KEYMAP=us +DISKS=/dev/vda +NO_ENCRYPT=yes +ROOT_PASSWORD=archsetup +ENABLE_SSH=yes diff --git a/scripts/testing/cleanup-tests.sh b/scripts/testing/cleanup-tests.sh index 5c0153b..390d7e5 100755 --- a/scripts/testing/cleanup-tests.sh +++ b/scripts/testing/cleanup-tests.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # Clean up old test VMs and artifacts # Author: Craig Jennings <craigmartinjennings@gmail.com> # License: GNU GPLv3 diff --git a/scripts/testing/create-base-vm.sh b/scripts/testing/create-base-vm.sh index 4ecf4d6..e626813 100755 --- a/scripts/testing/create-base-vm.sh +++ b/scripts/testing/create-base-vm.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # Create base VM for archsetup testing - Automated via Archangel ISO # Author: Craig Jennings <craigmartinjennings@gmail.com> # License: GNU GPLv3 @@ -19,10 +20,19 @@ source "$SCRIPT_DIR/lib/vm-utils.sh" # Configuration VM_IMAGES_DIR="$PROJECT_ROOT/vm-images" -CONFIG_FILE="$SCRIPT_DIR/archsetup-test.conf" LIVE_ISO_PASSWORD="archangel" SNAPSHOT_NAME="clean-install" +# FS_PROFILE (btrfs default / zfs) picks the archangel base-install config. +# btrfs -> archsetup-test.conf, zfs -> archsetup-test-zfs.conf. The matching +# base image name is derived from FS_PROFILE by init_vm_paths. +FS_PROFILE="${FS_PROFILE:-btrfs}" +if [ "$FS_PROFILE" = "btrfs" ]; then + CONFIG_FILE="$SCRIPT_DIR/archsetup-test.conf" +else + CONFIG_FILE="$SCRIPT_DIR/archsetup-test-${FS_PROFILE}.conf" +fi + # Initialize logging mkdir -p "$PROJECT_ROOT/test-results" LOGFILE="$PROJECT_ROOT/test-results/create-base-vm-$(date +'%Y%m%d-%H%M%S').log" diff --git a/scripts/testing/debug-vm.sh b/scripts/testing/debug-vm.sh index 32f377c..b0fa2b9 100755 --- a/scripts/testing/debug-vm.sh +++ b/scripts/testing/debug-vm.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # Launch VM for interactive debugging # Author: Craig Jennings <craigmartinjennings@gmail.com> # License: GNU GPLv3 diff --git a/scripts/testing/lib/logging.sh b/scripts/testing/lib/logging.sh index ed20707..809d396 100755 --- a/scripts/testing/lib/logging.sh +++ b/scripts/testing/lib/logging.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # Logging utilities for archsetup testing # Author: Craig Jennings <craigmartinjennings@gmail.com> # License: GNU GPLv3 diff --git a/scripts/testing/lib/network-diagnostics.sh b/scripts/testing/lib/network-diagnostics.sh index 674aeba..38788e5 100644 --- a/scripts/testing/lib/network-diagnostics.sh +++ b/scripts/testing/lib/network-diagnostics.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # Network diagnostics for VM testing # Author: Craig Jennings <craigmartinjennings@gmail.com> # License: GNU GPLv3 diff --git a/scripts/testing/lib/testinfra.sh b/scripts/testing/lib/testinfra.sh new file mode 100644 index 0000000..0822a9f --- /dev/null +++ b/scripts/testing/lib/testinfra.sh @@ -0,0 +1,120 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later +# +# Testinfra post-install validation sweep (runs on the host, over SSH). +# +# This is the primary post-install validator (it replaced the shell +# run_all_validations sweep). It connects to the freshly-installed VM over SSH +# and runs the pytest suite under scripts/testing/tests/. Its result drives the +# run's pass/fail, and per-test failures are bucketed (archsetup / base_install +# / unknown) into the same issue-attribution report the shell sweep produced. +# +# Auth: reuse the root key the harness already authorized (inject_root_key), +# which survives the sshd prohibit-password hardening; mint our own only if the +# harness didn't (standalone use). pytest connects key-only via a generated +# ssh-config. Key + config live in the results dir and are discarded with it. +# +# Uses globals from run-test.sh / vm-utils.sh: SCRIPT_DIR, VM_IP, SSH_PORT, +# ROOT_PASSWORD, ROOT_SSH_KEY, ARCHSETUP_VM_CONF, plus the validation.sh +# helpers attribute_issue / VALIDATION_*. Toggle with RUN_TESTINFRA=false. + +# Record each pytest failure from the attribution file into the issue arrays +# (validation.sh's attribute_issue), so generate_issue_report covers them. +_testinfra_record_attribution() { + local file="$1" bucket="" + [ -f "$file" ] || return 0 + while IFS= read -r line; do + case "$line" in + "[archsetup]") bucket=archsetup ;; + "[base_install]") bucket=base ;; + "[unknown]") bucket=unknown ;; + " "*) attribute_issue "testinfra: ${line# }" "$bucket" ;; + esac + done < "$file" +} + +# run_testinfra_validation <results_dir> +# Returns 0 only when the pytest sweep ran and passed. Returns non-zero when it +# failed OR could not run (missing tooling / SSH setup) — a sweep that can't run +# is not a pass. RUN_TESTINFRA=false is the one explicit opt-out (returns 0). +run_testinfra_validation() { + local results_dir="$1" + local tests_dir="$SCRIPT_DIR/tests" + local key="$results_dir/testinfra_key" + local sshcfg="$results_dir/testinfra_ssh_config" + + if [ "${RUN_TESTINFRA:-true}" != "true" ]; then + warn "RUN_TESTINFRA=false - skipping the Testinfra validation sweep" + return 0 + fi + if ! command -v pytest >/dev/null 2>&1 || ! python3 -c 'import testinfra' >/dev/null 2>&1; then + error "Testinfra/pytest not installed on host - cannot validate (run: make deps)" + return 1 + fi + + section "Running Validation Checks (Testinfra)" + + # Prefer the harness's already-authorized root key; mint one if absent. + if [ -n "${ROOT_SSH_KEY:-}" ] && [ -f "${ROOT_SSH_KEY}" ]; then + key="$ROOT_SSH_KEY" + else + rm -f "$key" "$key.pub" + if ! ssh-keygen -t ed25519 -N "" -q -f "$key"; then + error "testinfra: ssh-keygen failed" + return 1 + fi + if ! copy_to_vm "$key.pub" "/tmp/testinfra_key.pub" "$ROOT_PASSWORD"; then + error "testinfra: pubkey copy failed" + return 1 + fi + if ! vm_exec "$ROOT_PASSWORD" \ + "mkdir -p /root/.ssh && chmod 700 /root/.ssh && cat /tmp/testinfra_key.pub >> /root/.ssh/authorized_keys && chmod 600 /root/.ssh/authorized_keys"; then + error "testinfra: authorizing key in VM failed" + return 1 + fi + fi + + # ssh-config so testinfra connects key-only, no host-key prompt. + cat > "$sshcfg" <<EOF +Host testinfra-target + HostName ${VM_IP:-localhost} + Port ${SSH_PORT:-2222} + User root + IdentityFile $key + IdentitiesOnly yes + StrictHostKeyChecking no + UserKnownHostsFile /dev/null +EOF + + # The account archsetup created, for the tests that need it. + local test_user + test_user=$(sed -n 's/^USERNAME=//p' "$ARCHSETUP_VM_CONF" 2>/dev/null | head -n1) + : "${test_user:=cjennings}" + + local logf="$results_dir/testinfra.log" + ARCHSETUP_TEST_USER="$test_user" pytest "$tests_dir" \ + --hosts="ssh://testinfra-target" \ + --ssh-config="$sshcfg" \ + --attribution-file="$results_dir/testinfra-attribution.txt" \ + -v >> "$logf" 2>&1 + local rc=$? + + # Surface pytest's counts through the shared validation counters so the + # issue report summary is meaningful (the shell sweep no longer runs). + local summary + summary=$(grep -oE '[0-9]+ (passed|failed|error|errors|skipped)' "$logf" | tail -10) + VALIDATION_PASSED=$(echo "$summary" | awk '/passed/{print $1}' | tail -1); VALIDATION_PASSED=${VALIDATION_PASSED:-0} + VALIDATION_WARNINGS=$(echo "$summary" | awk '/skipped/{print $1}' | tail -1); VALIDATION_WARNINGS=${VALIDATION_WARNINGS:-0} + local nfail nerr + nfail=$(echo "$summary" | awk '/failed/{print $1}' | tail -1); nfail=${nfail:-0} + nerr=$(echo "$summary" | awk '/error/{print $1}' | tail -1); nerr=${nerr:-0} + VALIDATION_FAILED=$((nfail + nerr)) + + if [ "$rc" -eq 0 ]; then + success "Testinfra validation passed ($VALIDATION_PASSED passed, $VALIDATION_WARNINGS skipped)" + else + error "Testinfra validation failed ($VALIDATION_FAILED failed/error; see testinfra.log)" + _testinfra_record_attribution "$results_dir/testinfra-attribution.txt" + fi + return "$rc" +} diff --git a/scripts/testing/lib/validation.sh b/scripts/testing/lib/validation.sh index 91270ef..fa7ddcc 100644 --- a/scripts/testing/lib/validation.sh +++ b/scripts/testing/lib/validation.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # Validation utilities for archsetup testing # Author: Craig Jennings <craigmartinjennings@gmail.com> # License: GNU GPLv3 @@ -20,38 +21,7 @@ declare -a UNKNOWN_ISSUES # SSH helper (uses globals: VM_IP, ROOT_PASSWORD) ssh_cmd() { sshpass -p "$ROOT_PASSWORD" ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ - -o ConnectTimeout=10 -p "${SSH_PORT:-22}" "root@$VM_IP" "$@" 2>/dev/null -} - -# Validation result helpers -validation_pass() { - local test_name="$1" - success "$test_name" - ((VALIDATION_PASSED++)) || true -} - -validation_fail() { - local test_name="$1" - local details="${2:-}" - error "$test_name" - [ -n "$details" ] && info " Details: $details" - ((VALIDATION_FAILED++)) || true -} - -validation_warn() { - local test_name="$1" - local details="${2:-}" - warn "$test_name" - [ -n "$details" ] && info " Details: $details" - ((VALIDATION_WARNINGS++)) || true -} - -# A check whose precondition can't hold in this environment (headless VM, -# slirp networking, pre-reboot state). Logged for the record, counted nowhere -# — a warning that fires on every run trains readers to ignore warnings. -validation_skip() { - local test_name="$1" - info "SKIP: $test_name" + -o ConnectTimeout=10 ${SSH_KEY_OPT:-} -p "${SSH_PORT:-22}" "root@$VM_IP" "$@" 2>/dev/null } # Attribute an issue to archsetup or base install @@ -264,802 +234,6 @@ categorize_errors() { } #============================================================================= -# VALIDATION CHECKS -#============================================================================= - -run_all_validations() { - section "Running Validation Checks" - - # User & Authentication - validate_user_created - validate_user_shell - validate_user_groups - - # Dotfiles - validate_dotfiles - - # Package Managers - validate_yay_installed - validate_pacman_working - - # Window Manager (detects DWM or Hyprland automatically) - validate_window_manager - - # Essential Services - validate_firewall - validate_dns_config - validate_avahi - validate_fail2ban - validate_networkmanager - - # Developer Tools - validate_emacs - validate_git_config - validate_dev_tools - - # System Configuration - validate_zfs_config - validate_boot_config - validate_autologin_config - validate_gnome_keyring_setup - - # Boot & Initramfs (critical for ZFS systems) - validate_terminus_font - validate_mkinitcpio_hooks - validate_initramfs_consolefont - validate_nvme_module - - # Archsetup Specific - validate_archsetup_log - validate_state_markers -} - -#----------------------------------------------------------------------------- -# User & Authentication Validations -#----------------------------------------------------------------------------- - -validate_user_created() { - step "Checking if user 'cjennings' exists" - if ssh_cmd "id cjennings" &>> "$LOGFILE"; then - validation_pass "User cjennings exists" - else - validation_fail "User cjennings not found" - attribute_issue "User cjennings not created" "archsetup" - fi -} - -validate_user_shell() { - step "Checking if ZSH is default shell" - local shell=$(ssh_cmd "getent passwd cjennings | cut -d: -f7") - if [ "$shell" = "/bin/zsh" ] || [ "$shell" = "/usr/bin/zsh" ]; then - validation_pass "ZSH is default shell" - else - validation_fail "ZSH not default shell (got: $shell)" - attribute_issue "ZSH not set as default shell" "archsetup" - fi -} - -validate_user_groups() { - step "Checking user group memberships" - # Groups added by archsetup: - # - wheel (useradd -G wheel) - # - sys,adm,network,scanner,power,uucp,audio,lp,rfkill,video,storage,optical,users (usermod -aG) - # - docker (gpasswd -a, added later in developer_workstation) - local expected_groups="wheel sys adm network scanner power uucp audio lp rfkill video storage optical users docker" - local missing_groups="" - - for group in $expected_groups; do - if ! ssh_cmd "groups cjennings" | grep -q "\b$group\b"; then - missing_groups="$missing_groups $group" - fi - done - - if [ -z "$missing_groups" ]; then - validation_pass "User in all expected groups (15 groups)" - else - validation_fail "User missing groups:$missing_groups" - attribute_issue "User missing groups:$missing_groups" "archsetup" - fi -} - -#----------------------------------------------------------------------------- -# Dotfiles Validations -#----------------------------------------------------------------------------- - -validate_dotfiles() { - step "Checking dotfiles setup" - - # 1. Check if .zshrc is a symlink - if ! ssh_cmd "test -L /home/cjennings/.zshrc"; then - validation_fail "Dotfiles not stowed (.zshrc is not a symlink)" - attribute_issue "Dotfiles stow failed" "archsetup" - return 1 - fi - - # 2. Check symlink points to correct location. archsetup now clones the - # dotfiles repo to ~/.dotfiles and stows from there (DOTFILES_DIR default). - # Which tree owns .zshrc depends on DESKTOP_ENV: none stows the standalone - # minimal/ tree; dwm and hyprland stow common/. - local target=$(ssh_cmd "readlink /home/cjennings/.zshrc") - local desktop_env=$(sed -n 's/^DESKTOP_ENV=//p' "$ARCHSETUP_VM_CONF" 2>/dev/null | head -n1) - local expected_pattern=".dotfiles/common/.zshrc" - [ "$desktop_env" = "none" ] && expected_pattern=".dotfiles/minimal/.zshrc" - - if ! echo "$target" | grep -q "$expected_pattern"; then - validation_fail "Dotfiles symlink points to wrong location: $target" - attribute_issue "Dotfiles symlink incorrect: $target" "archsetup" - return 1 - fi - - # 3. Check the target file actually exists (not a broken symlink) - if ! ssh_cmd "test -f /home/cjennings/.zshrc"; then - validation_fail "Dotfiles symlink is broken (target doesn't exist)" - ssh_cmd "ls -la /home/cjennings/.zshrc" >> "$LOGFILE" 2>&1 - attribute_issue "Dotfiles symlink broken" "archsetup" - return 1 - fi - - # 4. Check user can actually read the file (not just root) - local result=$(ssh_cmd "sudo -u cjennings cat /home/cjennings/.zshrc > /dev/null 2>&1 && echo OK || echo FAIL") - if [ "$result" != "OK" ]; then - validation_fail "Dotfiles not readable by user (permission issue)" - ssh_cmd "ls -la /home/cjennings/.zshrc" >> "$LOGFILE" 2>&1 - attribute_issue "Dotfiles not readable by user" "archsetup" - return 1 - fi - - validation_pass "Dotfiles configured correctly (symlink to $target, readable by user)" -} - -#----------------------------------------------------------------------------- -# Package Manager Validations -#----------------------------------------------------------------------------- - -validate_yay_installed() { - step "Checking if yay (AUR helper) is installed and functional" - - # Check binary exists - if ! ssh_cmd "which yay" &>> "$LOGFILE"; then - validation_fail "yay not found" - attribute_issue "yay not installed" "archsetup" - return 1 - fi - - # Check yay can query packages (functional test) - if ssh_cmd "sudo -u cjennings yay -Qi yay" &>> "$LOGFILE"; then - validation_pass "yay is installed and functional" - else - validation_fail "yay binary exists but query failed" - attribute_issue "yay not functional" "archsetup" - fi -} - -validate_pacman_working() { - step "Checking if pacman is functional" - if ssh_cmd "pacman -Qi base" &>> "$LOGFILE"; then - validation_pass "pacman is functional" - else - validation_fail "pacman query failed" - attribute_issue "pacman not functional" "unknown" - fi -} - -#----------------------------------------------------------------------------- -# Window Manager Validations -#----------------------------------------------------------------------------- - -validate_suckless_tools() { - step "Checking suckless tools (dwm, st, dmenu, slock)" - local missing="" - - for tool in dwm st dmenu slock; do - if ! ssh_cmd "test -f /usr/local/bin/$tool"; then - missing="$missing $tool" - fi - done - - if [ -z "$missing" ]; then - validation_pass "All suckless tools installed (dwm, st, dmenu, slock)" - else - validation_fail "Missing suckless tools:$missing" - attribute_issue "Missing suckless tools:$missing" "archsetup" - fi -} - -validate_hyprland_tools() { - step "Checking Hyprland tools" - local missing="" - - # Check core Hyprland packages - for pkg in hyprland hypridle hyprlock waybar fuzzel swww grim slurp gammastep foot; do - if ! ssh_cmd "pacman -Q $pkg &>/dev/null"; then - missing="$missing $pkg" - fi - done - - if [ -z "$missing" ]; then - validation_pass "All Hyprland tools installed" - else - validation_fail "Missing Hyprland tools:$missing" - attribute_issue "Missing Hyprland tools:$missing" "archsetup" - fi -} - -validate_hyprland_config() { - step "Checking Hyprland configuration files" - local missing="" - - for config in ".config/hypr/hyprland.conf" ".config/hypr/hypridle.conf" \ - ".config/hypr/hyprlock.conf" ".config/waybar/config" \ - ".config/fuzzel/fuzzel.ini" ".config/gammastep/config.ini"; do - if ! ssh_cmd "test -f /home/cjennings/$config"; then - missing="$missing $config" - fi - done - - if [ -z "$missing" ]; then - validation_pass "All Hyprland config files present" - else - validation_fail "Missing Hyprland configs:$missing" - attribute_issue "Missing Hyprland configs:$missing" "archsetup" - fi -} - -validate_hyprland_socket() { - step "Checking Hyprland IPC socket" - # The socket only exists while the compositor runs. In the headless test - # VM nobody logs in graphically, so a missing socket with no Hyprland - # process is the expected state, not a finding. - if ssh_cmd "test -S /tmp/hypr/*/.socket.sock 2>/dev/null"; then - validation_pass "Hyprland socket exists" - elif ! ssh_cmd "pgrep -x Hyprland >/dev/null 2>&1"; then - validation_skip "Hyprland not running (headless) — socket check not applicable" - else - validation_warn "Hyprland running but IPC socket not found" - fi -} - -validate_portal_dark_mode() { - step "Checking Settings portal returns dark mode" - - # Check portals.conf exists and uses gtk for Settings - local portals_conf="/home/cjennings/.config/xdg-desktop-portal/portals.conf" - if ! ssh_cmd "test -f $portals_conf"; then - validation_fail "portals.conf not found" - attribute_issue "xdg-desktop-portal portals.conf missing" "archsetup" - return 1 - fi - - local settings_backend=$(ssh_cmd "grep 'org.freedesktop.impl.portal.Settings' $portals_conf 2>/dev/null | cut -d= -f2") - if [ "$settings_backend" = "none" ]; then - validation_fail "Settings portal disabled (set to 'none')" - attribute_issue "Settings portal disabled in portals.conf" "archsetup" - return 1 - fi - - # Query the portal for color-scheme (requires portal services running) - # Returns "v v u 1" for dark mode (1 = prefer-dark) - local color_scheme=$(ssh_cmd "sudo -u cjennings busctl --user call org.freedesktop.portal.Desktop /org/freedesktop/portal/desktop org.freedesktop.portal.Settings Read 'ss' 'org.freedesktop.appearance' 'color-scheme' 2>/dev/null | grep -o 'u [0-9]' | cut -d' ' -f2") - - if [ "$color_scheme" = "1" ]; then - validation_pass "Settings portal returns dark mode (color-scheme=1)" - elif [ -z "$color_scheme" ] && ! ssh_cmd "pgrep -x Hyprland >/dev/null 2>&1"; then - # No compositor → no graphical session bus to query. A socket-activated - # xdg-desktop-portal process can exist even headless, so the compositor - # is the real precondition (same condition as the socket check). The - # conf-file checks above already validated what install controls. - validation_skip "No compositor running (headless) — portal query not applicable" - elif [ -z "$color_scheme" ]; then - validation_warn "Could not query Settings portal (portal may not be running)" - else - validation_fail "Settings portal not returning dark mode (color-scheme=$color_scheme, expected 1)" - attribute_issue "Settings portal not configured for dark mode" "archsetup" - fi -} - -validate_window_manager() { - # Detect which desktop environment is installed and validate accordingly - if ssh_cmd "pacman -Q hyprland &>/dev/null"; then - section "Hyprland Desktop Environment" - validate_hyprland_tools - validate_hyprland_config - validate_hyprland_socket - validate_portal_dark_mode - elif ssh_cmd "test -f /usr/local/bin/dwm"; then - section "DWM Desktop Environment" - validate_suckless_tools - else - validation_warn "No window manager detected (DESKTOP_ENV=none?)" - fi -} - -#----------------------------------------------------------------------------- -# Essential Services Validations -#----------------------------------------------------------------------------- - -validate_firewall() { - step "Checking if firewall (ufw) is enabled" - local status=$(ssh_cmd "systemctl is-enabled ufw.service 2>/dev/null || echo disabled") - if [ "$status" = "enabled" ]; then - validation_pass "UFW firewall is enabled" - else - validation_fail "UFW firewall not enabled" - attribute_issue "UFW not enabled" "archsetup" - fi -} - -validate_dns_config() { - step "Checking DNS-over-TLS configuration" - if ssh_cmd "grep -q 'DNS=.*#' /etc/systemd/resolved.conf 2>/dev/null"; then - validation_pass "DNS-over-TLS configured" - else - validation_warn "DNS-over-TLS may not be configured" - fi -} - -validate_avahi() { - step "Checking avahi-daemon status" - local status=$(ssh_cmd "systemctl is-enabled avahi-daemon.service 2>/dev/null || echo disabled") - if [ "$status" = "enabled" ]; then - validation_pass "avahi-daemon is enabled" - - # Full-stack mDNS test: ping hostname.local. QEMU user-mode (slirp, - # 10.0.2.x) doesn't pass multicast, so mDNS genuinely can't resolve - # there — only run the ping on real networking. - if ssh_cmd "ip -4 addr show" 2>/dev/null | grep -q "10\.0\.2\."; then - validation_skip "mDNS ping not possible on slirp networking (no multicast)" - else - local hostname=$(ssh_cmd "hostname") - if ssh_cmd "ping -c 1 -W 2 ${hostname}.local" &>> "$LOGFILE"; then - validation_pass "mDNS working (${hostname}.local responds to ping)" - else - validation_warn "mDNS ping failed (avahi may need time to propagate)" - fi - fi - else - # This might be OK if avahi was pre-installed - validation_warn "avahi-daemon not enabled (may have been pre-configured)" - fi -} - -validate_fail2ban() { - step "Checking fail2ban status" - local status=$(ssh_cmd "systemctl is-enabled fail2ban.service 2>/dev/null || echo disabled") - if [ "$status" = "enabled" ]; then - validation_pass "fail2ban is enabled" - else - validation_fail "fail2ban not enabled" - attribute_issue "fail2ban not enabled" "archsetup" - fi -} - -validate_networkmanager() { - step "Checking NetworkManager status" - local status=$(ssh_cmd "systemctl is-enabled NetworkManager.service 2>/dev/null || echo disabled") - if [ "$status" = "enabled" ]; then - validation_pass "NetworkManager is enabled" - # Functional test - if ssh_cmd "nmcli general status" &>> "$LOGFILE"; then - validation_pass "NetworkManager is functional" - else - validation_warn "NetworkManager enabled but not responding" - fi - else - validation_fail "NetworkManager not enabled" - attribute_issue "NetworkManager not enabled" "archsetup" - fi -} - -#----------------------------------------------------------------------------- -# Service-Specific Validations -#----------------------------------------------------------------------------- - -validate_all_services() { - section "Service Validations" - - # Core services (always expected) - validate_service "sshd" "enabled" "active" - validate_service "systemd-resolved" "enabled" "active" - validate_service "ufw" "enabled" "" # VM lacks iptables modules, can't be active - validate_service "fail2ban" "enabled" "active" - validate_service "NetworkManager" "enabled" "active" - validate_service "rngd" "enabled" "active" - validate_service "cronie" "enabled" "" - validate_service "atd" "enabled" "" - - # Cron job: log cleanup - step "Checking log-cleanup cron job" - local crontab_entry=$(ssh_cmd "sudo -u cjennings crontab -l 2>/dev/null | grep log-cleanup") - if [ -n "$crontab_entry" ]; then - validation_pass "log-cleanup cron job installed" - else - validation_fail "log-cleanup cron job not in crontab" - attribute_issue "log-cleanup cron job missing from user crontab" "archsetup" - fi - - # Timer services - validate_service "reflector.timer" "enabled" "" - validate_service "paccache.timer" "enabled" "" - - # Optional services (warn if missing, don't fail) - validate_service_optional "avahi-daemon" "enabled" - validate_service_optional "bluetooth" "enabled" - validate_service_optional "cups" "enabled" - validate_service_optional "docker" "enabled" - validate_service_optional "tailscaled" "enabled" - # Syncthing uses user service (not system), check lingering is enabled. - # test -e, not ls: ls prints the path on success, so the old capture held - # "path\nyes" and could never equal "yes" — the check warned on every - # run, even with lingering correctly enabled. - step "Checking user lingering for syncthing" - local linger_enabled=$(ssh_cmd "test -e /var/lib/systemd/linger/cjennings && echo yes || echo no") - if [ "$linger_enabled" = "yes" ]; then - validation_pass "User lingering enabled for syncthing user service" - else - validation_warn "User lingering not enabled (syncthing may not autostart)" - fi - - # Filesystem-specific - validate_zfs_services - validate_btrfs_services - - # Functional tests - validate_service_functions -} - -validate_service() { - local service="$1" - local expected_enabled="$2" # "enabled" or "" - local expected_active="$3" # "active" or "" - - step "Checking $service" - - if [ -n "$expected_enabled" ]; then - local enabled=$(ssh_cmd "systemctl is-enabled $service 2>/dev/null || echo disabled") - if [ "$enabled" = "enabled" ]; then - validation_pass "$service is enabled" - else - validation_fail "$service not enabled (got: $enabled)" - attribute_issue "$service not enabled" "archsetup" - return 1 - fi - fi - - if [ -n "$expected_active" ]; then - local active=$(ssh_cmd "systemctl is-active $service 2>/dev/null || echo inactive") - if [ "$active" = "active" ]; then - validation_pass "$service is active" - else - validation_fail "$service not active (got: $active)" - attribute_issue "$service not active" "archsetup" - return 1 - fi - fi - - return 0 -} - -validate_service_optional() { - local service="$1" - local expected_enabled="$2" - - step "Checking optional service: $service" - - local enabled=$(ssh_cmd "systemctl is-enabled $service 2>/dev/null || echo disabled") - if [ "$enabled" = "enabled" ]; then - validation_pass "$service is enabled" - else - validation_warn "$service not enabled (optional)" - fi -} - -validate_zfs_services() { - # Only check if ZFS is installed - if ! ssh_cmd "which zfs" &>> "$LOGFILE"; then - return 0 - fi - - step "Checking ZFS-specific services" - - validate_service_optional "sanoid.timer" "enabled" - - # Check for zfs-scrub timer (pool name varies) - local scrub_enabled - scrub_enabled=$(ssh_cmd "systemctl list-unit-files 'zfs-scrub*' 2>/dev/null | grep -c enabled" | tr -d '[:space:]') - scrub_enabled=${scrub_enabled:-0} - if [ "$scrub_enabled" -gt 0 ]; then - validation_pass "ZFS scrub timer enabled" - else - validation_warn "ZFS scrub timer not found" - fi -} - -validate_btrfs_services() { - # Only check if btrfs root - if ! ssh_cmd "mount | grep 'on / ' | grep -q btrfs"; then - return 0 - fi - - step "Checking btrfs-specific services" - validate_service_optional "grub-btrfsd" "enabled" -} - -validate_service_functions() { - section "Service Functional Tests" - - # UFW functional test - # NOTE: VM environment lacks iptables kernel modules, so UFW cannot activate. - # We only verify it's enabled; active status requires real hardware. - step "Testing UFW functionality" - local ufw_enabled - ufw_enabled=$(ssh_cmd "systemctl is-enabled ufw.service 2>/dev/null || echo disabled") - if [ "$ufw_enabled" = "enabled" ]; then - validation_pass "UFW is enabled (activation requires iptables kernel modules)" - else - validation_fail "UFW not enabled" - attribute_issue "UFW not enabled" "archsetup" - fi - - # fail2ban functional test - step "Testing fail2ban functionality" - if ssh_cmd "fail2ban-client status" &>> "$LOGFILE"; then - validation_pass "fail2ban is responding" - else - validation_fail "fail2ban not responding" - attribute_issue "fail2ban not functioning" "archsetup" - fi - - # DNS resolution test - step "Testing DNS resolution" - if ssh_cmd "resolvectl query archlinux.org" &>> "$LOGFILE"; then - validation_pass "DNS resolution working" - else - validation_warn "DNS resolution test failed (may be network issue)" - fi - - # Docker functional test (if enabled) - if ssh_cmd "systemctl is-enabled docker" &>> "$LOGFILE"; then - step "Testing Docker functionality" - if ssh_cmd "docker info" &>> "$LOGFILE"; then - validation_pass "Docker is responding" - elif ! ssh_cmd "systemctl is-active --quiet docker"; then - # archsetup enables docker for next boot (enable, not enable --now, - # by design — the daemon is heavy). Validation runs pre-reboot, so - # enabled-but-not-started is the correct installed state. - validation_skip "Docker enabled but not started (starts on boot by design)" - else - validation_warn "Docker active but not responding" - fi - fi -} - -#----------------------------------------------------------------------------- -# Developer Tools Validations -#----------------------------------------------------------------------------- - -validate_emacs() { - step "Checking if Emacs is installed" - if ssh_cmd "which emacs" &>> "$LOGFILE"; then - validation_pass "Emacs is installed" - - # Check if config exists - if ssh_cmd "test -d /home/cjennings/.emacs.d"; then - validation_pass "Emacs config directory exists" - - # Check user can access the directory - local result - result=$(ssh_cmd "sudo -u cjennings ls /home/cjennings/.emacs.d > /dev/null 2>&1 && echo OK || echo FAIL") - if [ "$result" = "OK" ]; then - validation_pass "Emacs config readable by user" - else - validation_fail "Emacs config not readable by user (permission issue)" - attribute_issue "Emacs .emacs.d not readable by user" "archsetup" - fi - else - validation_warn "Emacs config directory not found" - fi - else - validation_fail "Emacs not found" - attribute_issue "Emacs not installed" "archsetup" - fi -} - -validate_git_config() { - step "Checking git installation" - if ssh_cmd "which git" &>> "$LOGFILE"; then - validation_pass "git is installed" - else - validation_fail "git not found" - attribute_issue "git not installed" "archsetup" - fi -} - -validate_dev_tools() { - step "Checking developer tools" - local tools="python node npm go rustc" - local missing="" - - for tool in $tools; do - if ! ssh_cmd "which $tool" &>> "$LOGFILE"; then - missing="$missing $tool" - fi - done - - if [ -z "$missing" ]; then - validation_pass "Core dev tools installed" - else - validation_warn "Some dev tools missing:$missing" - fi -} - -#----------------------------------------------------------------------------- -# System Configuration Validations -#----------------------------------------------------------------------------- - -validate_zfs_config() { - step "Checking ZFS configuration (if applicable)" - if ssh_cmd "which zfs" &>> "$LOGFILE"; then - # ZFS is installed, check for sanoid - if ssh_cmd "which sanoid" &>> "$LOGFILE"; then - validation_pass "ZFS with sanoid detected" - else - validation_warn "ZFS detected but sanoid not installed" - fi - else - info "ZFS not installed (non-ZFS system)" - fi -} - -validate_boot_config() { - step "Checking GRUB configuration" - if ssh_cmd "test -f /boot/grub/grub.cfg" &>> "$LOGFILE"; then - validation_pass "GRUB config exists" - else - validation_warn "GRUB config not found (may use different bootloader)" - fi -} - -validate_terminus_font() { - step "Checking terminus-font installation" - if ssh_cmd "pacman -Q terminus-font" &>> "$LOGFILE"; then - validation_pass "terminus-font package installed" - else - validation_fail "terminus-font package not installed" - attribute_issue "terminus-font not installed via pacman" "archsetup" - fi -} - -validate_mkinitcpio_hooks() { - step "Checking mkinitcpio HOOKS configuration" - local hooks=$(ssh_cmd "grep '^HOOKS=' /etc/mkinitcpio.conf") - local is_zfs=$(ssh_cmd "findmnt -n -o FSTYPE / 2>/dev/null") - - if [ "$is_zfs" = "zfs" ]; then - # ZFS system: must use udev, not systemd - if echo "$hooks" | grep -q '\budev\b'; then - validation_pass "ZFS system uses udev hook (correct)" - elif echo "$hooks" | grep -q '\bsystemd\b'; then - validation_fail "ZFS system uses systemd hook (will break boot)" - attribute_issue "mkinitcpio uses systemd hook on ZFS system" "archsetup" - else - validation_warn "Could not determine init hook type" - fi - else - # Non-ZFS: systemd hook is fine - if echo "$hooks" | grep -q '\bsystemd\b'; then - validation_pass "Non-ZFS system uses systemd hook" - elif echo "$hooks" | grep -q '\budev\b'; then - validation_pass "Non-ZFS system uses udev hook" - fi - fi -} - -validate_initramfs_consolefont() { - step "Checking console font in initramfs" - local font_in_initramfs=$(ssh_cmd "lsinitcpio /boot/initramfs-linux*.img 2>/dev/null | grep -c 'consolefont.psf\\|ter-'") - - if [ "${font_in_initramfs:-0}" -gt 0 ]; then - validation_pass "Console font included in initramfs" - else - validation_warn "Console font may not be in initramfs" - fi -} - -validate_nvme_module() { - step "Checking NVMe module configuration" - local has_nvme=$(ssh_cmd "ls /dev/nvme* 2>/dev/null | head -1") - - if [ -n "$has_nvme" ]; then - # System has NVMe drives - local modules=$(ssh_cmd "grep '^MODULES=' /etc/mkinitcpio.conf") - if echo "$modules" | grep -q 'nvme'; then - validation_pass "NVMe module in mkinitcpio MODULES" - else - validation_warn "NVMe system but nvme not in MODULES (may cause slow boot)" - fi - else - info "No NVMe drives detected" - fi -} - -validate_autologin_config() { - step "Checking autologin configuration" - if ssh_cmd "test -f /etc/systemd/system/getty@tty1.service.d/autologin.conf" &>> "$LOGFILE"; then - validation_pass "Autologin configured" - else - info "Autologin not configured (may be intentional)" - fi -} - -validate_gnome_keyring_setup() { - step "Checking gnome-keyring pre-configuration" - local keyring_dir="/home/cjennings/.local/share/keyrings" - - # Check directory exists - if ! ssh_cmd "test -d $keyring_dir"; then - validation_fail "Keyring directory not created" - attribute_issue "gnome-keyring directory not pre-created" "archsetup" - return 1 - fi - - # Check directory permissions (should be 700) - local perms=$(ssh_cmd "stat -c '%a' $keyring_dir") - if [ "$perms" != "700" ]; then - validation_fail "Keyring directory has wrong permissions: $perms (expected 700)" - attribute_issue "gnome-keyring directory wrong permissions" "archsetup" - return 1 - fi - - # Check ownership - local owner=$(ssh_cmd "stat -c '%U' $keyring_dir") - if [ "$owner" != "cjennings" ]; then - validation_fail "Keyring directory owned by $owner (expected cjennings)" - attribute_issue "gnome-keyring directory wrong ownership" "archsetup" - return 1 - fi - - # Check default file exists and contains "login" - local default_keyring=$(ssh_cmd "cat $keyring_dir/default 2>/dev/null") - if [ "$default_keyring" != "login" ]; then - validation_fail "Default keyring not set to 'login' (got: '$default_keyring')" - attribute_issue "gnome-keyring default not set to login" "archsetup" - return 1 - fi - - validation_pass "gnome-keyring pre-configured (default=login, dir=700)" -} - -#----------------------------------------------------------------------------- -# Archsetup-Specific Validations -#----------------------------------------------------------------------------- - -validate_archsetup_log() { - step "Checking archsetup log for errors" - local error_count - # Use grep -h to suppress filenames, then wc -l to count total matches - error_count=$(ssh_cmd "grep -h '^Error:' /var/log/archsetup-*.log 2>/dev/null | wc -l" | tr -d '[:space:]') - error_count=${error_count:-0} - - if [ "$error_count" = "0" ]; then - validation_pass "No errors in archsetup log" - else - validation_fail "Found $error_count errors in archsetup log" - attribute_issue "Errors in archsetup log: $error_count" "archsetup" - fi -} - -validate_state_markers() { - step "Checking archsetup state markers" - local state_count=$(ssh_cmd "ls /var/lib/archsetup/state/ 2>/dev/null | wc -l") - - if [ "$state_count" -ge 12 ]; then - validation_pass "All 12 installation steps completed" - else - validation_warn "Only $state_count/12 steps completed" - fi -} - -#============================================================================= # ISSUE REPORTING #============================================================================= @@ -1138,18 +312,3 @@ EOF info "Issue report saved: $report_file" } -#============================================================================= -# MAIN VALIDATION ENTRY POINT -#============================================================================= - -run_full_validation() { - local output_dir="$1" - local archzfs_inbox="${2:-}" - - run_all_validations - analyze_log_diff "$output_dir" - generate_issue_report "$output_dir" "$archzfs_inbox" - - # Return success if no failures - [ $VALIDATION_FAILED -eq 0 ] -} diff --git a/scripts/testing/lib/vm-utils.sh b/scripts/testing/lib/vm-utils.sh index a8736a3..b85e773 100755 --- a/scripts/testing/lib/vm-utils.sh +++ b/scripts/testing/lib/vm-utils.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # VM management utilities for archsetup testing (direct QEMU) # Author: Craig Jennings <craigmartinjennings@gmail.com> # License: GNU GPLv3 @@ -10,13 +11,26 @@ # VM configuration defaults VM_CPUS="${VM_CPUS:-4}" -VM_RAM="${VM_RAM:-4096}" # MB +# 8 GiB headroom for AUR builds: makepkg runs -j$VM_CPUS, and parallel cc1plus +# (~700 MB each on heavy C++ packages) OOM-killed under the old 4 GiB default. +VM_RAM="${VM_RAM:-8192}" # MB VM_DISK_SIZE="${VM_DISK_SIZE:-50}" # GB +# Filesystem profile: selects which base image + archangel config the harness +# targets. "btrfs" is the historical default (its image name stays unsuffixed +# so existing base images keep working); "zfs" gets its own image, since the +# two on-disk layouts can't share a disk. Honoured by init_vm_paths below. +FS_PROFILE="${FS_PROFILE:-btrfs}" + # SSH configuration SSH_PORT="${SSH_PORT:-2222}" SSH_OPTS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=10" ROOT_PASSWORD="${ROOT_PASSWORD:-archsetup}" +# Set by inject_root_key once a root key is authorized in the VM. When set, the +# ssh/scp helpers add "-i <key>" so they keep working after archsetup hardens +# sshd to PermitRootLogin prohibit-password (which kills root *password* login +# but still allows key auth). Left unquoted at use sites, like SSH_OPTS. +SSH_KEY_OPT="${SSH_KEY_OPT:-}" # OVMF firmware paths OVMF_CODE="/usr/share/edk2/x64/OVMF_CODE.4m.fd" @@ -36,9 +50,22 @@ init_vm_paths() { local images_dir="${1:-$VM_IMAGES_DIR}" [ -z "$images_dir" ] && fatal "VM_IMAGES_DIR not set" + case "$FS_PROFILE" in + btrfs|zfs) ;; + *) fatal "Invalid FS_PROFILE: $FS_PROFILE (must be 'btrfs' or 'zfs')" ;; + esac + VM_IMAGES_DIR="$images_dir" - DISK_PATH="$VM_IMAGES_DIR/archsetup-base.qcow2" - OVMF_VARS="$VM_IMAGES_DIR/OVMF_VARS.fd" + # btrfs keeps the legacy unsuffixed name; other profiles get a suffix so + # their images sit side by side without clobbering each other. + local img_suffix="" + [ "$FS_PROFILE" != "btrfs" ] && img_suffix="-$FS_PROFILE" + DISK_PATH="$VM_IMAGES_DIR/archsetup-base${img_suffix}.qcow2" + # Per-profile NVRAM: UEFI boot entries live here, outside the qcow2, so a + # disk-snapshot revert can't restore them. Sharing one file across profiles + # let a zfs run's ZFSBootMenu entries clobber the btrfs GRUB entry, leaving + # the btrfs base unbootable (no removable ESP fallback to recover from). + OVMF_VARS="$VM_IMAGES_DIR/OVMF_VARS${img_suffix}.fd" PID_FILE="$VM_IMAGES_DIR/qemu.pid" MONITOR_SOCK="$VM_IMAGES_DIR/qemu-monitor.sock" SERIAL_LOG="$VM_IMAGES_DIR/qemu-serial.log" @@ -350,7 +377,7 @@ wait_for_ssh() { progress "Waiting for SSH on localhost:$SSH_PORT..." while [ "$elapsed" -lt "$timeout" ]; do - if sshpass -p "$password" ssh $SSH_OPTS -p "$SSH_PORT" root@localhost true 2>/dev/null; then + if sshpass -p "$password" ssh $SSH_OPTS $SSH_KEY_OPT -p "$SSH_PORT" root@localhost true 2>/dev/null; then success "SSH is available" return 0 fi @@ -366,7 +393,7 @@ wait_for_ssh() { vm_exec() { local password="${1:-$ROOT_PASSWORD}" shift - sshpass -p "$password" ssh $SSH_OPTS \ + sshpass -p "$password" ssh $SSH_OPTS $SSH_KEY_OPT \ -o ServerAliveInterval=30 -o ServerAliveCountMax=10 \ -p "$SSH_PORT" root@localhost "$@" 2>> "$LOGFILE" } @@ -378,7 +405,7 @@ copy_to_vm() { local password="${3:-$ROOT_PASSWORD}" step "Copying $(basename "$local_file") to VM:$remote_path" - if sshpass -p "$password" scp $SSH_OPTS -P "$SSH_PORT" \ + if sshpass -p "$password" scp $SSH_OPTS $SSH_KEY_OPT -P "$SSH_PORT" \ "$local_file" "root@localhost:$remote_path" >> "$LOGFILE" 2>&1; then success "File copied to VM" return 0 @@ -395,7 +422,7 @@ copy_from_vm() { local password="${3:-$ROOT_PASSWORD}" step "Copying $remote_file from VM" - if sshpass -p "$password" scp $SSH_OPTS -P "$SSH_PORT" \ + if sshpass -p "$password" scp $SSH_OPTS $SSH_KEY_OPT -P "$SSH_PORT" \ "root@localhost:$remote_file" "$local_path" >> "$LOGFILE" 2>&1; then success "File copied from VM" return 0 @@ -404,3 +431,31 @@ copy_from_vm() { return 1 fi } + +# inject_root_key <key_path> +# Authorize a throwaway root key over the initial password session and switch +# all the helpers above to key auth (sets SSH_KEY_OPT + ROOT_SSH_KEY). Call once, +# right after wait_for_ssh and before running archsetup: archsetup sets +# PermitRootLogin prohibit-password and reloads sshd partway through, which kills +# root *password* login. Without a key in place first, every SSH after that step +# fails and the run aborts before any validation. Key auth survives the hardening. +# Targets root@$VM_IP on $SSH_PORT so it works for both the local VM runner +# (VM_IP=localhost, port 2222) and the bare-metal runner (VM_IP=host, port 22). +inject_root_key() { + local key="$1" + rm -f "$key" "$key.pub" + if ! ssh-keygen -t ed25519 -N "" -q -f "$key"; then + warn "Root key generation failed - run may break at sshd hardening" + return 1 + fi + if sshpass -p "$ROOT_PASSWORD" ssh $SSH_OPTS -p "$SSH_PORT" "root@${VM_IP:-localhost}" \ + "mkdir -p /root/.ssh && chmod 700 /root/.ssh && cat >> /root/.ssh/authorized_keys && chmod 600 /root/.ssh/authorized_keys" \ + < "$key.pub" >> "$LOGFILE" 2>&1; then + SSH_KEY_OPT="-i $key" + export ROOT_SSH_KEY="$key" + success "Root SSH key authorized (survives sshd prohibit-password hardening)" + return 0 + fi + warn "Root key authorization failed - run may break at sshd hardening" + return 1 +} diff --git a/scripts/testing/run-test-baremetal.sh b/scripts/testing/run-test-baremetal.sh index b6d1ab1..d22c424 100755 --- a/scripts/testing/run-test-baremetal.sh +++ b/scripts/testing/run-test-baremetal.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # Run archsetup test on bare metal ZFS system # Author: Craig Jennings <craigmartinjennings@gmail.com> # License: GNU GPLv3 @@ -19,13 +20,16 @@ PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)" # Source utilities source "$SCRIPT_DIR/lib/logging.sh" -source "$SCRIPT_DIR/lib/validation.sh" +source "$SCRIPT_DIR/lib/validation.sh" # live helpers: ssh_cmd, capture_*_state, analyze_log_diff, generate_issue_report +source "$SCRIPT_DIR/lib/vm-utils.sh" # inject_root_key + SSH_OPTS/SSH_KEY_OPT for key auth +source "$SCRIPT_DIR/lib/testinfra.sh" # run_testinfra_validation (authoritative validator) # Parse arguments ROLLBACK_FIRST=false ROLLBACK_AFTER=false TARGET_HOST="" ROOT_PASSWORD="" +PORT="22" usage() { echo "Usage: $0 --host <hostname> --password <root_password> [options]" @@ -35,6 +39,7 @@ usage() { echo " --password <password> Root password for SSH" echo "" echo "Options:" + echo " --port <port> SSH port (default 22; use 2222 to target a test VM)" echo " --rollback-first Roll back to genesis snapshots before running" echo " --rollback-after Roll back to genesis snapshots after test (cleanup)" echo " --validate-only Skip archsetup, only run validation checks" @@ -54,6 +59,10 @@ while [[ $# -gt 0 ]]; do ROOT_PASSWORD="${2:?--password requires a value}" shift 2 ;; + --port) + PORT="${2:?--port requires a value}" + shift 2 + ;; --rollback-first) ROLLBACK_FIRST=true shift @@ -93,9 +102,16 @@ cleanup_baremetal() { } trap cleanup_baremetal EXIT -# Override VM_IP for validation.sh ssh_cmd function -# shellcheck disable=SC2034 # consumed by the sourced validation.sh +# Connection globals consumed by ssh_cmd (validation.sh), inject_root_key +# (vm-utils.sh), and run_testinfra_validation (testinfra.sh). +# shellcheck disable=SC2034 # consumed by the sourced libraries VM_IP="$TARGET_HOST" +# shellcheck disable=SC2034 +SSH_PORT="$PORT" +# Test-user source for testinfra (reads USERNAME); the bare-metal user is the +# archsetup default, cjennings, same as the VM conf. +# shellcheck disable=SC2034 +ARCHSETUP_VM_CONF="$SCRIPT_DIR/archsetup-vm.conf" # Initialize logging mkdir -p "$TEST_RESULTS_DIR" @@ -108,8 +124,8 @@ info "Target: $TARGET_HOST" # Test SSH connectivity step "Testing SSH connectivity to $TARGET_HOST" if ! sshpass -p "$ROOT_PASSWORD" ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ - -o ConnectTimeout=10 "root@$TARGET_HOST" "echo connected" &>/dev/null; then - fatal "Cannot connect to $TARGET_HOST via SSH" + -o ConnectTimeout=10 -p "$PORT" "root@$TARGET_HOST" "echo connected" &>/dev/null; then + fatal "Cannot connect to $TARGET_HOST:$PORT via SSH" fi success "SSH connection OK" @@ -146,6 +162,15 @@ if $ROLLBACK_FIRST; then success "Reconnected" fi +# Authorize a throwaway root key before archsetup hardens sshd. archsetup sets +# PermitRootLogin prohibit-password and reloads sshd partway through, which kills +# root *password* SSH; key auth survives it, so every later ssh_cmd and the +# Testinfra sweep keep working. Placed after any genesis rollback so the key +# isn't rolled away. Best-effort: a failure only risks the post-hardening steps. +step "Authorizing throwaway root key (survives sshd hardening)" +inject_root_key "$TEST_RESULTS_DIR/root_key" || \ + warn "Continuing without an injected root key — SSH may fail after archsetup hardens sshd" + if ! $VALIDATE_ONLY; then # Capture pre-install state capture_pre_install_state "$TEST_RESULTS_DIR" @@ -160,7 +185,7 @@ if ! $VALIDATE_ONLY; then step "Transferring to $TARGET_HOST" ssh_cmd "rm -rf /tmp/archsetup-test && mkdir -p /tmp/archsetup-test" sshpass -p "$ROOT_PASSWORD" scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ - "$BUNDLE_FILE" "root@$TARGET_HOST:/tmp/archsetup.bundle" >> "$LOGFILE" 2>&1 + ${SSH_KEY_OPT:-} -P "$PORT" "$BUNDLE_FILE" "root@$TARGET_HOST:/tmp/archsetup.bundle" >> "$LOGFILE" 2>&1 step "Extracting on target" ssh_cmd "cd /tmp && git clone /tmp/archsetup.bundle archsetup-test && rm /tmp/archsetup.bundle" >> "$LOGFILE" 2>&1 @@ -222,12 +247,12 @@ if ! $VALIDATE_ONLY; then step "Copying archsetup log" sshpass -p "$ROOT_PASSWORD" scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ - "root@$TARGET_HOST:/var/log/archsetup-*.log" "$TEST_RESULTS_DIR/" 2>> "$LOGFILE" || \ + ${SSH_KEY_OPT:-} -P "$PORT" "root@$TARGET_HOST:/var/log/archsetup-*.log" "$TEST_RESULTS_DIR/" 2>> "$LOGFILE" || \ warn "Could not copy archsetup log" step "Copying archsetup output" sshpass -p "$ROOT_PASSWORD" scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ - "root@$TARGET_HOST:$REMOTE_LOG" "$TEST_RESULTS_DIR/archsetup-output.log" 2>> "$LOGFILE" || \ + ${SSH_KEY_OPT:-} -P "$PORT" "root@$TARGET_HOST:$REMOTE_LOG" "$TEST_RESULTS_DIR/archsetup-output.log" 2>> "$LOGFILE" || \ warn "Could not copy output log" # Capture post-install state @@ -238,13 +263,14 @@ else mkdir -p "$TEST_RESULTS_DIR/pre-install" "$TEST_RESULTS_DIR/post-install" fi -# Run validations -run_all_validations -validate_all_services - -# Additional ZFS-specific validations -section "ZFS-Specific Validations" -validate_zfs_services +# Run validations. Testinfra is the authoritative validator (same as the VM +# runner); its ZFS-conditional pytest checks cover what validate_zfs_services +# used to, and it connects over the key authorized above. set +e because it +# returns pytest's rc (non-zero on failures) and the report + summary below must +# still run; the verdict is carried by testinfra_rc and the exit code at the end. +set +e +run_testinfra_validation "$TEST_RESULTS_DIR" +testinfra_rc=$? # Analyze logs if we ran archsetup if ! $VALIDATE_ONLY; then @@ -254,8 +280,8 @@ fi # Generate reports generate_issue_report "$TEST_RESULTS_DIR" "$ARCHZFS_INBOX" -# Set validation result (TEST_PASSED is the boolean; VALIDATION_PASSED stays the counter) -if [ "$VALIDATION_FAILED" -eq 0 ]; then +# The run passes only if the Testinfra sweep passed. +if [ "$testinfra_rc" -eq 0 ]; then TEST_PASSED=true else TEST_PASSED=false diff --git a/scripts/testing/run-test.sh b/scripts/testing/run-test.sh index 5830ed9..f962df3 100755 --- a/scripts/testing/run-test.sh +++ b/scripts/testing/run-test.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # Run archsetup test in a VM using snapshots # Author: Craig Jennings <craigmartinjennings@gmail.com> # License: GNU GPLv3 @@ -23,6 +24,7 @@ source "$SCRIPT_DIR/lib/logging.sh" source "$SCRIPT_DIR/lib/vm-utils.sh" source "$SCRIPT_DIR/lib/network-diagnostics.sh" source "$SCRIPT_DIR/lib/validation.sh" +source "$SCRIPT_DIR/lib/testinfra.sh" # Parse arguments KEEP_VM=false @@ -48,6 +50,9 @@ while [[ $# -gt 0 ]]; do echo " --keep Keep VM in post-test state (for debugging)" echo " --script Specify custom archsetup script to test" echo " --snapshot Snapshot name to revert to (default: clean-install)" + echo "" + echo "Env: FS_PROFILE=btrfs|zfs (default btrfs) selects the base image" + echo " built by create-base-vm.sh. e.g. FS_PROFILE=zfs $0" exit 1 ;; esac @@ -98,6 +103,7 @@ init_logging "$LOGFILE" init_vm_paths "$VM_IMAGES_DIR" section "ArchSetup Test Run: $TIMESTAMP" +info "Filesystem profile: $FS_PROFILE (image: $(basename "$DISK_PATH"))" # Verify archsetup script exists if [ ! -f "$ARCHSETUP_SCRIPT" ]; then @@ -106,7 +112,11 @@ fi # Check disk exists if [ ! -f "$DISK_PATH" ]; then - info "Create it first: ./scripts/testing/create-base-vm.sh" + if [ "$FS_PROFILE" = "btrfs" ]; then + info "Create it first: ./scripts/testing/create-base-vm.sh" + else + info "Create it first: FS_PROFILE=$FS_PROFILE ./scripts/testing/create-base-vm.sh" + fi fatal "Base disk not found: $DISK_PATH" fi @@ -140,6 +150,13 @@ start_qemu "$DISK_PATH" "disk" "" "none" || fatal "Failed to start VM" wait_for_ssh "$ROOT_PASSWORD" 120 || fatal "VM SSH not available" stop_timer "boot" +# Authorize a root key now, before archsetup runs. archsetup hardens sshd to +# PermitRootLogin prohibit-password partway through, which breaks the harness's +# root password SSH; key auth survives it. Without this, the run aborts mid-way +# (before any validation) once the hardening step lands. +inject_root_key "$TEST_RESULTS_DIR/root_key" || \ + warn "Continuing without root key - run may break at the sshd hardening step" + # Run network diagnostics if ! run_network_diagnostics; then fatal "Network diagnostics failed - aborting test" @@ -240,7 +257,8 @@ fi # Poll for completion step "Monitoring archsetup progress (polling every 30 seconds)..." POLL_COUNT=0 -MAX_POLLS=180 # 90 minutes max (180 * 30 seconds) +MAX_POLLS=300 # 150 minutes max (300 * 30 seconds); a full install with heavy + # AUR builds (e.g. vagrant) can exceed 90 min on a slow mirror while [ $POLL_COUNT -lt $MAX_POLLS ]; do # Check if archsetup process is still running @@ -261,7 +279,7 @@ while [ $POLL_COUNT -lt $MAX_POLLS ]; do done if [ $POLL_COUNT -ge $MAX_POLLS ]; then - error "ArchSetup timed out after 90 minutes" + error "ArchSetup timed out after 150 minutes" ARCHSETUP_EXIT_CODE=124 else # Get exit code from the remote log @@ -307,18 +325,17 @@ copy_from_vm "/var/log/archsetup-installed-packages.txt" "$TEST_RESULTS_DIR/" "$ # Capture post-install state capture_post_install_state "$TEST_RESULTS_DIR" -# Run comprehensive validation -# This uses the validation.sh library for all checks. +# Run comprehensive validation (Testinfra/pytest is the primary validator; the +# old shell run_all_validations sweep was retired once pytest reached parity). # # From here to the end of the script, errexit is disabled on purpose: the -# validation functions are designed to fail-and-count (see VALIDATION_FAILED) -# rather than abort, and the analysis/report-generation steps below can also -# legitimately return non-zero. With `set -e` active, a single failed check -# would kill the run before the test report is written or the VM is cleaned -# up. Pass/fail is signalled explicitly by the exit code at the bottom. +# analysis/report-generation steps below can legitimately return non-zero, and +# with `set -e` active a single failed check would kill the run before the test +# report is written or the VM is cleaned up. Pass/fail is signalled explicitly +# by the exit code at the bottom. set +e -run_all_validations -validate_all_services +run_testinfra_validation "$TEST_RESULTS_DIR" +testinfra_rc=$? # Analyze log differences (pre vs post install) analyze_log_diff "$TEST_RESULTS_DIR" @@ -327,8 +344,8 @@ analyze_log_diff "$TEST_RESULTS_DIR" # If base install issues found and archzfs inbox exists, create issue file generate_issue_report "$TEST_RESULTS_DIR" "$ARCHZFS_INBOX" -# Set validation result based on failure count -if [ "$VALIDATION_FAILED" -eq 0 ]; then +# The run passes only if the Testinfra sweep passed. +if [ "$testinfra_rc" -eq 0 ]; then TEST_PASSED=true else TEST_PASSED=false diff --git a/scripts/testing/setup-testing-env.sh b/scripts/testing/setup-testing-env.sh index fb0628b..b5b584f 100755 --- a/scripts/testing/setup-testing-env.sh +++ b/scripts/testing/setup-testing-env.sh @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # Setup testing environment for archsetup # Author: Craig Jennings <craigmartinjennings@gmail.com> # License: GNU GPLv3 diff --git a/scripts/testing/tests/conftest.py b/scripts/testing/tests/conftest.py new file mode 100644 index 0000000..680c967 --- /dev/null +++ b/scripts/testing/tests/conftest.py @@ -0,0 +1,111 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Pytest + Testinfra config for archsetup post-install validation. + +These tests run on the *host* and connect to the freshly-installed VM over SSH +(Testinfra provides the `host` fixture, parametrized from --hosts). This file +adds two things the bespoke shell harness had that Testinfra does not: + + - Failure attribution. Each check is marked with the layer that owns a + failure (archsetup | base_install | unknown), mirroring validation.sh's + attribute_issue. Failures are bucketed and written to --attribution-file + so run-test.sh can route base-install issues to the archzfs inbox as before. + - Tiering markers (smoke | integration) so `pytest -m smoke` is a fast gate. + +The `target_user` fixture supplies the account archsetup created; it reads +ARCHSETUP_TEST_USER (set by run-test.sh from the VM conf) and defaults to the +historical "cjennings". +""" + +import os + +import pytest + + +_ATTRIBUTION_BUCKETS = ("archsetup", "base_install", "unknown") +_failures = {bucket: [] for bucket in _ATTRIBUTION_BUCKETS} + + +def pytest_addoption(parser): + parser.addoption( + "--attribution-file", + action="store", + default=None, + help="write the failure attribution report (archsetup/base_install/unknown) here", + ) + + +def pytest_configure(config): + config.addinivalue_line( + "markers", + "attribution(bucket): layer that owns a failure — archsetup, base_install, or unknown", + ) + config.addinivalue_line("markers", "smoke: fast subset (user, key packages, dotfiles present)") + config.addinivalue_line("markers", "integration: full post-install checks") + + +@pytest.hookimpl(wrapper=True) +def pytest_runtest_makereport(item, call): + report = yield + if report.when == "call" and report.failed: + marker = item.get_closest_marker("attribution") + bucket = marker.args[0] if (marker and marker.args) else "archsetup" + if bucket not in _failures: + bucket = "unknown" + _failures[bucket].append(item.nodeid) + return report + + +def pytest_sessionfinish(session, exitstatus): + path = session.config.getoption("--attribution-file") + if not path: + return + with open(path, "w") as fh: + for bucket in _ATTRIBUTION_BUCKETS: + fh.write("[%s]\n" % bucket) + for nodeid in _failures[bucket]: + fh.write(" %s\n" % nodeid) + + +@pytest.fixture(scope="session") +def target_user(): + """The account archsetup created in the VM under test.""" + return os.environ.get("ARCHSETUP_TEST_USER", "cjennings") + + +@pytest.fixture(scope="session") +def home(target_user): + return "/home/%s" % target_user + + +@pytest.fixture(scope="module") +def zfs_root(host): + """True when the VM's root filesystem is ZFS (gates ZFS-specific checks).""" + return host.run("findmnt -n -o FSTYPE /").stdout.strip() == "zfs" + + +@pytest.fixture(scope="module") +def has_nvme(host): + """True when the VM exposes an NVMe device.""" + return host.run("ls /dev/nvme0n1 2>/dev/null").rc == 0 + + +@pytest.fixture(scope="module") +def hyprland_installed(host): + return host.package("hyprland").is_installed + + +@pytest.fixture(scope="module") +def dwm_installed(host): + return host.file("/usr/local/bin/dwm").exists + + +@pytest.fixture(scope="module") +def compositor_running(host): + """A graphical session is live (gates socket/portal checks that need one).""" + return host.run("pgrep -x Hyprland").rc == 0 + + +@pytest.fixture(scope="module") +def on_slirp(host): + """QEMU user-mode networking (10.0.2.x) — no multicast, so mDNS can't work.""" + return "10.0.2." in host.run("ip -4 addr show").stdout diff --git a/scripts/testing/tests/test_archsetup.py b/scripts/testing/tests/test_archsetup.py new file mode 100644 index 0000000..52fe3f7 --- /dev/null +++ b/scripts/testing/tests/test_archsetup.py @@ -0,0 +1,26 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Post-install checks: archsetup's own log and state markers. + +Parity port of validate_archsetup_log and validate_state_markers. +""" + +import pytest + + +EXPECTED_STATE_STEPS = 12 + + +@pytest.mark.attribution("archsetup") +def test_no_errors_in_archsetup_log(host): + out = host.run("grep -h '^Error:' /var/log/archsetup-*.log 2>/dev/null | wc -l") + count = int((out.stdout.strip() or "0")) + assert count == 0, "archsetup log reported %d Error: lines" % count + + +@pytest.mark.attribution("archsetup") +def test_all_install_steps_completed(host): + out = host.run("ls /var/lib/archsetup/state/ 2>/dev/null | wc -l") + count = int((out.stdout.strip() or "0")) + assert count >= EXPECTED_STATE_STEPS, ( + "only %d/%d install steps completed" % (count, EXPECTED_STATE_STEPS) + ) diff --git a/scripts/testing/tests/test_backups.py b/scripts/testing/tests/test_backups.py new file mode 100644 index 0000000..07da5ec --- /dev/null +++ b/scripts/testing/tests/test_backups.py @@ -0,0 +1,44 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Post-install checks: backup_system_file ran during a real install. + +Expansion coverage (P4). The unit suite (tests/backup-system-file/) covers the +helper's logic; this confirms it actually fires end-to-end — archsetup leaves a +<file>.archsetup.bak next to each pre-existing file it edits in place. + +These targets are edited unconditionally on every run (pacman.conf/makepkg.conf +always sed'd, sudoers always appended), so their backups must exist. +mkinitcpio.conf is edited only conditionally (the systemd-hook switch on +non-ZFS, or the nvme module on NVMe systems), so it gets its own fixture-gated +check below. Conditionally-edited files (locale.gen, geoclue, fstab) aren't +asserted here since their edits depend on the base image. +""" + +import pytest + + +ALWAYS_BACKED_UP = [ + "/etc/pacman.conf", + "/etc/makepkg.conf", + "/etc/sudoers", +] + + +@pytest.mark.attribution("archsetup") +@pytest.mark.parametrize("path", ALWAYS_BACKED_UP) +def test_backup_created_for_edited_file(host, path): + bak = host.file(path + ".archsetup.bak") + assert bak.exists, "%s.archsetup.bak missing — backup_system_file did not fire" % path + assert bak.is_file + + +@pytest.mark.attribution("archsetup") +def test_backup_created_for_mkinitcpio(host, zfs_root, has_nvme): + # archsetup edits /etc/mkinitcpio.conf only when it has something to change: + # the systemd-hook switch (non-ZFS only) or adding the nvme module (NVMe + # systems). A ZFS root with no NVMe touches neither, so there's no backup. + if zfs_root and not has_nvme: + pytest.skip("ZFS root + no NVMe: archsetup doesn't edit mkinitcpio.conf") + bak = host.file("/etc/mkinitcpio.conf.archsetup.bak") + assert bak.exists, \ + "/etc/mkinitcpio.conf.archsetup.bak missing — backup_system_file did not fire" + assert bak.is_file diff --git a/scripts/testing/tests/test_boot.py b/scripts/testing/tests/test_boot.py new file mode 100644 index 0000000..78b4404 --- /dev/null +++ b/scripts/testing/tests/test_boot.py @@ -0,0 +1,67 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Post-install checks: boot, initramfs, and filesystem config. + +Parity port of validate_zfs_config, validate_boot_config, +validate_mkinitcpio_hooks, validate_initramfs_consolefont, validate_nvme_module. +Filesystem/hardware-specific checks are gated on fixtures. +""" + +import pytest + + +@pytest.mark.attribution("archsetup") +def test_bootloader_installed(host, zfs_root): + # A ZFS root boots via ZFSBootMenu (archangel installs the EFI binary under + # /efi/EFI/ZBM), so there is no GRUB; a non-ZFS root uses GRUB. + if zfs_root: + assert host.file("/efi/EFI/ZBM/zfsbootmenu.efi").exists, \ + "ZFS root must have the ZFSBootMenu EFI binary" + else: + assert host.file("/boot/grub/grub.cfg").exists, \ + "non-ZFS root must have a GRUB config" + + +@pytest.mark.attribution("archsetup") +def test_mkinitcpio_hooks(host, zfs_root): + hooks = host.run("grep '^HOOKS=' /etc/mkinitcpio.conf").stdout + if zfs_root: + # ZFS must use the udev hook; the systemd hook breaks a ZFS boot. + assert " udev" in hooks or "(udev" in hooks, "ZFS root must use the udev hook" + assert "systemd" not in hooks, "ZFS root must not use the systemd hook" + else: + # Non-ZFS: either hook is acceptable. + assert ("systemd" in hooks) or ("udev" in hooks) + + +@pytest.mark.attribution("archsetup") +def test_console_font_configured(host, zfs_root): + # archsetup sets FONT=ter-132n in /etc/vconsole.conf on every run. + assert host.file("/etc/vconsole.conf").contains("^FONT=ter-132n"), \ + "archsetup should set FONT=ter-132n in /etc/vconsole.conf" + # On non-ZFS it also rebuilds the initramfs (mkinitcpio -P) so the font is + # baked in for early boot. On ZFS that rebuild is skipped (the busybox ZFS + # hook is incompatible with the systemd-hook switch), so the font applies at + # the vconsole layer once userspace starts, not inside the initramfs. + if zfs_root: + return + # Pick the main initramfs (this fleet runs linux-lts, so the name is + # initramfs-linux-lts.img, not initramfs-linux.img); skip the fallback image. + img = host.run( + "ls /boot/initramfs-*.img 2>/dev/null | grep -v fallback | head -1" + ).stdout.strip() + assert img, "no initramfs image found under /boot" + out = host.run("lsinitcpio %s 2>/dev/null | grep -cE 'consolefont.psf|ter-'" % img) + assert int((out.stdout.strip() or "0")) > 0, "console font not found in %s" % img + + +def test_nvme_module_when_nvme_present(host, has_nvme): + if not has_nvme: + pytest.skip("no NVMe device present") + modules = host.run("grep '^MODULES=' /etc/mkinitcpio.conf").stdout + assert "nvme" in modules, "NVMe system should list nvme in mkinitcpio MODULES" + + +def test_zfs_has_sanoid(host): + if not host.exists("zfs"): + pytest.skip("ZFS not installed (non-ZFS system)") + assert host.exists("sanoid"), "ZFS system should have sanoid installed" diff --git a/scripts/testing/tests/test_config_applied.py b/scripts/testing/tests/test_config_applied.py new file mode 100644 index 0000000..00c410e --- /dev/null +++ b/scripts/testing/tests/test_config_applied.py @@ -0,0 +1,55 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Post-install checks: archsetup's in-place config edits actually took effect. + +Expansion coverage (P4). These assert the *content* archsetup writes, not just +that a service is enabled — catching cases where a sed silently no-ops (e.g. +ParallelDownloads, which current Arch ships uncommented so a "^#"-only match +left it at 5). +""" + +import pytest + + +@pytest.mark.attribution("archsetup") +def test_pacman_parallel_downloads(host): + line = host.run("grep -E '^ParallelDownloads' /etc/pacman.conf").stdout + assert "ParallelDownloads = 10" in line, "ParallelDownloads not set to 10 (got: %r)" % line + + +@pytest.mark.attribution("archsetup") +def test_pacman_color_enabled(host): + assert host.run("grep -qx Color /etc/pacman.conf").rc == 0 + + +@pytest.mark.attribution("archsetup") +def test_pacman_multilib_enabled(host): + # -F: [multilib] is a literal section header, not a regex character class. + assert host.run("grep -Fxq '[multilib]' /etc/pacman.conf").rc == 0 + + +@pytest.mark.attribution("archsetup") +def test_makepkg_parallel_make(host): + line = host.run("grep -E '^MAKEFLAGS' /etc/makepkg.conf").stdout + assert "-j" in line, "MAKEFLAGS not configured for parallel make (got: %r)" % line + + +@pytest.mark.attribution("archsetup") +def test_makepkg_options_trimmed(host): + opts = host.run("grep -E '^OPTIONS' /etc/makepkg.conf").stdout + assert "!debug" in opts and "purge" in opts, "makepkg OPTIONS not customized" + + +@pytest.mark.attribution("archsetup") +@pytest.mark.parametrize("rel", ["dns.conf", "wifi-privacy.conf"]) +def test_networkmanager_dropin(host, rel): + assert host.file("/etc/NetworkManager/conf.d/%s" % rel).exists + + +@pytest.mark.attribution("archsetup") +def test_fail2ban_jail_local(host): + assert host.file("/etc/fail2ban/jail.local").exists + + +@pytest.mark.attribution("archsetup") +def test_reflector_config(host): + assert host.file("/etc/xdg/reflector/reflector.conf").exists diff --git a/scripts/testing/tests/test_desktop.py b/scripts/testing/tests/test_desktop.py new file mode 100644 index 0000000..c02d2b6 --- /dev/null +++ b/scripts/testing/tests/test_desktop.py @@ -0,0 +1,111 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Post-install checks: window manager + desktop integration. + +Parity port of validate_window_manager and its Hyprland/DWM branches, plus +validate_autologin_config. Hyprland and DWM checks are gated on which DE the +run installed; socket/portal-query checks are gated on a live compositor (the +headless test VM has none). + +Note: validate_hyprland_tools historically checked `swww`, but archsetup now +installs `awww` (swww successor) and `pacman -Q swww` no longer matches — so +this checks awww. That divergence from the shell sweep is a correctness fix. +""" + +import pytest + + +HYPRLAND_TOOLS = [ + "hyprland", "hypridle", "hyprlock", "waybar", "fuzzel", + "awww", "grim", "slurp", "gammastep", "foot", +] + +HYPRLAND_CONFIGS = [ + ".config/hypr/hyprland.conf", + ".config/hypr/hypridle.conf", + ".config/hypr/hyprlock.conf", + ".config/waybar/config", + ".config/fuzzel/fuzzel.ini", + ".config/gammastep/config.ini", +] + +SUCKLESS_TOOLS = ["dwm", "st", "dmenu", "slock"] + +PORTALS_CONF = ".config/xdg-desktop-portal/portals.conf" + + +@pytest.mark.attribution("archsetup") +@pytest.mark.parametrize("pkg", HYPRLAND_TOOLS) +def test_hyprland_tool_installed(host, hyprland_installed, pkg): + if not hyprland_installed: + pytest.skip("Hyprland not installed (DESKTOP_ENV != hyprland)") + assert host.package(pkg).is_installed, "%s missing" % pkg + + +@pytest.mark.attribution("archsetup") +@pytest.mark.parametrize("rel", HYPRLAND_CONFIGS) +def test_hyprland_config_present(host, hyprland_installed, home, rel): + if not hyprland_installed: + pytest.skip("Hyprland not installed (DESKTOP_ENV != hyprland)") + assert host.file("%s/%s" % (home, rel)).exists, "%s missing" % rel + + +@pytest.mark.attribution("archsetup") +def test_live_update_guard_installed(host, hyprland_installed): + if not hyprland_installed: + pytest.skip("Hyprland not installed (DESKTOP_ENV != hyprland)") + guard = host.file("/usr/local/bin/hypr-live-update-guard") + assert guard.exists, "live-update guard script missing" + assert guard.mode & 0o111, "live-update guard not executable" + hook = host.file("/etc/pacman.d/hooks/hypr-live-update-guard.hook") + assert hook.exists, "live-update guard pacman hook missing" + assert "hypr-live-update-guard" in hook.content_string, \ + "hook does not invoke the guard script" + + +@pytest.mark.attribution("archsetup") +def test_portal_settings_backend_not_disabled(host, hyprland_installed, home): + if not hyprland_installed: + pytest.skip("Hyprland not installed") + conf = host.file("%s/%s" % (home, PORTALS_CONF)) + assert conf.exists, "portals.conf missing" + line = host.run( + "grep org.freedesktop.impl.portal.Settings %s" % conf.path + ).stdout + assert "=none" not in line.replace(" ", ""), "Settings portal disabled (=none)" + + +def test_portal_returns_dark_mode(host, hyprland_installed, compositor_running, target_user): + if not hyprland_installed: + pytest.skip("Hyprland not installed") + if not compositor_running: + pytest.skip("no compositor running (headless) — portal query not applicable") + cmd = ( + "sudo -u %s busctl --user call org.freedesktop.portal.Desktop " + "/org/freedesktop/portal/desktop org.freedesktop.portal.Settings Read " + "'ss' 'org.freedesktop.appearance' 'color-scheme'" % target_user + ) + out = host.run(cmd).stdout + assert "u 1" in out, "Settings portal should report color-scheme=1 (dark)" + + +def test_hyprland_socket(host, hyprland_installed, compositor_running): + if not hyprland_installed: + pytest.skip("Hyprland not installed") + if not compositor_running: + pytest.skip("Hyprland not running (headless) — socket check not applicable") + assert host.run("test -S /tmp/hypr/*/.socket.sock").rc == 0 + + +@pytest.mark.attribution("archsetup") +@pytest.mark.parametrize("tool", SUCKLESS_TOOLS) +def test_suckless_tool_installed(host, dwm_installed, tool): + if not dwm_installed: + pytest.skip("DWM not installed (DESKTOP_ENV != dwm)") + assert host.file("/usr/local/bin/%s" % tool).exists, "%s missing" % tool + + +def test_autologin_configured(host): + conf = host.file("/etc/systemd/system/getty@tty1.service.d/autologin.conf") + if not conf.exists: + pytest.skip("autologin not configured (AUTOLOGIN=no, may be intentional)") + assert conf.exists diff --git a/scripts/testing/tests/test_dotfiles.py b/scripts/testing/tests/test_dotfiles.py new file mode 100644 index 0000000..cd6e474 --- /dev/null +++ b/scripts/testing/tests/test_dotfiles.py @@ -0,0 +1,19 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Post-install checks: dotfiles stowed for the user. + +Parity port of validate_dotfiles from validation.sh: .zshrc must be a symlink +into the ~/.dotfiles stow tree, not broken, and readable by the user (not just +root). +""" + +import pytest + + +@pytest.mark.attribution("archsetup") +def test_zshrc_stowed_and_readable(host, target_user): + zshrc = host.file("/home/%s/.zshrc" % target_user) + assert zshrc.is_symlink, ".zshrc should be a stow symlink" + assert ".dotfiles/" in zshrc.linked_to, "symlink should point into ~/.dotfiles" + assert zshrc.exists, "symlink target must exist (not broken)" + # Readable by the user, not only root. + assert host.run("sudo -u %s test -r %s" % (target_user, zshrc.path)).rc == 0 diff --git a/scripts/testing/tests/test_hardening.py b/scripts/testing/tests/test_hardening.py new file mode 100644 index 0000000..f12b0e6 --- /dev/null +++ b/scripts/testing/tests/test_hardening.py @@ -0,0 +1,50 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Post-install checks: security/system hardening archsetup applies. + +Expansion coverage (P4) — these were not in the original shell sweep. They +assert the system-level changes archsetup makes in place: sshd root hardening, +quiet kernel console, an emptied /etc/issue, the console font, and the EFI +mount permission tightening. +""" + +import pytest + + +@pytest.mark.smoke +@pytest.mark.attribution("archsetup") +def test_sshd_root_prohibit_password(host): + conf = host.file("/etc/ssh/sshd_config.d/10-hardening.conf") + assert conf.exists, "sshd hardening drop-in missing" + assert "PermitRootLogin prohibit-password" in conf.content_string + + +@pytest.mark.attribution("archsetup") +def test_quiet_printk_sysctl(host): + conf = host.file("/etc/sysctl.d/20-quiet-printk.conf") + assert conf.exists + assert "kernel.printk" in conf.content_string + + +@pytest.mark.attribution("archsetup") +def test_issue_emptied(host): + # archsetup truncates /etc/issue to drop the distro/date banner. + assert host.file("/etc/issue").size == 0 + + +@pytest.mark.attribution("archsetup") +def test_console_font_configured(host): + assert "ter-132n" in host.file("/etc/vconsole.conf").content_string + + +@pytest.mark.attribution("archsetup") +def test_efi_mount_permissions_tightened(host): + # archsetup adds fmask/dmask to the /efi vfat line so it isn't world-readable. + fstab = host.file("/etc/fstab").content_string + efi_lines = [ + ln for ln in fstab.splitlines() + if ln.strip() and not ln.lstrip().startswith("#") + and " /efi " in ln and " vfat " in ln + ] + if not efi_lines: + pytest.skip("no /efi vfat line in fstab") + assert all("fmask=" in ln for ln in efi_lines), "/efi mount not permission-tightened" diff --git a/scripts/testing/tests/test_keyring.py b/scripts/testing/tests/test_keyring.py new file mode 100644 index 0000000..99d322d --- /dev/null +++ b/scripts/testing/tests/test_keyring.py @@ -0,0 +1,35 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Post-install checks: gnome-keyring pre-configuration. + +Parity port of validate_gnome_keyring_setup: the keyrings dir must exist, be +mode 700, owned by the user, with the default keyring set to "login". +""" + +import pytest + + +@pytest.fixture(scope="session") +def keyring_dir(home): + return "%s/.local/share/keyrings" % home + + +@pytest.mark.attribution("archsetup") +def test_keyring_dir_exists(host, keyring_dir): + assert host.file(keyring_dir).is_directory + + +@pytest.mark.attribution("archsetup") +def test_keyring_dir_mode_700(host, keyring_dir): + assert host.file(keyring_dir).mode == 0o700 + + +@pytest.mark.attribution("archsetup") +def test_keyring_dir_owned_by_user(host, keyring_dir, target_user): + assert host.file(keyring_dir).user == target_user + + +@pytest.mark.attribution("archsetup") +def test_default_keyring_is_login(host, keyring_dir): + default = host.file("%s/default" % keyring_dir) + assert default.exists, "default keyring file missing" + assert default.content_string.strip() == "login" diff --git a/scripts/testing/tests/test_packages.py b/scripts/testing/tests/test_packages.py new file mode 100644 index 0000000..f237088 --- /dev/null +++ b/scripts/testing/tests/test_packages.py @@ -0,0 +1,60 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Post-install checks: package managers and key packages. + +Parity port of validate_yay_installed, validate_pacman_working, +validate_terminus_font, validate_emacs, validate_git_config, validate_dev_tools. +""" + +import pytest + + +DEV_TOOLS = ["python", "node", "npm", "go", "rustc"] + + +@pytest.mark.smoke +@pytest.mark.attribution("archsetup") +def test_yay_installed(host): + assert host.exists("yay"), "yay binary not on PATH" + + +@pytest.mark.attribution("archsetup") +def test_yay_functional(host, target_user): + # yay must actually query the package DB as the user, not just exist. + assert host.run("sudo -u %s yay -Qi yay" % target_user).rc == 0 + + +@pytest.mark.smoke +@pytest.mark.attribution("unknown") +def test_pacman_functional(host): + assert host.package("base").is_installed + + +@pytest.mark.attribution("archsetup") +def test_terminus_font_installed(host): + assert host.package("terminus-font").is_installed + + +@pytest.mark.attribution("archsetup") +def test_emacs_installed(host): + assert host.exists("emacs") + + +@pytest.mark.attribution("archsetup") +def test_emacs_config_readable_by_user(host, target_user, home): + emacsd = host.file("%s/.emacs.d" % home) + if not emacsd.exists: + pytest.skip(".emacs.d not present (config dir optional on some profiles)") + assert emacsd.is_directory + assert host.run("sudo -u %s ls %s" % (target_user, emacsd.path)).rc == 0 + + +@pytest.mark.smoke +@pytest.mark.attribution("archsetup") +def test_git_installed(host): + assert host.exists("git") + + +@pytest.mark.attribution("archsetup") +@pytest.mark.parametrize("tool", DEV_TOOLS) +def test_dev_tool_present(host, tool): + assert host.exists(tool), "dev tool %s missing from PATH" % tool diff --git a/scripts/testing/tests/test_services.py b/scripts/testing/tests/test_services.py new file mode 100644 index 0000000..0ca3970 --- /dev/null +++ b/scripts/testing/tests/test_services.py @@ -0,0 +1,103 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Post-install checks: services, timers, and their functional health. + +Parity port of validate_firewall, validate_dns_config, validate_avahi, +validate_fail2ban, validate_networkmanager, and validate_all_services / +validate_service_functions. + +Mapping of the shell sweep's three outcomes: + - validation_fail (hard) -> assert + - validation_warn (soft) -> pytest.skip with the reason (visible, never red) + - validation_skip (precond)-> pytest.skip gated on a fixture +""" + +import pytest + + +# Required services: (name, must_be_active). ufw can't activate in the VM (no +# iptables kernel modules), so it's enabled-only; cronie/atd are enabled-only too. +REQUIRED_ENABLED_ACTIVE = ["sshd", "systemd-resolved", "fail2ban", "NetworkManager", "rngd"] +REQUIRED_ENABLED_ONLY = ["ufw", "cronie", "atd"] +REQUIRED_TIMERS = ["reflector.timer", "paccache.timer"] +OPTIONAL_SERVICES = ["avahi-daemon", "bluetooth", "cups", "docker", "tailscaled"] + + +@pytest.mark.attribution("archsetup") +@pytest.mark.parametrize("svc", REQUIRED_ENABLED_ACTIVE) +def test_required_service_enabled_and_active(host, svc): + s = host.service(svc) + assert s.is_enabled, "%s should be enabled" % svc + assert s.is_running, "%s should be active" % svc + + +@pytest.mark.smoke +@pytest.mark.attribution("archsetup") +@pytest.mark.parametrize("svc", REQUIRED_ENABLED_ONLY) +def test_required_service_enabled(host, svc): + assert host.service(svc).is_enabled, "%s should be enabled" % svc + + +@pytest.mark.attribution("archsetup") +@pytest.mark.parametrize("timer", REQUIRED_TIMERS) +def test_required_timer_enabled(host, timer): + assert host.service(timer).is_enabled, "%s should be enabled" % timer + + +@pytest.mark.parametrize("svc", OPTIONAL_SERVICES) +def test_optional_service(host, svc): + # Optional: warn-if-missing in the shell sweep -> skip here so it never reds. + if not host.service(svc).is_enabled: + pytest.skip("%s not enabled (optional)" % svc) + + +@pytest.mark.attribution("archsetup") +def test_dns_over_tls_dropin_present(host): + # archsetup ships /etc/systemd/resolved.conf.d/dns-over-tls.conf. + assert host.file("/etc/systemd/resolved.conf.d/dns-over-tls.conf").exists + + +@pytest.mark.attribution("archsetup") +def test_fail2ban_responds(host): + assert host.run("fail2ban-client status").rc == 0 + + +@pytest.mark.attribution("archsetup") +def test_networkmanager_responds(host): + assert host.run("nmcli general status").rc == 0 + + +@pytest.mark.attribution("archsetup") +def test_log_cleanup_cron_installed(host, target_user): + out = host.run("sudo -u %s crontab -l" % target_user).stdout + assert "log-cleanup" in out, "log-cleanup entry missing from user crontab" + + +@pytest.mark.attribution("archsetup") +def test_syncthing_user_lingering_enabled(host, target_user): + # syncthing runs as a user service; lingering must be on for autostart. + assert host.file("/var/lib/systemd/linger/%s" % target_user).exists + + +def test_dns_resolution(host): + # Network-dependent; advisory in the shell sweep. Skip on failure. + if host.run("resolvectl query archlinux.org").rc != 0: + pytest.skip("DNS resolution query failed (network-dependent)") + + +def test_mdns_resolves(host, on_slirp): + # mDNS needs multicast, which QEMU slirp doesn't pass. + if on_slirp: + pytest.skip("mDNS not possible on slirp networking (no multicast)") + if not host.service("avahi-daemon").is_enabled: + pytest.skip("avahi-daemon not enabled") + hostname = host.run("hostname").stdout.strip() + assert host.run("ping -c 1 -W 2 %s.local" % hostname).rc == 0 + + +def test_docker_functional(host): + if not host.service("docker").is_enabled: + pytest.skip("docker not enabled") + if not host.service("docker").is_running: + # archsetup enables docker for next boot, not --now; pre-reboot this is correct. + pytest.skip("docker enabled but not started (starts on boot by design)") + assert host.run("docker info").rc == 0 diff --git a/scripts/testing/tests/test_users.py b/scripts/testing/tests/test_users.py new file mode 100644 index 0000000..c0097ed --- /dev/null +++ b/scripts/testing/tests/test_users.py @@ -0,0 +1,34 @@ +# SPDX-License-Identifier: GPL-3.0-or-later +"""Post-install checks: the user account archsetup creates. + +Parity port of validate_user_created / validate_user_shell / validate_user_groups. +""" + +import pytest + + +# Groups archsetup adds: wheel (useradd -G), the usermod -aG set, and docker +# (added later in the developer-workstation step). +EXPECTED_GROUPS = [ + "wheel", "sys", "adm", "network", "scanner", "power", "uucp", + "audio", "lp", "rfkill", "video", "storage", "optical", "users", "docker", +] + + +@pytest.mark.smoke +@pytest.mark.attribution("archsetup") +def test_user_exists(host, target_user): + assert host.user(target_user).exists + + +@pytest.mark.attribution("archsetup") +def test_user_shell_is_zsh(host, target_user): + # archsetup may set either path depending on how zsh resolves. + assert host.user(target_user).shell in ("/bin/zsh", "/usr/bin/zsh") + + +@pytest.mark.attribution("archsetup") +@pytest.mark.parametrize("group", EXPECTED_GROUPS) +def test_user_in_group(host, target_user, group): + # Parametrized so a failure names the exact missing group. + assert group in host.user(target_user).groups diff --git a/scripts/wipedisk b/scripts/wipedisk index 0c08c72..b833407 100644 --- a/scripts/wipedisk +++ b/scripts/wipedisk @@ -1,4 +1,5 @@ #!/usr/bin/env bash +# SPDX-License-Identifier: GPL-3.0-or-later # Craig Jennings <c@cjennings.net> # identify disk and erase diff --git a/scripts/zfs-replicate b/scripts/zfs-replicate index cf946f1..02ffcf5 100755 --- a/scripts/zfs-replicate +++ b/scripts/zfs-replicate @@ -1,4 +1,5 @@ #!/bin/bash +# SPDX-License-Identifier: GPL-3.0-or-later # zfs-replicate - Replicate ZFS datasets to TrueNAS # # Usage: diff --git a/tests/backup-system-file/test_backup_system_file.py b/tests/backup-system-file/test_backup_system_file.py new file mode 100644 index 0000000..5d48d03 --- /dev/null +++ b/tests/backup-system-file/test_backup_system_file.py @@ -0,0 +1,161 @@ +"""Tests for the backup_system_file helper in the archsetup installer. + +backup_system_file snapshots a pre-existing system file to +`<path>.archsetup.bak` before archsetup edits it in place, so a botched +in-place edit (fstab, mkinitcpio.conf, sudoers, ...) is recoverable. It is +idempotent: it never overwrites an existing backup, so the pristine original +survives repeated edits within a run and across re-runs of the installer. It +no-ops (success) when the target does not exist. + +These tests exercise the REAL function body, extracted from the `archsetup` +script at run time (not a copy), so the production code path is what runs. +Edits run against real temp files the test creates and tears down. + +Run from repo root: + python3 -m unittest tests.backup-system-file.test_backup_system_file +""" + +import os +import shutil +import stat +import subprocess +import tempfile +import unittest + + +REPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) +ARCHSETUP = os.path.join(REPO_ROOT, "archsetup") + + +class BackupHarness(unittest.TestCase): + """Source backup_system_file out of the real archsetup script and invoke it.""" + + def setUp(self): + self.tmp = tempfile.mkdtemp(prefix="backup-system-file-test-") + # A bash wrapper that extracts just the backup_system_file function from + # the real installer and invokes it with the test's arg. Sourcing the + # sed-extracted function means we test the production code path, not a + # reimplementation. The helper is self-contained (prints its own + # warnings), so no logger stub is needed. + self.wrapper = os.path.join(self.tmp, "run.sh") + with open(self.wrapper, "w") as f: + f.write( + "#!/bin/bash\n" + 'ARCHSETUP="$1"; shift\n' + "source <(sed -n '/^backup_system_file() {/,/^}/p' \"$ARCHSETUP\")\n" + 'backup_system_file "$@"\n' + ) + os.chmod(self.wrapper, 0o755) + + def tearDown(self): + # Restore writability in case a test made a dir read-only. + for root, dirs, _ in os.walk(self.tmp): + for d in dirs: + os.chmod(os.path.join(root, d), 0o755) + shutil.rmtree(self.tmp, ignore_errors=True) + + def run_backup(self, target): + return subprocess.run( + ["bash", self.wrapper, ARCHSETUP, target], + capture_output=True, text=True, timeout=10, + ) + + def write(self, name, content, mode=None): + path = os.path.join(self.tmp, name) + with open(path, "w") as f: + f.write(content) + if mode is not None: + os.chmod(path, mode) + return path + + +# ----------------------------------------------------------------------------- +# Normal cases +# ----------------------------------------------------------------------------- + +class TestBackupNormal(BackupHarness): + + def test_existing_file_is_backed_up_with_same_content(self): + target = self.write("fstab", "UUID=abc / ext4 defaults 0 1\n") + result = self.run_backup(target) + self.assertEqual(result.returncode, 0, msg=result.stderr) + backup = target + ".archsetup.bak" + self.assertTrue(os.path.isfile(backup), "backup should be created") + with open(backup) as f: + self.assertEqual(f.read(), "UUID=abc / ext4 defaults 0 1\n") + + def test_backup_preserves_mode(self): + # sudoers ships 0440; a restored backup must keep restrictive perms. + target = self.write("sudoers", "root ALL=(ALL) ALL\n", mode=0o440) + result = self.run_backup(target) + self.assertEqual(result.returncode, 0, msg=result.stderr) + backup = target + ".archsetup.bak" + self.assertEqual(stat.S_IMODE(os.stat(backup).st_mode), 0o440) + + +# ----------------------------------------------------------------------------- +# Boundary cases +# ----------------------------------------------------------------------------- + +class TestBackupBoundary(BackupHarness): + + def test_existing_backup_is_not_overwritten(self): + # The pristine original must survive a later edit + second backup call. + target = self.write("pacman.conf", "PRISTINE\n") + self.assertEqual(self.run_backup(target).returncode, 0) + # Simulate archsetup editing the file in place, then backing up again. + with open(target, "w") as f: + f.write("EDITED\n") + result = self.run_backup(target) + self.assertEqual(result.returncode, 0, msg=result.stderr) + with open(target + ".archsetup.bak") as f: + self.assertEqual(f.read(), "PRISTINE\n", "backup must stay pristine") + + def test_missing_target_is_a_quiet_noop(self): + target = os.path.join(self.tmp, "never-existed.conf") + result = self.run_backup(target) + self.assertEqual(result.returncode, 0, msg=result.stderr) + self.assertFalse(os.path.exists(target + ".archsetup.bak")) + + def test_second_call_same_run_is_a_noop(self): + # A file edited twice in one run (e.g. mkinitcpio MODULES then HOOKS) + # gets backed up once; the second call must not error or re-copy. + target = self.write("mkinitcpio.conf", "HOOKS=(base udev)\n") + self.assertEqual(self.run_backup(target).returncode, 0) + backup = target + ".archsetup.bak" + first_mtime = os.stat(backup).st_mtime_ns + result = self.run_backup(target) + self.assertEqual(result.returncode, 0, msg=result.stderr) + self.assertEqual(os.stat(backup).st_mtime_ns, first_mtime, + "backup must not be rewritten on the second call") + + +# ----------------------------------------------------------------------------- +# Error cases +# ----------------------------------------------------------------------------- + +class TestBackupErrors(BackupHarness): + + def test_empty_target_is_refused(self): + result = self.run_backup("") + self.assertNotEqual(result.returncode, 0) + + def test_copy_failure_returns_nonzero(self): + # Target exists but its directory is read-only, so the .bak can't be + # written. The helper must report failure rather than silently skip. + subdir = os.path.join(self.tmp, "ro") + os.makedirs(subdir) + target = os.path.join(subdir, "fstab") + with open(target, "w") as f: + f.write("data\n") + os.chmod(subdir, 0o500) # r-x: owner cannot create the .bak here + try: + result = self.run_backup(target) + finally: + os.chmod(subdir, 0o755) + self.assertNotEqual(result.returncode, 0) + self.assertFalse(os.path.exists(target + ".archsetup.bak")) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/hypr-live-update-guard/test_hypr_live_update_guard.py b/tests/hypr-live-update-guard/test_hypr_live_update_guard.py new file mode 100644 index 0000000..5ec5ce8 --- /dev/null +++ b/tests/hypr-live-update-guard/test_hypr_live_update_guard.py @@ -0,0 +1,95 @@ +"""Tests for the hypr-live-update-guard pacman PreTransaction hook script. + +The guard aborts a live pacman upgrade of GPU/compositor runtime libraries +(mesa, hyprland, wayland, GPU drivers) while a Hyprland session is running, +so the compositor doesn't SIGABRT when a now-"(deleted)" library is next +called. It reads the triggering package names on stdin (pacman NeedsTargets) +and exits non-zero to abort the transaction (AbortOnFail) before any package +is swapped. When Hyprland isn't running, or an override is set, it exits 0 +and the upgrade proceeds. + +Test seams (env vars the production script honors): + HYPR_GUARD_RUNNING 1/0 forces the Hyprland-running check (default: pgrep) + HYPR_ALLOW_LIVE_UPDATE 1 overrides the guard (proceed anyway) + HYPR_GUARD_SENTINEL path whose existence also overrides the guard + +Run from repo root: + python3 -m unittest tests.hypr-live-update-guard.test_hypr_live_update_guard +""" + +import os +import subprocess +import tempfile +import unittest + + +REPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) +GUARD = os.path.join(REPO_ROOT, "scripts", "hypr-live-update-guard") + + +def run_guard(stdin="mesa\n", running="1", allow=None, sentinel=None): + env = dict(os.environ) + env["HYPR_GUARD_RUNNING"] = running + if allow is not None: + env["HYPR_ALLOW_LIVE_UPDATE"] = allow + # Point the sentinel at a path that does not exist unless a test sets one, + # so the host's real /run state can't leak into the result. + env["HYPR_GUARD_SENTINEL"] = sentinel if sentinel else "/nonexistent/guard-sentinel" + return subprocess.run( + ["sh", GUARD], + input=stdin, capture_output=True, text=True, timeout=10, env=env, + ) + + +class HyprLiveUpdateGuard(unittest.TestCase): + # --- Normal cases --------------------------------------------------- + + def test_running_with_dangerous_pkg_aborts(self): + r = run_guard(stdin="mesa\n", running="1") + self.assertEqual(r.returncode, 1, r.stderr) + + def test_abort_message_names_the_package_and_tty_remedy(self): + r = run_guard(stdin="mesa\n", running="1") + self.assertIn("mesa", r.stderr) + self.assertIn("TTY", r.stderr) + + def test_not_running_allows(self): + r = run_guard(stdin="mesa\n", running="0") + self.assertEqual(r.returncode, 0, r.stderr) + + def test_not_running_is_silent(self): + r = run_guard(stdin="mesa\nhyprland\n", running="0") + self.assertEqual(r.stderr.strip(), "") + + # --- Boundary cases ------------------------------------------------- + + def test_multiple_packages_all_listed(self): + r = run_guard(stdin="mesa\nhyprland\nvulkan-radeon\n", running="1") + self.assertEqual(r.returncode, 1) + for pkg in ("mesa", "hyprland", "vulkan-radeon"): + self.assertIn(pkg, r.stderr) + + def test_running_with_empty_stdin_still_guards(self): + # The hook only fires when dangerous targets exist, so an empty target + # list shouldn't normally happen; if Hyprland is up, stay safe (abort). + r = run_guard(stdin="", running="1") + self.assertEqual(r.returncode, 1) + + # --- Override / error cases ----------------------------------------- + + def test_env_override_proceeds_even_when_running(self): + r = run_guard(stdin="mesa\n", running="1", allow="1") + self.assertEqual(r.returncode, 0, r.stderr) + + def test_sentinel_file_override_proceeds(self): + with tempfile.NamedTemporaryFile(prefix="guard-allow-") as f: + r = run_guard(stdin="mesa\n", running="1", sentinel=f.name) + self.assertEqual(r.returncode, 0, r.stderr) + + def test_override_env_zero_does_not_bypass(self): + r = run_guard(stdin="mesa\n", running="1", allow="0") + self.assertEqual(r.returncode, 1, r.stderr) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/installer-steps/test_orchestrators.py b/tests/installer-steps/test_orchestrators.py new file mode 100644 index 0000000..e62c198 --- /dev/null +++ b/tests/installer-steps/test_orchestrators.py @@ -0,0 +1,117 @@ +"""Characterization tests for the decomposed installer step orchestrators. + +The 2026 decomposition turned the giant step functions into thin +orchestrators that call one named sub-function per concern. These tests pin +the call SEQUENCE of each orchestrator: a dropped, added, or reordered +sub-step call fails the test. They guard the wiring, not the sub-functions' +own behavior (those mutate the system and are exercised by the VM harness). + +Method: sed-extract the orchestrator from the real `archsetup` (its body is +now just `display` + sub-function calls), source it with `display` silenced +and every sub-function replaced by a recorder that echoes its own name, run +it, and assert stdout is the expected ordered list. + +Run from repo root: + python3 -m unittest tests.installer-steps.test_orchestrators +""" + +import os +import subprocess +import textwrap +import unittest + + +REPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) +ARCHSETUP = os.path.join(REPO_ROOT, "archsetup") + +# orchestrator -> exact ordered sub-step calls +ORCHESTRATORS = { + "essential_services": [ + "configure_randomness", "configure_networking", "configure_power", + "configure_ssh_server", "configure_fail2ban", "configure_firewall", + "configure_service_discovery", "configure_job_scheduling", + "configure_package_cache", "configure_snapshots", + "configure_user_lingering", + ], + "prerequisites": [ + "bootstrap_pacman_keyring", "install_required_software", + "configure_build_environment", "configure_package_mirrors", + ], + "developer_workstation": [ + "install_programming_languages", "install_editors", + "install_android_utilities", "install_vpn_tools", + "install_devops_utilities", + ], + "boot_ux": [ + "tighten_efi_permissions", "add_nvme_early_module", + "configure_initramfs_hook", "configure_encrypted_autologin", + "configure_tlp_power", "trim_firmware", "configure_grub", + ], + "user_customizations": [ + "clone_user_repos", "stow_dotfiles", "prune_waybar_battery", + "refresh_desktop_caches", "configure_dconf_defaults", + "finalize_dotfiles", "create_user_directories", + ], +} + + +def run_orchestrator(func, stubs, extra_defs=""): + """Source `func` from archsetup with `stubs` recording their names.""" + stub_defs = "\n".join(f"{s}() {{ echo {s}; }}" for s in stubs) + script = textwrap.dedent(f"""\ + display() {{ :; }} + {stub_defs} + {extra_defs} + source <(sed -n '/^{func}() {{/,/^}}/p' "{ARCHSETUP}") + {func} + """) + result = subprocess.run( + ["bash", "-c", script], + capture_output=True, text=True, timeout=10, + ) + return result + + +class OrchestratorSequence(unittest.TestCase): + def test_each_orchestrator_calls_substeps_in_order(self): + for func, expected in ORCHESTRATORS.items(): + with self.subTest(orchestrator=func): + result = run_orchestrator(func, expected) + self.assertEqual(result.returncode, 0, result.stderr) + got = result.stdout.split() + self.assertEqual(got, expected, + f"{func} call sequence drifted") + + +class SnapshotDispatch(unittest.TestCase): + """configure_snapshots branches on filesystem; pin each branch.""" + + SUBS = ["configure_zfs_snapshots", "configure_btrfs_snapshots"] + + def test_zfs_root_runs_zfs_snapshots(self): + result = run_orchestrator( + "configure_snapshots", self.SUBS, + extra_defs="is_zfs_root() { return 0; }\nis_btrfs_root() { return 1; }", + ) + self.assertEqual(result.returncode, 0, result.stderr) + self.assertEqual(result.stdout.split(), ["configure_zfs_snapshots"]) + + def test_btrfs_root_runs_btrfs_snapshots(self): + result = run_orchestrator( + "configure_snapshots", self.SUBS, + extra_defs="is_zfs_root() { return 1; }\nis_btrfs_root() { return 0; }", + ) + self.assertEqual(result.returncode, 0, result.stderr) + self.assertEqual(result.stdout.split(), ["configure_btrfs_snapshots"]) + + def test_other_filesystem_runs_neither(self): + result = run_orchestrator( + "configure_snapshots", self.SUBS, + extra_defs="is_zfs_root() { return 1; }\nis_btrfs_root() { return 1; }", + ) + self.assertEqual(result.returncode, 0, result.stderr) + self.assertEqual(result.stdout.split(), []) + + +if __name__ == "__main__": + unittest.main() diff --git a/tests/run-task/test_run_task.py b/tests/run-task/test_run_task.py new file mode 100644 index 0000000..35036dd --- /dev/null +++ b/tests/run-task/test_run_task.py @@ -0,0 +1,172 @@ +"""Tests for the run_task / enable_service helpers in the archsetup installer. + +run_task is the installer's describe-run-warn primitive. It replaces the +hand-written idiom that recurs ~100 times across the script: + + action="enabling rngd service" && display "task" "$action" + systemctl enable rngd >> "$logfile" 2>&1 || error_warn "$action" "$?" + +as a single call: + + run_task "enabling rngd service" systemctl enable rngd + +It announces the task via display, runs the command with stdout+stderr +appended to $logfile, and on failure calls error_warn with the command's +real exit code (non-fatal). enable_service is a thin wrapper that enables +one or more systemd units with the conventional "enabling <unit> service" +wording. + +These tests exercise the REAL function bodies, extracted from the +`archsetup` script at run time (not a copy), with recording stubs standing +in for display, error_warn, and systemctl. The command run by run_task is +genuinely executed. + +Run from repo root: + python3 -m unittest tests.run-task.test_run_task +""" + +import os +import shutil +import subprocess +import tempfile +import unittest + + +REPO_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "..")) +ARCHSETUP = os.path.join(REPO_ROOT, "archsetup") + +# A bash harness that sources the real run_task + enable_service out of the +# installer, with recording stubs for their dependencies. Each stub appends a +# tab-separated record to a file named by an env var, so the Python side can +# assert what was called. The real command passed to run_task still runs. +WRAPPER = r"""#!/bin/bash +ARCHSETUP="$1"; shift +logfile="$LOGFILE" + +display() { printf '%s\t%s\n' "$1" "$2" >> "$DISPLAY_LOG"; } +error_warn() { printf '%s\t%s\n' "$1" "$2" >> "$ERRWARN_LOG"; return 1; } +systemctl() { printf 'systemctl %s\n' "$*"; } + +source <(sed -n '/^run_task() {/,/^}/p' "$ARCHSETUP") +source <(sed -n '/^enable_service() {/,/^}/p' "$ARCHSETUP") + +"$@" +""" + + +class RunTaskHarness(unittest.TestCase): + def setUp(self): + self.tmp = tempfile.mkdtemp(prefix="run-task-test-") + self.wrapper = os.path.join(self.tmp, "run.sh") + with open(self.wrapper, "w") as f: + f.write(WRAPPER) + os.chmod(self.wrapper, 0o755) + self.logfile = os.path.join(self.tmp, "install.log") + self.display_log = os.path.join(self.tmp, "display.log") + self.errwarn_log = os.path.join(self.tmp, "errwarn.log") + + def tearDown(self): + shutil.rmtree(self.tmp, ignore_errors=True) + + def call(self, *args): + env = dict(os.environ) + env["LOGFILE"] = self.logfile + env["DISPLAY_LOG"] = self.display_log + env["ERRWARN_LOG"] = self.errwarn_log + return subprocess.run( + ["bash", self.wrapper, ARCHSETUP, *args], + capture_output=True, text=True, timeout=10, env=env, + ) + + def read(self, path): + if not os.path.exists(path): + return "" + with open(path) as f: + return f.read() + + # --- Normal cases ----------------------------------------------------- + + def test_run_task_success_announces_and_runs(self): + result = self.call("run_task", "doing a thing", "true") + self.assertEqual(result.returncode, 0, result.stderr) + # Announced as a "task" with the exact description. + self.assertEqual(self.read(self.display_log), "task\tdoing a thing\n") + # No warning on success. + self.assertEqual(self.read(self.errwarn_log), "") + + def test_run_task_captures_command_output_to_logfile(self): + result = self.call("run_task", "echo something", "echo", "hello-from-cmd") + self.assertEqual(result.returncode, 0, result.stderr) + self.assertIn("hello-from-cmd", self.read(self.logfile)) + # Command output is logged, not printed to the terminal. + self.assertNotIn("hello-from-cmd", result.stdout) + + def test_run_task_captures_stderr_to_logfile(self): + # `ls` of a missing path writes to stderr; it must land in the logfile. + missing = os.path.join(self.tmp, "no-such-path") + self.call("run_task", "listing", "ls", missing) + self.assertIn("no-such-path", self.read(self.logfile)) + + def test_run_task_preserves_multiple_arguments(self): + self.call("run_task", "multi-arg", "printf", "%s|%s|%s", "a", "b", "c") + self.assertIn("a|b|c", self.read(self.logfile)) + + def test_run_task_preserves_arguments_with_spaces(self): + self.call("run_task", "spacey", "printf", "[%s]", "two words") + self.assertIn("[two words]", self.read(self.logfile)) + + # --- enable_service --------------------------------------------------- + + def test_enable_service_single_unit(self): + self.call("enable_service", "rngd") + self.assertEqual(self.read(self.display_log), "task\tenabling rngd service\n") + self.assertIn("systemctl enable rngd", self.read(self.logfile)) + + def test_enable_service_multiple_units(self): + self.call("enable_service", "foo", "bar", "baz") + disp = self.read(self.display_log) + self.assertIn("task\tenabling foo service\n", disp) + self.assertIn("task\tenabling bar service\n", disp) + self.assertIn("task\tenabling baz service\n", disp) + log = self.read(self.logfile) + self.assertIn("systemctl enable foo", log) + self.assertIn("systemctl enable bar", log) + self.assertIn("systemctl enable baz", log) + + # --- Error cases ------------------------------------------------------ + + def test_run_task_failure_warns_with_description(self): + result = self.call("run_task", "failing thing", "false") + self.assertNotEqual(result.returncode, 0) + self.assertEqual(self.read(self.errwarn_log), "failing thing\t1\n") + + def test_run_task_failure_propagates_real_exit_code(self): + # `bash -c 'exit 42'` must surface 42 to error_warn, not a clobbered 0. + self.call("run_task", "exit-42", "bash", "-c", "exit 42") + self.assertEqual(self.read(self.errwarn_log), "exit-42\t42\n") + + def test_enable_service_failure_warns_per_unit(self): + # Override systemctl to fail; each unit should produce a warning. + env = dict(os.environ) + env["LOGFILE"] = self.logfile + env["DISPLAY_LOG"] = self.display_log + env["ERRWARN_LOG"] = self.errwarn_log + # Re-create wrapper with a failing systemctl stub for this case. + failing = os.path.join(self.tmp, "run-fail.sh") + with open(failing, "w") as f: + f.write(WRAPPER.replace( + "systemctl() { printf 'systemctl %s\\n' \"$*\"; }", + "systemctl() { printf 'systemctl %s\\n' \"$*\"; return 1; }", + )) + os.chmod(failing, 0o755) + subprocess.run( + ["bash", failing, ARCHSETUP, "enable_service", "alpha", "beta"], + capture_output=True, text=True, timeout=10, env=env, + ) + warns = self.read(self.errwarn_log) + self.assertIn("enabling alpha service\t1\n", warns) + self.assertIn("enabling beta service\t1\n", warns) + + +if __name__ == "__main__": + unittest.main() @@ -21,7 +21,7 @@ The vocabulary is open — topic tags are coined as needed — so these are conv - *Effort / autonomy*: =:quick:= a spare-moment fix (minutes, not a sitting); =:solo:= Claude can carry it end to end — there's a build path, a test path, and no upfront decision needed (a leftover manual spot-check doesn't disqualify it). - *Topic / area* (open): the subsystem a task touches — e.g. =:hyprland:= =:waybar:= =:mpd:= =:music:= =:network:= =:tooling:= =:llm:= =:eask:= =:pocketbook:= =:cmail:=. Coin a new one when it aids filtering. * Archsetup Open Work -** TODO [#B] Scrolling layout: frame fit + wrap-around :hyprland: +** TODO [#B] Scrolling/Carousel layout: frame fit + wrap-around :hyprland: :PROPERTIES: :LAST_REVIEWED: 2026-06-13 :END: @@ -32,49 +32,12 @@ Disabled 2026-06-12 (bind and cycle entry points removed; Super+Shift+S reassign The support machinery was deliberately kept for this task: =layout-navigate= and =layout-resize= retain their scrolling branches, =waybar-layout= still renders the scrolling state, and the unbound legacy =cycle-layout= script still lists it. Re-enabling is two lines: add =scrolling= back to =LAYOUTS= in =layout-cycle= and restore a direct-jump bind (the old chord is taken now — pick a new one). The =tests/layout-cycle= suite pins the disabled state and will go red on re-enable, which is the reminder to update it. -** TODO [#C] Waybar indicators unevenly spaced :quick:solo: +** TODO [#B] Pocketbook finish-or-cancel decision :pocketbook: +SCHEDULED: <2026-08-23 Sun> :PROPERTIES: -:LAST_REVIEWED: 2026-06-09 -:END: -The right-side module icons don't sit at even intervals — spacing reads as inconsistent across the group. Tune the per-module margin/padding in =dotfiles/hyprland/.config/waybar/style.css= so the icons are evenly distributed. Noticed 2026-05-21 after adding the airplane indicator. -** TODO [#C] Wlogout exit-menu buttons are rectangular, not square -:PROPERTIES: -:LAST_REVIEWED: 2026-06-09 -:END: -The wlogout exit menu renders its buttons taller than they are wide on velox, so the cells read as vertical rectangles instead of squares. They render square (centered) correctly on ratio, so this is a per-host / resolution difference, not a flat bug. Fix the button sizing in the wlogout style (=~/.dotfiles/hyprland/.config/wlogout/style.css=) so each cell is square on both hosts. Noticed 2026-05-21. Related: the [#D] VERIFY about wlogout sizing across displays. - -Add a regression test so the square-cell fix doesn't silently break on a resolution change: assert the rendered (or computed) wlogout button cells are square across ratio's and velox's resolutions. Dropped :quick: — the cross-host test pushes this past a spare-moment fix. -** TODO [#B] Guard against live mesa/hyprland/wayland-runtime updates :hyprland: -:PROPERTIES: -:LAST_REVIEWED: 2026-06-09 -:END: -A live =pacman -Syu= that swaps mesa/hyprland/wayland runtime libs out from under a running Hyprland session can crash the compositor: the next GPU-lib call hits a now-"(deleted)" library and SIGABRTs, taking the Wayland clients down with it. Hit ratio 2026-06-07 (mesa 26.0.6 -> 26.1.2 + hyprland upgraded live; Hyprland SIGABRT took down awww/insync/emacs). Likely the driver behind ratio's high lifetime unsafe-shutdown ratio — a crashed compositor forces a hard reset. - -Ship a guard: an update wrapper, or a documented practice, that when a pending =-Syu= set includes mesa/hyprland/wayland runtime libs advises running it from a TTY (or after logging out of Hyprland) rather than live. Returned to archsetup from archangel 2026-06-09 — hyprland/mesa are installed and managed by archsetup, not the ISO installer. - -** TODO [#C] Pocketbook development backlog :pocketbook: -:PROPERTIES: -:LAST_REVIEWED: 2026-05-26 +:LAST_REVIEWED: 2026-06-24 :END: -Pocketbook (GTK4 layer-shell notes panel, toggled via waybar) was pulled from publication 2026-05-26 — github repo + cjennings.net repo deleted, mirror hook removed — and folded into this repo at =pocketbook/= until it's ready to spin back out. Src-layout Python package with pytest tests and a Makefile. Develop it in-tree; the backing modules are =store/note/panel/layer_shell/app/note_widget= + =style.css=. - -Backlog (unordered; promote items to their own dated tasks as they're picked up): - -- Configurable options, possibly a dedicated configuration panel. -- Lose-focus hides pocketbook — configurable on/off. -- Configurable display order: chronological by creation date (asc/desc), manual, alphabetical (asc/desc). -- Search / filter notes. -- Global toggle keybind (Hyprland =bind=) alongside the waybar click; document the waybar integration. -- Note CRUD polish (create/edit/delete) + optional markdown rendering. -- Pin / favorite notes. -- Tags or notebooks / categories. -- Persistence: confirm store format + =~/.local/share/pocketbook/= location, add versioning/migration, decide a backup/sync story. -- Theming: track the dupre/hudson theme system so =style.css= follows =set-theme=. -- Layer-shell geometry config (anchor edge, width, margins) + HiDPI / multi-monitor behavior — ties into [[file:docs/PLAN-per-host-overrides.org][per-host overrides]] scaling work. -- Config file format (toml) + reload-without-restart. -- Expand test coverage (TDD per testing standards; =tests/= already exists). -- Release prep for the eventual spin-back-out: pyproject metadata, version, license. -- Re-wire the archsetup install (gtk4-layer-shell dep + install step + post-install clone) when pocketbook ships. Removed 2026-05-26 — see git history of =archsetup= / =scripts/post-install.sh=. +Decide whether to finish the pocketbook app or close and cancel the project. Removed from the waybar setup 2026-06-23 (the org-capture popup covers quick reminders and text for now), so it's out of daily use — this is the checkpoint to commit to it or retire it. Backlog above: [[*Pocketbook development backlog][Pocketbook development backlog]]. ** TODO [#B] Provision Eask in archsetup :tooling:eask: :PROPERTIES: @@ -87,10 +50,13 @@ Add =@emacs-eask/cli= to archsetup's provisioning so fresh machines get it. Eask - Decision: also set a persistent user npm prefix (=~/.npmrc= with =prefix=${HOME}/.local=)? If yes, that =~/.npmrc= is a legitimate dotfile to stow; if no, rely on the explicit =--prefix= flag alone. =~/.eask/= is a regenerable cache — leave un-stowed. - Acceptance: fresh run leaves =eask= on PATH at =~/.local/bin/eask= (no root); =cd ~/code/chime && make setup && make test= works. -** TODO [#B] Waybar timer module :waybar: +** DONE [#B] Waybar timer module :waybar: +CLOSED: [2026-06-29 Mon] :PROPERTIES: :LAST_REVIEWED: 2026-05-26 :END: +Shipped as =wtimer= in the dotfiles repo (=134d61e=), a single always-visible module right of the battery/resource readout, non-collapsible. Covers all four modes (timer / alarm / stopwatch / pomodoro) with multiple running at once: the bar shows the most urgent item with a per-type glyph + "+N" badge, the tooltip lists them all. Left-click creates (fuzzel), middle-click pauses, right-click cancels, scroll cycles the primary; notify fires on completion and pomodoro phase changes. Pure-functions-over-injected-clock design; CLI serializes state with flock + atomic write so the 1s render and click handlers never lose an update or double-fire. TDD: 86 cases, 95% line coverage. Design spec: [[file:docs/design/2026-06-29-waybar-timer-module-spec.org][docs/design/2026-06-29-waybar-timer-module-spec.org]]. Live-verified on velox (glyph renders, position, countdown); the color states + click interactions filed under Manual testing and validation. + A custom waybar module providing three time-keeping functions, surfaced in the bar with click/scroll controls and dunst notifications on completion. - *Alarm* — fire a notification at a wall-clock time (e.g. 2:00pm). Builds on the existing =notify= + =at= pattern from protocols.org. @@ -99,42 +65,102 @@ A custom waybar module providing three time-keeping functions, surfaced in the b Implementation notes (to flesh out when picked up): waybar =custom= module(s) with =exec= polling or a persistent =exec= script emitting JSON; click actions to start/pause/reset; a small state file under =~/.local/state= or =~/.local/var=. Lives in the hyprland tier (=dotfiles/hyprland/.config/waybar/= + a backing script in =hyprland/.local/bin/=). TDD the backing script per testing.md. -** TODO [#B] Collapsible waybar sides :waybar: -:PROPERTIES: -:LAST_REVIEWED: 2026-06-09 -:END: -Let either side of the waybar collapse horizontally to a minimal base set, toggled by a click. Each collapsible side carries a small triangle / arrowhead pointing toward the screen edge it collapses into (away from center). Clicking it collapses that side to its base set and flips the arrow to point back toward center; clicking again restores the full side. Same shape-changes-with-state idea as the auto-dim indicator. - -Spec ready (2026-06-19): [[file:working/collapsible-waybar-sides/collapsible-waybar-sides-spec.org]]. Spike settled the mechanism: [[file:working/collapsible-waybar-sides/spike-findings.org]]. - -Decisions locked: right base set = date + worldclock + tray; left base set = menu + workspaces; per-side independent; host-agnostic (base set constant, full set is each host's existing config). Mechanism = config-swap + SIGUSR2 reload via an active-config copy in =$XDG_RUNTIME_DIR= (the CSS/state-file approach was disproven — GTK3 can't reflow-hide native modules). Lives in =~/.dotfiles/hyprland/=. Next: implement per the spec (TDD the toggle + arrow scripts). - -** TODO [#B] Network-manager dropdown, nmcli-backed with GPG-stored secrets :waybar:network: -:PROPERTIES: -:LAST_REVIEWED: 2026-06-09 -:END: -Replace the current wifi/network waybar component with a self-contained network manager driving nmcli directly (no =nmtui= dependency). Same look as the existing indicator; clicking it drops down the management interface (design open, keep it minimalistic). - -Core functionality: -- Add / edit / remove connections. -- List saved connections by SSID, ordered by recency (most-recently-used first); select one to switch to it. -- Recognize a wired/ethernet connection even when plugged in after the session started, and allow selecting it at any time. Switch freely: ethernet↔wifi, wifi↔wifi. -- Match all current functionality of the existing wifi/network component (status icon, signal strength, tooltip). - -Credential storage: -- Store connection definitions + passwords in a GPG-encrypted file under =~/.config= (appropriate XDG location), encrypted to Craig's private key. -- Passphrase cadence configurable: decrypt once per session, once per hour (via gpg-agent cache TTL), or never (plaintext / stays decrypted). *Default is unencrypted* — encryption is opt-in. - -Design / open questions (propose before building): -- Dropdown UI tech: a GTK layer-shell panel (like pocketbook), a fuzzel/rofi-style menu, or a waybar-native expanding group. -- Relationship to NetworkManager's own store (=/etc/NetworkManager/system-connections=, root-only): does the GPG store supplement or replace it, and how do they stay in sync. -- Whether to keep the existing =custom/netspeed= throughput readout alongside the new SSID/status indicator. - -Implementation notes: backing scripts in the dotfiles repo (hyprland tier); nmcli for every NM op (device status, con up/down, add/modify/delete, wifi rescan/list). TDD the nmcli-wrapper logic with a fake nmcli on PATH. Sizable — worth a =docs/design/= doc before implementation. +*** 2026-06-24 Wed @ 17:32:37 -0400 Scope expansion from roam capture (folded duplicate) +A roam-inbox capture asked for the same widget and expands the scope, so folding it in here rather than duplicating: +- *One panel, mode-selectable* — a single component where you choose timer / stopwatch / alarm; the icon changes to reflect the selected mode. +- *Stopwatch* — a count-up (the third function alongside the alarm/timer/pomodoro above), hover shows start time ("Stopwatch started: 12:22pm"). +- Hover text per mode: timer "Timer: 5 min", alarm "Alarm: 12:15pm", stopwatch "Stopwatch started: 12:22pm". +- *Multiple simultaneous* — several timers/alarms/stopwatches set and displayed at once, in one panel. +- Deliverable includes proposing a few panel designs and recommending one before building. + +** DONE [#B] Sysmon module right-click cycles the visible metric :feature:waybar:solo: +CLOSED: [2026-06-28 Sun] +Shipped in the dotfiles repo (=f7b6896=, implemented from this archsetup session per Craig). =waybar-sysmon= reads a selected metric from =$XDG_RUNTIME_DIR/waybar/sysmon-metric= (absent = host default, so the old behavior is preserved); the new =sysmon-cycle= helper advances through a host-appropriate ring (battery only on a laptop), wraps, and refreshes waybar via signal 12 wired to =on-click-right=. Left-click stays the btop popup. Added cpu/temp/mem icons + thresholds. TDD: 13 new =waybar-sysmon= selection cases + a 9-case =sysmon-cycle= suite, full dotfiles suite green (29 suites). =sysmon-cycle= symlinked into =~/.local/bin= on velox. Live visual/relogin check filed under "Manual testing and validation". Handoff sent to the dotfiles inbox. +Builds on the just-shipped =custom/sysmon= collapse (dotfiles be7469b). Right-clicking the module rotates which metric is the visible one, in a fixed order: battery → cpu → temp → mem → disk → back to battery. Each click advances one step and wraps around. The host default (battery on a laptop, disk on a desktop) is the starting/reset metric; the tooltip keeps showing all metrics regardless. Left-click stays =pypr toggle monitor= (the btop popup) — the cycle lives on =on-click-right=. + +Implementation notes: =waybar-sysmon= needs a persisted selection (a state file in =$XDG_RUNTIME_DIR/waybar/=, absent = host default) that it reads to pick the visible metric. A new =sysmon-cycle= helper bumps the index and signals the module to refresh (add a =signal= to =custom/sysmon=, like the other custom modules; wire =sysmon-cycle= to =on-click-right=). TDD both — extend =tests/waybar-sysmon= for selection-driven output, add a =tests/sysmon-cycle= for the index advance/wrap and the signal. + +** TODO [#B] Waybar network module — custom/net :feature:waybar:network: +:PROPERTIES: +:LAST_REVIEWED: 2026-06-29 +:END: +Unifies the old wifi-no-internet indicator (was =[#C]=) and the network-manager +dropdown (was =[#B]=) into one =custom/net= module: a tested Python =net= engine +(nmcli + diagnostics), a thin bar indicator, and a GTK4 layer-shell panel. Code +lives in the dotfiles repo (hyprland tier + a =net/= package like pocketbook); +archsetup only installs deps. Secrets stay in NetworkManager's own store (no +separate credential store). The =captive= script becomes the diagnostics engine. +Full design, acceptance criteria, and the failure-mode coverage table: +[[file:docs/design/2026-06-29-waybar-network-module-spec.org][2026-06-29-waybar-network-module-spec.org]]. + +Phases below, dependency order. Engine/unit work is agent-verifiable (=unittest= ++ fakes on PATH, coverage via venv); the live-network and visual states need real +conditions, filed under "Manual testing and validation". + +*** TODO Phase 1 — indicator + console recovery :network: +Deliverable: =net status= + =net probe= (native cheap captive probe) + the +=captive= =--json= probe refactor; =waybar-net= replacing =custom/netspeed=; +split-cadence cache (single-flight flock, atomic write, fresh/stale/expired/unknown +classes, iface/SSID/UUID invalidation); CSS states (captive/no-internet/degraded/ +rfkill); =net repair= tiers (rfkill/reset/bounce); =net doctor [--fix]= with the +four terminal classifications; Makefile console-recovery targets (=make online= +etc.); absorb the airplane state and delete the standalone airplane module +(=waybar-airplane=, =airplane-mode=, their tests, =custom/airplane= + css). +Tests: =tests/net/= + =tests/waybar-net/= unittest with fake nmcli/curl/rfkill/ +resolvectl on a temp PATH; doctor-classification fixtures; degraded-under-slow-nmcli +benchmark; branch coverage ≥90% on pure modules via a throwaway venv; coverage-gap +pass. +Verify (manual, live): see Manual testing and validation. + +*** TODO Phase 2 — panel shell + connection management :network: +Deliverable: GTK4 + gtk4-layer-shell panel (pocketbook scaffold); =net list/up/ +down/add/edit/remove/rescan= (open + WPA-PSK; enterprise activate-only); MRU list +with live signal; mutation safety + rollback (keep prior link until target +activates, no stranding); panel state machines; the panel UX flow (default focus, +primary buttons, disabled rules, confirmation wording, keyboard nav). +Tests: fake-nmcli command-sequence assertions (UUID-keyed, escaped parsing: +colon/backslash/newline/duplicate/hidden/non-ASCII); rollback keeps prior link on +failed switch; NM-secret write + no-secret-leak; panel state-machine transitions. +Verify (manual, live): see Manual testing and validation. + +*** TODO Phase 3 — diagnostics + speed test in the panel :network: +Deliverable: wire =net diagnose= / =net repair= / =net doctor= / =net portal= / +=net speedtest= into the Diagnose (read-only) vs Repair (mutating, confirmed) +sections; "Get me online" with live escalation reporting; portal Open button; +speedtest (=speedtest-go --json=) progress + cancel; failure-mode → exact-string +rendering across surfaces. +Tests: diagnose read-only; each repair tier confirms + verifies cleanup (DNS +override reverts → cleanup_verified, else cleanup-unverified); speedtest parse from +fixture JSON + fixture stderr failure messages. +Verify (manual, live): see Manual testing and validation. + +*** TODO Phase 4 — docs + rollout :network: +Deliverable: in-app help (=net --help= + per-command, panel help affordance); +README/user-guide (commands, indicator states, panel, config keys, make targets, +troubleshooting from the failure table, rollback); archsetup Hyprland dep install +(=gtk4-layer-shell=, =python-gobject=, =speedtest-go-bin=); ratio manual dep + +stow step. +Verify: =net --help= and each subcommand complete; user-guide covers every command ++ the recovery targets. + +*** TODO Phase 5 — VPN / WireGuard (vNext) :network: +Fold the existing archsetup wireguard tooling into the panel + CLI (=net vpn ...=). +Out of the v1 milestone; spec separately when picked up. (v1 only detects + +classifies a VPN-routed failure, it doesn't repair it.) + +** DONE [#B] Network module: enterprise WiFi add/edit deferred to vNext :waybar:network: +CLOSED: [2026-06-29 Mon] +Decided 2026-06-29 (Craig): keep v1 to open + WPA-PSK add/edit; the +WPA-Enterprise / 802.1X add/edit form is vNext, not a v1 phase. v1 still +*activates* any saved enterprise profile and points editing at nmtui/nmcli. +Evidence that settled it: 24 saved profiles on velox, 18 WPA-PSK, 0 enterprise — +no 802.1X network in Craig's history, so the form would be unused UI. If one ever +appears, nmtui adds it once and the module activates it thereafter. Spec: +[[file:docs/design/2026-06-29-waybar-network-module-spec.org][2026-06-29-waybar-network-module-spec.org]]. ** TODO [#B] Desktop-settings dropdown panel :waybar: :PROPERTIES: -:LAST_REVIEWED: 2026-06-09 +:LAST_REVIEWED: 2026-06-24 :END: One waybar dropdown gathering the desktop toggles and sliders into a single settings panel, opened from a gear/settings glyph on the bar. Incorporate: - *Auto-dim* toggle (the =custom/dim= feature just shipped — fold in here, or keep the standalone indicator and mirror it). @@ -142,7 +168,7 @@ One waybar dropdown gathering the desktop toggles and sliders into a single sett - *Keyboard-backlight* brightness slider (brightnessctl on the kbd_backlight class). - *Mouse* enable/disable toggle — shown only when a mouse is connected. - *Trackpad* enable/disable toggle — shown only when a trackpad is connected (mirror =toggle-touchpad= / =touchpad-auto=). -- *Idle inhibitor* (the existing =idle_inhibitor= module). +- *Idle inhibitor* (the =custom/idle= module that replaced the built-in =idle_inhibitor= 2026-06-24 — toggles the hypridle daemon, state-synced icon). - *Airplane mode* (the existing =airplane-mode= toggle; laptop-only). The conditional rows (mouse, trackpad, airplane) appear only when their hardware/context applies — reuse the laptop/device detection the airplane and touchpad indicators already do. @@ -153,14 +179,6 @@ Design / open questions (propose before building): Implementation notes: a small GTK layer-shell app (mirror pocketbook's structure: src-layout Python package, pytest, Makefile) talking to brightnessctl / hyprctl / the touchpad + airplane helpers. Lives in the dotfiles repo or in-tree like pocketbook. TDD the backing toggle/slider logic. Sizable — worth a design doc first. -** TODO [#B] Separate mpd playlist_directory from music_directory :mpd:music:quick: -:PROPERTIES: -:LAST_REVIEWED: 2026-06-09 -:END: -Spec written and approved (option 1), pinned before execution on 2026-06-03. Root issue: mpd.conf has =playlist_directory= == =music_directory= == ~/music, so the whole audio library is the playlist store and radio streams mix with curated playlists. Option 1: radio stream playlists (portable, 73 in the dotfiles repo) move to a dedicated =playlist_directory= (=~/.local/share/mpd/playlists=) via stow; the 22 curated local playlists (machine-specific track refs) live in the music tree. Also removes the broken ~/music/radio/ orphan (73 dead symlinks). - -Full step-by-step spec (mpd.conf edit, repo restructure of =common/music/= → =common/.local/share/mpd/playlists/=, curated relocation, restow, verification incl. the 7 relative-path curated playlists, ratio propagation) is in the 2026-06-03 session record under .ai/sessions/. Two open decisions before executing: (1) drop the empty =60s Sounds.m3u= or refill with the SomaFM 60s URL; (2) curated playlists into =~/music/playlists/= subdir vs leave flat in ~/music/. Side cleanup surfaced: a stray audio file =Black Flamingos - Space Bar.m4a= is wrongly committed in the dotfiles repo's =common/music/= — git rm it and move to the synced library. - ** TODO [#B] Local offline LLM runtime + per-host model cache :tooling:llm: :PROPERTIES: :LAST_REVIEWED: 2026-05-29 @@ -199,114 +217,6 @@ Boot the configured endpoint and send a short prompt; surface success/failure + Acceptance: fresh VM install of the ratio profile reaches an endpoint on =:8081= that answers a smoke prompt; velox profile gets Q4_K_M + 8B and answers a prompt within reasonable laptop latency; network-down install completes successfully with the pending-models warning surfaced. -** DOING [#B] Prepare for GitHub open-source release -:PROPERTIES: -:LAST_REVIEWED: 2026-06-09 -:END: -Remove personal info, credentials, and code quality issues before publishing. -*** 2026-06-16 Tue @ 00:55:39 -0500 Six dotfiles-scoped sub-tasks moved to the ~/.dotfiles project -Per the 2026-06-16 task audit, the six sub-tasks targeting files now owned by the standalone =~/.dotfiles= repo were handed off to that project (newly bootstrapped as its own AI project) and removed from this epic: "Remove credentials and secrets from dotfiles", "Remove/template personal info from dotfiles", "Remove binary font files from repo", "Move battery out of waybar sysmonitor group", "Resolution-adaptive scratchpad sizing", and "Dynamic waybar/foot config based on screen resolution". Handoff: =~/.dotfiles/inbox/2026-06-16-0053-from-archsetup-dotfiles-release-prep-handoff.org=. This epic now covers archsetup-proper release work only (scripts personal-info, device-specific config, history scrub, shellcheck, SPDX headers, README/LICENSE). The 2026-06-09 reconciliation note below is the prior state. -*** 2026-06-09 Tue @ 19:21:36 -0500 Reconciliation: six sub-tasks now target the ~/.dotfiles repo, not archsetup -Phase 3.2 removed the in-repo =dotfiles/= tree, so six sub-tasks below no longer describe archsetup content — they target files now owned by the =~/.dotfiles= repo (=git.cjennings.net/dotfiles.git=): "Remove credentials and secrets from dotfiles", "Remove/template personal info from dotfiles", "Remove binary font files from repo", "Move battery out of waybar sysmonitor group", "Resolution-adaptive scratchpad sizing", and "Dynamic waybar/foot config based on screen resolution". Their paths are relative to that repo now. Kept here for tracking per Craig (2026-06-09); he'll re-scope the archsetup-vs-dotfiles split shortly. archsetup-proper release work (scripts personal-info, device-specific config, shellcheck, and scrubbing the pre-=b10cba5= dotfiles secrets from archsetup's own history) stays this task. -*** 2026-05-11 Mon @ 13:01:29 -0500 AI Response: Open-source-prep source audit -Checked each subtask below against the source / git state. Bottom line: almost nothing is fully done. =LICENSE= and =README.md= were added this session (see those subtasks); the rest still stands. -- *Remove credentials and secrets from dotfiles* — NOT DONE. All five named files still tracked: =dotfiles/common/.config/.tidal-dl.token.json=, =.config/calibre/smtp.py.json=, =.config/transmission/settings.json=, =.msmtprc=, =.mbsyncrc=. =.gitignore= lists none of them; no =.example= templates exist. -- *Remove/template personal info from scripts* — PARTIALLY DONE. Repo URLs ARE config-driven (=archsetup:141-146= use =${dwm_repo:-https://git.cjennings.net/...}=, documented in =archsetup.conf.example=). Still personal: =archsetup:2-3= (email/website header), =init:8,21= (=root:welcome=), =scripts/post-install.sh:17-56= (personal repos). -- *Remove/template personal info from dotfiles* — NOT DONE. =.gitconfig= has =c@cjennings.net=, =name = Craig Jennings=, =github user = cjennings=, =safe.directory= and employer creds; =.config/mpd/musicpd.conf= + =mpd.conf= still use =~cjennings/= / =/home/cjennings/= paths; =.ssh/config= has personal/employer hosts; =.config/yt-dlp/config:2= has =c@cjennings.net=; =hyprland.conf:3= has personal attribution. -- *Scrub git history of secrets* — NOT DONE. 275 commits; history not fresh, no filter-repo evidence. -- *Remove device-specific configuration* — NOT DONE. =archsetup:1486-1493= still creates the Logitech BRIO udev rule unconditionally; no config flag. -- *Add README.md for GitHub* — DONE (this session — initial draft, pending review). See subtask below. -- *Add LICENSE file* — DONE (this session — GPL-3). See subtask below. -- *Remove binary font files from repo* — NOT DONE. =dotfiles/common/.local/share/fonts/= still tracks 8 PragmataPro =.ttf= files, =AppleColorEmoji.ttf=, and other commercial fonts (Cartograph, MonoLisa, ComicCode, etc.). -- *Make claude-code installation optional* — NOT DONE. =archsetup:1817-1818= runs =curl -fsSL https://claude.ai/install.sh | sh= unconditionally; no flag. -- *Add input validation for username and paths* — PARTIALLY DONE. =archsetup:326-328= validates =$username= against =^[a-z][a-z0-9_]*$= (plus reserved-names check, marked DONE separately). No validation of =$source_dir= or other path vars. -- *Move battery out of waybar sysmonitor group* — NOT DONE. =dotfiles/hyprland/.config/waybar/config:27-37= still has =battery= inside =group/sysmonitor=. -- *Resolution-adaptive scratchpad sizing* — NOT DONE. No size/move windowrules for scratchpads in =hypr/conf.d=. -- *Dynamic waybar/foot config based on screen resolution* — NOT DONE. No resolution-detection/generation script. -- *Bulk shellcheck cleanup* — PARTIALLY DONE. =shellcheck archsetup= still shows 68 findings: 30×SC2329, 16×SC2174, 15×SC2024, 4×SC2086, 1 each SC2155/SC2129/SC2005. The 4 SC2086 (unquoted) are the ones a reviewer would flag — those are the priority. -- *Document testing process in README* — NOT DONE. =scripts/testing/README.org= exists but isn't the project README. (Now unblocked — root README exists.) -- *Add guard for rm -rf on constructed paths* — DONE 2026-05-20. All three constructed-path deletes routed through a =safe_rm_rf= guard (absolute / no-=..= / inside-allowed-prefix / real-dir checks); unit-tested in =tests/safe-rm-rf/=. -- *Standardize boolean comparison style* — NOT DONE. Mixed: =[ "$var" = "true" ]= at =archsetup:542,544,569= vs bare =if $var;= form ~7 places elsewhere. -- *Replace eval with safer alternatives* — NOT DONE. =archsetup:442= still =if eval "$cmd" >> "$logfile" 2>&1;= in =retry_install=. - -*** TODO [#A] Rotate exposed calendar feed URLs -Needs the ratio GUI (browser-based regeneration), so deferred until I'm in front of ratio. Three private ical URLs sat in git history (commit =500b1f5=, 2026-05-13) until the 2026-05-20 scrub. The scrub removed them from local + remote history, but anyone who pulled the repo between those dates still has the tokens, so regenerate all three: -- Google personal (=craigmartinjennings@gmail.com= private ical URL) -- Proton (calendar.proton.me URL with PassphraseKey) -- Google DeepSat (=craig.jennings@deepsat.com= private ical URL) -After regenerating, update the live =~/.emacs.d/calendar-sync.local.el= (now owned by the emacs/dotemacs project — see its inbox handoff from 2026-05-20). - -*** 2026-05-20 Wed @ 12:09:32 -0500 Scrubbed the calendar secret from git history -=dotfiles/common/.emacs.d/calendar-sync.local.el= (private Google/Proton/DeepSat ical URLs, added in =500b1f5= for stow distribution) was discovered while folding tmux-util into stow. Sent the file back to the emacs project's inbox, =git rm='d it, then =git filter-repo --invert-paths --path= purged it from all 29 affected commits. Force-pushed (=0921e4d...618e6cc=, with lease) and ran =reflog expire= + =gc --prune=now= on the bare repo at =/var/git/archsetup.git=. Verified: the file is in zero commits, the secret tokens return zero matches across all history, and =500b1f5= / =0921e4d= are unreachable on both local and remote. Rotation of the URLs tracked as the sibling TODO above. This also proves =filter-repo= works cleanly here — relevant precedent for the broader [[*Scrub git history of secrets (or start fresh)][history-scrub task]] below (the 5 credential files are still in history). - -*** TODO [#A] Remove/template personal information from scripts -- =archsetup= lines 2-3: personal email and website in header -- =archsetup= lines 141-146: hardcoded =git.cjennings.net= repository URLs — make configurable via conf -- =scripts/post-install.sh=: personal git repos and server URLs (the old =scripts/gitrepos.sh= was consolidated into this script in =dae7659=, so its personal =git.cjennings.net= clone targets now live here) -- =init= line 8: hardcoded password =welcome= - -*** TODO [#A] Scrub git history of secrets (or start fresh) -Even after removing files, secrets remain in git history. -Options: =git filter-repo= to rewrite history, or start a fresh repo for the GitHub remote. -Recommend: fresh repo for GitHub (keep cjennings.net remote with full history). - -*** TODO [#A] Remove device-specific configuration :quick: -=archsetup= lines 1458-1463: Logitech BRIO webcam udev rule — move to optional/configurable section. - -*** DOING [#A] Add README.md for GitHub -Project description, features, requirements, installation instructions, -configuration guide (archsetup.conf), security considerations, -contributing guidelines (or separate CONTRIBUTING.md), and license. - -**** 2026-05-11 Mon @ 13:01:29 -0500 AI Response: Initial README draft -Drafted =README.md= at repo root, modeled on =~/code/chime/README.org=. First pass — review and run a voice/style pass before committing. Personal info (emails, =cjennings.net= URLs, personal repo names) intentionally replaced with placeholders for the eventual public release. - -*** 2026-05-19 Tue @ 01:54:29 -0500 Added GPL-3 LICENSE file at repo root -GPL-3 chosen. Canonical GPLv3 text landed at =LICENSE= on 2026-05-11 (commit =f80e664=). README already links to it. SPDX/license headers across source files (or a NOTICE file) split out as a new sub-task below for the eventual public release. - -*** TODO [#A] Add SPDX/license headers to source files :quick: -For a real GPL-3 release on GitHub, every source file should carry an SPDX-License-Identifier header (or the repo should ship a NOTICE file naming the license + contributors). Not blocking — the =LICENSE= file at root is what GitHub needs. Worth doing once the credentials-cleanup work for open-sourcing actually ships. Light lift: pick a header template, sweep the install script + scripts/ + dotfile scripts. - -*** 2026-06-09 Tue @ 19:21:36 -0500 Made claude-code install optional -Shipped in =f2dad22= (feat: make the claude-code install optional). The =curl | sh= from claude.ai now sits behind a config flag instead of running unconditionally. - -*** 2026-06-09 Tue @ 19:21:36 -0500 Input validation added (validate_config + validate_username) -validate_config + validate_username shipped (detail in the 2026-05-11 note below). The =$source_dir= path check was judged unnecessary — it derives from the now-always-validated =$username= (=/home/$username/.local/src=). Closed as done. - -**** 2026-05-11 Mon @ 18:20:49 -0500 AI Response: validate_config + validate_username added -Added two pre-flight validators to =archsetup= (right after =load_config=, before any install step): -- =validate_username()= — the lowercase / starts-with-letter / =[a-z0-9_]= / not-reserved check, extracted from the inline block in =preflight_checks()=. Fixes an existing gap: the inline check only ran on the *prompted* path, so a config with =USERNAME=root= (or =USERNAME=foo bar=) slipped through unvalidated. Now both =preflight_checks= and =validate_config= call it. -- =validate_config()= — runs whenever =--config-file= is used: rejects unknown =DESKTOP_ENV= (must be dwm/hyprland/none) early instead of dying in step 7-9; rejects =AUTOLOGIN=/=NO_GPU_DRIVERS= values that aren't =yes=/=no= (currently silently ignored); basic shape check on =LOCALE=; and a scheme + no-whitespace/no-leading-dash check on the six =*_REPO= URLs that get passed to =git clone= (rejects e.g. =--upload-pack=…= injection). Plain =echo …>&2; exit 1= (the logging helpers aren't defined that early). =$source_dir= needs no separate check — it's =/home/$username/.local/src=, derived from the now-always-validated =$username=. -Not a security boundary (=load_config= sources the config as bash; a hostile config can already run anything) — it's typo-catching. Verified with =bash -n= and a smoke-test matrix of good/bad inputs through both functions. The next =make test= run confirms valid configs still install. Leaving as DOING for review. - -*** 2026-05-20 Wed @ 06:50:25 -0500 Swept shellcheck across the shell scripts -Census across the 16 shell scripts (=archsetup=, =init=, =scripts/*.sh=, =scripts/testing/=): 124 findings, zero errors. Triaged against "what matters for public review" and confirmed the 2026-01-24 read — most are intentional or documented-acceptable: -- SC2024 (14, sudo redirects), SC2174 (16, =mkdir -p -m=), SC1091 (13, unfollowable sources), SC2329 (32, functions invoked indirectly via the =STEPS= dispatch array), SC2153 (1, =DISK_PATH= sourced from =vm-utils.sh=) — all false positives or accepted. -- SC2086 on =$SSH_OPTS= in =vm-utils.sh= (×4) and =$TEMP_DISKS= in =cleanup-tests.sh= — intentional word-splitting; quoting would break them. The SSH_OPTS-as-array refactor is the proper fix, deliberately deferred (codebase-wide, one atomic change). -- SC2086 integer tests in =[ ]= (=archsetup=, =cleanup-tests=) — safe, note-level style; left to avoid churn in the just-fixed =retry_install=. -- SC2015 (×2, =vm_exec && success || warn=) — =success=/=warn= return 0, so C won't spuriously fire. Idiomatic. - -Fixed the four that are genuine: =init= (a =#!/bin/sh= script) used =$(</etc/hostname)= (SC3034 bashism → =$(cat ...)=) and an unquoted =$interface_up= (SC2086 → quoted); =shellcheck init= now clean, =sh -n= passes. Suppressed the two =VM_IP= SC2034 warnings with documented =# shellcheck disable= directives (consumed by the sourced =validation.sh=, which shellcheck can't follow). 124 → 120; the remaining 120 are the triaged-acceptable set above. - -*** 2026-05-20 Wed @ 06:32:17 -0500 Documented the testing process in the README -The README only covered the VM integration harness; the unit-test layer under =tests/= (Python =unittest=, fake-binary-on-PATH, one dir per script — =layout-navigate=, =tmux-util=) was undocumented. Added a =make test-unit= target that runs every =tests/*/test_*.py= suite explicitly (=unittest discover= can't find them — hyphenated dir names aren't valid package paths), then rewrote the README Testing section into "Unit tests" and "Integration tests (VM harness)" subsections, including how to add a suite for a new script. Updated Contributing to point at =make test-unit= for script changes. 61 unit tests pass via the new target. -*** 2026-05-20 Wed @ 18:22:42 -0400 Added safe_rm_rf guard on constructed-path deletes -Added a self-contained =safe_rm_rf <path> <allowed_prefix>= helper to =archsetup= and routed all three constructed-path deletes through it. The guard refuses to run unless the target is absolute, free of =..=, deeper than a bare top-level dir, strictly inside the allowed prefix (not the prefix itself), and a real directory (not a symlink); otherwise it prints the reason and returns non-zero without deleting. On the happy path it delegates to =rm -rf=. - -Sites converted (the line numbers in the original task body were stale — actual sites located by grep): -- =--fresh= state-dir wipe — prefix =/var/lib/archsetup=. -- =git_install= clone-retry cleanup (=build_dir= under =$source_dir=). -- =aur_installer= yay clone-retry cleanup (same prefix). - -The helper is defined before the top-level =--fresh= handler (which runs at load time, before the logging helpers exist), so it carries no =error_warn= dependency and reports refusals to stderr itself. The two in-function sites keep their existing =|| error_warn= / =|| error_fatal= handling. - -Tests: =tests/safe-rm-rf/test_safe_rm_rf.py= sources the real function out of the script and exercises Normal/Boundary/Error cases (13 tests) against real temp dirs. =make test-unit= green (61 tests), =bash -n= clean, no new shellcheck warnings. -*** TODO [#A] Standardize boolean comparison style :quick: -Mixed =[ "$var" = "true" ]= vs =$var= evaluation — pick one pattern. - -*** 2026-05-26 Tue @ 15:27:09 -0500 eval task moot — the line-434 eval is gone, the survivor is deliberate -Verified: the only =eval= left in =archsetup= is line 578 in =retry_install=, and it's intentional and documented — it captures =$?= directly from =eval "$cmd"= to dodge the if-compound-swallows-exit-code trap. Replacing it with an array would reintroduce that bug. The line-434 eval this task pointed at no longer exists. Nothing to change. - ** TODO [#B] Review post-archsetup laptop setup steps (velox 2026-04-10) :PROPERTIES: :LAST_REVIEWED: 2026-06-09 @@ -367,26 +277,98 @@ docs/ dirs (gitignored) for ~/code and ~/projects repos needed scp/rsync from ra Same for ~/.emacs.d/docs/. Not in git, so not available after clone. Consider: document as post-install step or create a sync script. -** TODO [#C] Ensure sleep/suspend works on laptops +** TODO [#B] Test + CI infrastructure :test: :PROPERTIES: -:LAST_REVIEWED: 2026-06-09 +:LAST_REVIEWED: 2026-06-28 :END: -Critical functionality for laptop use - current battery drain unacceptable -*NOTE:* This applies to Framework Laptop (velox), not Framework Desktop (ratio) -Add kernel parameter: ~rtc_cmos.use_acpi_alarm=1~ (will become systemd default) -Consider: ~acpi_mask_gpe=0x1A~ for battery drain, suspend-then-hibernate config -See Framework community notes on logind.conf and sleep.conf settings +Umbrella for the test-harness and CI-automation buildout. Consolidated from the 2026-06-28 task audit: these were scattered top-level tasks circling one effort, re-homed as children so the work reads as a unit. Each child ships independently and keeps the priority it carried before. No CI runner exists yet, so the CI/CD-pipeline child gates several of the others. -** TODO [#C] Build CI/CD pipeline that runs archsetup on every commit +*** TODO [#C] Build CI/CD pipeline that runs archsetup on every commit :PROPERTIES: :LAST_REVIEWED: 2026-06-13 :END: Core automation infrastructure - enables continuous validation +*** TODO [#B] Generate recovery scripts from test failures +:PROPERTIES: +:LAST_REVIEWED: 2026-06-13 +:END: +Auto-create post-install fix scripts for failed packages - makes failures actionable +*** TODO [#B] Establish monthly review workflow +:PROPERTIES: +:LAST_REVIEWED: 2026-06-13 +:END: +The diff engine now exists (=scripts/package-inventory= / =make package-diff=), so what remains here is the cadence, not the tooling: a scheduled prompt to run the diff and act on it. Subtasks 1-2 are the recurring human judgment the engine feeds; subtask 3 is the automation to schedule it. +**** TODO [#B] For packages in archsetup but not on system: determine if still needed +**** TODO [#B] For packages on system but not in archsetup: decide add or remove +**** TODO [#B] Schedule monthly package diff review +*** TODO [#C] Set up automated test schedule +:PROPERTIES: +:LAST_REVIEWED: 2026-06-28 +:END: +Weekly full run to catch deprecated packages even without commits +*** TODO [#C] Implement manual test trigger capability +:PROPERTIES: +:LAST_REVIEWED: 2026-06-28 +:END: +Allow on-demand test runs when automation is toggled off +*** TODO [#C] Create test results dashboard/reporting +:PROPERTIES: +:LAST_REVIEWED: 2026-06-28 +:END: +Make test outcomes visible and actionable +*** TODO [#B] Block merges to main if tests fail +:PROPERTIES: +:LAST_REVIEWED: 2026-05-21 +:END: +Enforce quality gate - broken changes don't enter main branch +*** TODO [#B] Add network failure testing to test suite +:PROPERTIES: +:LAST_REVIEWED: 2026-05-21 +:END: +Simulate network disconnect mid-install to verify resilience +*** TODO [#B] Keep VM base images up to date +:PROPERTIES: +:LAST_REVIEWED: 2026-06-28 +:END: +Regular updates to the Arch base VM image (qemu, built by =create-base-vm.sh=) with a review process and schedule. The harness is VM/qemu-based, not containers. +*** TODO [#B] Persist test logs for historical analysis +:PROPERTIES: +:LAST_REVIEWED: 2026-05-21 +:END: +Archive logs with review process and schedule to identify failure patterns and trends +*** TODO [#B] Implement automated deprecation detection +:PROPERTIES: +:LAST_REVIEWED: 2026-05-21 +:END: +Parse package warnings and repo metadata to catch upcoming deprecations proactively +*** TODO [#C] Monitor and optimize test execution time +:PROPERTIES: +:LAST_REVIEWED: 2026-05-21 +:END: +Keep test runs performant as installs and post-install tests grow (target < 2 hours) +*** TODO [#C] Set up alerts for deprecated packages +:PROPERTIES: +:LAST_REVIEWED: 2026-05-21 +:END: +Proactive monitoring integrated with testing +*** TODO [#C] Fix VM cloning machine-ID conflicts for parallel testing +:PROPERTIES: +:LAST_REVIEWED: 2026-05-21 +:END: +Currently using snapshot-based testing which works but limits to sequential test runs +Cloned VMs fail to get DHCP/network even with machine-ID manipulation (truncate/remove) +Root cause: Truncating /etc/machine-id breaks systemd/NetworkManager startup +Need to investigate proper machine-ID regeneration that doesn't break networking +Would enable parallel test execution in CI/CD +Priority C because snapshot-based testing meets current needs ** TODO [#B] Fix install errors surfaced by the 2026-05-11 VM test run :PROPERTIES: :LAST_REVIEWED: 2026-06-15 :END: +*** 2026-06-28 Sun @ 13:29:29 -0400 Audit reconcile: 2026-06-28 btrfs+zfs runs reproduce the same residual set +Newer full runs landed since the 2026-06-11 reconcile below: the 2026-06-25 zfs run (Testinfra 96/0) and the 2026-06-28 btrfs+zfs runs (97/0, "zero attributed issues"). The residual four were NOT fixed and reproduce unchanged: =enabling firewall= (archsetup:1496-1498, carries a VM-kernel note), =enabling gamemode for user= (archsetup:2221, non-critical), and =tidaler (AUR)=. Zero archsetup-attributed Testinfra issues across both profiles confirms these are environment / non-critical, not archsetup bugs. Bare-metal confirmation of the firewall pair is still the open thread. + *** 2026-06-15 Mon @ 23:53:21 -0500 Audit reconcile: latest VM run (2026-06-11) confirms the surviving error set The most recent VM run (=test-results/20260611-113904/=) carries four error-summary entries: =enabling firewall= + =verifying firewall is active= (the iptables/nf_tables "Could not fetch rule set generation id" pair, still unconfirmed on bare metal), =enabling gamemode for user= (non-critical), and =tidaler (AUR)=. The earlier fontconfig/dconf fixes held — none reappear. So the count is down from the 7→6 anchor below to four, all of them the known-residual items already itemized. Errors logged during the VM install. Status as of the 2026-05-11 18:36 run (=test-results/20260511-183643/archsetup-output.log=) after the =48c9439= fontconfig/dconf fix: 7 → 6. @@ -408,13 +390,10 @@ Root cause was in =retry_install=: =last_exit_code=$?= ran AFTER =if eval ...; t *** 2026-05-19 Tue @ 01:25:26 -0500 Verified the b9907c7 emacs-stow fix end-to-end =make test= 21:44 → 22:29 (42 min), =test-results/20260518-214516/=. 52/0/5, =ArchSetup Exit Code: 0=. The third-branch path fired correctly — install log =archsetup-2026-05-18-21-45-46.log:14358-14365= shows =From https://git.cjennings.net/dotemacs= → =[new branch] main -> origin/main= → =Reset branch 'main'= → =branch 'main' set up to track 'origin/main'=. No exit-128, no =fatal: not a git repository=. Error Summary down to 7 (was 13 on 2026-05-16); the emacs entry is gone. AUR exit-0 logging triggered for 2 packages this run (mkinitcpio-firmware, tidaler) vs 6 on 2026-05-16 — same bug class, fewer triggers, still tracked under =[#B] AUR exit-0 logged as error=. Issue Attribution: 1 ARCHSETUP entry (Proton VPN Daemon failed — known VM-no-VPN-config artifact). Cleanup ran clean via the normal path. -** TODO [#B] Generate recovery scripts from test failures +** TODO [#B] Review undeclared ratio packages for installer inclusion :PROPERTIES: -:LAST_REVIEWED: 2026-06-13 +:LAST_REVIEWED: 2026-06-24 :END: -Auto-create post-install fix scripts for failed packages - makes failures actionable - -** TODO [#B] Review undeclared ratio packages for installer inclusion Triggered by the 2026-06-14 =make package-diff= run on ratio: 62 packages are installed but not declared in archsetup. Stripped of the structural buckets — pacstrap base/boot/kernel (base, linux*, grub, efibootmgr, sudo, btrfs-progs, fwupd, logrotate, ex-vi-compat, linux-lts-strix, zram-generator), the =make deps= VM set (qemu-full, virt-manager, virt-viewer, libguestfs, bridge-utils, dnsmasq, archiso), and the yay bootstrap — these 40 remain. Check the ones to add to the installer, then rerun =make package-diff= to confirm they clear. Some entries are libraries likely pulled in as dependencies (blas-openblas, openblas, eigen, tk, lib32-openal, pkcs11-helper, gtk4-layer-shell, webkit2gtk, sane, freerdp, rust-bindgen) — check those only if you want them declared explicitly rather than left to dependency resolution. @@ -460,163 +439,169 @@ Some entries are libraries likely pulled in as dependencies (blas-openblas, open - [ ] webkit2gtk - [ ] whisper.cpp -** TODO [#B] Establish monthly review workflow -:PROPERTIES: -:LAST_REVIEWED: 2026-06-13 -:END: -The diff engine now exists (=scripts/package-inventory= / =make package-diff=), so what remains here is the cadence, not the tooling: a scheduled prompt to run the diff and act on it. Subtasks 1-2 are the recurring human judgment the engine feeds; subtask 3 is the automation to schedule it. -*** TODO [#A] For packages in archsetup but not on system: determine if still needed -*** TODO [#A] For packages on system but not in archsetup: decide add or remove -*** TODO [#A] Schedule monthly package diff review - -** TODO [#B] Complete security education within 3 months -:PROPERTIES: -:LAST_REVIEWED: 2026-05-21 -:END: -Read recommended resources to make informed security decisions (see metrics for Claude suggestions) - ** TODO [#B] All error messages should be actionable with recovery steps :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-24 :END: Currently just reports errors without guidance on how to fix them ** TODO [#B] Improve logging consistency :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-24 :END: Some operations log to ~$logfile~, others don't - standardize logging All package installs should log, all system modifications should log, all errors should log with context Makes debugging failed installations easier -** TODO [#B] Add backup before system file modifications +** TODO [#B] Security hardening + audit :security: :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-28 :END: -Safety net for /etc/X11/xorg.conf.d and other system file edits -Files like ~/etc/sudoers~, ~/etc/pacman.conf~, ~/etc/default/grub~ modified without backup -If modifications fail or are incorrect, difficult to recover - should backup files to ~.backup~ before modifying +Umbrella for the security-hardening and audit effort. Consolidated from the 2026-06-28 task audit, re-homing the scattered security tasks as children so the work reads as a unit. Each child ships independently and keeps its prior priority. -** TODO [#B] Implement Testinfra test suite for archsetup +*** TODO [#B] Test security + functionality together :PROPERTIES: :LAST_REVIEWED: 2026-05-21 :END: -Create comprehensive integration tests using Testinfra (Python + pytest) to validate archsetup installations - -Tests should cover: -- Smoke tests: user created, key packages installed, dotfiles present -- Integration tests: services running, configs valid, X11 starts, apps launch -- End-to-end tests: login as user, startx, open terminal, run emacs, verify workflows - -Framework: Testinfra with pytest (SSH-native, built-in modules for files/packages/services/commands) -Location: scripts/testing/tests/ directory -Integration: Run via pytest against test VMs after archsetup completes -Benefits: Expressive Python tests, excellent reporting, can test interactive scenarios - -A design doc (not yet written) should cover: -- Complete example test suite (test_integration.py) -- Tiered testing strategy (smoke/integration/end-to-end) -- How to run tests and integrate with run-test.sh -- Comparison with alternatives (Goss) - -** TODO [#B] Set up automated test schedule +**** TODO [#B] Verify no unexpected open ports or services +*** TODO [#B] Security audit tooling :PROPERTIES: :LAST_REVIEWED: 2026-05-21 :END: -Weekly full run to catch deprecated packages even without commits - -** TODO [#B] Implement manual test trigger capability +**** TODO [#B] Implement port scanning check +**** TODO [#B] Create security posture verification script +**** TODO [#B] Set up intrusion detection monitoring +*** TODO [#B] Document threat model and mitigations within 6 months :PROPERTIES: :LAST_REVIEWED: 2026-05-21 :END: -Allow on-demand test runs when automation is toggled off - -** TODO [#B] Create test results dashboard/reporting +Identify attack vectors, what's mitigated, what remains +*** TODO [#C] Complete security education within 3 months :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-24 :END: -Make test outcomes visible and actionable - -** TODO [#B] Block merges to main if tests fail +Read recommended resources to make informed security decisions (see metrics for Claude suggestions) +*** TODO [#C] Create security checklist for cafe/public wifi scenarios :PROPERTIES: :LAST_REVIEWED: 2026-05-21 :END: -Enforce quality gate - broken changes don't enter main branch +Practical guidelines for working in public spaces -** TODO [#B] Add network failure testing to test suite +** TODO [#B] Test each modernization thoroughly before replacing :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-28 :END: -Simulate network disconnect mid-install to verify resilience +Ensure new tools integrate with the Hyprland environment and don't break workflow (the fleet is all Hyprland now; archsetup still supports DWM/X11 but no current machine uses it) -** TODO [#B] Keep container base images up to date +** TODO [#B] Add NVIDIA preflight check for Hyprland :PROPERTIES: :LAST_REVIEWED: 2026-05-21 :END: -Regular updates to Arch base image with review process and schedule +Detect NVIDIA GPU and warn user about potential Wayland issues: +- Require driver version 535+ or abort +- Document required env vars (LIBVA_DRIVER_NAME, GBM_BACKEND, etc.) +- Prompt to continue or abort if NVIDIA detected -** TODO [#B] Persist test logs for historical analysis +** CANCELLED [#B] Migrate terminal emulator from foot to ghostty :tooling: +CLOSED: [2026-06-28 Sun 13:58] :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-24 :END: -Archive logs with review process and schedule to identify failure patterns and trends +Decision (Craig, 2026-06-24): switch from foot to ghostty. Drivers: ligatures (foot won't add them) and kitty-graphics + sixel image support (foot is sixel-only, no kitty-graphics plans). ghostty is pure-Wayland on Hyprland, declarative config that fits the theme system, runtime config reload (keybind / SIGUSR2 since 1.2). Trade-off accepted: slightly higher input latency than foot. Already in use as Emacs's terminal renderer, so the config + rendering are familiar and the 06-18 tmux theme was tuned against that surface. Full evaluation: [[file:docs/2026-06-10-terminal-emulator-evaluation.org][docs/2026-06-10-terminal-emulator-evaluation.org]]. -** TODO [#B] Implement automated deprecation detection -:PROPERTIES: -:LAST_REVIEWED: 2026-05-21 -:END: -Parse package warnings and repo metadata to catch upcoming deprecations proactively +Migration scope: +- archsetup: add =ghostty= to the package list; decide whether to keep =foot= installed as a fallback or drop it. +- dotfiles: port =foot.ini= → ghostty config (flat key=value). The shared foot.ini sets no font (per-host via =host.ini= include) — replicate that per-host font split for ghostty. +- Themes: the dupre/hudson =themes/<name>/= dirs hold foot configs; add ghostty theme files and teach =set-theme= to write + reload the ghostty config. Watch the reload-clobbers-OSC-10/11 bug (ghostty #2795) when wiring runtime theme switch. +- hyprland.conf: default-terminal keybind, pyprland scratchpad terminals, and any other =foot= references → ghostty. +- Verify on velox + ratio: ligatures render, latency acceptable in tmux+vterm use, dupre theme correct, sixel/kitty-graphics previews work. -** TODO [#B] Audit dotfiles/common directory +** DONE [#C] Scratchpad launch turns on focus-follows-mouse :bug:hyprland: +CLOSED: [2026-06-28 Sun] :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-28 :END: -*** TODO [#B] Review all 50+ scripts in ~/.local/bin - remove unused scripts -*** TODO [#B] Check dotfiles for uninstalled packages - remove orphaned configs -*** TODO [#B] Verify all stowed files are actually used +Root cause: =float_switch_override_focus = 1= in hyprland.conf. With =follow_mouse = 0=, focus still jumped to the window under the pointer when it crossed a floating-tiled boundary, so launching a floating scratchpad re-enabled focus-follows-mouse onto tiled windows. Fixed by setting it to 0 (dotfiles =5619342=). Not a pyprland side effect. + +Imported from roam inbox 2026-06-25. Repro: with two tiled windows, moving the mouse over the other tile does nothing (focus-follows-mouse off, as expected). Then launch a terminal (scratchpad), move the mouse over a tile, and focus now switches to the window under the pointer. Something about the scratchpad/terminal launch flips focus-follows-mouse on. Find what re-enables it (likely a Hyprland focus/input setting or a pyprland scratchpad side effect) and keep it off. -** TODO [#B] Test security + functionality together +** DONE [#B] mod+J/K focus navigation: raise to front, reach floating, monocle fix :feature:bug:hyprland: +CLOSED: [2026-06-29 Mon] +Three improvements to =layout-navigate= (mod+J/K), validated live on velox: +- Raise the focused window to the front on focus navigation, so focusing a window behind an overlapping floating one brings it forward (dotfiles =5619342=, bundled with the =float_switch_override_focus = 0= scratchpad fix tracked above). +- Cycle into floating windows, so you can navigate back to a scratchpad like any window instead of it being a one-way trip (dotfiles =f2107f7=). +- Fixed a monocle regression from that change: the =cyclenext= dispatcher no-ops between monocle-stacked tiles, so focus navigation now computes the workspace window list and focuses the next/prev by address — layout-independent and floating-inclusive (dotfiles =09815f3=). + +** TODO [#C] Wlogout exit-menu buttons are rectangular, not square :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-24 :END: -*** TODO [#B] Verify no unexpected open ports or services +The wlogout exit menu renders its buttons taller than they are wide on velox, so the cells read as vertical rectangles instead of squares. They render square (centered) correctly on ratio, so this is a per-host / resolution difference, not a flat bug. Fix the button sizing in the wlogout style (=~/.dotfiles/hyprland/.config/wlogout/style.css=) so each cell is square on both hosts. Noticed 2026-05-21. Related: the [#D] VERIFY about wlogout sizing across displays. -** TODO [#B] Security audit tooling +The wlogout config uses fixed pixel margins, which is the likely reason sizing differs across the two displays — adjusting them for the laptop screen is part of the fix (folded in from the former "Test wlogout menu on laptop" VERIFY, 2026-06-24). + +Add a regression test so the square-cell fix doesn't silently break on a resolution change: assert the rendered (or computed) wlogout button cells are square across ratio's and velox's resolutions. Dropped :quick: — the cross-host test pushes this past a spare-moment fix. +** TODO [#C] Window focus lost when unhiding stashed windows :bug:hyprland: :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-24 :END: -*** TODO [#B] Implement port scanning check -*** TODO [#B] Create security posture verification script -*** TODO [#B] Set up intrusion detection monitoring +From the roam inbox: hiding a window (e.g. the org-capture popup) then unhiding it should leave the unhidden window focused, but another window typically takes focus. Also =ctrl+j/k= (layout-navigate) can't reach the unhidden window afterward — it should always reach any visible window except the waybar. Involves stash-restore + layout-navigate; needs interactive reproduction with Craig. -** TODO [#B] Document threat model and mitigations within 6 months +** TODO [#C] Pocketbook development backlog :pocketbook: :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-05-26 :END: -Identify attack vectors, what's mitigated, what remains +Pocketbook (GTK4 layer-shell notes panel, toggled via waybar) was pulled from publication 2026-05-26 — github repo + cjennings.net repo deleted, mirror hook removed — and folded into this repo at =pocketbook/= until it's ready to spin back out. Src-layout Python package with pytest tests and a Makefile. Develop it in-tree; the backing modules are =store/note/panel/layer_shell/app/note_widget= + =style.css=. + +Backlog (unordered; promote items to their own dated tasks as they're picked up): -** TODO [#B] Verify package signature verification not bypassed by --noconfirm +- Configurable options, possibly a dedicated configuration panel. +- Lose-focus hides pocketbook — configurable on/off. +- Configurable display order: chronological by creation date (asc/desc), manual, alphabetical (asc/desc). +- Search / filter notes. +- Global toggle keybind (Hyprland =bind=) alongside the waybar click; document the waybar integration. +- Note CRUD polish (create/edit/delete) + optional markdown rendering. +- Pin / favorite notes. +- Tags or notebooks / categories. +- Persistence: confirm store format + =~/.local/share/pocketbook/= location, add versioning/migration, decide a backup/sync story. +- Theming: track the dupre/hudson theme system so =style.css= follows =set-theme=. +- Layer-shell geometry config (anchor edge, width, margins) + HiDPI / multi-monitor behavior — ties into [[file:docs/PLAN-per-host-overrides.org][per-host overrides]] scaling work. +- Config file format (toml) + reload-without-restart. +- Expand test coverage (TDD per testing standards; =tests/= already exists). +- Release prep for the eventual spin-back-out: pyproject metadata, version, license. +- Re-wire the archsetup install (gtk4-layer-shell dep + install step + post-install clone) when pocketbook ships. Removed 2026-05-26 — see git history of =archsetup= / =scripts/post-install.sh=. + +** TODO [#C] Fn+F9 toggles pocketbook — source unlocated :hyprland:pocketbook: :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-23 :END: -Packages installed with ~--noconfirm~ may skip signature checks -AUR had issues previously requiring --noconfirm workaround - verify this doesn't compromise security -Ensure package signatures are still verified despite --noconfirm flag +On velox, pressing Fn+F9 (physical function key) toggles the pocketbook panel. It shouldn't. Raised from a home-project session 2026-06-23. -** TODO [#B] Test each modernization thoroughly before replacing +Investigated 2026-06-23 and could not locate the trigger in any config. Ruled out, three ways: +- No F9 bind (bare / $mod / keycode) in the live =hyprland.conf= (now a stow symlink), the velox host tier =conf.d/local.conf=, or the waybar config. +- =hyprctl binds= runtime (all 90 active binds, authoritative) execs pocketbook on ONLY =SUPER+P=. No F9/XF86 path reaches it. The old touchpad toggle that used to sit on =$mod+F9= was moved to =$mod+M=, so F9 is unbound in Hyprland. +- No input remapper (keyd/xremap/input-remapper) and no hotkey daemon (sxhkd/swhkd) running or configured; pocketbook's own source has no F9 / GlobalShortcuts / portal / dbus listener (its GTK ShortcutController binds only Esc/Ctrl-n/Ctrl-j/Ctrl-k/Del/Return). pocketbook is a single-instance Gtk.Application, so any path that re-runs =pocketbook= toggles it. + +Parked at Craig's call (not worth deeper investigation now). If it resurfaces, the one unfinished step is to capture what keysym Fn+F9 actually emits (=wev -f wl_keyboard:key=, press Fn+F9, read the =sym:= / =code:=) and grep for that. Most likely folds into removing pocketbook from the waybar setup — if pocketbook leaves the bar, retire this with it. + +** TODO [#C] Ensure sleep/suspend works on laptops :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-09 :END: -Ensure new tools integrate with DWM environment and don't break workflow +Critical functionality for laptop use - current battery drain unacceptable +*NOTE:* This applies to Framework Laptop (velox), not Framework Desktop (ratio) +Add kernel parameter: ~rtc_cmos.use_acpi_alarm=1~ (will become systemd default) +Consider: ~acpi_mask_gpe=0x1A~ for battery drain, suspend-then-hibernate config +See Framework community notes on logind.conf and sleep.conf settings -** TODO [#B] Add NVIDIA preflight check for Hyprland +** TODO [#C] Re-check python-lyricsgenius --skipinteg workaround :solo: :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-24 :END: -Detect NVIDIA GPU and warn user about potential Wayland issues: -- Require driver version 535+ or abort -- Document required env vars (LIBVA_DRIVER_NAME, GBM_BACKEND, etc.) -- Prompt to continue or abort if NVIDIA detected +archsetup installs =python-lyricsgenius= with =--mflags --skipinteg=, skipping makepkg integrity + PGP checks — a workaround originally for an expired-signature issue upstream (surfaced by the 2026-06-23 --noconfirm audit). Periodically test whether the cause has cleared: if a plain =aur_install python-lyricsgenius= builds without complaint, drop the =--skipinteg= workaround. Removal needs a real AUR build to confirm, so it isn't a blind change. + +*** 2026-06-24 Wed @ 17:55:34 -0400 Rechecked: still needed, but the cause changed +Ran =makepkg --verifysource= on the current AUR PKGBUILD (3.7.0-1). The package tarball =lyricsgenius-3.7.0.tar.gz= now passes its b2sum — the original expired-PGP-signature problem is gone (the PKGBUILD no longer carries any =validpgpkeys=). But integrity still FAILS, on a different file: =LICENSE.txt=, which the PKGBUILD fetches from the project's github master and pins a b2sum for. github master is a moving target, so that b2sum drifts and =--skipinteg= is still required. This is structural (not a transient upstream fix away), so it likely won't clear until the maintainer pins the LICENSE to a tagged release. Updated the archsetup comment to the real cause. Keep rechecking, but lower expectations of it clearing. ** TODO [#C] Review theme config architecture for dunst/fuzzel :PROPERTIES: @@ -632,67 +617,31 @@ error-prone — changes must be made in both places. Consider: - Same situation applies to fuzzel.ini The goal is a single place to edit each config, not two. -** TODO [#C] Monitor and optimize test execution time -:PROPERTIES: -:LAST_REVIEWED: 2026-05-21 -:END: -Keep test runs performant as installs and post-install tests grow (target < 2 hours) - -** TODO [#C] Set up alerts for deprecated packages -:PROPERTIES: -:LAST_REVIEWED: 2026-05-21 -:END: -Proactive monitoring integrated with testing - -** TODO [#C] Fix VM cloning machine-ID conflicts for parallel testing -:PROPERTIES: -:LAST_REVIEWED: 2026-05-21 -:END: -Currently using snapshot-based testing which works but limits to sequential test runs -Cloned VMs fail to get DHCP/network even with machine-ID manipulation (truncate/remove) -Root cause: Truncating /etc/machine-id breaks systemd/NetworkManager startup -Need to investigate proper machine-ID regeneration that doesn't break networking -Would enable parallel test execution in CI/CD -Priority C because snapshot-based testing meets current needs - -** TODO [#C] Create security checklist for cafe/public wifi scenarios -:PROPERTIES: -:LAST_REVIEWED: 2026-05-21 -:END: -Practical guidelines for working in public spaces - -** TODO [#C] Build security dashboard command :solo: +** TODO [#C] Review current tool pain points annually :PROPERTIES: :LAST_REVIEWED: 2026-05-21 :END: -Single command shows: encryption status, firewall status, open ports, running services - -** VERIFY [#C] Evaluate modern CLI tool replacements -:PROPERTIES: -:LAST_REVIEWED: 2026-06-10 -:END: -Research done 2026-06-10, adoption decisions pending. Full report: [[file:docs/2026-06-10-modern-cli-tools-evaluation.org][docs/2026-06-10-modern-cli-tools-evaluation.org]]. Recommendation: adopt bat, dust, hyperfine, tealdeer, doggo (all in extra, all actively maintained); optional xh/jless/sd/ouch; nothing already-adopted has been superseded. Say which to install and I'll add them to archsetup + the machines. - -** 2026-06-10 Wed @ 18:25:11 -0500 paru vs yay — evaluated, staying with yay -Research done 2026-06-10: [[file:docs/2026-06-10-paru-vs-yay-evaluation.org][docs/2026-06-10-paru-vs-yay-evaluation.org]]. The maintenance picture inverted since the task was filed: yay released v12.6.0 on 2026-06-07 with active triage, while paru has had no release in 11 months, no commit in 5, and a stable that fails to build against current libalpm (issue #1468 open 6 months). For an installer that bootstraps the AUR helper unattended, paru is the riskier choice on every axis that matters. No decision needed — the evidence closes this one; revisit only if paru's maintenance resumes. +Once-yearly systematic inventory of known deficiencies and friction points in current toolset -** VERIFY [#C] Evaluate terminal emulator alternatives -:PROPERTIES: -:LAST_REVIEWED: 2026-06-10 -:END: -Research done 2026-06-10, your read pending. Full report: [[file:docs/2026-06-10-terminal-emulator-evaluation.org][docs/2026-06-10-terminal-emulator-evaluation.org]]. Recommendation: stay with foot — it wins on latency, ties on Wayland purity, fits the theme system, and stays healthy (1.26.0, Mar 2026). ghostty is the only real challenger (and the original ligature motivation favors it — foot still doesn't do ligatures), so the open question is whether ligatures matter enough to trade foot's latency edge. wezterm is effectively unmaintained (no release since Feb 2024). +** CANCELLED [#C] archsetup Waybar Wi-Fi module should show no-internet state :feature:waybar: +CLOSED: [2026-06-29 Mon] +Consolidated, not dropped: the no-internet/captive indicator + the diagnostics/ +bounce/speed-test scope are now Phase 1 + Phase 3 of the unified +[[*Waybar network module — custom/net][Waybar network module — custom/net]] parent. The work continues there; +this separate entry is retired so it's tracked in one place. Spec: +[[file:docs/design/2026-06-29-waybar-network-module-spec.org][2026-06-29-waybar-network-module-spec.org]]. -** VERIFY [#C] Review file manager options for Wayland +** TODO [#C] Waybar emacs-service status + control :feature:waybar: :PROPERTIES: -:LAST_REVIEWED: 2026-06-10 +:LAST_REVIEWED: 2026-06-24 :END: -Research done 2026-06-10, adoption call pending. Full report: [[file:docs/2026-06-10-file-manager-evaluation.org][docs/2026-06-10-file-manager-evaluation.org]]. Recommendation: keep nautilus (only candidate that's Wayland-native, libadwaita-dark native, and actively developed); add yazi as the Wayland TUI (v26.5.6, monthly releases, sixel previews work in foot with zero scripting, zoxide built in — it has matured substantially since the problematic 2026-02 try). ranger upstream is effectively frozen (still 1.9.4, 700+ open issues), so porting it to the Wayland machines is the one option the evidence rules out. Original body's history preserved in git. +From the roam inbox (2026-06-22): with Emacs integrated into the system as file manager and instant note-taker, make bouncing it trivial. A waybar component showing the emacs service status, with detail on hover, that turns the server on / off / bounce via right-click. Pairs with running the Emacs daemon as a managed systemd user service. -** TODO [#C] Review current tool pain points annually +** TODO [#C] set-wallpaper detaches waypaper config from its stow symlink :bug:hyprland:quick:solo: :PROPERTIES: -:LAST_REVIEWED: 2026-05-21 +:LAST_REVIEWED: 2026-06-28 :END: -Once-yearly systematic inventory of known deficiencies and friction points in current toolset +=set-wallpaper= persists with =mv "$tmp" "$CONFIG"=, which replaces the =~/.config/waypaper/config.ini= stow symlink with a real file. After the first run the live config is detached from =~/.dotfiles/hyprland/.config/waypaper/config.ini=, so a later =git pull= + restow won't update it and set-wallpaper changes never flow back to the repo. Fix: write in place rather than =mv= over the symlink — e.g. =cp "$tmp" "$CONFIG"= (follows the symlink to the real dotfiles file), or resolve the link target and write there. Lives in =~/.dotfiles/hyprland/.local/bin/set-wallpaper=; it has a test suite, so add a Boundary case for "CONFIG is a symlink". ** TODO [#D] Consider Customizing Hyprland Animations Current: windows pop in, scratchpads slide from bottom. @@ -717,23 +666,69 @@ animation = fade, 1, 2, default animation = layers, 1, 2, default, fade #+end_src -** VERIFY [#D] Test wlogout menu on laptop -Test wlogout exit menu on laptop to verify sizing works on different display. -Current config uses fixed pixel margins - may need adjustment for laptop screen. - ** TODO [#D] Parse and improve AUR error reporting Parse yay errors and provide specific, actionable fixes instead of generic error messages ** TODO [#D] Improve progress indicators throughout install Enhance existing indicators to show what's happening in real-time -** TODO [#C] Teach archsetup to stow the host tier :solo: -:PROPERTIES: -:LAST_REVIEWED: 2026-06-11 -:END: -Phase 5 of the per-host overrides spec, deferred from the 2026-06-11 implementation: the installer's stow calls in =user_customizations()= stow =common= + the DE package only. Add the host tier (=$(cat /etc/hostname)= at install time, or a conf key) guarded so a host without a tier is skipped with a message — same semantics as the dotfiles Makefile. Matters only for fresh installs of ratio/velox-named machines; the post-install =make stow= path already handles it. - ** TODO Manual testing and validation +*** wtimer: color states + click/scroll interactions on the live bar +What we're verifying: the timer module's interactions and CSS state colors render right on the live bar. The glyph, position (right of battery), countdown, and "+N" badge are already verified live; the per-state colors and the real mouse/scroll bindings are what's left. The logic is unit-tested (86 cases); this is the human-in-the-loop visual + input check. +- Left-click the timer module — a fuzzel menu offers timer / alarm / stopwatch / pomodoro; pick timer, enter =5s=. +- Watch it count down; at under a minute it should turn the urgent color (dupre orange #d47c59). +Expected: the timer reaches 0:00, a persistent notification fires, and the item disappears from the bar. +- Create two timers (e.g. =3m= and =10m=); a =+1= badge shows; scroll over the module. +Expected: scrolling cycles which item is primary (the displayed time/glyph changes); the badge count stays correct. +- Middle-click the module while a timer runs. +Expected: it pauses (dimmed paused color #5f5c52) and the countdown freezes; middle-click again resumes from where it left off. +- Right-click the module with items present. +Expected: a fuzzel menu lists the items; choosing one cancels it. +- Start a pomodoro (left-click → pomodoro); let a work phase elapse (or set short test phases by editing state). +Expected: the glyph + color switch between work (gold #d7af5f) and break (#8a9a5b), a notification fires at each phase change, and the cycle count advances. +*** Sysmon right-click cycles the visible metric (live waybar) +What we're verifying: right-clicking the collapsed sysmon module rotates the visible metric and the bar refreshes at once, left-click still opens btop, and the cpu/temp/mem icons render as real glyphs (not tofu boxes). The cycle logic is unit-tested; this is the live-waybar + visual confirmation. +- Reload waybar so it picks up the new =signal= / =on-click-right= config (Super+B relaunches it, or =pkill waybar; waybar &= from a terminal) +- Right-click the sysmon module several times, watching the visible metric +- Left-click the sysmon module once +Expected: each right-click advances the visible metric battery → cpu → temp → mem → disk → back to battery (velox is a laptop, so battery is in the ring) and the bar updates immediately. Every metric shows a sensible icon plus its value, no tofu. Left-click still opens the btop popup. The tooltip still lists all metrics. +*** Give the README a final read before public release +What we're verifying: =README.md= reads cleanly and accurately for a first-time reader, with no stale personal info and consistent public-fork placeholders. +- Open =~/code/archsetup/README.md= +- Read it end to end as if you've never seen the project +Expected: every section is accurate, the personal-project disclaimer reads right, the placeholders (=<your-domain>=, =github.com/yourusername=) are consistent, and nothing personal leaked into the public-facing draft. +*** 2026-06-28 Sun @ 12:54:47 -0400 Live-update guard verified on velox (live Hyprland) +Verified the =hypr-live-update-guard= PreTransaction hook end-to-end on velox +with Hyprland running (pid 1997). velox predated the feature, so the guard was +absent — placed =/usr/local/bin/hypr-live-update-guard= (755) and +=/etc/pacman.d/hooks/hypr-live-update-guard.hook= (644), byte-matching the +archsetup hyprland-step install. The guard now ships on velox permanently. + +Results: +- Quick contract (=printf 'mesa\nhyprland\n' | guard=) → exit=1, BLOCKED banner, + sorted pkgs, correct TTY remedy + sentinel path. +- Not-running branch (=HYPR_GUARD_RUNNING=0=) → exit=0, silent. +- Env override (=HYPR_ALLOW_LIVE_UPDATE=1=) → exit=0. +- Sentinel (=touch /run/archsetup-allow-live-gpu-update=) → exit=0; removed → + re-armed exit=1. +- Real firing through pacman: =sudo pacman -S mesa= (same-version reinstall = + Upgrade op on a guarded target). pacman ran the hook, fed =mesa= via + =NeedsTargets=, the guard aborted, =AbortOnFail= stopped the transaction + ("no packages were upgraded"); mesa unchanged at 1:26.1.3-2. This is the + authoritative proof pacman parses + wires the hook. +- Full-logout end-to-end (guard quiet, upgrade completes after logout): covered + by construction — the not-running branch exits 0, and a 0-exit PreTransaction + hook lets pacman proceed normally (proven by the mesa abort showing the hook + path runs). Not re-run under a real logout; no separate residual. +*** Wallpaper survives relogin (waypaper --restore) +What we're verifying: the hyprland =exec-once= now runs =waypaper --restore= instead of a hardcoded =awww img=, so a wallpaper chosen via =set-wallpaper= / waypaper / dirvish persists across a relogin. The exec-once only fires at Hyprland startup, so this can't be confirmed without a real relogin. (Mechanism already verified: =waypaper --restore= applied the persisted wallpaper via the awww backend, exit 0.) +- Set a wallpaper different from the current one (or pick one in waypaper, Super+Shift+P): +#+begin_src sh :results output +set-wallpaper ~/pictures/wallpaper/trondheim-norway.jpg +#+end_src +- Log out of Hyprland and back in (or reboot) +Expected: the wallpaper you just set is what comes back after login — not whatever was showing before, and never the old hardcoded default unless that's what you set. + *** velox per-host env applies after Hyprland restart What we're verifying: the velox tier's env lines (GDK_SCALE/QT_SCALE_FACTOR 1.5, XCURSOR_SIZE 36) only apply at Hyprland startup, and the foot font moved to host.ini — neither can be confirmed over ssh. - On velox, log out of Hyprland and back in (or reboot) @@ -753,22 +748,187 @@ Expected: near-black frame (#151311), dark toolbar/omnibox (#252321), gold links *** 2026-06-10 Wed @ 17:46:34 -0500 velox post-trim reboot verified; realtek firmware restored Craig rebooted velox (passphrase at console); checks ran over SSH after boot. Wifi connected, TLP active, graphics fine. One dmesg hit: r8152 failed to load rtl_nic/rtl8156b-2.fw — the Framework Ethernet expansion card (RTL8156B) is Realtek, so the trim list wrongly dropped linux-firmware-realtek (a Realtek laptop camera is on USB too). Reinstalled the package on velox (its hook rebuilt the initramfs) and removed realtek from archsetup's trim list. The driver worked even without the blob (internal-defaults fallback), so this was correctness, not breakage. -** TODO [#B] Enlarge org-capture popup to scratchpad size :hyprland: -From a .emacs.d inbox handoff (2026-06-15, captured via roam): the quick-capture / org-protocol popup is too small to be effective — it should be about the size of a terminal scratchpad. +*** Super+F dirvish popup: launch, float, dismiss-on-focus-loss, q +What we're verifying: the physical keychord opens a floating Dirvish popup; opening any file launches it independently (never inside the popup frame) and the popup auto-dismisses when focus leaves; navigating dirs keeps it; q is the manual close. The Wayland focus event that drives the auto-dismiss can't be driven headlessly — only a real keypress + real app launch confirms it. +- Press Super+F +- Expected: a Dirvish frame opens floating and centered, rooted at ~/ (home) +- Navigate into a directory with RET (or right-arrow) +- Expected: the popup stays open and shows that directory (browsing keeps it up) +- Open a video with RET (or o) +- Expected: the video opens in its player and the popup vanishes on its own — no q needed, nothing left in the way, and q never lands on the video +- Press Super+F again, open a PDF or image with RET +- Expected: it opens in zathura / feh (externally), NOT inside the popup frame; popup dismisses +- Press Super+F again, open a .txt or .org file with RET +- Expected: it opens in a NEW emacsclient frame (separate from your working session), not adopted into your current session; popup dismisses +- Press Super+F, then click another window without opening anything +- Expected: the popup dismisses on focus loss +- Press Super+F, then press q +- Expected: the popup closes completely (manual dismiss still works; no empty leftover frame) +- Press Super+F, then press Super+F again while it's still open +- Expected: still exactly one popup — the second press focuses the existing one, no second frame, no stray buffer (for several independent file managers, use C-x d) +- Press Super+Shift+F +- Expected: GUI nautilus opens (the binding nautilus moved to) +*** Network module Phase 1 — indicator states on the live bar +What we're verifying: =custom/net= shows the right state for each real network condition. The engine logic is unit-tested; this is the live-bar + visual check (states can't be faked on the running bar). Phase 2-3 tests get added under this task as those phases land. +- Reload waybar to pick up =custom/net= (Super+B, or =pkill waybar; waybar &=). +- On a normal connected network, read the module. +- Expected: wifi glyph + signal + SSID; tooltip shows IPv4, gateway, throughput, and a recent "online" probe result. +- Join the hotel/captive network (or any portal network). +- Expected: the module shows the captive state (distinct glyph + warning color), tooltip names the portal host. +- Unplug to a network with no internet (or block egress). +- Expected: the no-internet state (distinct from captive and from disconnected). +*** Network module Phase 1 — net doctor recovers rfkill from a TTY +What we're verifying: the console-recovery path works with no GUI, and recovers the framework's post-power-loss soft-block. +#+begin_src sh :results output +rfkill block wifi # simulate the soft-block +rfkill list wifi # confirm Soft blocked: yes +#+end_src +- Switch to a TTY (Ctrl+Alt+F3) and log in (no Hyprland). +#+begin_src sh :results output +make -C ~/.dotfiles online # or: net doctor --fix +#+end_src +- Expected: doctor reports the rfkill block, runs =rfkill unblock wifi= + =nmcli radio wifi on=, reconnects, and ends "online" — all from the TTY. +*** Network module Phase 1 — airplane state absorbed, old module gone +What we're verifying: the airplane toggle/state moved into =custom/net= and the standalone module is removed cleanly. +- Toggle airplane mode (the keybind / the net module's control). +- Expected: =custom/net= reflects the airplane state; wifi drops and restores. +- Check the bar has no separate =custom/airplane= module, and =waybar-airplane= / =airplane-mode= are gone from =~/.local/bin=. +- Expected: no duplicate airplane indicator; no stale scripts. -*** 2026-06-15 Mon @ 19:19:55 -0500 AI Response: popup size is the frame's char-cell count, not the Hyprland rule -Triaged under auto inbox-zero. The popup is the emacsclient frame named "org-capture", created by =~/.dotfiles/hyprland/.local/bin/quick-capture= with =(width . 90) (height . 22)= — 90 columns by 22 lines. Emacs sizes by character cells and overrides the Hyprland rule =windowrule = match:title ^(org-capture)$, size 900 500= (hyprland.conf:182). The live frame measured ~889x860 px; the width tracks the 90-column count, not the window rule. Setting the Hyprland rule to =size 55% 65%= (the scratchpad's pyprland spec) did not change the frame width, so I reverted it — dotfiles left clean. +** DOING [#B] Prepare for GitHub open-source release +:PROPERTIES: +:LAST_REVIEWED: 2026-06-28 +:END: +Remove personal info, credentials, and code quality issues before publishing. +*** 2026-06-16 Tue @ 00:55:39 -0500 Six dotfiles-scoped sub-tasks moved to the ~/.dotfiles project +Per the 2026-06-16 task audit, the six sub-tasks targeting files now owned by the standalone =~/.dotfiles= repo were handed off to that project (newly bootstrapped as its own AI project) and removed from this epic: "Remove credentials and secrets from dotfiles", "Remove/template personal info from dotfiles", "Remove binary font files from repo", "Move battery out of waybar sysmonitor group", "Resolution-adaptive scratchpad sizing", and "Dynamic waybar/foot config based on screen resolution". Handoff: =~/.dotfiles/inbox/2026-06-16-0053-from-archsetup-dotfiles-release-prep-handoff.org=. This epic now covers archsetup-proper release work only (scripts personal-info, device-specific config, history scrub, shellcheck, SPDX headers, README/LICENSE). The 2026-06-09 reconciliation note below is the prior state. +*** 2026-06-09 Tue @ 19:21:36 -0500 Reconciliation: six sub-tasks now target the ~/.dotfiles repo, not archsetup +Phase 3.2 removed the in-repo =dotfiles/= tree, so six sub-tasks below no longer describe archsetup content — they target files now owned by the =~/.dotfiles= repo (=git.cjennings.net/dotfiles.git=): "Remove credentials and secrets from dotfiles", "Remove/template personal info from dotfiles", "Remove binary font files from repo", "Move battery out of waybar sysmonitor group", "Resolution-adaptive scratchpad sizing", and "Dynamic waybar/foot config based on screen resolution". Their paths are relative to that repo now. Kept here for tracking per Craig (2026-06-09); he'll re-scope the archsetup-vs-dotfiles split shortly. archsetup-proper release work (scripts personal-info, device-specific config, shellcheck, and scrubbing the pre-=b10cba5= dotfiles secrets from archsetup's own history) stays this task. +*** 2026-05-11 Mon @ 13:01:29 -0500 AI Response: Open-source-prep source audit +Checked each subtask below against the source / git state. Bottom line: almost nothing is fully done. =LICENSE= and =README.md= were added this session (see those subtasks); the rest still stands. +- *Remove credentials and secrets from dotfiles* — NOT DONE. All five named files still tracked: =dotfiles/common/.config/.tidal-dl.token.json=, =.config/calibre/smtp.py.json=, =.config/transmission/settings.json=, =.msmtprc=, =.mbsyncrc=. =.gitignore= lists none of them; no =.example= templates exist. +- *Remove/template personal info from scripts* — PARTIALLY DONE. Repo URLs ARE config-driven (=archsetup:141-146= use =${dwm_repo:-https://git.cjennings.net/...}=, documented in =archsetup.conf.example=). Still personal: =archsetup:2-3= (email/website header), =init:8,21= (=root:welcome=), =scripts/post-install.sh:17-56= (personal repos). +- *Remove/template personal info from dotfiles* — NOT DONE. =.gitconfig= has =c@cjennings.net=, =name = Craig Jennings=, =github user = cjennings=, =safe.directory= and employer creds; =.config/mpd/musicpd.conf= + =mpd.conf= still use =~cjennings/= / =/home/cjennings/= paths; =.ssh/config= has personal/employer hosts; =.config/yt-dlp/config:2= has =c@cjennings.net=; =hyprland.conf:3= has personal attribution. +- *Scrub git history of secrets* — NOT DONE. 275 commits; history not fresh, no filter-repo evidence. +- *Remove device-specific configuration* — NOT DONE. =archsetup:1486-1493= still creates the Logitech BRIO udev rule unconditionally; no config flag. +- *Add README.md for GitHub* — DONE (this session — initial draft, pending review). See subtask below. +- *Add LICENSE file* — DONE (this session — GPL-3). See subtask below. +- *Remove binary font files from repo* — NOT DONE. =dotfiles/common/.local/share/fonts/= still tracks 8 PragmataPro =.ttf= files, =AppleColorEmoji.ttf=, and other commercial fonts (Cartograph, MonoLisa, ComicCode, etc.). +- *Make claude-code installation optional* — NOT DONE. =archsetup:1817-1818= runs =curl -fsSL https://claude.ai/install.sh | sh= unconditionally; no flag. +- *Add input validation for username and paths* — PARTIALLY DONE. =archsetup:326-328= validates =$username= against =^[a-z][a-z0-9_]*$= (plus reserved-names check, marked DONE separately). No validation of =$source_dir= or other path vars. +- *Move battery out of waybar sysmonitor group* — NOT DONE. =dotfiles/hyprland/.config/waybar/config:27-37= still has =battery= inside =group/sysmonitor=. +- *Resolution-adaptive scratchpad sizing* — NOT DONE. No size/move windowrules for scratchpads in =hypr/conf.d=. +- *Dynamic waybar/foot config based on screen resolution* — NOT DONE. No resolution-detection/generation script. +- *Bulk shellcheck cleanup* — PARTIALLY DONE. =shellcheck archsetup= still shows 68 findings: 30×SC2329, 16×SC2174, 15×SC2024, 4×SC2086, 1 each SC2155/SC2129/SC2005. The 4 SC2086 (unquoted) are the ones a reviewer would flag — those are the priority. +- *Document testing process in README* — NOT DONE. =scripts/testing/README.org= exists but isn't the project README. (Now unblocked — root README exists.) +- *Add guard for rm -rf on constructed paths* — DONE 2026-05-20. All three constructed-path deletes routed through a =safe_rm_rf= guard (absolute / no-=..= / inside-allowed-prefix / real-dir checks); unit-tested in =tests/safe-rm-rf/=. +- *Standardize boolean comparison style* — NOT DONE. Mixed: =[ "$var" = "true" ]= at =archsetup:542,544,569= vs bare =if $var;= form ~7 places elsewhere. +- *Replace eval with safer alternatives* — NOT DONE. =archsetup:442= still =if eval "$cmd" >> "$logfile" 2>&1;= in =retry_install=. -Real lever: the column/line count in the quick-capture script. Scratchpad reference on ratio (DP-4, 3440x1440) is 55% 65% ~= 1892x936 px ~= 190 cols by 24 lines. Why this isn't a solo auto-fix — it needs a tradeoff decision: -- The script lives in the shared =hyprland/= stow tier, so a fixed ~190 columns overflows velox's 1920-wide laptop, and 24+ lines overflows velox's 1080 height (22 lines ~= 860 px is already near the safe max there). -- Emacs char-cell sizing doesn't adapt to the monitor the way pyprland's percentage does, so "scratchpad-size on both machines" needs one of: a fixed compromise count, a per-host override via the ratio/velox tiers, or a script that computes columns from the active monitor. -Options to weigh: (a) a safe-on-both compromise like width 120-130 / height 24; (b) per-host width through the ratio/velox tiers; (c) dynamic sizing in quick-capture from =hyprctl monitors=. Pick the tradeoff and I'll implement. +*** 2026-06-28 Sun @ 13:34:03 -0400 Cancelled: calendar-feed URL rotation +Craig's call — not rotating. The three private iCal URLs (Google personal, Proton with PassphraseKey, Google DeepSat) sat in git history from =500b1f5= (2026-05-13) until the 2026-05-20 filter-repo scrub, which removed them from local + remote history. The residual exposure is only to anyone who cloned the repo in that 2026-05-13..05-20 window; Craig accepts that window rather than regenerating all three tokens on ratio. The history scrub already happened; the live =calendar-sync.local.el= is owned by the emacs project. Closing without rotation. + +*** 2026-05-20 Wed @ 12:09:32 -0500 Scrubbed the calendar secret from git history +=dotfiles/common/.emacs.d/calendar-sync.local.el= (private Google/Proton/DeepSat ical URLs, added in =500b1f5= for stow distribution) was discovered while folding tmux-util into stow. Sent the file back to the emacs project's inbox, =git rm='d it, then =git filter-repo --invert-paths --path= purged it from all 29 affected commits. Force-pushed (=0921e4d...618e6cc=, with lease) and ran =reflog expire= + =gc --prune=now= on the bare repo at =/var/git/archsetup.git=. Verified: the file is in zero commits, the secret tokens return zero matches across all history, and =500b1f5= / =0921e4d= are unreachable on both local and remote. Rotation of the URLs tracked as the sibling TODO above. This also proves =filter-repo= works cleanly here — relevant precedent for the broader [[*Scrub git history of secrets (or start fresh)][history-scrub task]] below (the 5 credential files are still in history). + +*** TODO [#B] Remove/template personal information from scripts +- =archsetup= lines 3-4: personal email and website in header +- =scripts/post-install.sh=: personal git repos and server URLs (the old =scripts/gitrepos.sh= was consolidated into this script in =dae7659=, so its personal =git.cjennings.net= clone targets now live here) +- =init= line 9: hardcoded password =welcome= +**** 2026-06-28 Sun @ 13:29:29 -0400 Reconciled: dotfiles repo URLs already config-driven +Dropped the "lines 141-146 hardcoded =git.cjennings.net= URLs" bullet. archsetup:138-140 reads =DOTFILES_REPO= / =DOTFILES_BRANCH= / =DOTFILES_DIR= overrides (defaults only, documented in =archsetup.conf.example=), so that item is already done. Refreshed the stale line numbers on the remaining bullets (header email/site now lines 3-4, init password now line 9, after the SPDX headers shifted the files). -** TODO [#C] archsetup Waybar Wi-Fi module should show no-internet state :feature: +*** TODO [#B] Scrub git history of secrets (or start fresh) +Even after removing files, secrets remain in git history. +Options: =git filter-repo= to rewrite history, or start a fresh repo for the GitHub remote. +Recommend: fresh repo for GitHub (keep cjennings.net remote with full history). +**** 2026-06-28 Sun @ 13:29:29 -0400 Reconciled: 589 commits, 5 credential files still in history +History is now 589 commits (the 2026-05-11 note's "275" is stale). Only the calendar-feed file has been filter-repo'd so far (2026-05-20). The five credential files remain in history at their pre-=b10cba5= paths: =.tidal-dl.token.json= (5 commits), =calibre/smtp.py.json= (6), =transmission/settings.json= (5), =.msmtprc= (8), =.mbsyncrc= (9). None are tracked in the current tree. The scrub-or-fresh-repo decision still stands. + +*** 2026-06-24 Wed @ 19:41:56 -0400 Gated device-specific udev rules behind a flag +The Logitech BRIO udev rule is now wrapped in =if [ "$install_device_udev_rules" = "true" ]=, fed by a new =INSTALL_DEVICE_UDEV_RULES= key (default yes, opt-out — still mainly a personal project). Added the var default, the config read, a =validate_config= check, and an =archsetup.conf.example= entry. Verified: default/yes writes the rule, no skips it, bogus is rejected; =bash -n= clean. + +*** 2026-06-28 Sun @ 13:37:33 -0400 Added README.md — full draft complete, final read filed +=README.md= is substantively done at repo root (10.9 KB), covering project description, features, requirements, installation, the =archsetup.conf= configuration guide, security considerations, contributing, and license, with generic placeholders for the eventual public fork. The 2026-05-11 "first pass" note below is superseded. Craig's final read before public release is filed under "Manual testing and validation"; closing as code-complete pending that human check, per the audit rule. + +**** 2026-05-11 Mon @ 13:01:29 -0500 AI Response: Initial README draft +Drafted =README.md= at repo root, modeled on =~/code/chime/README.org=. First pass — review and run a voice/style pass before committing. Personal info (emails, =cjennings.net= URLs, personal repo names) intentionally replaced with placeholders for the eventual public release. + +*** 2026-05-19 Tue @ 01:54:29 -0500 Added GPL-3 LICENSE file at repo root +GPL-3 chosen. Canonical GPLv3 text landed at =LICENSE= on 2026-05-11 (commit =f80e664=). README already links to it. SPDX/license headers across source files (or a NOTICE file) split out as a new sub-task below for the eventual public release. + +*** 2026-06-24 Wed @ 19:41:56 -0400 Added SPDX headers to all shell scripts +Swept =# SPDX-License-Identifier: GPL-3.0-or-later= in right after the shebang of all 24 shell scripts in the repo (=archsetup=, =init=, =scripts/**/*.sh= incl. =scripts/testing/=). The dotfiles are a separate repo now, so they aren't swept here. Verified the header sits at line 2 (after the shebang) and syntax still passes. + +*** 2026-06-09 Tue @ 19:21:36 -0500 Made claude-code install optional +Shipped in =f2dad22= (feat: make the claude-code install optional). The =curl | sh= from claude.ai now sits behind a config flag instead of running unconditionally. + +*** 2026-06-09 Tue @ 19:21:36 -0500 Input validation added (validate_config + validate_username) +validate_config + validate_username shipped (detail in the 2026-05-11 note below). The =$source_dir= path check was judged unnecessary — it derives from the now-always-validated =$username= (=/home/$username/.local/src=). Closed as done. + +**** 2026-05-11 Mon @ 18:20:49 -0500 AI Response: validate_config + validate_username added +Added two pre-flight validators to =archsetup= (right after =load_config=, before any install step): +- =validate_username()= — the lowercase / starts-with-letter / =[a-z0-9_]= / not-reserved check, extracted from the inline block in =preflight_checks()=. Fixes an existing gap: the inline check only ran on the *prompted* path, so a config with =USERNAME=root= (or =USERNAME=foo bar=) slipped through unvalidated. Now both =preflight_checks= and =validate_config= call it. +- =validate_config()= — runs whenever =--config-file= is used: rejects unknown =DESKTOP_ENV= (must be dwm/hyprland/none) early instead of dying in step 7-9; rejects =AUTOLOGIN=/=NO_GPU_DRIVERS= values that aren't =yes=/=no= (currently silently ignored); basic shape check on =LOCALE=; and a scheme + no-whitespace/no-leading-dash check on the six =*_REPO= URLs that get passed to =git clone= (rejects e.g. =--upload-pack=…= injection). Plain =echo …>&2; exit 1= (the logging helpers aren't defined that early). =$source_dir= needs no separate check — it's =/home/$username/.local/src=, derived from the now-always-validated =$username=. +Not a security boundary (=load_config= sources the config as bash; a hostile config can already run anything) — it's typo-catching. Verified with =bash -n= and a smoke-test matrix of good/bad inputs through both functions. The next =make test= run confirms valid configs still install. Leaving as DOING for review. + +*** 2026-05-20 Wed @ 06:50:25 -0500 Swept shellcheck across the shell scripts +Census across the 16 shell scripts (=archsetup=, =init=, =scripts/*.sh=, =scripts/testing/=): 124 findings, zero errors. Triaged against "what matters for public review" and confirmed the 2026-01-24 read — most are intentional or documented-acceptable: +- SC2024 (14, sudo redirects), SC2174 (16, =mkdir -p -m=), SC1091 (13, unfollowable sources), SC2329 (32, functions invoked indirectly via the =STEPS= dispatch array), SC2153 (1, =DISK_PATH= sourced from =vm-utils.sh=) — all false positives or accepted. +- SC2086 on =$SSH_OPTS= in =vm-utils.sh= (×4) and =$TEMP_DISKS= in =cleanup-tests.sh= — intentional word-splitting; quoting would break them. The SSH_OPTS-as-array refactor is the proper fix, deliberately deferred (codebase-wide, one atomic change). +- SC2086 integer tests in =[ ]= (=archsetup=, =cleanup-tests=) — safe, note-level style; left to avoid churn in the just-fixed =retry_install=. +- SC2015 (×2, =vm_exec && success || warn=) — =success=/=warn= return 0, so C won't spuriously fire. Idiomatic. + +Fixed the four that are genuine: =init= (a =#!/bin/sh= script) used =$(</etc/hostname)= (SC3034 bashism → =$(cat ...)=) and an unquoted =$interface_up= (SC2086 → quoted); =shellcheck init= now clean, =sh -n= passes. Suppressed the two =VM_IP= SC2034 warnings with documented =# shellcheck disable= directives (consumed by the sourced =validation.sh=, which shellcheck can't follow). 124 → 120; the remaining 120 are the triaged-acceptable set above. + +*** 2026-05-20 Wed @ 06:32:17 -0500 Documented the testing process in the README +The README only covered the VM integration harness; the unit-test layer under =tests/= (Python =unittest=, fake-binary-on-PATH, one dir per script — =layout-navigate=, =tmux-util=) was undocumented. Added a =make test-unit= target that runs every =tests/*/test_*.py= suite explicitly (=unittest discover= can't find them — hyphenated dir names aren't valid package paths), then rewrote the README Testing section into "Unit tests" and "Integration tests (VM harness)" subsections, including how to add a suite for a new script. Updated Contributing to point at =make test-unit= for script changes. 61 unit tests pass via the new target. +*** 2026-05-20 Wed @ 18:22:42 -0400 Added safe_rm_rf guard on constructed-path deletes +Added a self-contained =safe_rm_rf <path> <allowed_prefix>= helper to =archsetup= and routed all three constructed-path deletes through it. The guard refuses to run unless the target is absolute, free of =..=, deeper than a bare top-level dir, strictly inside the allowed prefix (not the prefix itself), and a real directory (not a symlink); otherwise it prints the reason and returns non-zero without deleting. On the happy path it delegates to =rm -rf=. + +Sites converted (the line numbers in the original task body were stale — actual sites located by grep): +- =--fresh= state-dir wipe — prefix =/var/lib/archsetup=. +- =git_install= clone-retry cleanup (=build_dir= under =$source_dir=). +- =aur_installer= yay clone-retry cleanup (same prefix). + +The helper is defined before the top-level =--fresh= handler (which runs at load time, before the logging helpers exist), so it carries no =error_warn= dependency and reports refusals to stderr itself. The two in-function sites keep their existing =|| error_warn= / =|| error_fatal= handling. + +Tests: =tests/safe-rm-rf/test_safe_rm_rf.py= sources the real function out of the script and exercises Normal/Boundary/Error cases (13 tests) against real temp dirs. =make test-unit= green (61 tests), =bash -n= clean, no new shellcheck warnings. +*** 2026-06-24 Wed @ 19:41:56 -0400 Standardized boolean comparisons on the explicit form +Rewrote the bare =if $var= boolean conditionals (=show_status_only=, =fresh_install=, =skip_gpu_drivers=, =detected_intel/amd/nvidia=, plus two =! $var= negation chains) to the explicit =[ "$var" = "true" ]= / =!= "true"= form, and quoted the one unquoted =install_claude_code = true=. Left =if $step_func= alone — that's the STEPS function-dispatch, not a boolean. Verified: only =step_func= remains bare, all comparisons are quoted, =bash -n= clean. + +*** 2026-05-26 Tue @ 15:27:09 -0500 eval task moot — the line-434 eval is gone, the survivor is deliberate +Verified: the only =eval= left in =archsetup= is line 578 in =retry_install=, and it's intentional and documented — it captures =$?= directly from =eval "$cmd"= to dodge the if-compound-swallows-exit-code trap. Replacing it with an array would reintroduce that bug. The line-434 eval this task pointed at no longer exists. Nothing to change. + +** CANCELLED [#B] Audit dotfiles/common directory +CLOSED: [2026-06-28 Sun] +Refiled to the standalone =~/.dotfiles= repo, which owns this content since the 2026-06-16 split. Handoff sent 2026-06-28: =~/.dotfiles/inbox/2026-06-28-1335-from-archsetup-refiled-from-archsetup-task-audit-2026.org=. The three sub-tasks (review ~/.local/bin scripts, remove orphaned configs, verify stowed files are used) travel with it. Cancelled here, not abandoned. + +** CANCELLED [#C] Zoom launches in a tiny window :bug:hyprland: +CLOSED: [2026-06-28 Sun 13:56] :PROPERTIES: -:LAST_REVIEWED: 2026-06-13 +:LAST_REVIEWED: 2026-06-24 :END: -From the roam inbox: the Waybar Wi-Fi module should distinguish "connected to an access point" from "connected and has internet." Add a no-internet state or indicator to the archsetup Waybar configuration. Not marked quick/solo because it needs the archsetup environment and live network-state verification. +From the roam inbox: Zoom opens at a tiny size. Needs diagnosis (HiDPI scaling vs a window rule vs XWayland) and live verification with Zoom actually running — held for a Craig-driven debug pass, not a blind fix. + +** DONE [#B] btrfs base VM unbuildable — archangel ISO bakes zfs-auto-snapshot :bug:test: +CLOSED: [2026-06-28 Sun] +Resolved: archangel shipped a fixed ISO (2026-06-27) that conditions the baked AUR list on the filesystem, so a btrfs install no longer drags in =zfs-auto-snapshot=. The btrfs base rebuilt and went green in the 2026-06-28 VM run (97/0, zero attributed issues). The EFI removable-fallback hardening is archangel-side and optional. +=make test-vm-base= (btrfs) fails in archangel's installer: the ISO bakes a fixed +AUR list ("downgrade yay informant zrepl pacman-cleanup-hook zfs-auto-snapshot +topgrade ventoy-bin") into every install regardless of =FILESYSTEM=. On a btrfs +install =zfs= isn't present, so =zfs-auto-snapshot='s =zfs= dependency can't +resolve and the unattended pacstrap aborts ("unable to satisfy dependency 'zfs' +required by zfs-auto-snapshot"). This is an archangel ISO bug (the baked list isn't +controllable from =archsetup-test.conf=), so it blocks btrfs-profile VM testing +until archangel ships an ISO that conditions the AUR list on the filesystem (or +drops zfs tooling from non-zfs installs). The 2026-06-27 btrfs base regen attempt +also wiped the prior (unbootable) btrfs base, so there's no btrfs base image until +this is fixed. zfs-profile testing works (=make test FS_PROFILE=zfs=). + +Companion hardening (defense-in-depth, archangel-side): install the bootloader +with a removable =\EFI\BOOT\BOOTX64.EFI= fallback so a base boots even from +fresh/empty NVRAM, and real installs survive firmware that drops boot entries. * Archsetup Resolved @@ -1244,3 +1404,249 @@ CLOSED: [2026-06-14 Sun] Make package diff a runnable script instead of manual process Resolved 2026-06-14: the runnable script already existed — =scripts/package-inventory= (built 2026-02-06) extracts archsetup's declared packages and diffs them against the live system (=--summary= / =--archsetup-only= / =--system-only= / full report). This pass added the missing coverage: 7 characterization tests in =tests/package-inventory/= pinning the extraction and both diff directions behind injectable =PKGINV_ARCHSETUP= / =PKGINV_PACMAN= seams, plus a =make package-diff= target for discoverability. Full unit suite green (26 tests, 3 suites). +** DONE [#B] Idle-inhibitor keybind + synced waybar indicator :hyprland:waybar: +CLOSED: [2026-06-23 Tue] +Shipped 2026-06-23 as dotfiles commit =a004201=. Super+I toggles the hypridle daemon (kill = inhibit, relaunch = restore). The built-in waybar =idle_inhibitor= module was replaced with a =custom/idle= module backed by a =waybar-idle= script, so the keybind, the bar click, and the icon share one source of truth (whether hypridle is running) and stay in sync. Icons inhibited / active, with a 5s poll safety net. Freed =Super+I= by pruning the unused ai-term pyprland scratchpad from both host configs. TDD'd (=waybar-idle= + =hypridle-toggle= suites); dupre/hudson theme CSS updated. From a home-project handoff 2026-06-23; Craig confirmed it works live. +** DONE [#B] Verify package signature verification not bypassed by --noconfirm +CLOSED: [2026-06-23 Tue] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-21 +:END: +Audited 2026-06-23. =--noconfirm= does not bypass signature verification — it only auto-answers interactive prompts. Signature checking is governed by =SigLevel= in =/etc/pacman.conf=, which archsetup leaves at the Arch default (=Required DatabaseOptional=): its only pacman.conf edits are ParallelDownloads, Color, and enabling multilib (=archsetup:913,917=), none of which touch =SigLevel=. So every repo package stays signature-verified regardless of =--noconfirm=. + +One real integrity bypass exists, and it is not =--noconfirm=: =archsetup:2403= runs =yay -S --noconfirm --mflags --skipinteg python-lyricsgenius=, where =--skipinteg= skips makepkg's checksum and PGP-signature checks for that one AUR package (a documented workaround for an expired-signature issue upstream). It's scoped to a single package, not global. Tracked for periodic re-check below. +** DONE [#C] Harden sshd in the installer (explicit prohibit-password) :solo: +CLOSED: [2026-06-24 Wed] +Done 2026-06-24: the openssh block (=archsetup:1271-1277=) now writes =/etc/ssh/sshd_config.d/10-hardening.conf= with =PermitRootLogin prohibit-password= and reloads sshd, right after starting the service. =PasswordAuthentication= left untouched so ssh-copy-id to the user still works. Makes the posture intentional rather than dependent on the upstream default. Velox and ratio (which carried an explicit =PermitRootLogin yes= at =sshd_config:33= from earlier provisioning) were already fixed by hand 2026-06-23. Verified =bash -n= + =shellcheck -S error= clean; full drop-in-on-fresh-install confirmation is VM-deferred (the unit harness covers helpers, not inline install steps). +** DONE [#C] Build security dashboard command :solo: +CLOSED: [2026-06-23 Tue] +:PROPERTIES: +:LAST_REVIEWED: 2026-05-21 +:END: +Shipped 2026-06-23 as dotfiles commit =1b9b205=: =security-status= (=common/.local/bin=, on PATH). Read-only dashboard showing disk encryption (LUKS *and* ZFS native — the fleet runs ZFS, so a LUKS-only check would have falsely reported "no encryption"), ufw state, externally-reachable ports (counts all listening, lists only the non-loopback exposures), and running/failed service counts. Command lookups are env-overridable; parsing covered by unit tests against canned output. New file, so ratio needs =git pull && make stow hyprland= to link it. +** DONE [#C] paru vs yay — evaluated, staying with yay +CLOSED: [2026-06-10 Wed] +Research done 2026-06-10: [[file:docs/2026-06-10-paru-vs-yay-evaluation.org][docs/2026-06-10-paru-vs-yay-evaluation.org]]. The maintenance picture inverted since the task was filed: yay released v12.6.0 on 2026-06-07 with active triage, while paru has had no release in 11 months, no commit in 5, and a stable that fails to build against current libalpm (issue #1468 open 6 months). For an installer that bootstraps the AUR helper unattended, paru is the riskier choice on every axis that matters. No decision needed — the evidence closes this one; revisit only if paru's maintenance resumes. +** DONE [#C] Teach archsetup to stow the host tier :solo: +CLOSED: [2026-06-23 Tue] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-11 +:END: +Already implemented in =user_customizations()= (=archsetup:1049-1058=): after stowing =common= + the DE package, it derives =host_tier="$(cat /etc/hostname 2>/dev/null || uname -n)"= and stows that package when =$dotfiles_dir/$host_tier= exists, else prints "no host tier for '<host>' — skipping". The =/etc/hostname=-first detection is the right call for install time (=uname -n= still reports the ISO's name until reboot), and it's the same skip-if-absent semantics as the dotfiles Makefile. Verified by reading the installer 2026-06-23; no code change needed. +** DONE [#C] Waybar indicators unevenly spaced :quick:solo:waybar: +CLOSED: [2026-06-24 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +The right-side module icons don't sit at even intervals — spacing reads as inconsistent across the group. Noticed 2026-05-21 after adding the airplane indicator. + +Done 2026-06-24: a screenshot showed the standalone module icons were already even — the unevenness was the tray, whose icons clustered tight (tray =spacing: 4= vs the ~0.3rem margins on every other module). Bumped tray =spacing= 4 → 10 in the waybar =config=; restarting waybar and re-screenshotting confirmed the row reads even. The lever was the tray spacing, not the per-module CSS the original body guessed at. +** DONE [#B] Separate mpd playlist_directory from music_directory :mpd:music:quick: +CLOSED: [2026-06-24 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +Done 2026-06-24 (dotfiles a9bfdf3): set =playlist_directory= to =~/.local/share/mpd/playlists= (separate from =music_directory= ~/music). git-moved the 73 radio-stream playlists from =common/music/= into =common/.local/share/mpd/playlists/= (history preserved); dropped the empty =60s Sounds.m3u= (Craig's call); git rm'd the stray =Black Flamingos - Space Bar.m4a= and moved the real track into the music library. Curated playlists left flat in ~/music (Craig's call — avoids rewriting the 7 relative-path ones). The ~/music/radio orphan was already gone. Relinked surgically (a pre-existing =whereami= stow conflict blocked a full =stow common=). mpd restarted clean: 73 radio playlists load from playlist_directory (verified SomaFM stream URLs), 24 curated browsable from the music tree. ratio needs the same restow + mpd restart on its next pull (reminder filed). Decisions answered: 60s dropped, curated flat. +Spec written and approved (option 1), pinned before execution on 2026-06-03. Root issue: mpd.conf has =playlist_directory= == =music_directory= == ~/music, so the whole audio library is the playlist store and radio streams mix with curated playlists. Option 1: radio stream playlists (portable, 73 in the dotfiles repo) move to a dedicated =playlist_directory= (=~/.local/share/mpd/playlists=) via stow; the 22 curated local playlists (machine-specific track refs) live in the music tree. Also removes the broken ~/music/radio/ orphan (73 dead symlinks). + +Full step-by-step spec (mpd.conf edit, repo restructure of =common/music/= → =common/.local/share/mpd/playlists/=, curated relocation, restow, verification incl. the 7 relative-path curated playlists, ratio propagation) is in the 2026-06-03 session record under .ai/sessions/. Two open decisions before executing: (1) drop the empty =60s Sounds.m3u= or refill with the SomaFM 60s URL; (2) curated playlists into =~/music/playlists/= subdir vs leave flat in ~/music/. Side cleanup surfaced: a stray audio file =Black Flamingos - Space Bar.m4a= is wrongly committed in the dotfiles repo's =common/music/= — git rm it and move to the synced library. +** DONE [#C] Install adopted modern CLI tools :tooling:solo: +CLOSED: [2026-06-24 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +Done 2026-06-24: added bat/dust/hyperfine/doggo to archsetup General Utilities (tealdeer was already declared), installed all five on velox, set =BAT_THEME=ansi= in =common/.profile.d/tools.sh= (tracks the dupre terminal palette), seeded the tldr cache. ratio still needs the =pacman -S= (additive; lands on its next archsetup run). +Decision (Craig, 2026-06-24): adopt all five recommended tools — =bat=, =dust=, =hyperfine=, =tealdeer=, =doggo= (all in extra). Add them to archsetup's package list and install on both machines. Optional candidates (=xh=/=jless=/=sd=/=ouch=) declined for now. Full evaluation: [[file:docs/2026-06-10-modern-cli-tools-evaluation.org][docs/2026-06-10-modern-cli-tools-evaluation.org]]. + +- Add the five to the appropriate pacman package section in =archsetup=. +- =pacman -S bat dust hyperfine tealdeer doggo= on velox + ratio. +- =bat=: set =BAT_THEME= to match the dupre palette once installed. +- =tealdeer=: run =tldr --update= to seed the cache after install. +** DONE [#C] Review file manager options for Wayland +CLOSED: [2026-06-24 Wed] +Decision (Craig, 2026-06-24): keep nautilus only; skip yazi. File management lives in Emacs dired plus the Super+F dirvish popup, so a TUI file manager has no daily user here. ranger was already ruled out (frozen upstream). Full evaluation: [[file:docs/2026-06-10-file-manager-evaluation.org][docs/2026-06-10-file-manager-evaluation.org]]. Follow-on surfaced: nautilus needs dark theming (filed as its own task). +** DONE [#B] Theme nautilus to a dark theme :bug:solo: +CLOSED: [2026-06-24 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +nautilus rendered blindingly white (Craig, 2026-06-24). As a GTK4/libadwaita app it follows the appearance portal's =org.freedesktop.appearance color-scheme=, which mirrors =org.gnome.desktop.interface color-scheme=. Two stacked causes: + +1. velox had no system-wide dconf db at all — no =/etc/dconf/profile/user=, no =/etc/dconf/db/site.d/00-archsetup-defaults=, no compiled =site= db — so archsetup's declared default (=color-scheme='prefer-dark'=, =archsetup:1109-1119=) never reached the machine (velox predates that block). Created the profile + site defaults as archsetup writes them and ran =dconf update=. =gsettings get= then returned =prefer-dark=. + +2. That alone did NOT fix the running session: a system-db default emits no GSettings change signal, so the appearance portal kept reporting =0= (no-preference → light), and libadwaita reads the portal, not =GTK_THEME=. (An early screenshot looked dark only because the shell env carries =GTK_THEME=Adwaita:dark=, which Hyprland-launched apps don't inherit — masking the real state.) Fix: a user-level =gsettings set org.gnome.desktop.interface color-scheme prefer-dark=, which signals the portal live. It now reports =1=, and a portal-driven nautilus (GTK_THEME unset) renders dark — screenshot-verified. + +Durable: the user value persists in =~/.config/dconf/user=; archsetup's system-db handles fresh installs (the portal reads the default fresh at login, so no signal is needed there). No archsetup change. ratio may need the same one-two — see the Active Reminder. +** CANCELLED [#D] Test wlogout menu on laptop +CLOSED: [2026-06-24 Wed] +Merged into the "Wlogout exit-menu buttons are rectangular, not square" task ([#C]) — same effort (per-host wlogout button sizing across velox/ratio). The fixed-pixel-margins hint was folded into that task's body. +** DONE [#B] Enlarge org-capture popup to scratchpad size :hyprland: +CLOSED: [2026-06-24 Wed] +From a .emacs.d inbox handoff (2026-06-15, captured via roam): the quick-capture / org-protocol popup is too small to be effective — it should be about the size of a terminal scratchpad. + +*** 2026-06-24 Wed @ 17:21:11 -0400 Sized the popup to the scratchpad, per-host in pixels +The 06-15 read was wrong: the real size lever is the Hyprland window rule, not the quick-capture char-cell count. The =size 900 500= rule on the org-capture window pinned it to 900x500 regardless of the frame's requested geometry (demoing 120x24 vs 180x32 looked identical because both clamped to 900x500). Tried a percentage rule (=size 75% 70%=) to auto-adapt per host like the pyprland scratchpad — native window rules do NOT honor percentages (only pyprland does), so the frame fell back to char-cell geometry and overflowed the screen. Fix: absolute pixels matching each host's terminal scratchpad, placed in the host tier (=<host>/conf.d/local.conf=) since pixels don't adapt across monitors. velox = 1078x671 (75%x70% of its 1437x958 logical desktop) — verified on-screen. ratio = 1892x936 (55%x65% of 3440x1440) — set but not yet eyeballed on ratio (tracked as an Active Reminder in notes.org). The shared hyprland.conf keeps float/center/stay_focused and a comment pointing at the per-host size. dotfiles change — needs commit in =~/.dotfiles=. + +*** 2026-06-15 Mon @ 19:19:55 -0500 AI Response: popup size is the frame's char-cell count, not the Hyprland rule +Triaged under auto inbox-zero. The popup is the emacsclient frame named "org-capture", created by =~/.dotfiles/hyprland/.local/bin/quick-capture= with =(width . 90) (height . 22)= — 90 columns by 22 lines. Emacs sizes by character cells and overrides the Hyprland rule =windowrule = match:title ^(org-capture)$, size 900 500= (hyprland.conf:182). The live frame measured ~889x860 px; the width tracks the 90-column count, not the window rule. Setting the Hyprland rule to =size 55% 65%= (the scratchpad's pyprland spec) did not change the frame width, so I reverted it — dotfiles left clean. + +Real lever: the column/line count in the quick-capture script. Scratchpad reference on ratio (DP-4, 3440x1440) is 55% 65% ~= 1892x936 px ~= 190 cols by 24 lines. Why this isn't a solo auto-fix — it needs a tradeoff decision: +- The script lives in the shared =hyprland/= stow tier, so a fixed ~190 columns overflows velox's 1920-wide laptop, and 24+ lines overflows velox's 1080 height (22 lines ~= 860 px is already near the safe max there). +- Emacs char-cell sizing doesn't adapt to the monitor the way pyprland's percentage does, so "scratchpad-size on both machines" needs one of: a fixed compromise count, a per-host override via the ratio/velox tiers, or a script that computes columns from the active monitor. +Options to weigh: (a) a safe-on-both compromise like width 120-130 / height 24; (b) per-host width through the ratio/velox tiers; (c) dynamic sizing in quick-capture from =hyprctl monitors=. Pick the tradeoff and I'll implement. +** DONE [#C] Highlight current month and year in the calendar hover :feature:waybar:quick:solo: +CLOSED: [2026-06-24 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +From the roam inbox (2026-06-24): the waybar clock's calendar tooltip highlights today's date in goldenrod; the current month and year header should be goldenrod too. + +Done 2026-06-24: the date module is the custom =waybar-date= script (not the built-in clock), so the highlight lives in its tooltip markup. Added a sed wrapping line 1 of the current-month =cal= output (the centered "Month Year") in the same =#daa520= goldenrod the day highlight uses. Verified the tooltip JSON carries =<span color='#daa520'><b>June 2026</b></span>= with today's highlight intact and waybar live; the on-hover look is Craig's spot-check. +** DONE [#C] Wallpaper-set from dirvish doesn't work on Wayland :hyprland: +CLOSED: [2026-06-24 Wed] +From the roam inbox (2026-06-24, claimed for archsetup by Craig): typing =bg= in the dirvish popup doesn't change the wallpaper — Craig's read is it may still be wired to feh/X11 instead of a Wayland utility. + +Findings (2026-06-24): the Wayland wallpaper utility on this setup is =awww= (waypaper's configured =backend = awww=; =set-theme= sets the default via =awww img <file>=). There was no shared wallpaper script (=bg= on PATH is just the shell builtin), and the dirvish =bg= command lives in the Emacs config, so it was calling the wrong (or no Wayland) setter. + +Done 2026-06-24 (dotfiles 8be2484): added =set-wallpaper <image>= to the hyprland tier — sets live via =awww img= and persists the choice into =waypaper/config.ini=, the single Wayland-correct entry point. Resolves relative paths, validates the file, exits non-zero without persisting if awww fails. 8 Normal/Boundary/Error tests green; live-verified (awww set it, config rewrote). Notified =.emacs.d= to point the dirvish =bg= command at =set-wallpaper <file>= — that wiring is its piece (dependency cleared, =:blocker:= dropped). + +Follow-up (separate, small): the login restore =exec-once= in =hyprland.conf= is hardcoded to =trondheim-norway.jpg=, so a wallpaper set via =set-wallpaper= shows live but won't survive a relogin until the exec-once becomes =waypaper --restore= (which reads the now-persisted config). Filed below. +** DONE [#B] Add backup before system file modifications :solo: +CLOSED: [2026-06-25 Thu] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +Safety net for /etc/X11/xorg.conf.d and other system file edits +Files like ~/etc/sudoers~, ~/etc/pacman.conf~, ~/etc/default/grub~ modified without backup +If modifications fail or are incorrect, difficult to recover - should backup files to ~.backup~ before modifying + +Done 2026-06-25: added a =backup_system_file <path>= helper next to =safe_rm_rf= — it snapshots a pre-existing file to =<path>.archsetup.bak= before an in-place edit, idempotent (never clobbers an existing backup, so the pristine original survives repeated edits and re-runs), =cp -p= to preserve mode/ownership, no-op when the file is absent. Took the narrow scope (Craig's call): route only the in-place =sed -i= / append edits to *pre-existing* files through it — locale.gen, makepkg.conf, pacman.conf, sudoers, conf.d/wireless-regdom, geoclue.conf, conf.d/pacman-contrib, fstab, mkinitcpio.conf, vconsole.conf — and skip the brand-new drop-in files archsetup fully owns (nothing to back up; recovery is just deleting them). Tests: =tests/backup-system-file/= (7 Normal/Boundary/Error, incl. mode-preserved, existing-backup-not-overwritten, missing-target no-op, cp-failure). =make test-unit= green across all 5 suites; =bash -n= clean; only shellcheck note is the known SC2329 false positive (indirect STEPS dispatch). Integration verification is the next VM run. +** DONE [#B] Migrate bare-metal test runner to Testinfra, then delete the shell sweep :test: +CLOSED: [2026-06-25 Thu] +Plan + ZFS-coverage expansion: [[file:docs/design/2026-06-25-zfs-vm-test-coverage.org]] (build a ZFS base VM via archangel + a =FS_PROFILE= selector so =make test= covers the ZFS path, then migrate this runner to key auth + Testinfra against it, then delete the dead =validation.sh= functions = phase E here). +=run-test.sh= (VM) now uses the Testinfra/pytest sweep as its authoritative validator, but =run-test-baremetal.sh= (lines ~243-244) still calls the old =run_all_validations= / =validate_all_services= from =scripts/testing/lib/validation.sh=. Migrate the bare-metal runner to =run_testinfra_validation= too (same key + ssh-config approach, adapted for a real host), then delete the now-dead shell-sweep functions from =validation.sh=. Keep the live helpers: =ssh_cmd=, =attribute_issue=, =capture_pre/post_install_state=, =analyze_log_diff=, =categorize_errors=, =generate_issue_report=, and the =VALIDATION_*= counters/arrays. Deferred from the Testinfra cutover because it needs a bare-metal test loop to validate, out of scope for the VM-only autonomous run. +*** 2026-06-25 Thu @ 12:37:02 -0400 P-A/P-B shipped (FS_PROFILE selector); P-C blocked on archangel ZFS-install bug +P-A + P-B landed in =353b179=: =archsetup-test-zfs.conf= (archangel ZFS config) + an =FS_PROFILE= (btrfs default / zfs) selector across =vm-utils.sh= (=init_vm_paths= derives a per-profile image + validates the profile), =create-base-vm.sh= (selects the archangel config), =run-test.sh= (--help + profile display), and the Makefile (=make test FS_PROFILE=zfs=). Design simplification recorded: no =archsetup-vm-zfs.conf= needed — archsetup auto-detects ZFS from the live root via =is_zfs_root()=, so the archsetup run config is shared; only the archangel base config + base image differ. Open Q1 resolved: archangel supports ZFS root natively (it's the default FS). + +P-C (build the ZFS base image) is BLOCKED on archangel. =create-base-vm.sh FS_PROFILE=zfs= built the disk + booted the archangel ISO fine, but the archangel install died: =dkms install zfs/2.3.3 -k 6.18.36-1-lts= exited 1, ZFS module not built. Root cause is in archangel, not archsetup: it appends the [archzfs] experimental repo then runs =pacstrap -K= with no =pacman -Sy= refresh, so it uses the archzfs sync db baked into the Feb-2026 ISO (zfs-dkms 2.3.3) while linux-lts is pulled fresh (6.18.36). 2.3.3 doesn't build against 6.18. velox runs zfs-dkms 2.4.2 on the same kernel from the same channel, so the fix exists upstream — archangel just needs to refresh the db before pacstrap (+ a fresh ISO). Bug + dependency handoff sent to archangel inbox (=2026-06-25-1236-from-archsetup-bug-zfs-install-fails-stale-baked.org=). Retry P-C once a fixed archangel ISO is available. P-D (bare-metal migration code) is still workable in the meantime against the btrfs VM / velox. + +*** 2026-06-25 Thu @ 16:05:07 -0400 archangel unblocked; ZFS base built; 3 archsetup bugs fixed (local); re-run paused +archangel shipped the fix (archangel =89691a0=: =pacman -Syy= before pacstrap) + rebuilt the ISO. With it, =create-base-vm.sh FS_PROFILE=zfs= built a verified ZFS-root base (=archsetup-base-zfs.qcow2=, clean-install snapshot, kernel 6.18.36). =make test FS_PROFILE=zfs= then surfaced three real archsetup bugs against the current archangel base, each fixed in a LOCAL (unpushed) commit: +- =8ed42b9= informant: the base ships informant; its pacman PreTransaction hook (AbortOnFail) blocked archsetup's first transaction. Fix: =informant read --all= up front (guarded). PROVEN. +- =66caeb5= pacman.conf perms: the base ships =/etc/pacman.conf= 0600 (archangel =strip_repo_stanza= mktemp+mv clobbers perms), breaking user =makepkg=/=yay=. Fix: =chmod 644= after archsetup's edits. PROVEN (run reached 75 min deep). +- =05ec096= reflector: archsetup configured reflector's timer but never ran it, so installs used the base's 425-mirror worldwide list and pacman stalled ~15 min on a slow/unresponsive mirror. Fix: run reflector once before the heavy installs (=timeout=-bounded, non-fatal). NOT yet integration-proven — the next re-run validates it. +Second archangel handoff sent for the pacman.conf-0600 root cause (=2026-06-25-1440-...=); archsetup's chmod is defensive, archangel should ship 0644. Paused before the re-run at Craig's request (he starts =sudo make test FS_PROFILE=zfs= from the laptop). Possible harness-side factor on the stall: slirp IPv6 blackholing (one stalled conn was IPv6) — watch if it recurs despite reflector. + +*** 2026-06-25 Thu @ 21:56:12 -0400 P-C GREEN — ZFS VM test path passes end to end +=make test FS_PROFILE=zfs= PASSED: archsetup exit 0 (full ~68-min ZFS install, reflector held — no stall), pytest =95 passed, 0 failed, 11 skipped=. The ZFS-conditional checks now run the ZFS branch instead of skipping: =test_bootloader_installed= (ZFSBootMenu EFI binary at /efi/EFI/ZBM), =test_mkinitcpio_hooks= (zfs udev hook), =test_console_font_configured= (vconsole.conf), =test_zfs_has_sanoid= all PASS; =test_backup_created_for_mkinitcpio= correctly SKIPs (ZFS+virtio edits nothing). The 3 archsetup issues (gamemode, mu, signal-cli AUR) are the known non-critical residuals, same as on btrfs. Four commits pushed to main: =8ed42b9= informant news-hook, =66caeb5= pacman.conf 0644, =05ec096= reflector-during-install, =eb379c3= ZFS-aware boot/backup tests. P-C (ZFS coverage, design phases A-C) is DONE. Remaining on this task: P-D (migrate run-test-baremetal.sh to inject_root_key + run_testinfra_validation) and P-E (delete the dead validation.sh shell sweep). +*** 2026-06-25 Thu @ 23:26:02 -0400 P-D + P-E done — whole epic closed +P-D (=771b92e=): migrated =run-test-baremetal.sh= to key auth + Testinfra. =inject_root_key= generalized to =root@$VM_IP= (vm-utils) so it serves both runners; the bare-metal runner now injects the key after the genesis rollback, threads =SSH_KEY_OPT= + a new =--port= through every ssh/scp, and validates via =run_testinfra_validation= instead of the shell sweep. Follow-up fix =fb495d4=: =set +e= around the validator (it returns pytest's rc, which under =set -e= aborted before the report) — caught by the smoke test. Validated against the ZFS VM (=--validate-only=, localhost:2222): connectivity, ZFS check, key auth, Testinfra connect+run, report all work; a green bare-metal install still needs real ZFS hardware. + +P-E (=a4a339b=): deleted the dead shell sweep from =validation.sh= now both runners use Testinfra — run_all_validations, validate_all_services, run_full_validation, the ~35 validate_* checks, validation_pass/fail/warn/skip. Kept the live helpers (ssh_cmd, attribute_issue, capture_pre/post_install_state, analyze_log_diff, categorize_errors, generate_issue_report, VALIDATION_* counters + arrays). 1156 → 314 lines. Verified: no dangling refs, both runners parse + smoke-run clean, unit suite green. + +Known follow-ups (not blockers): (1) archangel still owes the pacman.conf-0600 root-cause fix (handoff in its inbox; archsetup's chmod is the defensive layer). (2) The bare-metal runner runs =bash archsetup= with no --config-file — pre-existing, would prompt on real hardware; out of this epic's scope. (3) A true green bare-metal run needs real ZFS hardware (ratio). +** DONE [#B] Implement Testinfra test suite for archsetup +CLOSED: [2026-06-25 Thu] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +*** 2026-06-25 Thu @ Final fresh make test GREEN — Testinfra is the validator +=make test= (fresh build, 150-min cap) PASSED: =TEST PASSED=, =Validation: PASSED=, pytest =96 passed, 10 skipped, 0 failed, 0 errors=, pytest as the authoritative gate. ParallelDownloads now =10= on the fixed build. End-state: the VM test runner validates post-install via the Testinfra/pytest sweep (=scripts/testing/tests/=, 88 tests + conftest fixtures) — full parity with the old shell sweep plus expansion coverage (sshd hardening, =backup_system_file= .bak files, applied pacman/makepkg/NM/fail2ban/reflector config). Three real bugs surfaced + fixed by this work: (1) the 2026-06-24 sshd hardening had silently broken =make test= (root password SSH died mid-run → key auth, f50fc1d); (2) =ParallelDownloads= stuck at Arch's default 5 (sed only matched the commented form → fixed, 2d63802); (3) install monitor cap too tight at 90 min (→ 150, fe84b71). Follow-up filed: migrate =run-test-baremetal.sh= off the shell sweep, then delete the dead =validation.sh= functions (P5). +*** 2026-06-25 Thu @ Decision: port to Testinfra + expand coverage, design doc first +Reviewed against the existing harness: =scripts/testing/lib/validation.sh= already runs ~14 post-install checks (=run_all_validations=), so this isn't net-new capability — it's porting that shell validation to Testinfra/pytest for better expressiveness + reporting, then growing coverage. Craig's call (prioritizes test investment over feature speed): do the port and expand. Starting with a design doc in =docs/design/= per the task's own "design doc not yet written" note. Stale slice to drop/rescope: the X11/startx end-to-end tests (fleet is Wayland/Hyprland now). +*** 2026-06-25 Thu @ 00:54:22 -0400 P1 scaffold landed (advisory, alongside shell sweep) +Built the Testinfra harness skeleton: =scripts/testing/tests/= (conftest.py with the attribution marker + report hook + =target_user= fixture; 3 parity checks — user exists/shell, ufw enabled, dotfiles stowed+readable), =scripts/testing/lib/testinfra.sh= (=run_testinfra_validation=: ephemeral-key injection, ssh-config, pytest-over-SSH; advisory + non-fatal, =RUN_TESTINFRA= toggle), wired into run-test.sh after the shell sweep, and added =python-pytest python-pytest-testinfra= to =make deps=. Verified on host: py_compile clean, =pytest --collect-only= green in a throwaway venv (4 tests, fixtures resolve), =bash -n= + shellcheck clean, unit suite still green. Integration (the pytest sweep actually running against a VM) is unverified here — needs a =make test= run. Decisions locked: inject test key; run both through parity; full expansion (P4) in this task after the P3 cutover. +*** 2026-06-25 Thu @ 01:12:09 -0400 P2 full parity port (88 tests) +Ported the whole shell sweep to pytest: test_users (exists/shell/15 groups parametrized), test_packages (yay+functional, pacman, terminus-font, emacs+config readable, git, 5 dev tools), test_services (required enabled/active, enabled-only, timers, optional skip-if-absent, DoT drop-in, fail2ban/nmcli responds, log-cleanup cron, syncthing lingering, DNS/mDNS/docker skips), test_desktop (Hyprland tools+configs+portal+socket gated on install/compositor, DWM suckless, autologin), test_boot (grub, mkinitcpio hooks branched on zfs_root, console-font-in-initramfs, nvme gated, zfs/sanoid), test_keyring (dir 700/owner/default=login), test_archsetup (log no Error:, ≥12 state markers). conftest fixtures: target_user/home/zfs_root/has_nvme/hyprland_installed/dwm_installed/compositor_running/on_slirp. 88 tests collected, py_compile clean. Correctness fix vs the shell sweep: check =awww= not the stale =swww=. Installed python-pytest-testinfra on velox so the harness gate passes. Next: VM run to diff pytest vs shell sweep for parity. +*** 2026-06-25 Thu @ 01:24:11 -0400 Fixed: sshd hardening had silently broken =make test= +VM run #1 aborted ~6 min in (Error 5), before any validation ran. Root cause (pre-existing, not the Testinfra work): the 2026-06-24 sshd hardening sets =PermitRootLogin prohibit-password= + reloads sshd mid-install, and the harness SSHes as root by *password* throughout — so every op after that step got "Permission denied" and run-test.sh fataled before validations. Fix: =inject_root_key= authorizes a throwaway root key right after first SSH (before archsetup runs) and all helpers (=wait_for_ssh=/=vm_exec=/=copy_to_vm=/=copy_from_vm=/=ssh_cmd=) gained =$SSH_KEY_OPT= so they use key auth, which =prohibit-password= still allows. testinfra.sh reuses that key. Additive (password stays as fallback). bash -n + shellcheck clean. Re-running the VM suite to confirm it now reaches the validation + pytest phases. +*** 2026-06-25 Thu @ 03:33:33 -0400 Parity proven + P4 expansion validated on a live VM +VM run #3 (=make test-keep=, kept VM up): pytest parity = 78 passed / 10 skipped / 0 fail / 0 err — matches & exceeds the shell sweep (53/0/0). Then built P4 expansion against the live VM (iterating in ~30s, no rebuild): test_hardening (sshd prohibit-password, sysctl printk, /etc/issue emptied, vconsole font, /efi fmask), test_config_applied (pacman ParallelDownloads/Color/multilib, makepkg MAKEFLAGS/OPTIONS, NM dns+wifi-privacy drop-ins, fail2ban jail, reflector), test_backups (=.archsetup.bak= present for pacman.conf/makepkg.conf/sudoers/mkinitcpio.conf — end-to-end proof of the backup feature). Full suite vs live VM: 95 passed / 10 skipped / 1 fail. The 1 fail = a REAL archsetup bug the tests caught: =ParallelDownloads= stayed at the Arch default 5 because the sed only matched a commented =#ParallelDownloads=, but current Arch ships it uncommented — fixed the sed to match both (=^#\?ParallelDownloads=). Also fixed a test bug (=grep -qx '[multilib]'= → =grep -Fxq=, the brackets were a regex char class). Remaining: P3 cutover (pytest authoritative) + P5 retire shell sweep, then a final fresh =make test=. +*** 2026-06-25 Thu @ 03:38:28 -0400 P3 cutover: Testinfra is now the authoritative validator +run-test.sh dropped the =run_all_validations= + =validate_all_services= shell-sweep calls; =run_testinfra_validation= now drives =TEST_PASSED= (returns pytest's rc; "couldn't run" = fail, not a silent pass). It surfaces pytest's pass/skip/fail counts through the shared =VALIDATION_*= counters and parses =testinfra-attribution.txt= into the issue arrays so =generate_issue_report= still buckets failures archsetup/base/unknown. Validated the failure path against the still-up VM: pytest rc=1, failure correctly bucketed to [archsetup]. P5 (physically delete the dead shell-sweep functions) is NOT done here — =run-test-baremetal.sh= still calls =run_all_validations=/=validate_all_services=, so deletion must wait until the bare-metal runner is migrated too (filed below). Final step: fresh =make test= to confirm the pass path (ParallelDownloads now 10) with pytest as the gate. +*** 2026-06-25 Thu @ 08:35:26 -0400 Final run hit the harness 90-min install cap (not a regression) +The fresh =make test= timed out at 9/12 steps while building =vagrant= from AUR (=ARCHSETUP timed out after 90 minutes=, exit 124), so validation ran against a half-installed system → 10 pytest failures, all late-step (issue/sysctl/vconsole/mkinitcpio/docker/state-markers). The suite worked correctly — it caught an incomplete install. Verified my ParallelDownloads sed is clean (no pacman corruption) and archsetup logged 0 errors. Root cause: =MAX_POLLS=180= (90 min) is too tight for a full install with heavy AUR builds; bumped to 300 (150 min). Re-running. +Create comprehensive integration tests using Testinfra (Python + pytest) to validate archsetup installations + +Tests should cover: +- Smoke tests: user created, key packages installed, dotfiles present +- Integration tests: services running, configs valid, X11 starts, apps launch +- End-to-end tests: login as user, startx, open terminal, run emacs, verify workflows + +Framework: Testinfra with pytest (SSH-native, built-in modules for files/packages/services/commands) +Location: scripts/testing/tests/ directory +Integration: Run via pytest against test VMs after archsetup completes +Benefits: Expressive Python tests, excellent reporting, can test interactive scenarios + +A design doc (not yet written) should cover: +- Complete example test suite (test_integration.py) +- Tiered testing strategy (smoke/integration/end-to-end) +- How to run tests and integrate with run-test.sh +- Comparison with alternatives (Goss) +** DONE [#C] Proton Mail Bridge font size :chore:quick: +CLOSED: [2026-06-24 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +From the roam inbox (2026-06-22): adjust the Proton Mail Bridge UI font to a comfortable size. The bridge is a Qt app, so it likely keys off Qt scaling or the qt5ct/qt6ct config like the other Qt apps (QT_SCALE_FACTOR or a font setting). + +Done 2026-06-24 (dotfiles =hyprland.conf:47=): the bridge is a Qt6 *QML* app, so it ignores the qt6ct General font — bumped the UI font via =QT_FONT_DPI= on the autostart instead. Changed the exec-once to =env QT_FONT_DPI=108 protonmail-bridge --no-window= (default DPI is 96; 108 = 1.125x). Iterated live with Craig: 120 too big, 108 comfortable. hyprland.conf is a stow symlink so the change is already live; applies at every login. The =~/.config/autostart/Proton Mail Bridge.desktop= entry is dormant under Hyprland (no XDG-autostart), so it was left as-is. +** DONE [#C] Wallpaper login-restore is hardcoded, not waypaper --restore :hyprland:quick:solo: +CLOSED: [2026-06-24 Wed] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +The Hyprland =exec-once= (=hyprland.conf:26=) restores the wallpaper with a hardcoded =awww img ~/pictures/wallpaper/trondheim-norway.jpg=, so any wallpaper set later (via =set-wallpaper=, waypaper, or the dirvish =bg=) reverts on relogin. =set-wallpaper= now persists the choice to =waypaper/config.ini=, so switch the exec-once to =waypaper --restore= (after =awww-daemon= is up) to make set wallpapers survive a relogin. Small, dotfiles-only; verify by setting a different wallpaper, relogging, and confirming it sticks. + +Done 2026-06-24 (dotfiles): swapped the line-26 exec-once from the hardcoded =awww img …/trondheim-norway.jpg= to =awww-daemon & sleep 1 && waypaper --restore=. waypaper has a real =awww= backend (in its =--backend= list), the stowed =waypaper/config.ini= carries =backend = awww= plus a default =wallpaper == line, so =--restore= works on a fresh install too. Mechanism verified live: =waypaper --restore= reapplied the persisted wallpaper via awww, exit 0. Relogin confirmation filed under "Manual testing and validation". Follow-up filed: =set-wallpaper='s =mv= detached the live =waypaper/config.ini= from its stow symlink, so set-wallpaper changes no longer flow back to dotfiles. +** DONE [#B] VM test harness shared one NVRAM file across filesystem profiles :bug:test: +CLOSED: [2026-06-27 Sat] +The harness shared one OVMF NVRAM file (=vm-images/OVMF_VARS.fd=) across the btrfs +and zfs profiles (=init_vm_paths= suffixed the disk image per profile but not the +NVRAM). NVRAM lives outside the qcow2, so a disk-snapshot revert can't restore it, +and a zfs run's ZFSBootMenu boot entries clobbered the btrfs GRUB entry. With no +removable =\EFI\BOOT\BOOTX64.EFI= fallback on the base ESP, the next btrfs run +booted into UEFI with no bootable device ("BdsDxe: No bootable option or device +was found", then PXE/HTTP, then SSH timeout before archsetup ran). Found +2026-06-27 trying to VM-validate the installer refactor. + +Fixed: =OVMF_VARS= now carries the same per-profile suffix as the disk image +(=OVMF_VARS${img_suffix}.fd=) in =vm-utils.sh init_vm_paths=, so btrfs and zfs keep +separate NVRAM. Validated by a full green zfs run 2026-06-27 (ArchSetup exit 0, +Testinfra 96 passed / 0 failed). Remaining hardening tracked below. +** DONE [#B] Guard against live mesa/hyprland/wayland-runtime updates :hyprland: +CLOSED: [2026-06-28 Sun] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-09 +:END: +A live =pacman -Syu= that swaps mesa/hyprland/wayland runtime libs out from under a running Hyprland session can crash the compositor: the next GPU-lib call hits a now-"(deleted)" library and SIGABRTs, taking the Wayland clients down with it. Hit ratio 2026-06-07 (mesa 26.0.6 -> 26.1.2 + hyprland upgraded live; Hyprland SIGABRT took down awww/insync/emacs). Likely the driver behind ratio's high lifetime unsafe-shutdown ratio — a crashed compositor forces a hard reset. + +Shipped as a pacman PreTransaction hook rather than a wrapper, so it fires no matter how the upgrade is launched (pacman, yay, topgrade). =scripts/hypr-live-update-guard= aborts the transaction before any package is swapped when the GPU/compositor runtime set is being upgraded AND Hyprland is running, pointing the user to re-run from a TTY with the session stopped; it stays quiet when Hyprland isn't running (the safe from-a-TTY path). Override via =HYPR_ALLOW_LIVE_UPDATE=1= or by touching the sentinel file named in the abort message. archsetup installs the script to =/usr/local/bin= and the hook to =/etc/pacman.d/hooks/= in the hyprland path. Decision logic unit-tested (=tests/hypr-live-update-guard=, 9 cases). Live firing test filed under Manual testing and validation. Commits: archsetup (this session). +** DONE [#B] Collapsible waybar sides :waybar: +CLOSED: [2026-06-27 Sat] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-09 +:END: +Let either side of the waybar collapse horizontally to a minimal base set, toggled by a click. Each collapsible side carries a small triangle / arrowhead pointing toward the screen edge it collapses into (away from center). Clicking it collapses that side to its base set and flips the arrow to point back toward center; clicking again restores the full side. Same shape-changes-with-state idea as the auto-dim indicator. + +Spec (2026-06-19): [[file:assets/2026-06-19-collapsible-waybar-sides-spec.org]]. Spike that settled the mechanism: [[file:assets/2026-06-18-collapsible-waybar-sides-spike-findings.org]]. + +Decisions locked: right base set = date + worldclock + tray; left base set = menu + workspaces; per-side independent; host-agnostic (base set constant, full set is each host's existing config). Mechanism = config-swap + SIGUSR2 reload via an active-config copy in =$XDG_RUNTIME_DIR= (the CSS/state-file approach was disproven — GTK3 can't reflow-hide native modules). Lives in =~/.dotfiles/hyprland/=. + +Shipped per spec (dotfiles 804bef6): 3 TDD'd scripts (=waybar-active-config=, =waybar-collapse=, =waybar-arrow=; 22 cases), arrow modules wired into the config (left arrow innermost-left, right arrow innermost-right), CSS ×3, =$mod+[= / =$mod+]= keybinds, and =waybar-toggle= relaunch updated to load the active config so a crash preserves collapse state. Verified live: click, keybind, and per-side independence all work; expand round-trips exactly to canonical. +** DONE [#C] Collapse waybar sysmonitor to a single icon + hover :feature:waybar: +CLOSED: [2026-06-27 Sat] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +From the roam inbox (2026-06-22): replace the spread-out sysmonitor readouts (temp, cpu, mem, storage) with one visible icon showing a single chosen metric, the rest in the hover tooltip. Open question: fold it into the battery component instead of a standalone module. Implementation lives in the waybar config under ~/.dotfiles. + +Shipped as a standalone =custom/sysmon= module (Craig's call: host-dependent primary — battery on laptop, disk on desktop — rather than fold into battery, which is laptop-only). Backing script =waybar-sysmon= gathers cpu/temp/mem/disk/battery, shows the host-appropriate metric, rest in tooltip; 13-case TDD suite; removed the 5 native modules + their CSS across all 3 themes. Dotfiles be7469b. +** DONE [#C] Rename idle inhibitor to something more intuitive :chore:waybar: +CLOSED: [2026-06-27 Sat] +:PROPERTIES: +:LAST_REVIEWED: 2026-06-24 +:END: +From the roam inbox (2026-06-24): the "idle inhibitor" name doesn't work as a mnemonic — something like "sleep" (i.e. "keep awake" / "no-sleep") would land better. Decide the new name, then rename across the touchpoints: the =custom/idle= waybar module, the keybind mnemonic, and the backing script names (=hypridle-toggle= / =waybar-idle= from the 2026-06-24 idle-inhibitor work). Needs Craig's call on the name first, so not solo. + +Renamed to "caffeine" (Craig's call, 2026-06-27): =custom/caffeine= module, =waybar-caffeine= + =caffeine-toggle= scripts, tooltip "Caffeine: ON/OFF", CSS + test suites updated. Keybind stays =$mod+I= (=$mod+C= is hyprpicker). Shipped in dotfiles 8b45b51. |
