diff options
Diffstat (limited to 'docs/design')
| -rw-r--r-- | docs/design/2026-06-25-testinfra-validation.org | 238 | ||||
| -rw-r--r-- | docs/design/2026-06-25-zfs-vm-test-coverage.org | 139 | ||||
| -rw-r--r-- | docs/design/2026-06-29-waybar-network-module-spec.org | 1601 | ||||
| -rw-r--r-- | docs/design/2026-06-29-waybar-timer-module-spec.org | 217 |
4 files changed, 2195 insertions, 0 deletions
diff --git a/docs/design/2026-06-25-testinfra-validation.org b/docs/design/2026-06-25-testinfra-validation.org new file mode 100644 index 0000000..5c82aa2 --- /dev/null +++ b/docs/design/2026-06-25-testinfra-validation.org @@ -0,0 +1,238 @@ +#+TITLE: Design: Testinfra Post-Install Validation for archsetup +#+AUTHOR: Craig Jennings +#+DATE: 2026-06-25 +#+STATUS: Accepted (2026-06-25) + +* Problem + +The VM integration harness (=scripts/testing/run-test.sh=) runs archsetup in a +QEMU VM, then verifies the result two ways: + +1. Parses archsetup's own install log for its Error Summary and the + =ARCHSETUP_EXECUTION_COMPLETE= marker (did the script finish, did it log + errors). +2. Runs =run_all_validations= from =scripts/testing/lib/validation.sh= — a + hand-rolled, shell-based post-install assertion sweep of ~26 checks over SSH. + +The shell sweep works, but each check is 6-40 lines of =ssh_cmd= + +=validation_pass/fail= + =attribute_issue= boilerplate, the pass/fail counters +are hand-maintained globals, and the reporting is bespoke. Adding or reading a +check is heavier than it should be, and growing the suite (archsetup configures +far more than the 26 checks cover) compounds that weight. + +This doc proposes porting the post-install validation to Testinfra (Python + +pytest) for more expressive checks and better reporting, then growing coverage. + +* Decision + +Port the post-install validation layer to Testinfra + pytest, reaching parity +with the existing =validation.sh= sweep, then expand coverage. Recorded +rationale: the up-front port cost (parity rewrite + a test-only dependency) is +an accepted trade — the priority is a robust, well-reported, growing validation +suite over feature speed. The framework swap alone buys ergonomics and +reporting, not coverage, so it is paired with real new coverage (below). + +This replaces the shell sweep; it does not touch archsetup's own install-log +parsing (that stays as a separate signal). The full coverage expansion (P4) +lands in this task too, sequenced strictly after the parity cutover so the +parity verification stays clean. + +* Current harness (what exists today) + +** Flow (run-test.sh) +1. Revert VM to base snapshot, boot, wait for SSH. +2. =capture_pre_install_state=. +3. Bundle + copy archsetup + dotfiles into the VM, run archsetup in background, + poll to completion. +4. =capture_post_install_state=. +5. =run_all_validations= (the shell sweep). +6. =analyze_log_diff= + =generate_issue_report= (issue attribution). +7. Explicit pass/fail exit code; cleanup. + +** The shell sweep (validation.sh) +~26 checks under =run_all_validations=: user created / shell / groups, dotfiles, +yay, pacman working, window manager, firewall, DNS, avahi, fail2ban, +NetworkManager, emacs, git config, dev tools, zfs, boot config, autologin, +gnome-keyring, terminus font, mkinitcpio hooks, initramfs consolefont, nvme +module, archsetup log, state markers. + +** Issue attribution +=attribute_issue <msg> <bucket>= sorts each failure into one of three arrays — +=ARCHSETUP_ISSUES=, =BASE_INSTALL_ISSUES=, =UNKNOWN_ISSUES= — and +=generate_issue_report= writes them out (base-install issues route to the +archzfs inbox). This is domain logic Testinfra has no equivalent for; the port +must preserve it. + +** Connection +=ssh_cmd= uses =sshpass -p "$ROOT_PASSWORD" ssh ... -p "$SSH_PORT" root@$VM_IP=, +with =VM_IP=localhost=, =SSH_PORT=2222=, =ROOT_PASSWORD=archsetup=. + +* Design + +** Where Testinfra fits +Replace the =run_all_validations= call (step 5) with a pytest invocation against +the running VM. Steps 1-4 and 6-7 are unchanged; =analyze_log_diff= stays. +Testinfra connects over the same SSH the harness already exposes. + +** Connection model +Testinfra's paramiko/ssh backend targets the live VM via its host spec: + +#+begin_src sh +pytest scripts/testing/tests/ \ + --hosts="ssh://root@localhost:2222" \ + --ssh-config=<generated> \ + --json-report --json-report-file="$TEST_RESULTS_DIR/testinfra.json" +#+end_src + +Password auth: generate a throwaway ssh-config (or reuse sshpass via a +=--ssh-identity= once archsetup drops the key, but at validation time we only +have the root password). Simplest: a tiny generated ssh config + sshpass +wrapper, or switch the test VM to a known test key injected pre-run. Open +question below. + +** Test layout +#+begin_example +scripts/testing/tests/ + conftest.py # host fixture, markers, attribution hook, report glue + test_users.py # user created / shell / groups + test_dotfiles.py # stow symlinks, readable by user + test_packages.py # yay, pacman working, dev tools, key packages + test_services.py # firewall, dns, avahi, fail2ban, networkmanager + test_boot.py # zfs, mkinitcpio hooks, nvme, consolefont, terminus + test_desktop.py # window manager, autologin, gnome-keyring + test_archsetup.py # install log, state markers + test_hardening.py # NEW: sshd drop-in, sysctl, /etc fstab perms, backups +#+end_example + +** Example tests (parity) +#+begin_src python +def test_ufw_enabled(host): + assert host.service("ufw").is_enabled + +def test_user_cjennings_exists(host): + u = host.user("cjennings") + assert u.exists + assert u.shell == "/usr/bin/zsh" + +def test_zshrc_stowed_and_readable(host): + f = host.file("/home/cjennings/.zshrc") + assert f.is_symlink + assert ".dotfiles/" in f.linked_to + assert f.exists # not broken + assert host.run("sudo -u cjennings test -r %s" % f.path).rc == 0 + +def test_mkinitcpio_systemd_hook(host): + # non-ZFS systems delegate fsck from udev to systemd + conf = host.file("/etc/mkinitcpio.conf").content_string + assert "systemd" in conf +#+end_src + +Compare =test_ufw_enabled= (1 line) to the current =validate_firewall= (8 lines +of ssh_cmd + branch + counters). + +** Preserving issue attribution +Map the three buckets to pytest markers and collect them in a =conftest.py= +hook: + +#+begin_src python +@pytest.mark.attribution("archsetup") # or "base_install" / "unknown" +def test_ufw_enabled(host): ... +#+end_src + +A =pytest_runtest_makereport= hook records each failure under its marker's +bucket and writes the same three-way report =generate_issue_report= produces +(base-install failures still route to the archzfs inbox). Default bucket = +archsetup when unmarked. + +** Tiered strategy +Markers =@pytest.mark.smoke= (user, key packages, dotfiles present) and +=@pytest.mark.integration= (services, configs, boot). =pytest -m smoke= for a +fast gate, full run otherwise. Drop the task's original X11/startx end-to-end +slice — the fleet is Wayland/Hyprland and headless GUI e2e is flaky and +expensive; a Wayland-session smoke check can be reconsidered later as its own +task. + +** Reporting +=pytest-json-report= (or junit-xml) → =$TEST_RESULTS_DIR/=, surfaced in the +test report alongside the install-log analysis. pytest's own per-test +pass/fail/skip output replaces the hand-maintained counters. + +* Coverage + +** Parity (port all current checks) +All ~26 =validation.sh= checks, grouped per the layout above. + +** Expansion (new — the coverage win) +archsetup configures much that isn't validated today. Candidates: +- sshd hardening drop-in (=/etc/ssh/sshd_config.d/10-hardening.conf=, + PermitRootLogin prohibit-password). +- =backup_system_file= behavior — assert =.archsetup.bak= exists for files + archsetup edited in place (fstab, mkinitcpio.conf, sudoers, …). +- pacman.conf (ParallelDownloads, Color, multilib) and makepkg.conf (MAKEFLAGS, + OPTIONS) settings actually applied. +- systemd-resolved DNS-over-TLS drop-in; NetworkManager wifi-privacy. +- fail2ban jail.local present; reflector config; sysctl printk; /etc/issue + emptied; vconsole font; fstab /efi fmask/dmask perms. +- sanoid / zfs-replicate units (ZFS hosts). + +* Dependencies + +Add =python-pytest=, =python-pytest-testinfra= (pulls paramiko), and a JSON +reporter to =make deps= (test host only — not installed by archsetup itself). +Note: the existing unit suites run under =python3 -m unittest=; the integration +layer runs under pytest. Two runners, both Python; =make test-unit= unchanged, +=make test= gains the pytest step. + +* Goss comparison (the task asked) + +- *Goss* — YAML-declarative health specs, a single Go binary executed *on the + target*. Fast, no Python. But the spec must be pushed into the VM and run + there, the assertions are less programmable, and it adds a Go binary to the + flow. +- *Testinfra* — Python, runs *on the host* over SSH (nothing installed in the + VM), assertions are full Python with rich built-in modules + (File/Package/Service/User/Command), integrates with pytest's tooling. + +Choose Testinfra: it runs from the host (the VM stays clean), it's far more +programmable for the conditional checks archsetup needs (DESKTOP_ENV branches, +ZFS-vs-not), and it aligns with the repo's existing Python test tooling. + +* Migration plan (phased, TDD where the helper logic is ours) + +- *P1 — Scaffold.* conftest.py (host fixture + connection), the attribution + marker + report hook, and 3 parity checks (firewall, user, dotfiles). Wire a + pytest step into run-test.sh behind a flag so the shell sweep still runs. +- *P2 — Full parity.* Port all ~26 checks; diff a real VM run's results against + the shell sweep to confirm no check was lost. +- *P3 — Cut over.* Make pytest the primary sweep in run-test.sh; keep + =analyze_log_diff= and the install-log signal. +- *P4 — Expand.* Add the new coverage (hardening, backups, applied settings). +- *P5 — Retire.* Remove =run_all_validations= from validation.sh (keep the + capture/analyze helpers that pytest doesn't replace). + +* Acceptance criteria + +- =make test= runs archsetup in a VM, then a pytest sweep over SSH, and a real + run reports parity with (or a superset of) the current shell checks. +- Failures still sort into archsetup / base-install / unknown, with base-install + issues routed to the archzfs inbox as today. +- =make deps= installs the test dependencies; the VM has nothing extra installed. +- A documented =pytest -m smoke= fast path exists. + +* Resolved decisions (2026-06-25) + +1. *Auth at validation time — inject a throwaway test key.* Pre-run, generate + an ephemeral keypair, push the pubkey into the VM's + =/root/.ssh/authorized_keys= over the existing sshpass channel, and point + Testinfra at the private key via a generated ssh-config. No password in the + pytest invocation; paramiko key auth just works; the keypair is discarded + after the run. (Chosen over wrapping sshpass around Testinfra, which is + awkward since Testinfra spawns its own ssh connections.) +2. *Cut over — run both through parity, then switch.* Keep the shell sweep + running alongside pytest through P2 so a real VM run can diff pytest's + results against the shell sweep and prove no check was dropped. pytest + becomes primary at P3; =run_all_validations= is deleted at P5 after the + expanded suite proves out. +3. *Expansion scope — full, in this task, after cutover.* All of P4 lands here, + sequenced strictly after the P3 parity cutover so the parity diff is clean + before new checks are added. diff --git a/docs/design/2026-06-25-zfs-vm-test-coverage.org b/docs/design/2026-06-25-zfs-vm-test-coverage.org new file mode 100644 index 0000000..d9625e0 --- /dev/null +++ b/docs/design/2026-06-25-zfs-vm-test-coverage.org @@ -0,0 +1,139 @@ +#+TITLE: Design: ZFS VM Test Coverage + Bare-Metal Runner Migration +#+AUTHOR: Craig Jennings +#+DATE: 2026-06-25 +#+STATUS: Draft — for review + +* Problem + +Two gaps, one root: + +1. *The ZFS install path is untested in automation.* The VM harness + (=make test=) uses a single non-ZFS base image, so every ZFS-conditional + check skips (mkinitcpio udev hook on ZFS, sanoid, zfs-scrub timer, the whole + ZFS branch of archsetup). ZFS is exercised *only* by =run-test-baremetal.sh= + against real hardware. + +2. *=run-test-baremetal.sh= is latently broken by the sshd hardening.* It SSHes + to the target as root *by password* throughout the run, exactly the pattern + archsetup's =PermitRootLogin prohibit-password= (shipped 2026-06-24) kills + mid-install. The VM runner already hit and fixed this (=inject_root_key= + + key auth, commit f50fc1d); the bare-metal runner never got that fix, so it + almost certainly aborts mid-install now, the same way the VM runner did. + +The fix for both is the same shape: a ZFS base VM gives a safe, repeatable, +snapshot-rollback ZFS target (no sacrificial hardware), which both fills the +coverage gap *and* provides a target to migrate + validate the bare-metal +runner against. This also unblocks P5 (deleting the dead shell-sweep functions +from validation.sh), which is gated on the bare-metal runner leaving the shell +sweep. + +* Decision + +Build a ZFS base VM via archangel, add a filesystem-profile selector to the VM +harness so =make test= can target zfs or non-zfs, then migrate +=run-test-baremetal.sh= to key auth + the Testinfra sweep and validate it +against the ZFS VM. Finish by deleting the now-dead shell-sweep functions (P5). + +Explicitly rejected: loosening =PermitRootLogin= (or adding a skip-hardening +test flag). That trades a real security feature for harness convenience and +would mean never validating the hardened config. Key auth is the correct fix, +already proven in the VM runner. + +* Current state (grounded) + +- =create-base-vm.sh= boots an =archangel-*.iso=, copies =archsetup-test.conf= + into the live env, runs =archangel --config-file /root/archsetup-test.conf= + (the base-OS install — partitioning/filesystem live here), powers off, and + snapshots =clean-install= onto =vm-images/archsetup-base.qcow2=. +- =run-test.sh= hardcodes that one image + snapshot, and copies + =scripts/testing/archsetup-vm.conf= (DESKTOP_ENV=hyprland, non-ZFS) into the + VM as the archsetup config. +- =run-test-baremetal.sh= takes =--host= / =--password=, SSHes as root by + password, rolls back ZFS =@genesis= snapshots, transfers + runs archsetup, + then calls =run_all_validations= / =validate_all_services= (overriding + =VM_IP= to the target). It is the only remaining caller of the shell sweep. +- Key auth machinery already exists and is reusable: =inject_root_key= and + =SSH_KEY_OPT= in =vm-utils.sh=, and =run_testinfra_validation= in + =testinfra.sh= (drives connection from a generated ssh-config keyed on + =VM_IP= / =SSH_PORT=). + +* Design + +** A. ZFS archangel base +Add a ZFS archangel config (a =archsetup-test-zfs.conf= or equivalent) that +installs a ZFS root. Confirm archangel supports a ZFS-root config (it's a +separate project — verify its config options first). Unencrypted ZFS for the +test VM (skip the passphrase prompt; encryption isn't what we're validating). + +** B. Per-profile base images + selector +- =create-base-vm.sh= takes a profile (e.g. =FS_PROFILE=zfs|ext4=, default + current/non-ZFS), picks the matching archangel config, and writes a + profile-named image: =vm-images/archsetup-base.qcow2= (default) vs + =vm-images/archsetup-base-zfs.qcow2=. Same =clean-install= snapshot name. +- =run-test.sh= + Makefile take the same =FS_PROFILE= and select the image (via + =init_vm_paths=). The archsetup run config (=archsetup-vm.conf=) is *shared* — + archsetup auto-detects ZFS from the live root, so no per-profile run config is + needed. =make test FS_PROFILE=zfs=. + +** C. Bare-metal runner migration +Mirror the VM runner's fix in =run-test-baremetal.sh=: +- After the first successful SSH to =TARGET_HOST=, call =inject_root_key= (it + authorizes a key over the password session; set =VM_IP=TARGET_HOST=, + =SSH_PORT=22= so the helpers + ssh-config target the real host). +- Replace =run_all_validations= / =validate_all_services= with + =run_testinfra_validation= (now authoritative). +- Everything downstream already routes through =$SSH_KEY_OPT= (the vm-utils + helpers) and the ssh-config, so it survives the hardening. + +** D. Validate +- =make test FS_PROFILE=zfs= → the ZFS-conditional pytest checks now *run* + (not skip): mkinitcpio uses the udev hook, sanoid installed, zfs-scrub timer, + zfs root. Fix any real ZFS-path findings archsetup has. +- Point =run-test-baremetal.sh= at the ZFS VM (or real hardware) → confirm the + key-auth migration carries it through the hardening to a green pytest sweep. + +** E. Delete the shell sweep (P5) +Once both runners use =run_testinfra_validation=, delete the dead functions from +=validation.sh= (run_all_validations, validate_all_services, the ~26 validate_* +checks, validate_service*, run_full_validation, validation_pass/fail/warn/skip). +Keep the live helpers: ssh_cmd, attribute_issue, capture_pre/post_install_state, +analyze_log_diff, categorize_errors, generate_issue_report, VALIDATION_*. + +* Phases +- *P-A* archangel ZFS config (verify archangel ZFS support first). +- *P-B* create-base-vm.sh + run-test.sh + Makefile profile selector; build the + ZFS base image + snapshot. +- *P-C* =make test FS_PROFILE=zfs= green (ZFS-conditional tests run; fix + findings). VM-validatable here. +- *P-D* migrate run-test-baremetal.sh to key auth + Testinfra; validate against + the ZFS VM. +- *P-E* delete the dead shell-sweep functions (the standing P5 follow-up). + +* Open questions +1. *Does archangel support a ZFS-root config out of the box?* RESOLVED (yes). + ZFS is archangel's *default* filesystem (=FILESYSTEM=zfs=, validated by + =installer/lib/config.sh:validate_filesystem=), with =NO_ENCRYPT=yes= for an + unattended unencrypted install and a ready =installer/velox-zfs.conf.example= + to model. No archangel work needed. +2. *Two images vs one image + two snapshots?* RESOLVED — two images. ZFS vs + btrfs are different on-disk layouts; cleaner than juggling snapshots on one + disk. =btrfs= keeps the legacy unsuffixed =archsetup-base.qcow2=; =zfs= gets + =archsetup-base-zfs.qcow2=. +3. *Profile on run-test.sh vs a separate run-test-zfs.sh?* RESOLVED — + =FS_PROFILE= env param on the existing runner + Makefile, no duplicate + harness. +4. *Disk size / RAM for the ZFS VM* — start at the 4G RAM / 50G disk defaults; + bump =VM_RAM= only if the ZFS install OOMs (decide at P-C build time). +5. *Should the bare-metal runner stay at all once a ZFS VM exists*, or does the + ZFS VM profile make it redundant for everything except real-hardware smoke? + Defer until after P-D. + +* Design corrections (found during P-A/P-B grounding) +- The "non-ZFS" base is *btrfs*, not ext4 — =archsetup-test.conf= sets + =FILESYSTEM=btrfs=. The profile axis is zfs vs btrfs throughout. +- *No =archsetup-vm-zfs.conf= is needed.* archsetup reads no filesystem key; it + auto-detects ZFS from the live root via =is_zfs_root()= (=findmnt -n -o FSTYPE + /=, archsetup:688). The ZFS branch (sanoid, zfs-scrub timer, mkinitcpio udev + hook, docker zfs storage driver) fires whenever the running root is ZFS. So + only the *archangel* base config and the base *image* differ per profile; the + archsetup run config (=archsetup-vm.conf=) is shared. diff --git a/docs/design/2026-06-29-waybar-network-module-spec.org b/docs/design/2026-06-29-waybar-network-module-spec.org new file mode 100644 index 0000000..db9657d --- /dev/null +++ b/docs/design/2026-06-29-waybar-network-module-spec.org @@ -0,0 +1,1601 @@ +#+TITLE: Waybar Network Module — Design Spec +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-06-29 + +* Status + +*Phase 1 SHIPPED* (2026-06-29, dotfiles =5254bd8=..=c095a22=, 10 commits): engine +(=net status/probe/diagnose/repair/doctor/portal=) + =waybar-net= indicator + +split-cadence cache + redacted event log + Makefile recovery targets + airplane +absorption. 160 net tests; pure modules ≥90% branch. One as-built deviation from +this spec: airplane absorption is *display-only* (Craig's call, option 1) — net +shows the airplane state but the =airplane-mode= low-power toggle is KEPT (it does +radios + CPU + brightness + services, not a network concern); only =waybar-airplane= ++ =custom/airplane= + =waybar-netspeed= were deleted. See decision 12. Phases 2-5 +remain. Live waybar eyeball is under todo.org "Manual testing and validation". + +Ready for Phase 1; Ready-with-caveats overall. Three Codex review rounds + Craig's +cj comments are all incorporated — every finding has a disposition and the findings +cookie reads complete ([31/31]), with no open decisions (enterprise scope settled: +open + WPA-PSK in v1, 802.1X add/edit vNext, activate-only). The cj comments +reshaped several decisions (no separate credential store — use NM's own; =net +doctor= + Makefile console-recovery in v1; rfkill + full-stack-bounce repair; +airplane module absorbed; VPN a later Phase 5). The only remaining caveats are +Phase-2/3 build unknowns named under Open items (gtk4-layer-shell anchoring, the +=captive= =--json= refactor) — not Phase-1 blockers. Phase 1 (indicator + console +recovery) is ready to build. + +* Goal + +One waybar network component that does the whole job: shows connection state +(including the missing "associated but no internet / captive portal" state), +manages connections from a dropdown (nmcli-backed; secrets stay in +NetworkManager's own store, no separate credential file), and runs the network +diagnostics and remediation off the same place +(captive-portal detection + forcing, bounce/reset, gateway/DNS checks, speed +test). + +It unifies three todo tasks that are really one feature: +- =[#C]= "archsetup Waybar Wi-Fi module should show no-internet state" — the + indicator state plus the 2026-06-22 roam expansion (bounce, diagnostics, speed + test off the component). +- =[#B]= "Network-manager dropdown, nmcli-backed" — the management dropdown. (The + todo task's original "GPG-stored secrets" framing is superseded: secrets stay in + NM's own store, decision 5.) +- The network diagnostics already shipped in =captive= (the hotel/captive-portal + tool, formerly =login-page=) become this module's diagnostics engine rather + than a standalone CLI. + +* Scope + +** In +- *Indicator* — wifi/ethernet icon + signal + SSID, plus an internet sub-state: + online / captive / no-internet / connecting / disconnected / airplane. +- *Absorbs the airplane module* — the airplane state + toggle move into + =custom/net= (airplane is a network concern). Once this ships, the standalone + =custom/airplane= module, the =waybar-airplane= + =airplane-mode= scripts, their + =tests/=, and the css are deleted (listed under Files touched). The + desktop-settings panel (sibling =[#B]=) no longer needs an airplane row. +- *Interface-correct* — targets the wifi (or chosen) device, not the + default-route interface, so an active USB tether or wired link can't mask + wifi state. (Same lesson =captive= fixed; the current =custom/netspeed= keys + off the default route and has the bug.) +- *Connection management (panel)* — list saved connections most-recently-used + first, live signal for in-range wifi, click to switch; add / edit / remove for + open + WPA-PSK; activate any existing saved profile (including enterprise ones + NM already stores); ethernet↔wifi and wifi↔wifi switching even when a link + appears mid-session. +- *Diagnostics (panel)* — read-only Diagnose (captive probe 204-vs-portal with + the extracted portal URL, gateway ping, DNS config) separated from mutating + Repair. Repair has tiers, lightest first: rfkill-unblock, per-connection reset + (fresh MAC), full-stack bounce (=nmcli networking off/on=, then restart + NetworkManager if that fails), and the temporary 1.1.1.1 override test. Each + Repair action confirms and verifies cleanup. +- *Speed test (panel)* — down/up/ping with a progress indicator and last-result + shown, via the already-installed =speedtest-go --json=. +- *Connection secrets* — none of our own. Settings and passwords live where NM + already keeps them: =/etc/NetworkManager/system-connections/*.nmconnection= + (root-only =0600=, the PSK/EAP secret stored inline). We read/write them through + nmcli, which handles the privilege. No separate file, no GPG, no gpg-agent — one + fewer dependency, and NM's store is already the secure-at-rest source of truth. +- *Persistence* — connectivity probe result cached in the runtime dir so the + bar reads it cheaply between probes. +- *Observability* — a redacted JSONL event log so a post-failure session can + diagnose without re-running destructive actions. + +** Out (v1, note for later) +- No replacement of NetworkManager's connection engine. NM stays the thing that + connects; we drive it via nmcli. +- No add/edit *form* for WPA-Enterprise / 802.1X in v1. The reason is effort vs + payoff: 802.1X has many interdependent fields (CA cert, client cert, identity, + anonymous identity, phase-2 auth) where a wrong entry silently fails auth, so a + trustworthy form is a lot of UI for connections Craig rarely adds (open + + WPA-PSK covers home, hotels, and phone hotspots). v1 still *activates* existing + saved enterprise profiles and points editing at =nmtui=/=nmcli=. Settled + (Craig, 2026-06-29): enterprise add/edit is vNext — 24 saved profiles on velox, + 0 enterprise, so the form would be unused UI; if one ever appears nmtui adds it + once and the module activates it thereafter. +- No per-connection captive-portal *auto-login* in v1. (That would mean storing a + portal's login form answers — room number, surname, a checkbox — and replaying + them automatically when a known portal is detected, so the page never appears. + Out for v1 because every portal's form differs and it means storing per-venue + answers; v1 just opens the portal for you.) +- No graphing/history of speed-test results beyond the last run. +- No static-IP / proxy / metered / MAC-randomization editing in v1 (activate + existing, edit elsewhere). +- No VPN / WireGuard management in v1, but it's a planned later phase (Phase 5), + not a permanent exclusion — it folds the existing archsetup wireguard tooling + into the same panel/CLI. +- The desktop-settings dropdown (sibling =[#B]=) is a separate module, but it + shares the GTK4 layer-shell panel shell built here. + +* Architecture + +Three layers. Keep the bar cheap, the panel rich, the logic in one tested place. + +1. *Engine* — a =net= Python package (src-layout, unittest), exposing a CLI. Wraps + every nmcli op and owns the diagnostics. Emits JSON. This is the testable + core (fake =nmcli= / =curl= / =speedtest-go= on PATH, like the existing + =waybar-netspeed= and =waybar-sysmon= test harnesses). Precedent: pocketbook is + Python in the dotfiles repo; =wtimer= is Python for the same testability + reason. +2. *Indicator* — a thin =waybar-net= script that calls =net status --json= and + renders icon + signal + state + tooltip. Replaces =custom/netspeed= + (throughput folds into the tooltip). +3. *Panel* — a GTK4 + gtk4-layer-shell app (mirrors pocketbook's structure) + that imports the engine. Hosts connection management, diagnostics, and the + speed test. + +How the existing pieces map in: +- =captive= (bash, shipped) — the engine shells out to it for the heavy, + interactive portal-force flow (sudo reset, DNS override, browser launch). Its + cheap portal-detection logic is mirrored natively in the engine for the fast + status path so the bar never blocks on a subprocess. =captive= stays a usable + standalone CLI. The refactor (below) extracts its probe + reset into functions + the engine can call non-interactively. +- =waybar-netspeed= (sh, shipped) — retired; its throughput sampling moves into + the engine's status output and renders in the indicator tooltip only. +- =nmcli= — the connection backend for every op. + +Language note: the engine is Python; the indicator is a thin Python or sh +wrapper over =net status --json=. The bar path must stay fast (see Performance +budgets), so the indicator does no network I/O itself — it reads link state and +the cached connectivity result. + +* Repository + dependencies + +- *Code lives in the dotfiles repo* (=~/.dotfiles=), not archsetup. The =net= + package sits in-tree like pocketbook (src-layout, unittest, Makefile target); + =waybar-net= and the =net= CLI entry live in the hyprland tier + (=hyprland/.local/bin/=). Tests under =tests/net/= and =tests/waybar-net/=. + archsetup owns only the *dependency install*, not the code. +- *archsetup installs the deps* in its Hyprland step: =gtk4-layer-shell=, + =python-gobject=, plus =nmcli=/=curl=/=resolvectl=/=rfkill= (already present via + NetworkManager/curl/systemd/util-linux). Speed test uses =speedtest-go= (AUR + =speedtest-go-bin=, already installed on velox); archsetup adds it to the AUR + list. librespeed-cli is the documented fallback if a self-hosted LibreSpeed + server is ever wanted. No =gpg= dependency (secrets live in NM's own store). +- *Daily-drivers*: a stowed-script + AUR-dep feature, so ratio needs the same + =git pull= + stow + the archsetup-added deps. Note the manual dep step in the + rollout. + +** Makefile targets (console recovery is a first-class path) +=net doctor= and the diagnostics are reachable from a bare TTY when waybar and +the GUI are down — that's the case where you most need them. The dotfiles +Makefile carries targets that wrap the =net= CLI so "get back online" is one make +command from the console: +- =make online= — =net doctor --fix= (diagnose, then apply the lightest repair: + rfkill-unblock → reset → bounce → open portal). The headline recovery target. +- =make net-doctor= — =net doctor= (read-only diagnose + recommendation). +- =make net-status= / =make net-diagnose= / =make net-portal= / =make net-reset= + / =make net-bounce= — the individual ops. +- =make test= — already runs =tests/*=; the =net= package's unittest suites are + collected the same way. +These intentionally need only nmcli/curl/rfkill (no GUI, no waybar, no Python +GTK), so they work from a TTY on a broken graphical session. + +* Connectivity model — split cadence + +The indicator polls every ~2s, but a real internet/captive probe every 2s wastes +battery and can re-trigger a captive portal. So split it: + +- *Fast path (every poll, cheap, no network)* — interface, type, SSID, signal, + IPv4 presence, throughput sample. From nmcli / sysfs only. No network I/O. +- *Slow path (cached, TTL ~45s)* — the actual internet/captive probe (the 204 + check + meta-refresh portal extraction). Result cached at + =$XDG_RUNTIME_DIR/waybar/net-connectivity.json= with a timestamp. + +The indicator reads the cache each poll. When the cache is older than the TTL, +=net status= kicks =net probe= in the background (spawn + detach, never awaited) +and renders the last cached sub-state meanwhile. A user-triggered +diagnose/reconnect refreshes the cache immediately. This keeps the bar +responsive and the portal un-poked. + +** Concurrency, atomicity, staleness +- *Single-flight* — =net probe= takes a lock file at + =$XDG_RUNTIME_DIR/waybar/net-probe.lock= (flock, non-blocking). A second probe + while one runs is a no-op, so a flapping 2s poll can't pile up overlapping + probes. +- *Atomic writes* — the cache is written to a temp file + =os.replace= (atomic + rename), so a reader never sees a half-written cache. Same pattern as =wtimer=. +- *Max probe runtime* — the probe has a hard timeout (≤ 6s total: curl + =--max-time 5= + slack). On timeout it writes an =unknown= result, never hangs. +- *Stale classes* the indicator distinguishes: fresh (< TTL), stale (TTL..3×TTL, + shown with a subdued/aging hint), expired (> 3×TTL → treat as unknown), + unknown (no cache / probe failed). The bar never shows a confident "online" + past the expired threshold. +- *Invalidation* — the cache records the iface + SSID + active-connection UUID it + was taken under; a change in any of them invalidates it immediately (a + reconnect must not show the old network's verdict). +- *Crash cleanup* — a stale lock older than the max runtime is ignored/reclaimed. + +* Performance budgets (hot path) + +The bar exec path (=waybar-net= → =net status=) must stay responsive: +- *Budget*: =net status= returns in < 100ms typical, < 250ms worst case. +- *No sleeping in the bar path.* Throughput is sampled from two reads of + =/sys/class/net/<iface>/statistics/{rx,tx}_bytes= across the *waybar poll + interval itself* (delta since the last cached sample + timestamp), not via an + in-process =sleep= like the old =waybar-netspeed=. The cache holds the prior + counters. +- *Subprocess cap*: at most one =nmcli= invocation on the hot path (a single + =nmcli -t -f ...= multi-field query), plus sysfs reads. Never a per-field + nmcli call. +- *Every subprocess has a timeout* (=nmcli --wait 2=, =subprocess timeout=). On + timeout or error the indicator emits a degraded JSON state (class + =net-degraded=, a neutral glyph) rather than blocking or crashing waybar. +- *Benchmark test*: a fake slow =nmcli= asserts =net status= still returns within + budget by falling back to the degraded state. + +* Engine — =net= CLI surface + +All subcommands take =--json= where a machine reads them. Pure formatting/state +functions under the CLI; IO (nmcli, curl, file) at the edges. Every subcommand +exits non-zero with a JSON error envelope (see JSON schemas) on failure. + +- =net status [--json] [--iface IF]= — fast link state + cached connectivity + sub-state + throughput. The indicator's source. Never does network I/O. +- =net probe [--iface IF]= — run the connectivity/captive probe now, update the + cache (single-flight, atomic), print online | captive (+ portal URL) | + no-internet | unknown. Mirrors =captive='s cheap detection natively. +- =net list [--json]= — saved connections, MRU order, active flag, plus in-range + wifi with signal. +- =net up <uuid>= / =net down [--iface IF]= — switch / disconnect. Operates on + UUID, not name (see nmcli contract). +- =net add= / =net edit <uuid>= / =net remove <uuid>= — manage connections + (open + WPA-PSK) through nmcli; the secret lands in NM's own + =.nmconnection=. Enterprise profiles are activate-only. +- =net rescan [--iface IF]= — wifi rescan. +- =net diagnose [--json]= — read-only report: gateway ping, DNS config, captive + probe. The structured contract below. Doubles as the post-failure snapshot. +- =net repair <action> [--json]= — mutating remediation, lightest first: + =rfkill= (unblock + radio on), =reset= (fresh MAC), =bounce= (full-stack: + =nmcli networking off/on=, escalating to =systemctl restart NetworkManager=), + =dns-test= (temporary 1.1.1.1 override, auto-reverted). Each confirms via the + caller and verifies cleanup. +- =net doctor [--json] [--fix]= — one-shot "get me online" mode for the console: + runs the full diagnose, then applies the lightest repair that fits (unblock + rfkill, reset, bounce, open portal) — read-only without =--fix=, acting with + it. The TTY recovery path when waybar/the GUI is down (see the Makefile + targets). +- =net portal= — run =captive='s portal-force flow (reset if needed, extract + + open the portal page). +- =net speedtest [--json]= — =speedtest-go --json= run; down/up/ping. + +* nmcli contract + +The command wrapper is the reliability boundary; SSIDs and connection names +contain spaces, colons, duplicates, hidden names, and non-ASCII. Rules: + +- *Terse, field-selected output*: =nmcli -t -f <fields> --escape yes ...= and + =nmcli -g <fields> ...= (get-values) for single-value reads. Parse with the + documented escaping (=\:= and =\\=); never naive =cut -d:=. +- *UUID is the handle.* Every saved-profile op (=up=, =down=, =modify=, =delete=) + uses the connection UUID, never the display name — names duplicate and contain + separators. =net list= surfaces UUIDs; the panel maps row → UUID. +- *Wait budgets*: activation/deactivation use =nmcli --wait <n>= with an explicit + budget (hot-path reads =--wait 2=; activation =--wait 30=). No unbounded waits. +- *Connectivity*: NM's own =nmcli networking connectivity= can return + =none/portal/limited/full/unknown=. Use it as a *cheap hint* on the fast path + when present, but the authoritative captive verdict is still our own probe + (NM's portal detection is coarser and config-dependent). +- *Parser tests* (fake nmcli fixtures): escaped colons and backslashes in SSIDs, + embedded newlines, duplicate connection names, hidden SSID (empty name), + non-ASCII SSID, the wired-appears-mid-session case, and the multi-active case + (wifi + tether both up). + +* JSON schemas + +Versioned (="v": 1=) envelopes so tests lock the contract. Sketches (fields +nullable unless noted): + +- =status=: ={v, iface, type: wifi|ethernet|none, ssid, signal, ipv4, + gateway, throughput: {rx_bps, tx_bps}, connectivity: online|captive|no-internet|unknown, + connectivity_age_s, connectivity_class: fresh|stale|expired|unknown, state: + online|captive|no-internet|connecting|disconnected|airplane|wired|degraded}=. +- =probe=: ={v, result: online|captive|no-internet|unknown, portal_url, http_code, + redirect_host, elapsed_ms, ts}=. +- =list=: ={v, connections: [{uuid, name, type, active, last_used, signal, + in_range, security}]}=. +- =diagnose=: ={v, steps: [<diagnostic step, see contract>], overall: + ok|warn|fail}=. +- =speedtest=: ={v, down_mbps, up_mbps, ping_ms, server, elapsed_ms, ts}=. +- error envelope (any command): ={v, error: {code, message, detail, partial: + bool}}= with a non-zero exit. + +* Diagnostics contract + +=net diagnose --json= returns an ordered list of steps. Each step is the unit the +panel renders and the log records: + +- =id= — stable identifier (e.g. =link=, =dhcp=, =gateway-ping=, =dns-config=, + =dns-resolve=, =http-probe=, =portal=). +- =status= — =pending | running | pass | warn | fail | skipped=. +- =title= — short human label. +- =evidence= — redacted detail (the value seen), per the redaction rules. +- =elapsed_ms=. +- =safety= — =read-only= or =mutating= (diagnose steps are all read-only). +- =next_action= — what the user/agent should do on warn/fail (e.g. "open portal", + "reset connection", "switch network"). + +Repair actions (=net repair=) carry the same shape but =safety: mutating=, plus a +=cleanup_verified: bool= field (e.g. the DNS override was reverted) and a +terminal =cleanup-unverified= status when revert can't be confirmed. + +** Diagnose vs Repair (read-only vs mutating) +The panel separates them visually and behaviorally: +- *Diagnose* — probe, gateway ping, DNS config read, captive check. No state + change, no sudo, runnable freely. +- *Repair* — reset (fresh MAC, deletes+recreates the NM profile), DNS override + test (mutates resolver, auto-reverts), portal force. Each needs an explicit + confirm, shows that it's privacy/state-changing, and verifies cleanup. A + Repair whose cleanup can't be verified ends in a visible =cleanup-unverified= + state, never a silent success. + +* Failure states, messages, recovery + +Each row below gives the *exact, final* user-facing string (not a template) with +=<placeholders>= for redacted evidence, plus the evidence field included and the +next action. The string is canonical: every surface renders the same text, so +there's one source of truth. + +Per-surface rendering of the canonical string: +- *Indicator* — the matching glyph + CSS class; the string is the tooltip + (untruncated). +- *Notification* (=notify=) — title = the failure label, body = the string. +- *CLI* — the string on stderr; =--json= puts it in =error.message= with the + evidence in =error.detail= and a stable =error.code=. +- *Panel* — the string as the section banner, with the diagnostic step's evidence + shown beneath. +Evidence is always redacted per the redaction rules (SSID/host shown; PSK/EAP/ +portal tokens never). + +- *associated, no DHCP* — "Connected to <SSID>, no IP (DHCP failed)" → + evidence: SSID, iface → reset / reconnect. +- *no-internet* — "On <SSID>, no internet (gateway reachable, no route out)" → + diagnose / switch network. +- *captive* — "Captive portal at <host> — login required" → Open portal. +- *DNS hijack* — "DNS is being redirected (portal)" → Open portal. +- *DNS broken* — "DNS not resolving (hotel DNS down); 1.1.1.1 works" → use + override / report. +- *HTTP intercepted* — "Traffic is being intercepted before it leaves" → Open + portal. +- *sudo declined* — "Reset needs admin; it was declined — nothing changed" → + retry with auth. +- *command timed out* — "<op> timed out; the system was left unchanged" → retry. +- *partial mutation* — "<op> partially applied: <what>; rolled back to <state>" + → review. +- *missing speedtest-go* — "speedtest-go not installed" → install hint. +- *no wifi hardware* (desktop) — wifi rows hidden; ethernet-only view. +- *wifi rfkill-blocked* — "WiFi is blocked (rfkill)" → unblock. The indicator + detects a soft-blocked radio (=rfkill list= shows the radio off though hardware + is present) and shows this distinct from disconnected. =net repair rfkill= (and + =net doctor --fix= as its first step) runs =rfkill unblock wifi= + =nmcli radio + wifi on= and reconnects. This is the framework-laptop case: an out-of-power + shutdown sometimes leaves wifi soft-blocked at next boot, and yes — the module + recovers it (the rfkill state is the indicator; the rfkill repair / doctor is + the one-step fix). A *hard* block (physical switch) is reported as + not-recoverable-in-software with that message. +- *wifi rfkill hard-blocked* — "WiFi is blocked by the hardware switch" → + evidence: rfkill hard state → flip the physical switch. +- *wrong password / missing secret* — "Saved password for <SSID> was rejected" → + evidence: SSID, NM auth-failure reason → re-enter the password. +- *enterprise auth/cert failure* — "Enterprise login failed for <SSID> (802.1X)" + → evidence: SSID, EAP failure reason → edit the profile in nmtui/nmcli. +- *upstream / AP / provider* — "On <SSID>, link is fine but the network has no + uplink" → evidence: gateway reachable, no route out, not a portal → switch + network or contact the venue. +- *VPN-routed* — "Connected; internet is routed through a VPN (<dev>)" → + evidence: default route on a tun/wg device or non-NM DNS owner → check the VPN, + not WiFi. +- *HTTP interception, no parseable portal URL* — "A portal is intercepting + traffic but didn't give a login link" → evidence: HTTP code, redirect host → + opens neverssl + the gateway page to log in manually. +- *DNS override cleanup unverified* — "Couldn't confirm DNS was restored after the + test" → evidence: iface, attempted revert → revert DNS manually + (=resolvectl revert <iface>=). + +Each message names whether the system was left unchanged, partially changed (with +what), or fully changed, so the user knows the residue. + +* Doctor: escalation, classification, terminal states + +=net doctor= diagnoses, classifies the failure, then (with =--fix=) applies the +*lightest* repair that fits and re-checks — it never loops destructive repairs +against a failure they can't fix. Each failure resolves to one of four outcomes, +and the doctor stops at any terminal one: + +- =fixable= — a local repair should help. Escalate lightest-first: rfkill-unblock + → reset (fresh MAC) → bounce (full stack) → portal, re-probing after each, and + stop as soon as the probe returns online. +- =needs-user-action= (terminal) — no reset/bounce will help; doctor stops and + names the exact next step. Covers: wrong WPA password / missing NM secret + (enter the password), locked keyring or polkit denial (retry with auth), + enterprise 802.1X cert/identity failure (edit the profile in =nmtui=/=nmcli=), + captive portal login-required (open the portal + accept terms). Doctor must not + delete/recreate the profile against these — that loses the saved password and + makes things worse. +- =upstream-not-local= (terminal) — the local link is up but the problem is past + it: AP has no uplink, gateway down/dropping traffic, DHCP server broken, ISP + outage, portal backend failing. =diagnose= proves it (link up + IP + gateway + reachable, but no route out and not a captive redirect), and =doctor --fix= + stops after local repairs are exhausted with "local repairs tried; likely + upstream/AP/provider" + the evidence. Next action: switch networks or contact + the venue. +- =deferred/vpn= (terminal for v1) — an active VPN / policy route / non-NM + resolver owns the default route or DNS, so "no internet" may be the VPN's fault, + not WiFi's. v1 *detects* this (default route on a =tun/wg= device, or DNS owned + by something other than the NM link) and classifies it separately — "link is + fine; internet is VPN-routed" — rather than misclassifying it as a WiFi failure. + v1 does not repair it (VPN management is Phase 5); it names the VPN as the likely + owner and stops. + +** DNS handling in doctor (explicit per class) +- *Captive DNS hijack* — open the portal (the hijack clears on login). No DNS + mutation. +- *Broken resolver, 1.1.1.1 works* — doctor offers an explicit *temporary* 1.1.1.1 + override as a repair with cleanup verification (auto-revert, =cleanup_verified=); + without =--fix= it only recommends the command. It does not leave a permanent + resolver change. +- *Port-53 / egress blocked* (even 1.1.1.1 fails) — terminal =upstream-not-local=; + doctor stops, since it's not locally fixable. + +* Failure-mode coverage + +For each common field failure: does =net diagnose= detect it, can =net doctor +--fix= repair it, and what terminal user action remains when it can't. (The +=needs-user-action= / =upstream-not-local= / =deferred/vpn= outcomes are defined +above.) + +| Failure mode | diagnose detects | doctor --fix | terminal user action | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| rfkill soft block | yes | yes (unblock) | none | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| rfkill hard block | yes | no | flip the physical switch | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| no wifi hardware | yes | n/a | use ethernet | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| associated, no DHCP | yes | yes (reset/bounce) | none, else switch network | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| gateway unreachable | yes | yes (bounce) | switch network if it | +| | | | persists | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| captive DNS hijack | yes | opens portal | log in at the portal | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| broken DNS, 1.1.1.1 works | yes | yes (temp override, | report the venue's DNS | +| | | auto-reverted) | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| HTTP captive portal | yes | opens portal | log in at the portal | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| HTTP interception, no | yes | opens neverssl + gateway | log in manually | +| parseable URL | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| upstream / AP outage | yes (link up, no route out) | no (stops after local) | switch network / contact | +| | | | venue | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| wrong WPA password / | yes | no | enter the password | +| missing secret | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| enterprise auth / cert | yes | no | edit the profile in | +| failure | | | nmtui/nmcli | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| duplicate SSID / | yes (UUID-keyed) | yes (activate by UUID) | none | +| connection-name | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| hidden SSID | yes | yes (connect by name) | enter SSID + password | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| multiple active links | yes | n/a | pick the interface | +| (wifi+tether) | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| wedged NetworkManager | yes | yes (bounce → restart NM) | none, else reboot | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| slow / hung command | yes (degraded) | retries within budget | retry | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| stale / corrupt cache | yes | self-heals (atomic + | none | +| | | invalidation) | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| DNS cleanup failure | yes | flags cleanup-unverified | revert DNS manually | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| missing speedtest backend | yes | n/a | install speedtest-go | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| VPN / policy-routing | yes (route/DNS ownership) | no (deferred to Phase 5) | check the VPN | +| interference | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| + +* Observability — logging + redaction + +- *Event log*: JSONL at =$XDG_STATE_HOME/net/events.jsonl= (fallback + =~/.local/state/net/events.jsonl=), size-rotated (e.g. 1 MB × 3). Every + mutating op and probe appends an event: =ts, op, argv (redacted), exit_code, + stderr_tail, elapsed_ms, iface, nm_uuid, probe_url_class, http_code, + redirect_host, cache_event=. +- *Redaction (always on)*: PSKs, EAP identities/passwords, NM secrets, and + portal query tokens are never logged. MAC addresses, full IPs, and SSID are + redacted when configured (=redact_mac=, =redact_ip=, =redact_ssid= in config). +- *Post-failure diagnosis*: =net doctor --json= is the snapshot + recommendation + (diagnose plus the suggested repair), =net diagnose --json= the raw report, and + the event log the history. =net doctor= is the console-recoverable entry point + (reachable as =make online= / =make net-doctor=). +- *Secret-leak tests*: assert no PSK/EAP/portal-token ever appears in any JSON + output, log line, or error message. + +* Indicator (task #C — Phase 1, the fast win) + +** States (internet sub-state on top of link state) +- online — associated and the probe returned 204. Normal icon. +- captive — associated, probe hit a portal. Distinct glyph + warning CSS class; + tooltip names the portal host; left-click opens diagnostics with the portal + ready to open (Phase 2+; see interactions for the Phase-1 interim). +- no-internet — associated, probe failed (no portal, no 204). Distinct glyph + + warning class. +- degraded — =net status= couldn't read link state within budget (slow/failed + nmcli). Neutral glyph, =net-degraded= class. Never blocks the bar. +- rfkill-blocked — the radio is soft-blocked (=rfkill=), distinct from + disconnected. Distinct glyph; the fix is =net repair rfkill= / =net doctor=. +- connecting / disconnected / airplane / wired — as today, plus wired shown + correctly even when it appears after session start. (airplane is now this + module's state, absorbed from the retired airplane module.) + +** Glyphs +Nerd-font codepoints, final values verified live before merge (same discipline +as wtimer). Reuse the signal-strength ramp already in =waybar-netspeed=; add a +captive / no-internet / degraded overlay glyph. + +** Tooltip +SSID + signal + IPv4 + gateway + the throughput readout (absorbed from +netspeed) + the last probe result and its age (stale/expired hinted). + +** Interactions (phase-aware; no keyboard-modifier clicks — waybar can't qualify +clicks by modifier, so the rich actions live in the panel, not ctrl/super-click) +Clicks never block the bar: each dispatches a detached background job and reports +via =notify=, single-flight per action. +- *Phase 1 (no panel yet)*: left-click runs =net probe= + notify (refreshes the + state on demand) and keeps the existing =pypr toggle network= scratchpad as the + interim manager; right-click runs =net repair reset= in the background + + notify; middle-click runs =net portal=. +- *Phase 2+ (panel exists)*: left-click opens the panel (focused on the relevant + section — diagnostics when captive); right/middle keep the background + reset/portal shortcuts. + +* Panel (tasks #B + #C diagnostics — Phases 2-3) + +GTK4 + gtk4-layer-shell, pocketbook scaffold (src-layout package, unittest, +Makefile, gtk4-layer-shell anchored dropdown under the bar). One panel shell, +reused by the future desktop-settings panel. + +Sections: +1. *Connections* — list, MRU-first, active marked, live signal bars for in-range + wifi; row click switches; buttons for add / edit / remove; a rescan control. +2. *Diagnose* (read-only) — Probe (204/captive, shows portal URL + Open), Gateway + ping, DNS config. Streaming step output (the diagnostics contract). +3. *Repair* (mutating, confirmed) — tiered lightest-first: Unblock rfkill, Reset + (fresh MAC), Bounce (full stack), DNS override test, Force portal. A "Get me + online" button runs =net doctor --fix= (the auto-escalating sequence). +4. *Speed test* — Run button, progress, down/up/ping result + last-run line. + +** Panel state, cancellation, permissions +State machines for: connection-list loading, rescan-in-progress, +activation-in-progress, diagnose-running, repair-running, speedtest-running. Plus +the real terminal states on this two-machine fleet: no-wifi-hardware (desktop → +ethernet-only view) and missing speedtest-go. (No GPG-key state — there's no +credential store; secrets live in NM.) ("No NetworkManager" is not a modeled +state — NM is always present +on these machines; if nmcli is somehow absent the panel shows a single hard-error +and exits.) Long operations show elapsed time and are cancellable where the +underlying op allows (rescan, speedtest, probe); clearly non-cancellable ones +(an in-flight activation) show elapsed + a disabled control. Permission-denied +(sudo/polkit declined) is a first-class outcome with the "nothing changed" +message, never a silent failure. + +Interaction-pattern catalog (=~/code/rulesets/patterns/=) principles that apply: +- transient-state-buttons — all the network levers in one place, reachable by + one chord (the bar click), state visible. +- default-most-common-friction-proportional — connections MRU-ordered so the + common pick is first; destructive ops (remove) and privacy-changing ones + (reset, override) get a confirm, switching does not. +- one-prompt-picker-typed-prefix — if the connection picker ever goes + keyboard-driven, kind (wifi/eth/saved/in-range) + name in one typed picker. + +** Panel UX flow (settle before Phase 2) +The concrete interaction defaults, so the GTK build isn't inventing them: +- *Default focus*: the Connections section, current connection's row selected. If + the indicator opened the panel because of a captive/no-internet state, focus + Diagnose instead with the relevant action highlighted. +- *Row content*: glyph (signal bars / wired / active check) + name + a secondary + line (security type, "active"/last-used). The active row is visually pinned at + top of its group. +- *Buttons*: one *primary* per section (Connections: Connect to the selected row; + Diagnose: Run diagnose; Repair: "Get me online"; Speed test: Run). Secondary + actions (add / edit / remove / rescan; individual repair tiers) are smaller and + grouped. +- *Disabled rules*: Connect disabled on the already-active row; Repair tiers + disabled while one runs; Speed test disabled while running; add/edit disabled + for enterprise (with the "edit in nmtui/nmcli" hint). +- *Confirmations* (exact wording): Reset → "Reset <SSID>? This drops the + connection and reconnects with a new MAC."; Bounce → "Restart networking? All + links drop briefly."; DNS override → "Temporarily set DNS to 1.1.1.1 for the + test? It reverts automatically."; Remove → "Forget <SSID>? The saved password is + deleted." +- *"Get me online" reporting*: shows each escalation step live (Unblock rfkill → + Reset → Bounce → Portal) with per-step pass/fail and stops at the first that + restores internet or at a terminal state, naming the next action. +- *After close*: the bar reflects the new state immediately (signal/refresh on + next poll); a running speedtest/diagnose keeps running and notifies on finish + (panel close doesn't cancel it). +- *Keyboard*: Esc closes; Tab moves between sections; arrows move rows; Enter + fires the section primary; the connection list is type-to-filter. + +* Connection management (nmcli) + +- Every op via nmcli per the nmcli contract above (terse, escaped, UUID-keyed, + bounded =--wait=). +- MRU ordering from NM's =connection.timestamp= (last activated), descending. +- Ethernet appears in the list whenever a wired device is present, selectable at + any time; switching just brings the chosen connection up. +- *Mutation safety + rollback*: switching keeps the current connection up until + the new one activates successfully (=nmcli --wait 30=); on failure it does not + tear down the working link, surfaces the failure, and leaves the prior + connection active. =net down= notes that NM may auto-reactivate a profile and + reports the post-op active connection so the user isn't surprised. A switch that + needs a password it doesn't have prompts (or fails with "password required"), + never silently strands. The exact NM command sequence (preflight active-state + read → activate target → verify default route → on failure, confirm prior + still up) is pinned in the engine and tested against fake nmcli. +- *Add/edit scope*: open + WPA-PSK only in v1. Existing saved profiles of any + type (including enterprise) can be *activated*; editing an enterprise profile + shows "edit via nmtui/nmcli" rather than a broken partial form. + +* Connection secrets (no separate store) + +Per Craig's call: don't build a parallel credential store. Settings and secrets +live where NetworkManager already keeps them, so there's one source of truth and +no extra dependency (no GPG, no gpg-agent, no =~/.config/net/connections=). + +- *Where secrets live*: =/etc/NetworkManager/system-connections/<name>.nmconnection=, + root-owned =0600=, with the PSK/EAP secret stored inline (the default + =secret-flags=0= "owned by NM"). That's already secure-at-rest (root-only) and + is what =nmcli= reads/writes. +- *How we touch them*: every add/edit/remove goes through =nmcli= (=connection add + / modify / delete=), which writes the =.nmconnection= with the right ownership + and perms. We never read or write =system-connections= files directly (root) and + never copy a secret out of them. +- *No export / import / sync* — there's nothing to sync. A new machine gets its + connections the way it always has (the user joins, or restores NM profiles), + not from a tool-specific vault. +- *config file*: =~/.config/net/config= still exists, but only for non-secret + preferences (speedtest server, redaction flags, probe TTL). It holds no + credentials. +- *No secret leakage*: PSK/EAP never appear in =net=' =--json= output, the event + log, or error text (tested) — even though NM is the store, our surfaces must not + echo a secret =nmcli= happens to return. + +* Speed test + +- Backend: *=speedtest-go=* (=--json=, =--server=, =--no-download/--no-upload=), + already installed on velox (AUR =speedtest-go-bin=). No new dependency for v1. + librespeed-cli is the documented fallback for a self-hosted LibreSpeed server. +- =net speedtest --json= parses speedtest-go's JSON into the =speedtest= schema. +- *Server policy*: auto-select nearest by default; allow a pinned server id in + =~/.config/net/config=. +- *Timeout + cancellation*: a hard run timeout (e.g. 60s); the panel run is + cancellable (kills the child). Offline / rate-limited / no-server errors map to + the failure-message table. +- *Tests*: fixture JSON (success) and fixture stderr (offline, no server, + malformed output) drive =net speedtest= parsing without touching the network. + +* Help + documentation + +In-app help has three layers, each reachable in the situation it's needed: + +- *CLI help (works from a dead-GUI TTY)*: =net --help= lists the subcommands in + one screen; =net <cmd> --help= documents each (flags, what it mutates, the + console-recovery targets). The Makefile targets are self-describing (=make help= + lists =online= / =net-doctor= / etc. with one-line descriptions). This is the + layer that matters most when you're at a console with no network. +- *Panel help (in the GUI)*: a small =?= affordance in the panel header opens an + inline help pane — what each section does, which Repair actions mutate state, + what the indicator glyphs/colors mean. Per-control tooltips on the less-obvious + buttons (rfkill, bounce, DNS override). No external help browser. +- *User guide (the durable doc)*: a README / docs page covering every command, + the indicator states + glyphs, the panel sections, the config file keys, the + recovery make targets, troubleshooting (the failure-message table), and + rollback. Written so a future session — or Craig six months out — can operate + and recover the module from the doc alone. + +The failure-message table above is the single source of truth for the +troubleshooting text; the guide and the panel help both render from it rather +than restating it. + +* Enhancement radar + +Low-cost adjacent affordances, each dispositioned so cheap wins aren't lost and +the v1 panel stays focused. (Several are already in v1 by virtue of other +sections; marked here so the consideration is visible.) + +| Enhancement | Disposition | Reason | +|-------------------------------------+-------------+--------------------------------------------------------| +| Open / copy portal URL | v1 | already in the captive flow; trivial Open + Copy | +|-------------------------------------+-------------+--------------------------------------------------------| +| Forget network | v1 | it's the remove op, already specced | +|-------------------------------------+-------------+--------------------------------------------------------| +| Rescan now | v1 | already a Connections control | +|-------------------------------------+-------------+--------------------------------------------------------| +| Retry with hardware MAC | v1 | captive already has --hardware-mac; expose in Repair | +|-------------------------------------+-------------+--------------------------------------------------------| +| Pin speedtest server | v1 | already a config key | +|-------------------------------------+-------------+--------------------------------------------------------| +| Copy redacted doctor report | v1 | cheap, serves the observability/support goal | +|-------------------------------------+-------------+--------------------------------------------------------| +| Show last good network / result | vNext | needs small history persistence | +|-------------------------------------+-------------+--------------------------------------------------------| +| Watch mode for net doctor | vNext | a --watch loop; handy at a TTY, not v1-critical | +|-------------------------------------+-------------+--------------------------------------------------------| +| Actionable desktop notifications | vNext | dunst supports actions; extra wiring | +|-------------------------------------+-------------+--------------------------------------------------------| +| Keyboard connection picker (fuzzel) | vNext | the typed-prefix pattern; panel covers v1 | +|-------------------------------------+-------------+--------------------------------------------------------| +| QR-code share / import WiFi | rejected | low value for a personal 2-machine setup; phones do QR | +|-------------------------------------+-------------+--------------------------------------------------------| + +* Waybar wiring + +- Replace =custom/netspeed= with =custom/net= in the bar's module list (same + slot). +- Module def: =exec: waybar-net=, =return-type: json=, =interval: 2=, a =signal= + for on-demand refresh (next free signal after wtimer's 14), =on-click=, + =on-click-right=, =on-click-middle= per the phase-aware interactions (each + dispatches a detached job, never blocks). +- Remove the old =on-click: pypr toggle network= scratchpad only once the panel + replaces it (Phase 2); Phase 1 keeps it as the interim manager. + +* Testing plan (TDD) + +- *Engine (normal)* — fake =nmcli= + =curl= + =speedtest-go= on PATH; assert + command sequences and parsed/emitted JSON for status, list, up/down, + add/edit/remove, probe, diagnose, repair, speedtest. Pure state/format + functions tested directly. JSON schemas locked by example. +- *Portal parser* — already covered in =tests/captive= (Normal/Boundary/Error + + the real SONIFI body). The engine's native probe reuses the same cases. +- *nmcli parsing* — escaped colon/backslash/newline in SSID, duplicate names, + hidden SSID, non-ASCII, wired-mid-session, multi-active (wifi+tether). +- *Failure + concurrency (the risky classes)* — slow/hung nmcli/curl/speedtest + (degraded state within budget), concurrent =net status= probe refresh + (single-flight), corrupt cache (recovered), stale cache after SSID change + (invalidated), permission denied / sudo declined, DNS-override cleanup failure + (=cleanup-unverified=), NM partial activation (rollback keeps prior link), + secret redaction, missing speedtest-go, no wifi hardware, rfkill soft/hard + block. +- *Doctor classification* — fixture-driven =net doctor= over fake nmcli/curl + asserting the right terminal classification + that =--fix= stops before + destructive repairs: auth failures (=needs-user-action=), upstream/AP failure + (=upstream-not-local=), VPN-routed failure (=deferred/vpn=), and the DNS classes + (hijack → portal, broken-but-1.1.1.1-works → offered override, egress-blocked → + upstream). Assert the failure-mode coverage table's "detects / repairs / terminal + action" holds for each row. +- *Indicator* — drive =net status --json= through =waybar-net=, assert the JSON + per state (online / captive / no-internet / degraded / wired / disconnected / + rfkill), iface override via env. +- *Panel* — pocketbook-style: backing logic (list ordering, op dispatch, + state-machine transitions), not GTK widgets. +- *NM secrets / no-leak* — add/edit writes the secret into NM via nmcli (asserted + against fake nmcli, never to a tool-owned file); assert no PSK/EAP appears in any + =--json=, log line, or error (there is no credential store to round-trip). +- *Live checklist (gated out of the suite)* — a "Manual testing and validation" + task per phase for the real-network states (captive at a hotel, no-internet, + switch under load, reset, speedtest) that can't be faked. + +** Harness + coverage gate +The concrete contract, matching the repo's existing convention (not pytest — the +dotfiles suites are =unittest=, run by =make test= as =python3 -m unittest= over +=tests/*/test_*.py=; 33 suites today): +- *Framework*: =unittest=. Each suite is =tests/<name>/test_<name>.py= + (=tests/net/=, =tests/waybar-net/=), collected by the existing =make test= loop + — no new runner, no pytest dependency. +- *Fakes on a temp PATH*: =fake-nmcli=, =fake-curl=, =fake-speedtest-go=, + =fake-rfkill=, =fake-resolvectl= live as executable stubs in =tests/<name>/= + (the =tests/layout-navigate/fake-hyprctl= pattern). A fixture file encodes the + command→canned-output map and the stub appends each invocation to a log the test + asserts against. Subprocess timeouts are simulated by a stub that sleeps past the + budget; =net status= must still return the degraded state. +- *Waybar wrappers end-to-end*: =waybar-net= is run as a subprocess with the fake + PATH and the env overrides (iface, cache path), asserting the emitted JSON — same + as =tests/waybar-netspeed=. +- *Coverage*: coverage.py is absent system-wide (and not importable), so coverage + runs in a throwaway venv (=python3 -m venv=, =pip install coverage=, =coverage + run -m unittest=, =coverage report=) — the method the wtimer suite used (95%). + Target: *branch* coverage over =net/= and the wrapper, ≥ 90% on the pure + classifier/parser modules. + +** Coverage as a gap-finder, not a number (per phase) +Line coverage alone misses the branches that matter here, so each phase ends with +a *coverage-gap pass*, not just a percentage: +- After the first green run, read the branch report and map every uncovered branch + to either a new test or a consciously-excluded live-only behavior (with a comment + or a Manual-testing entry naming it). +- *Branch coverage is required* for the pure logic: the doctor classifier (every + outcome — fixable / needs-user-action / upstream-not-local / deferred-vpn), the + cleanup-unverified path, the redaction paths, the degraded hot-path fallback, the + timeout branches, and the portal/nmcli parsers. +- A phase isn't "done" until its coverage-gap pass is recorded — uncovered logic is + either tested or explicitly excused, never silently uncovered. + +* Files touched (planned, all in =~/.dotfiles=) + +- =net/= package (src-layout, like pocketbook) — engine + panel. +- =hyprland/.local/bin/waybar-net= — the indicator (replaces =waybar-netspeed=). +- =hyprland/.local/bin/net= — engine CLI entry (console-script shim). +- =hyprland/.config/waybar/config= — swap =custom/netspeed= → =custom/net=; + remove =custom/airplane=. +- =hyprland/.config/waybar/style.css= — captive / no-internet / degraded / + rfkill classes; remove airplane classes. +- =tests/net/=, =tests/waybar-net/= — suites. +- =captive= — refactor: extract probe + reset into functions callable + non-interactively (a =--json= probe mode) so the engine reuses them. +- =~/.config/net/config= — seed config (probe TTL, speedtest server, redaction + flags). No secrets; not a credential store. +- dotfiles =Makefile= — add the console-recovery targets (=online=, =net-doctor=, + =net-status=, =net-diagnose=, =net-portal=, =net-reset=, =net-bounce=). +- *Deletions once net ships* (the airplane module is absorbed): + =hyprland/.local/bin/waybar-airplane=, =hyprland/.local/bin/airplane-mode=, + =tests/waybar-airplane/=, =tests/airplane-mode/=, and the =custom/airplane= + module + its css. +- archsetup Hyprland step — add =gtk4-layer-shell=, =python-gobject=, + =speedtest-go-bin= to the install lists (the only archsetup change; no =gpg= + added, secrets stay in NM's store). + +* Resolved decisions (Craig's calls + this response) + +1. Panel UI tech → GTK4 + gtk4-layer-shell, shared pocketbook scaffold (one + panel shell, reused by the desktop-settings sibling). +2. Engine language → Python =net= package; shells out to =captive= for the + portal-force flow, native cheap probe for the bar path. +3. Connectivity probe → split cadence (fast link poll every 2s + slow cached + internet/captive probe, TTL ~45s) with single-flight + atomic cache. +4. No keyboard-modifier clicks (waybar can't qualify them) — the panel hosts the + rich actions; bar clicks dispatch detached jobs (phase-aware). +5. No separate credential store (Craig's call, cj). Secrets live in NM's own + =system-connections= (root =0600=, inline), touched via nmcli. No GPG, no + gpg-agent, no =~/.config/net/connections=. Supersedes the earlier GPG-store + design. +6. =custom/netspeed= absorbed into =custom/net=; throughput moves to the tooltip. +7. Speed-test backend → =speedtest-go= (already installed), not a new + librespeed-cli dependency; librespeed-cli is the self-hosted fallback. +8. Code lives in the dotfiles repo; archsetup only installs deps. +9. v1 add/edit scope = open + WPA-PSK; enterprise/802.1X is activate-only, + add/edit is vNext (settled by Craig 2026-06-29 — no enterprise networks in his + history, so the form would be unused UI). +10. =net doctor= is in v1 (Craig's call, cj) — a one-shot diagnose+fix mode, + reachable from a TTY via =make online= / =make net-doctor=. (The earlier + "defer the doctor/bundle command" decision is reversed.) +11. Diagnose (read-only) and Repair (mutating, confirmed) are separated in the + panel and the CLI; Repair is tiered lightest-first (rfkill → reset → bounce). +12. =custom/net= absorbs the airplane module (Craig's call, cj). *As built + (2026-06-29, option 1): display-only.* net shows the airplane state (reads + the airplane-mode state file); the =airplane-mode= low-power toggle is kept + (radios + CPU + brightness + services is not a network concern) and moved to + =custom/net='s right-click + signal 15. Only the redundant display pieces — + =waybar-airplane=, =custom/airplane=, and the retired =waybar-netspeed= — + plus their tests/css were deleted. The earlier "delete airplane-mode" framing + is superseded. +13. Repair includes a full-stack bounce and an rfkill-unblock (Craig's calls, + cj) — the latter recovers the framework-laptop post-power-loss soft-block. +14. VPN / WireGuard is a planned Phase 5 (Craig's call, cj), not a permanent + exclusion. + +* Implementation phases + +- *Phase 1 — Indicator + console recovery (task #C).* =net status= + =net probe= + (native cheap probe, reusing captive's logic) + the =captive= probe refactor + + =waybar-net= + the split-cadence cache (single-flight, atomic, stale classes) + + CSS states (incl. rfkill) + performance budget. Plus the CLI-only recovery path: + =net repair= tiers (rfkill / reset / bounce), =net doctor [--fix]=, and the + Makefile targets (=make online= etc.) — all testable without the GTK panel. + Absorbs the airplane state and removes the standalone airplane module. Interim + left-click keeps the existing scratchpad until the panel lands. + - *Acceptance*: fresh-login waybar smoke test shows correct state on + online/captive/no-internet/wired/rfkill; =net status= stays within budget + under a fake slow nmcli (degraded state); =net doctor --fix= recovers a + soft-blocked radio from a TTY; the live captive checklist passes at a real + portal; the airplane state works and the old airplane module is gone; + reverting = swap =custom/netspeed= + =custom/airplane= back. +- *Phase 2 — Panel shell + connection management (task #B core).* GTK4 + layer-shell scaffold + =net list/up/down/add/edit/remove/rescan= + MRU list + + mutation safety/rollback + panel state machines. + - *Acceptance*: switch wifi↔wifi and ethernet↔wifi without stranding; a failed + switch leaves the prior link up; add/edit open + WPA-PSK writes the secret to + NM; remove confirms; panel states render for loading/rescan/activation. +- *Phase 3 — Diagnostics + speed test in the panel.* Wire =net diagnose= / + =net repair= / =net doctor= / =net portal= / =net speedtest= into the Diagnose + vs Repair sections; the "Get me online" button; portal Open button; speedtest + progress + cancel. + - *Acceptance*: diagnose runs read-only; each repair tier confirms + verifies + cleanup (DNS override reverts, shown); speedtest result parses from + speedtest-go and a fixture-driven failure shows the right message. +- *Phase 4 — Docs + rollout.* In-app help (=net --help= / per-command help, the + panel help affordance), README/user-guide (commands, panel, config, + troubleshooting, the make targets, rollback), and the manual dep step on ratio. + - *Acceptance*: =net --help= and each subcommand's help are complete; the + user-guide covers every command + the recovery targets; ratio rollout + documented. +- *Phase 5 — VPN / WireGuard (future).* Fold the existing archsetup wireguard + tooling into the same panel + CLI (=net vpn ...=). Out of the v1 milestone; + specced separately when picked up. + +* Open items / risks + +- gtk4-layer-shell dropdown anchoring under a waybar module needs the same + positioning work pocketbook solved; reuse it. (Phase 2.) +- The =captive= refactor must keep the standalone CLI behavior identical while + exposing a non-interactive =--json= probe; covered by the existing + =tests/captive= suite plus new probe-mode tests. (Phase 1.) +- speedtest-go server selection variance (nearest-server flor) — pin a server in + config if results are noisy. (Phase 3.) +- The background-probe kick from =net status= must be truly non-blocking (spawn + + detach); enforced by the single-flight lock and the performance benchmark test. + +* Rollback + +Each phase is independent. The indicator (Phase 1) is a drop-in replacement for +=custom/netspeed= (and =custom/airplane=); reverting is swapping those modules +back in the config and restoring their scripts. The panel is additive — not +wiring its clicks leaves the bar working as before. No credential store to roll +back (secrets stay in NM throughout). + +* Review findings [31/31] + +** DONE Define the structured diagnostics contract :blocking: +The spec says the engine "emits JSON" and that diagnostics "reuse =captive= +verbatim", but the current =~/.dotfiles/common/.local/bin/captive= flow is a +human-readable bash script that mixes diagnostics, sudo prompts, DNS mutation, +browser launch, and terminal prose. A GTK panel cannot reliably turn that into +clear state, progress, cancellation, or useful error messages. Define the +machine contract before implementation: every diagnostic step should have a +stable id, status (=pending/running/pass/warn/fail/skipped=), redacted evidence, +elapsed time, safety outcome, and next action. Keep =captive= as the interactive +CLI, but either refactor reusable probe/reset functions behind =net diagnose +--json= or make =captive= expose a non-interactive JSON mode. This blocks the +panel and logging work because otherwise the implementer must invent the +boundary. + +Disposition: accept — added the "Diagnostics contract" section (per-step id / +status / evidence / elapsed / safety / next_action) and the =captive= =--json= +probe-mode refactor under Architecture + Files touched. + +** DONE Specify user-facing failure messages and recovery actions :blocking: +The spec names failure states like =no-internet=, =captive=, failed probe, +failed reset, missing DNS, and missing speed-test backend, but it does not define +the messages the user sees or what each message tells them to do next. For this +feature, "error" is not enough: a user needs to know whether WiFi is associated, +whether DHCP succeeded, whether DNS is hijacked/broken, whether HTTP is +intercepted, whether sudo was declined, whether a command timed out, and whether +the system was left unchanged or partially changed. Add a message table for the +indicator, panel, and CLI with: failure class, visible text, evidence included, +redaction rule, and next action. This is blocking because UX quality here is the +product, not an implementation detail. + +Disposition: accept — added the "Failure states, messages, recovery" section +covering each class, the visible message, the "what changed" residue note, and +the next action across indicator/panel/CLI. + +** DONE Define the debug log and redacted support bundle :blocking: +There is no observability section. When this fails in a hotel or cafe, an agent +needs enough evidence to diagnose it without rerunning destructive actions. Add +log location, rotation/retention, JSONL event schema, command argv logging, +exit-code/stderr capture, elapsed time, selected iface, NM active connection +UUID, probe URL class, HTTP code, redirect host, DNS servers, and cache +read/write events. Also define a =net doctor --json= or =net debug-bundle= +command that emits redacted status, recent log events, dependency versions, and +a reproduction command. Redact SSID if configured, MAC addresses, portal query +tokens, PSKs, EAP identities/passwords, IPs when requested, and all GPG/NM +secrets. This blocks implementation readiness because post-failure diagnosis is +currently left to ad hoc terminal spelunking. + +Disposition: modify — accepted the JSONL event log, the schema, and the redaction +rules in full (new "Observability" section). Deferred the dedicated =net +debug-bundle= / =net doctor= command to vNext: for a single-user tool =net +diagnose --json= (the snapshot) plus the event log (the history) cover +post-failure diagnosis; a bundle command is gold-plating for v1. Recorded under +Out + Resolved decision 10. + +** DONE Pin the nmcli parsing and timeout contract :blocking: +The spec lists nmcli operations but not the exact fields, output modes, escaping +rules, ID semantics, or timeouts. This is risky because SSIDs and connection +names can contain spaces, colons, duplicates, hidden names, and non-ASCII; the +current =waybar-netspeed= already had an SSID parsing bug. The nmcli manual +documents =--terse=, =--get-values=, =--escape=, =--wait=, ID/UUID/path +selection, =passwd-file=, and built-in connectivity states +(=none/portal/limited/full/unknown=) at +https://man.archlinux.org/man/nmcli.1.en. The spec should require UUIDs for +saved-profile operations, explicit =--wait= budgets, parser tests for escaped +colons/backslashes/newlines/duplicate names/hidden SSIDs, and a decision on when +to use or ignore =nmcli networking connectivity [check]=. This is blocking +because the command wrapper is the core reliability boundary. + +Disposition: accept — added the "nmcli contract" section: terse + =--escape= + +=--get-values=, UUID-keyed ops, explicit =--wait= budgets, NM connectivity as a +cheap hint (our probe authoritative), and the parser test matrix. + +** DONE Define cache concurrency, atomicity, and stale-state behavior :blocking: +=net status= may spawn =net probe= whenever the cache is stale, but the spec +does not define locking, process coalescing, atomic writes, crash cleanup, or +what happens when the probe hangs. With a 2s Waybar interval, a bad network could +start overlapping probes, corrupt the runtime cache, or keep showing stale +"online" while the link is gone. Add a single-flight lock under +=$XDG_RUNTIME_DIR/waybar=, atomic write+rename for cache updates, max probe +runtime, stale age classes (fresh/stale/expired/unknown), cache invalidation on +iface/SSID/connection UUID change, and tests for concurrent =net status= calls. +This blocks the fast-path design because it is the main performance and +correctness risk. + +Disposition: accept — added "Concurrency, atomicity, staleness" under the +Connectivity model: flock single-flight, temp+rename atomic write, ≤6s probe +timeout, fresh/stale/expired/unknown classes, iface/SSID/UUID invalidation, stale +lock reclaim, plus concurrency tests in the test plan. + +** DONE Bound hot-path performance with measured budgets :blocking: +The spec says the cheap poll should be sub-100ms, but the proposed fast path +still may call multiple =nmcli= commands every two seconds, read sysfs, parse +throughput, and maybe spawn a background probe. The existing =waybar-netspeed= +had a deliberate sleep for throughput sampling; replacing it must define how +throughput is sampled without sleeping in the bar path. Add a per-command budget +for =waybar-net= and =net status=, a maximum number of subprocesses on the hot +path, a timeout for every subprocess, benchmark tests with fake slow =nmcli=, +and a rule that the indicator emits a degraded JSON state rather than blocking. +This is blocking because Waybar custom modules can visibly freeze or lag when +their exec path stalls. + +Disposition: accept — added the "Performance budgets" section: <100ms typical / +<250ms worst, throughput sampled across the poll interval (no in-process sleep), +one nmcli call max on the hot path, timeouts on every subprocess, the degraded +state, and a fake-slow-nmcli benchmark test. + +** DONE Make click actions non-blocking and visible :blocking: +Waybar right-click runs =net reset= and middle-click runs =net portal= directly. +Those operations can require sudo, open browsers, mutate DNS, delete/recreate NM +profiles, or hang on network commands, but Waybar click handlers provide no +panel, terminal, progress, or cancellation surface by default. Define whether +right/middle click instead opens the panel focused on the action, dispatches a +background job with notifications, or is removed from v1. If kept, specify +single-flight behavior, how sudo/polkit prompts surface, how success/failure is +reported, and how the user can inspect logs. This blocks UX readiness because +the fastest remediation path is currently the easiest place to hide failure. + +Disposition: modify — accepted the concern; made the interactions phase-aware and +non-blocking. Every click dispatches a detached, single-flight background job and +reports via =notify=; sudo surfaces through polkit/the normal prompt; failures go +to the notify + the event log. In Phase 1 (no panel) left-click runs probe + +notify and keeps the scratchpad; from Phase 2 left-click opens the panel focused +on the action. Recorded in the Indicator "Interactions" subsection. + +** DONE Specify connection mutation safety and rollback :blocking: +The spec says row click switches connections and remove gets a confirm, but it +does not define what happens when a switch partially succeeds, disconnects the +current working link, needs a password, loses the default route, or triggers +auto-activation. The nmcli manual warns that =connection down= does not prevent +future auto-activation and may internally block a profile until user action. +Define preflight, the exact NM command sequence, whether the old active +connection is kept until the new one proves usable, when rollback is attempted, +how long activation waits, and what the panel says when rollback fails. This is +blocking because the module can strand the user offline. + +Disposition: accept — added "Mutation safety + rollback" under Connection +management: keep the prior link up until the target activates (=--wait 30=), no +teardown on failure, password-required surfaced not stranded, =net down= reports +post-op active state + the auto-reactivation caveat, and the pinned NM command +sequence is tested against fake nmcli. + +** DONE Define the credential-store security model :blocking: +The GPG store is described as optional and default-unencrypted, but the spec does +not define file modes, schema, secret-source rules, import/export prompts, +recipient verification, stale secret handling, or what is logged. It also says +NM remains source of truth while the user-owned store contains PSK/EAP secrets, +which creates two truth sources for sensitive data. Add a precise schema, +=0600= file creation with parent-dir permissions, encrypted-recipient checks, +plaintext warning text, explicit opt-in flow, redaction requirements, behavior +when NM has a secret not in the store, behavior when the store has a secret NM +rejects, and tests for no secret leakage in JSON/logs/errors. This blocks Phase +4 and the full spec because otherwise the implementer must make security +decisions mid-code. + +Disposition: accept — rewrote "Credential storage" with the versioned schema, +=0600= file / =0700= dir, recipient verification on opt-in, the plaintext +warning, secret-source rule (entered/exported, never harvested from root store), +the two-source reconciliation policy (NM wins live, store wins for what NM +lacks, stale-secret flagging), and the no-leak tests. + +** DONE Define EAP, enterprise WiFi, and unsupported connection behavior :blocking: +The store says "PSK/EAP" and connection management says add/edit, but there is +no v1 contract for WPA-Enterprise fields, certificates, identity vs anonymous +identity, hidden networks, static IP, proxy settings, metered flags, MAC +randomization, or 802.1X prompt behavior. Either scope v1 to open/WPA-PSK plus +existing saved-profile activation, or define the minimum EAP form and the +unsupported-state messages. This blocks add/edit/import because enterprise WiFi +is too sensitive to hand-wave. + +Disposition: modify (scope) — scoped v1 to open + WPA-PSK add/edit, with +*activation* of any existing saved profile (including enterprise). Enterprise / +802.1X add/edit, static-IP, proxy, metered, and MAC-randomization editing are +vNext, shown as "edit via nmtui/nmcli". Recorded in Scope/Out, Connection +management, and Resolved decision 9. + +** DONE Split read-only diagnostics from mutating remediation :blocking: +The panel's diagnostics section includes probe, bounce/reset, gateway ping, and +DNS override test in one area, while =captive= currently performs resets and +temporary DNS changes as part of its flow. Users need to know which buttons are +read-only and which mutate NM profiles, MAC mode, DNS, or browser state. Add +separate "Diagnose" and "Repair" actions, confirmations for destructive or +privacy-changing operations, explicit cleanup verification for DNS override, and +a terminal state when cleanup is unverified. This blocks readiness because +network repair must not surprise the user or leave hidden residue. + +Disposition: accept — split the panel into a read-only Diagnose section and a +confirmed, mutating Repair section (and split the CLI into =net diagnose= vs =net +repair=). Added =cleanup_verified= + a terminal =cleanup-unverified= state to the +diagnostics contract. + +** DONE Define panel state, cancellation, and permissions UX :blocking: +The panel sections list buttons and a streaming output area, but not loading +states, disabled states, empty states, keyboard/focus behavior, cancellation, or +permission-denied handling. Add panel state machines for connection list loading, +rescan in progress, activation in progress, diagnostics running, speedtest +running, and no NetworkManager/no WiFi/no permissions/no GPG key/no +librespeed-cli. Each long operation should be cancellable where possible or +clearly non-cancellable with an elapsed-time display. This blocks the GTK work +because without it the implementer must invent the user flow. + +Disposition: modify — accepted the state-machine requirement (added "Panel state, +cancellation, permissions"), but scoped the state set to what can actually occur +on the two-machine fleet: dropped "no NetworkManager" as a modeled state (NM is +always present; a missing nmcli is a single hard-error exit) and kept +no-wifi-hardware, missing speedtest-go, no-GPG-key, plus the in-progress states +with elapsed-time + cancellation where the op allows. + +** DONE Verify speed-test dependency, server choice, and failure contract :blocking: +The spec chooses =librespeed-cli= and notes availability/default-server research +as an open risk, but Phase 3 still depends on parsing its JSON and showing +progress. I checked the upstream project page +(https://github.com/librespeed/speedtest-cli) and the AUR URL named by search is +not sufficient as a verified package/install contract in this spec. Add the +exact package name/source to install, command version expected, JSON shape, +server-selection policy, timeout, cancellation behavior, offline/rate-limited +messages, and tests with fixture JSON and fixture stderr. This blocks Phase 3 +because speed-test failure modes are otherwise undefined. + +Disposition: modify — verified live and changed the backend: =speedtest-go= (AUR +=speedtest-go-bin=, 1.x) is already installed on velox and supports =--json=, +=--server=, =--no-download/--no-upload=, so v1 needs no new dependency. +librespeed-cli (AUR =librespeed-cli= / =-bin=) is the documented self-hosted +fallback. Added the "Speed test" section with server policy, timeout, +cancellation, the failure-message mapping, and fixture-JSON/stderr tests. + +** DONE Define dependency installation and repo boundaries :blocking: +The files touched section alternates between archsetup paths and the external +dotfiles repo, while pocketbook has been folded into this repo and its previous +archsetup provisioning was intentionally removed. The spec should state where +the =net= package actually lives, which repository owns the scripts/tests, +whether =gtk4-layer-shell=, =python-gobject=, =librespeed-cli=, =gpg=, =nmcli=, +=curl=, and =resolvectl= are installed by archsetup or assumed present, and the +Makefile targets for test/lint/install. This blocks implementation because the +current path plan can produce code that is not installed on a fresh machine. + +Disposition: accept — added the "Repository + dependencies" section: all code in +=~/.dotfiles= (=net/= package in-tree like pocketbook, scripts in the hyprland +tier, tests under =tests/=), archsetup owns only the dep install +(=gtk4-layer-shell=, =python-gobject=, =speedtest-go-bin=; nmcli/curl/resolvectl +already present), Makefile =make test= collects the package suite, and a +daily-drivers note for ratio. Rewrote Files touched to match. + +** DONE Expand the test plan for failure, concurrency, and live verification :blocking: +The testing plan covers normal parsing and fake command sequences, but it misses +the riskiest behaviors: slow/hung =nmcli=/=curl=/=librespeed=, concurrent +=net status= cache refresh, corrupt cache, stale cache after SSID change, +permission denied, sudo declined, DNS override cleanup failure, NM partial +activation, duplicate connection names, secret redaction, missing optional +dependencies, no WiFi hardware, wired+tether+WiFi ambiguity, portal redirect +tokens, and Waybar click handlers. Add unit/fixture tests for each class plus a +manual/live checklist gated out of the normal suite. This is blocking because +the current plan would leave the exact "things that can go wrong here" mostly +untested. + +Disposition: accept — rewrote the Testing plan with the "Failure + concurrency" +class (slow/hung commands, single-flight, corrupt/stale cache, perm-denied, +cleanup-failure, partial activation, redaction, missing deps, no-wifi, +multi-active) and a per-phase live checklist gated out of the suite. + +** DONE Define status JSON schemas and compatibility rules +The spec says all subcommands take =--json= but does not define schemas. Add +versioned JSON examples for =status=, =probe=, =list=, =diagnose=, =speedtest=, +and error envelopes, including nullable fields and unknown/degraded states. This +is non-blocking for product direction but should be fixed before code so tests +can lock the CLI contract. + +Disposition: accept — added the "JSON schemas" section with versioned (=v:1=) +envelopes for status / probe / list / diagnose / speedtest and a shared error +envelope, including the degraded/unknown states. + +** DONE Rename or alias the phasing section for workflow compatibility +The spec has a usable =Phasing= section, but the spec-review workflow expects an +=Implementation phases= section that can be lifted into =todo.org=. Rename it or +add an alias heading during response. This is non-blocking because the existing +phase decomposition is understandable, but aligning the heading prevents future +workflow friction. + +Disposition: accept — renamed =Phasing= → =Implementation phases= and added +per-phase acceptance criteria. + +** DONE Add documentation and rollout acceptance checks +Rollback is described, but docs and rollout are thin. Add README/user-guide +updates for commands, panel behavior, config file, GPG opt-in, troubleshooting, +and rollback; add acceptance checks for each phase, including a fresh-login +Waybar smoke test and restoring =custom/netspeed=. This is non-blocking but +important for handing the feature to a future session without re-discovery. + +Disposition: accept — added per-phase acceptance criteria under Implementation +phases (incl. the fresh-login waybar smoke test and the =custom/netspeed= +restore), a Phase 4 "Docs + rollout", and (answering Craig's cj follow-up) a +dedicated "Help + documentation" section with the three help layers (CLI help, +panel help affordance, user guide). + +** DONE Add a failure-mode coverage table :blocking: +The spec now names many individual network failures, but it still does not carry +one compact coverage matrix that says, for each common failure mode, whether +=net diagnose= detects it, whether =net doctor --fix= can repair it, and what +terminal user action remains when it cannot. Add a table covering at least: +rfkill soft block, rfkill hard block, no WiFi hardware, associated/no DHCP, +gateway unreachable, captive DNS hijack, broken DNS where 1.1.1.1 works, HTTP +portal, HTTP interception without a parseable portal URL, upstream/AP outage, +wrong WPA password or missing secret, enterprise auth/cert failure, duplicate +SSID/connection-name ambiguity, hidden SSID, multiple active links, wedged +NetworkManager, slow/hung command, stale/corrupt cache, DNS cleanup failure, +missing speedtest backend, and VPN/routing interference. This blocks because +Craig asked for confidence that the diagnostics and doctor cover the real field +failures, and prose scattered across sections is too easy to misread. + +Disposition: accept — added the "Failure-mode coverage" section: a 22-row table +(every mode the finding named) with detect / doctor-fix / terminal-action +columns, conformed to the org-table standard (rules under every row, ≤120). + +** DONE Pin DNS repair semantics in doctor :blocking: +The spec diagnoses DNS hijack, broken hotel DNS, and the temporary 1.1.1.1 +override test, but =net doctor --fix= does not say whether it merely recommends +the override, applies a temporary override during recovery, or leaves DNS alone +after diagnosis. Define the exact behavior for each DNS class: captive hijack +should open the portal, broken DNS where 1.1.1.1 works should either offer an +explicit temporary repair with cleanup verification or recommend the command, +and port-53/egress blocking should stop as upstream/not locally fixable. This is +blocking because DNS is one of the most common "connected but unusable" failures +and the current doctor contract is ambiguous. + +Disposition: accept — added "DNS handling in doctor (explicit per class)" under +the new Doctor section: hijack → open portal (no DNS mutation); broken-but-1.1.1.1 +→ explicit temporary override with cleanup verification under =--fix=, recommend +otherwise; egress-blocked → terminal =upstream-not-local=. + +** DONE Make auth failures terminal user-action states :blocking: +Wrong WPA password, missing NM secret, locked keyring/polkit denial, enterprise +802.1X certificate/identity failure, and portal login-required are not fixed by +resetting or bouncing NetworkManager. The doctor sequence should classify these +as =needs-user-action= terminal states, stop before looping through destructive +repairs, and tell the user the exact next action (enter password, edit profile in +=nmtui=/=nmcli=, accept portal terms, provide cert/identity, or retry with +admin auth). This blocks because repeated reset/bounce against auth failures is +slow, noisy, and can make the network state worse without helping. + +Disposition: accept — added the =needs-user-action= terminal outcome to the +Doctor section: wrong password / missing secret / keyring-or-polkit denial / +802.1X cert-or-identity failure / portal-login-required all stop the doctor before +any destructive repair and name the exact next step. + +** DONE Define upstream/AP/provider failure terminal states :blocking: +Some failures are not client-repairable: AP has no uplink, hotel gateway is +down, DHCP server is broken, gateway drops traffic, ISP outage, or captive +portal backend is failing. The spec should define how =diagnose= proves "local +link is up but upstream is broken" and how =doctor --fix= stops after local +repairs are exhausted with a clear message like "local repairs tried; likely +upstream/AP/provider" plus the evidence. This blocks because users need to know +when to stop poking the laptop and switch networks or contact the venue. + +Disposition: accept — added the =upstream-not-local= terminal outcome: diagnose +proves link-up + IP + gateway-reachable but no route out and no captive redirect; +=doctor --fix= stops after local repairs with "local repairs tried; likely +upstream/AP/provider" + evidence → switch network / contact venue. + +** DONE Decide how VPN and policy routing affect v1 diagnosis +VPN/WireGuard management is Phase 5, but active VPNs, policy routes, DNS +overrides, and firewall killswitches can break apparent internet access in v1. +The current spec does not say whether v1 detects active VPN/policy routing and +classifies "network is fine, VPN route/DNS is broken" separately from WiFi +failure. Add either a v1 diagnostic check for active VPN/default-route/DNS +ownership with a "deferred repair" outcome, or explicitly state that VPN-routed +failures are out of scope and may be misclassified. This is blocking if Craig +expects the module to diagnose normal daily-driver network failures while VPN +tooling remains separate. + +Disposition: accept (chose the detect-and-classify option) — v1 detects an active +VPN / non-NM default route / non-NM DNS owner and classifies =deferred/vpn= ("link +is fine; internet is VPN-routed"), distinct from a WiFi failure. v1 does not +repair it (VPN management is Phase 5); it names the VPN as the likely owner and +stops. Added to the Doctor section + the coverage table + a doctor-classification +test. + +** DONE Remove stale GPG-store references from the resolved spec +The spec now decides "no separate credential store; secrets live in +NetworkManager", but the Testing plan still mentions =gpg round-trip= and =GPG +store= tests, and the panel-state list still mentions a no-GPG-key state. Remove +those stale references and replace them with NM-secret/no-secret-leak tests. +This is non-blocking for product behavior but blocking for implementation +clarity: otherwise tests will be written for a credential store that no longer +exists. + +Disposition: accept — replaced the Testing-plan =gpg round-trip= / =GPG store= +bullets with an "NM secrets / no-leak" test (add/edit writes the secret via nmcli; +assert no PSK/EAP in any JSON/log/error; no store to round-trip) and dropped the +=no-GPG-key= panel state. Residue from the cj-comment pass that dropped the store. + +** DONE Reconcile status, goal, and task text before implementation :blocking: +The spec status says "Implementation-ready with caveats" and "Phase 1 ready to +build", but the body still has an unresolved enterprise add/edit VERIFY, the +Goal still says "optional GPG-encrypted secret store", and the unified task title +still names "GPG-stored secrets" even though the accepted design removed the +store. Before implementation, make the top-level status, goal, scope, task +mapping, and resolved decisions agree with the current design. This blocks +readiness because a developer starting from the top of the file would still build +or plan around abandoned GPG-store behavior. + +Disposition: accept — fixed the Goal ("secrets stay in NM's own store"), the +=[#B]= task-mapping line (notes the "GPG-stored secrets" framing is superseded by +decision 5), the enterprise VERIFY (now resolved → Status updated), and corrected +the stale =pytest= mentions to =unittest= (the repo's actual harness). Top-of-file +status/goal/scope/decisions now agree with the design. + +** DONE Resolve enterprise add/edit scope or make the caveat explicit :blocking: +The spec still says "One open question for Craig: pull enterprise add/edit into +v1?" and points to a VERIFY in =todo.org=. That is a real product-scope decision: +if enterprise add/edit is in v1, panel forms, nmcli command sequences, tests, +error messages, and docs change materially; if it is out, the UI must consistently +show activate-only with "edit in nmtui/nmcli". Decide it in the spec before +implementation, or downgrade the status to =Ready with caveats= with this exact +accepted caveat. As written, the spec cannot be plain =Ready=. + +Disposition: accept — Craig decided (2026-06-29): enterprise add/edit is vNext, +activate-only in v1. Settled in the Status line, the Scope/Out bullet, decision 9, +and the VERIFY (now DONE in todo.org). The UI shows activate-only with "edit in +nmtui/nmcli" consistently. Evidence: 24 saved profiles, 0 enterprise. + +** DONE Define the concrete test harness and coverage gate :blocking: +The spec says TDD, fake binaries on PATH, and benchmark tests, but it does not +define the actual harness contract: pytest vs unittest for the =net= package, +where fake =nmcli=/=curl=/=speedtest-go=/=rfkill=/=resolvectl= live, how test +fixtures encode command histories, how subprocess timeouts are simulated, how +Waybar scripts are executed end-to-end, and how coverage is run. Add the exact +Makefile targets (=test=, =test-unit= or package-local =pytest=), pytest config, +coverage command (e.g. branch coverage over =net/= and =waybar-net= wrappers), +minimum threshold, and the rule for reading the coverage report to add missing +tests before declaring a phase done. This blocks readiness because "what is the +test harness?" is still answerable only by analogy to older suites. + +Disposition: accept — added the "Harness + coverage gate" section. Corrected the +premise: the repo is =unittest= (=make test= → =python3 -m unittest=, 33 suites), +not pytest. Pinned the fake-binary stub convention (=tests/<name>/fake-*= on a +temp PATH), the fixture command→output map, timeout simulation, the end-to-end +=waybar-net= subprocess run, and coverage via a throwaway venv (coverage.py is +absent system-wide) with a ≥90% branch target on the pure modules. + +** DONE Use coverage to find missing behavior, not just report a percentage :blocking: +The spec does not say how coverage findings affect implementation. For this +feature, line coverage alone can miss the important holes: doctor classification +branches, cleanup-unverified paths, redaction paths, degraded hot-path fallbacks, +timeout branches, and auth/upstream/VPN terminal states. Define coverage review +criteria per phase: branch coverage for pure classifiers and parsers, named +untested branches allowed only with comments or manual-check entries, and a +required "coverage gap pass" after the first green test run that maps uncovered +logic back to tests or consciously excluded live-only behavior. This blocks +readiness because the current test plan is broad but does not force the suite to +expose missing edge tests. + +Disposition: accept — added the "Coverage as a gap-finder, not a number (per +phase)" subsection: branch coverage required for the doctor classifier (every +outcome), cleanup-unverified, redaction, degraded-fallback, timeout, and the +parsers; a mandatory coverage-gap pass after the first green run mapping each +uncovered branch to a test or a named live-only exclusion; a phase isn't done +until that pass is recorded. + +** DONE Convert error classes into exact user-facing strings and evidence fields :blocking: +The failure table and doctor outcomes classify errors well, but many messages +are still templates or descriptions rather than final text. Add exact strings +for indicator tooltip, notification, CLI stderr, JSON =error.message=, and panel +banner/step text for every failure-mode row, including cases doctor cannot fix: +wrong password, missing secret, enterprise cert failure, upstream/AP/provider +failure, VPN-routed failure, hard rfkill block, DNS cleanup failure, speedtest +missing, and HTTP interception without parseable URL. For each string, specify +the redacted evidence included and the next action. This blocks UX readiness +because "useful error" is only testable once the actual text and evidence are +defined. + +Disposition: accept — rewrote the Failure states section: each row now carries the +exact final string (with =<placeholder>= evidence), the evidence field, and the +next action, plus a per-surface rendering rule (indicator tooltip / notify / +CLI+JSON error.message+detail+code / panel banner all render the one canonical +string). Added the missing doctor-unfixable rows: hard rfkill, wrong password / +missing secret, enterprise cert failure, upstream/AP/provider, VPN-routed, HTTP +interception without a parseable URL, and DNS cleanup-unverified. + +** DONE Add an enhancement disposition table +The spec captures several good enhancements (doctor, Makefile recovery, rfkill, +airplane absorption, VPN phase), but it does not show that low-cost adjacent +enhancements were considered and accepted/deferred/rejected. Add a small radar +table for likely affordances: copy redacted doctor report, open/copy portal URL, +retry with hardware MAC, forget network, rescan now, pin speedtest server, show +last good network/result, watch mode for =net doctor=, desktop notification +actions, QR-code/share WiFi import/export, and keyboard picker. Mark each +=v1=, =vNext=, or =rejected= with a one-line reason. This is non-blocking, but it +prevents accidental loss of cheap UX wins and keeps the v1 panel focused. + +Disposition: accept — added the "Enhancement radar" table dispositioning all the +named affordances: open/copy portal URL, forget network, rescan, hardware-MAC +retry, pin speedtest server, copy redacted doctor report = v1; last-good +network/result, doctor watch mode, actionable notifications, keyboard picker = +vNext; QR-share = rejected (low value for a 2-machine personal setup). + +** DONE Tighten the panel UX flow before Phase 2 +The panel has sections and state machines, but not a concrete interaction flow: +default focused section, row content, primary/secondary buttons, disabled-state +rules, confirmation wording for reset/bounce/DNS override, how "Get me online" +reports each escalation, what stays visible after the panel closes, and keyboard +navigation. Add a short UX flow spec or wire-level outline before Phase 2. This +is non-blocking for Phase 1, but it blocks Phase 2 implementation because a GTK +panel can easily become noisy or surprising if these defaults are invented while +coding. + +Disposition: accept — added the "Panel UX flow (settle before Phase 2)" +subsection: default focus (Connections, or Diagnose when opened from a captive +state), row content, one primary button per section, disabled-state rules, exact +confirmation wording for reset/bounce/DNS-override/remove, the live "Get me +online" escalation reporting, what survives panel close, and keyboard nav. + +* Review and iteration history + +** 2026-06-29 Mon @ 17:00:39 -0400 — Codex — reviewer + +- *What changed or was recommended:* Rubric: =Not ready=. Applied the + spec-review workflow and added blocking findings for diagnostics structure, + user-facing errors, observability, nmcli contracts, cache concurrency, + hot-path performance, Waybar click actions, mutation rollback, credential + security, unsupported WiFi types, panel states, speed-test dependency + verification, install boundaries, and test coverage. +- *Why:* The spec has the right broad shape, but this feature combines a status + indicator, NetworkManager mutation, sudo-backed remediation, captive-portal + probing, secret storage, and GTK UI. Too many failure modes were still left for + implementers to invent during coding. +- *Artifacts:* Findings recorded in =Review findings=. Local code read included + =~/.dotfiles/hyprland/.local/bin/waybar-netspeed=, + =~/.dotfiles/common/.local/bin/captive=, + =~/.dotfiles/tests/waybar-netspeed/test_waybar_netspeed.py=, + =~/.dotfiles/tests/captive/test_captive.py=, pocketbook's GTK layer-shell + scaffold, current Waybar config, and the linked =todo.org= tasks. External + checks included the Arch nmcli man page and upstream LibreSpeed CLI page. + +** 2026-06-29 Mon @ 17:35:00 -0400 — Claude Code (archsetup) — responder + +- *What changed:* Processed all 18 findings — 13 accepted, 5 modified, 0 + rejected. Accepts added new sections: Repository + dependencies, Performance + budgets, nmcli contract, JSON schemas, Diagnostics contract (+ Diagnose vs + Repair), Failure states/messages/recovery, Observability, cache + concurrency/atomicity/staleness, mutation safety + rollback, the rewritten + credential security model, the expanded test plan, JSON schemas, the renamed + Implementation phases with per-phase acceptance criteria, and a Status line. + Modifies: deferred the =net debug-bundle= command (log + =diagnose --json= + cover it), made click actions phase-aware + non-blocking, scoped enterprise + WiFi to activate-only (add/edit vNext), trimmed the panel state set to states + that occur on the two-machine fleet, and switched the speed-test backend to the + already-installed =speedtest-go= (verified live) instead of a new + librespeed-cli dependency. +- *Why:* Codex's review was strong and mostly right; the modifies are scope calls + for a single-user, two-machine tool (avoid gold-plating, use installed deps, + don't model impossible states) and one phasing reconciliation (the panel is + Phase 2, so Phase-1 clicks can't open it). +- *Artifacts:* All 18 findings completed =DONE= in place with dispositions. Live + check: =pacman -Qq | grep speed= (speedtest-go-bin installed; =speedtest-go + --json= confirmed). Status now "review incorporated (Codex, 2026-06-29)", + implementation-ready with the named Phase-2/3 caveats; Phase 1 ready to build. + +** 2026-06-29 Mon @ 17:37:58 -0400 — Claude Code (archsetup) — responder (cj comments) + +- *What changed:* Folded in Craig's 10 cj comments on the spec. Design changes: + (1) dropped the separate credential store entirely — secrets stay in NM's own + =system-connections= (root =0600=, inline), touched via nmcli; no GPG, no + gpg-agent (rewrote the secrets section, decision 5, dropped Phase 4 + the gpg + dep). (2) Added =net doctor [--fix]= + Makefile console-recovery targets + (=make online= etc.) as a first-class TTY path; reversed the earlier + defer-the-doctor call (decision 10). (3) Added a full-stack =bounce= repair and + an =rfkill= unblock repair + indicator state — the rfkill one recovers the + framework-laptop post-power-loss soft-block Craig hit. (4) =custom/net= absorbs + the airplane module; the standalone airplane scripts/tests/module are deleted on + ship. (5) Moved VPN/WireGuard from "out" to a planned Phase 5. (6) Added a + "Help + documentation" section (CLI help / panel help / user guide). Answered + the enterprise-defer rationale and the captive-auto-login explanation inline. +- *Why:* Craig's comments simplify (NM-only secrets, fewer deps) and harden the + recovery story (doctor + make targets + rfkill/bounce reachable from a dead-GUI + console — his stated need). Enterprise add/edit in v1 is the one open call, + raised as a VERIFY. +- *Artifacts:* All 10 cj blocks removed. Live checks: airplane scripts/tests + present (confirmed deletion targets); =/etc/NetworkManager/system-connections/= + is root =0600= with inline secrets (confirms the NM-only secret model); + =rfkill= present. VERIFY filed in =todo.org= for the enterprise-scope decision. + +** 2026-06-29 Mon @ 17:49:24 -0400 — Codex — reviewer + +- *What changed or was recommended:* Added a second feedback batch from the + common network failure-mode review: require a diagnose/doctor coverage table, + pin DNS repair semantics, make auth failures terminal user-action states, + define upstream/AP/provider terminal states, decide how active VPN/policy + routing affects v1 diagnosis, and remove stale GPG-store references. +- *Why:* The design now has a solid recovery shape, but the field question is + sharper: for each common failure, does =diagnose= catch it, does =doctor --fix= + fix it, or does the user need to switch networks/provide credentials/contact + the venue? That mapping needs to live in the spec, not in chat. +- *Artifacts:* Findings appended under =Review findings= after reviewing the + current spec's Diagnostics, Failure states, Doctor, Testing plan, and + responder history. + +** 2026-06-29 Mon @ 18:06:22 -0400 — Codex — reviewer + +- *What changed or was recommended:* Added a readiness pass focused on + development-time questions, missed enhancements, UX cleanliness, error wording, + and test/coverage mechanics. New findings require reconciling stale status/goal + text, resolving or explicitly caveating enterprise add/edit, defining the + concrete harness and coverage gate, making coverage actionable, turning failure + classes into exact user-facing strings, adding an enhancement disposition + table, and tightening the Phase-2 panel UX flow. +- *Why:* The network failure model is now strong, but implementation would still + force developers to invent test infrastructure, coverage expectations, final + error copy, and several UX details. The spec should decide those up front or + explicitly mark them as caveats. +- *Artifacts:* Findings appended under =Review findings= after reading the + updated spec, current archsetup Makefile test targets, pocketbook's pyproject, + and the linked =todo.org= enterprise VERIFY. + +** 2026-06-29 Mon @ 17:55:08 -0400 — Claude Code (archsetup) — responder (2nd review) + +- *What changed:* Processed all 6 second-batch findings — 6 accepted, 0 modified, + 0 rejected. Added a "Doctor: escalation, classification, terminal states" + section with four outcomes (=fixable= escalating lightest-first; =needs-user- + action=, =upstream-not-local=, =deferred/vpn= terminal) and explicit per-class + DNS handling. Added a "Failure-mode coverage" table (22 rows: detect / doctor-fix + / terminal action, org-table-standard conformant, ≤120). Cleaned the last + GPG-store residue (Testing plan → NM-secret/no-leak test; dropped the no-GPG-key + panel state). Added a doctor-classification test class. +- *Why:* The findings sharpen the doctor from "escalate repairs" to "classify, + then stop at the right terminal state" — the safety property that keeps + =doctor --fix= from looping destructive repairs against a wrong password or an + upstream outage. The VPN finding got the detect-and-classify option (not repair; + that's Phase 5) so v1 doesn't misread a VPN-routed failure as WiFi down. The + coverage table is the single artifact Craig asked for to trust the field + coverage. +- *Artifacts:* All 6 findings =DONE= in place; cookie =[24/24]=. Table conformed + via =wrap-org-table.el= (max line 120). No new findings introduced; no scope + expansion past what the findings asked. Spec remains implementation-ready, the + enterprise add/edit VERIFY still the one open decision. + +** 2026-06-29 Mon @ 18:24:51 -0400 — Claude Code (archsetup) — responder (3rd review) + +- *What changed:* Processed the third review (7 findings). The enterprise-scope + finding closed first on Craig's call (vNext, activate-only). The other 6, all + accepted: (1) reconciled the top-of-file text — fixed the Goal's GPG-store + wording, the =[#B]= task-mapping line, the resolved enterprise VERIFY, and + corrected the stale =pytest= mentions to =unittest= (the repo's real harness). + (2) Added a "Harness + coverage gate" section (unittest, fake-binary stubs on a + temp PATH, venv coverage, ≥90% branch on pure modules). (3) Added a per-phase + "coverage as a gap-finder" pass. (4) Rewrote the Failure states section to exact + final strings + evidence fields + a per-surface rendering rule, and added the + missing doctor-unfixable rows. (5) Added the "Enhancement radar" table + (v1/vNext/rejected). (6) Added the "Panel UX flow" subsection. +- *Why:* The findings close the gap between "design decided" and "a developer can + start": the harness/coverage contract, the exact UX strings, and the panel flow + are the things otherwise invented mid-code. The =pytest=→=unittest= correction + was a real defect — the spec contradicted the repo's actual test convention. +- *Artifacts:* All 31 findings =DONE=; cookie =[31/31]=. Both new tables conformed + via =wrap-org-table.el= (coverage 120, radar 110). Harness verified against the + live repo (33 unittest suites, =make test=, coverage.py absent → venv). Status + raised to "Ready for Phase 1; Ready-with-caveats overall" — no open decisions + remain. diff --git a/docs/design/2026-06-29-waybar-timer-module-spec.org b/docs/design/2026-06-29-waybar-timer-module-spec.org new file mode 100644 index 0000000..4b0ed0e --- /dev/null +++ b/docs/design/2026-06-29-waybar-timer-module-spec.org @@ -0,0 +1,217 @@ +#+TITLE: Waybar Timer Module (wtimer) — Design Spec +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-06-29 + +* Goal + +One always-visible waybar module that keeps time four ways — countdown timer, +wall-clock alarm, count-up stopwatch, and pomodoro — with several items running +at once. The bar shows the most urgent item with a per-type glyph; the tooltip +lists them all. Backed by a single =wtimer= script over a small JSON state file. +notify fires on completion. fuzzel drives creation. No GTK app. + +Source task: archsetup =todo.org= "Waybar timer module" (=:waybar:=), including +the folded roam-capture scope expansion (mode-selectable single panel, +stopwatch, multiple simultaneous, per-mode hover text). + +* Scope + +** In +- *Timer* — count down a duration, notify on elapse, then remove. +- *Alarm* — fire at a wall-clock time, notify, then remove. +- *Stopwatch* — count up from start; pause/resume; manual stop. +- *Pomodoro* — work/break cycles (25/5, long break 15 after 4 works), auto-advance with a notify at each phase change, runs until cancelled. +- *Multiple simultaneous* — N items of any mix held in state. Bar shows one primary item plus a =+N= badge; tooltip lists every item with its remaining/elapsed and label. +- *Pause / resume* per item; *cancel* one or all. +- *Interactions* — click to create (fuzzel), middle-click pause/resume primary, right-click cancel (fuzzel pick), scroll to cycle which item is primary. +- *Per-type glyph + CSS state classes* (running / paused / urgent / break). +- *Persistence across waybar restarts* (state file in the runtime dir). + +** Out (v1, note for later) +- No GTK panel — waybar module + tooltip + fuzzel only. +- No persistence across *reboot* (runtime-dir state clears). Alarms set before a reboot won't survive. Acceptable v1; revisit with =~/.local/state= + a catch-up-on-boot pass if wanted. +- No sound selection per item (uses notify's type sound). +- No history/stats of completed pomodoros beyond the current run's cycle count. + +* Architecture + +- =wtimer= — a single executable Python script in =hyprland/.local/bin/=. Chosen over POSIX sh (the other waybar backings) deliberately: the multi-item state machine, time arithmetic, pomodoro FSM, and JSON I/O are cleaner in Python, and it gives real line/branch *coverage numbers* (Craig asked for them). Precedent: pocketbook is Python in this repo. +- *Pure core + thin IO shell.* All logic is pure functions taking =now= as a parameter (dependency-injected clock — satisfies testing.md: no recursion, no scope-shadowing, production reads =time.time()=, tests pass an explicit instant). The CLI layer does the IO: read state, call pure fns, write state, emit JSON, shell out to notify/fuzzel. +- *State file*: =$XDG_RUNTIME_DIR/waybar/wtimer.json= (env override =WTIMER_STATE= for tests). Same runtime-dir convention as =sysmon-metric=. +- *Heartbeat*: waybar calls =wtimer render= every 1s. =render= runs the tick logic first (detect elapsed items, fire notify, advance pomodoro, drop finished timers/alarms), then prints the waybar JSON. One entry point waybar polls; no separate daemon. +- *Concurrency (BLOCKER from review).* The 1s =render= and the click/scroll handlers (=add=, =toggle=, =cancel=, =cycle=) are separate processes doing read-modify-write on the same state file. Without serialization, last-writer-wins drops a click's =add=, or clobbers render's "item removed/advanced" write so the same item ticks and notifies again next second. So every read-modify-write takes an exclusive =flock= on the state file for the whole cycle, and writes go through a temp file + =os.replace= (atomic), so a concurrent render never reads a half-written file. This is what actually makes "notify fires once" true — the mutation is only authoritative under the lock. +- *State dir*: =render= and the mutating commands =mkdir -p= the state dir first (=$XDG_RUNTIME_DIR/waybar/= may not exist on a fresh boot). +- *Clock injection everywhere*: =now= comes from =WTIMER_NOW= (epoch) if set, else =time.time()=. Pure fns take =now= as a parameter; the CLI seeds it from the env. This lets the CLI integration tests hit boundary instants (exactly-at-target), not just the pure-fn tests. +- *Instant refresh*: after any mutating command, send waybar =SIGRTMIN+14= (the module's signal) so the bar updates immediately instead of lagging up to 1s. Faked in tests (=WTIMER_REFRESH= override, default =pkill -RTMIN+14 waybar=). + +* State model + +#+begin_src json +{ + "items": [ + {"id": "1", "type": "timer", "label": "tea", "target": 1751240400, "duration": 300, "paused_left": null}, + {"id": "2", "type": "alarm", "label": "", "target": 1751251200, "paused_left": null}, + {"id": "3", "type": "stopwatch", "label": "", "start": 1751240000, "paused_elapsed": null}, + {"id": "4", "type": "pomodoro", "label": "", "target": 1751241900, "phase": "work", + "cycle": 1, "work": 1500, "short": 300, "long": 900, "interval": 4, "paused_left": null} + ], + "primary": "1", + "seq": 4 +} +#+end_src + +- =seq= is the monotonic id source (string ids). +- *Paused* timer/pomodoro: =paused_left= holds seconds remaining; =target= ignored while paused; resume sets =target = now + paused_left=, =paused_left = null=. +- *Paused* stopwatch: =paused_elapsed= holds elapsed seconds; resume sets =start = now - paused_elapsed=. +- =primary= is the id the bar shows; =null= or stale → auto-select (below). + +* Display logic + +** Primary selection (bar text) +1. If =primary= names a live item, show it. +2. Else the running countdown (timer/alarm/pomodoro) with the smallest remaining. +3. Else the first running stopwatch. +4. Else idle (no items). + +** Bar text +- =<glyph> <time>= for the primary, plus = +N= when N other items exist. +- Idle: a dim timer glyph alone (or empty — decide at render; lean dim glyph so the module has a stable click target). +- =time= formatting: =M:SS= under 1h, =H:MM:SS= at/over 1h. Stopwatch counts up; timer/alarm/pomodoro count down to target. +- Paused item: prefix a pause glyph or rely on the =paused= class (CSS dims it). + +** Glyphs (nerd font; final codepoints verified live before merge) +- timer , alarm , stopwatch , pomodoro-work , pomodoro-break (coffee), paused , idle (dim). +- One glyph table at the top of the script so a live-render tweak is one edit. + +** Tooltip (all items) +One line per item: =<glyph> <label-or-type> <remaining/elapsed> (<state>)=. Pomodoro line shows phase + cycle (e.g. =work 2/4=). Header line summarizes count. Empty state: "No timers". + +** CSS classes (the =alt=/=class= field) +=timer= / =alarm= / =stopwatch= / =pomodoro-work= / =pomodoro-break=, plus =paused= and =urgent= (remaining < 60s). Drives color in style.css + both themes. + +* Commands (CLI) + +| Command | Effect | +|---------------------------------+---------------------------------------------------------------------| +| =wtimer render= | tick + emit waybar JSON (the heartbeat) | +| =wtimer add timer <dur> [label]=| add a countdown (=dur= like =25m=, =90s=, =1h30m=, =5= → minutes) | +| =wtimer add alarm <HH:MM> [lbl]=| add a wall-clock alarm (next occurrence of that time) | +| =wtimer add stopwatch [label]= | start a count-up | +| =wtimer add pomodoro [label]= | start a pomodoro at work phase | +| =wtimer new= | fuzzel: pick type, prompt value, dispatch to =add= (thin wrapper) | +| =wtimer toggle [id]= | pause/resume the item (default: primary) | +| =wtimer cancel <id>= | remove one item | +| =wtimer pick-cancel= | fuzzel: choose an item to cancel (right-click handler) | +| =wtimer cancel-all= | clear all | +| =wtimer cycle [next|prev]= | move the primary pointer across all items (incl. paused), state-list order, wrapping | + +Duration parse: =Nh=, =Nm=, =Ns= combos, or a bare integer = minutes. Reject +unparseable input (exit non-zero, notify nothing). Alarm parse: =HH:MM= 24h; if +that time today already passed, target tomorrow. + +* Notifications + +- Timer elapse: =notify alarm "Timer" "<label or duration> done" --persist=. +- Alarm fire: =notify alarm "Alarm" "<HH:MM><, label>" --persist=. +- Pomodoro phase change: =notify info "Pomodoro" "Work → short break (3/4)"= (no =--persist=; phase nudges shouldn't pile up), long-break and work-resume worded accordingly. +- notify is faked on PATH in tests; assert type + that it fired once per event. + +* Pomodoro semantics + +- Defaults: work 25m, short 5m, long 15m, interval 4 (long break after every 4th work). +- FSM: work → short → work → short → work → short → work → long → work … +- =cycle= counts completed works in the current set (1..interval); resets after a long break. +- Each phase elapse advances =phase=, recomputes =target=, fires the phase notify. Pomodoro never auto-removes; cancel ends it. + +* Waybar wiring + +** Module def (config) — signal 14 (next free; 8–13 used) +#+begin_src json +"custom/timer": { + "exec": "wtimer render", + "return-type": "json", + "interval": 1, + "signal": 14, + "on-click": "wtimer new", + "on-click-middle": "wtimer toggle", + "on-click-right": "wtimer pick-cancel", + "on-scroll-up": "wtimer cycle next", + "on-scroll-down": "wtimer cycle prev" +} +#+end_src + +** Position — right of the sysmon (battery/resource) module +Insert =custom/timer= into =modules-right= immediately after =custom/sysmon= +(between =custom/sysmon= and =custom/netspeed=). On screen that places it just +right of the battery/resource readout. + +** Not collapsible — survives the right-side collapse +The module *definition* lives in the canonical config object, and =waybar-collapse= +only swaps the =modules-right= *array* in the runtime copy (which it seeds from +canonical, so the def is always present). So making the timer non-collapsible is +purely an array-membership change: add =custom/timer= to the =waybar-collapse= +right *base set* so it stays listed when the right side collapses: +- laptop: =["custom/arrow-right","custom/sysmon","custom/timer","tray","custom/date","custom/worldclock"]= +- desktop: =["custom/arrow-right","custom/timer","tray","custom/date","custom/worldclock"]= +Update the =tests/waybar-collapse= base-set expectations to match (TDD the change). + +* CSS + +Add =#custom-timer= plus the state classes to all three stylesheets. Keep the +*selectors and structure* parallel across the three (what the theme-drift test +checks); the actual color *values* are per-theme (dupre vs hudson) and differ by +design, so this is structural parity, not byte-identity. Confirm against the real +CSS files what the drift test compares before editing. +- =hyprland/.config/waybar/style.css= +- =hyprland/.config/themes/dupre/...= waybar css +- =hyprland/.config/themes/hudson/...= waybar css +Colors: normal = foreground; =urgent= = a warning hue (reuse the sysmon +warn/crit palette); =paused= = dimmed; =pomodoro-break= = a calmer accent. + +* Testing plan (TDD) + +- Suite: =tests/wtimer/test_wtimer.py= (auto-discovered by =make test='s =tests/*/test_*.py= glob — no enumeration gap). +- *Pure-function tests* (fast, the bulk), explicit injected =now=: + - =parse_duration=: =25m=, =90s=, =1h30m=, =5= (→min), =0=, negative, garbage, empty (Normal/Boundary/Error). + - =parse_alarm=: future today, already-passed-today → tomorrow, =00:00=, =23:59=, =24:00=/=12:60= invalid, non-=HH:MM=. + - =format_time=: 0, 59s, 60s, 3599s, 3600s, multi-hour, negative clamps to 0. + - =add_item= for each type; =seq= increments; ids unique. + - =tick=: timer not-yet-elapsed (no change), exactly-at-target, past-target (fires once, removed); alarm same; pomodoro work→short→…→long→work advance + cycle counting + the 4th-work→long boundary; paused items never tick; multiple items in one tick. + - =select_primary=: explicit primary, stale primary falls back, soonest-remaining rule, stopwatch-only, empty. + - =render_payload=: text/tooltip/class for each type + paused + urgent + =+N= badge + idle. + - =toggle= pause then resume round-trips remaining/elapsed exactly; =cycle= wraps; =cancel= / =cancel-all=. +- *CLI integration tests* (subprocess, fakes on PATH, =WTIMER_NOW= to hit boundaries): =add= then =render= round-trip; =render= fires the faked =notify= once on an elapsed item and drops it; state file created if absent; *missing parent dir* created (fresh-boot case); corrupt/empty state file → treated as empty, no crash; mutating command sends the faked refresh signal. +- *Concurrency test*: spawn overlapping =render= + a mutating command against one state file; assert no lost update (the added item survives) and exactly-once notify (no double-fire from a clobbered tick). This is the regression guard for the flock/atomic-write fix. +- *Mocking boundary*: fake =notify=, =fuzzel=, =killall= on PATH (record calls); never mock the wtimer logic. Clock injected as a parameter. +- *Coverage*: measure with =coverage.py= if present (target 90%+ on the logic per testing.md business-logic bar); report the actual number. If =coverage= is absent, report per-command/per-branch case coverage explicitly and flag the tool gap (verification.md). +- =tests/waybar-collapse= base-set expectations updated for the new module. +- =tests/= theme-drift check stays green (CSS parity). + +* Files touched + +dotfiles branch =waybar-timer-module=: +- =hyprland/.local/bin/wtimer= (new, executable) +- =tests/wtimer/test_wtimer.py= (new) +- =hyprland/.config/waybar/config= (module def + modules-right position) +- =hyprland/.local/bin/waybar-collapse= (base-set) + =tests/waybar-collapse/...= (expectations) +- =hyprland/.config/waybar/style.css= + dupre + hudson waybar css (CSS) + +archsetup (main, at the end): +- this spec +- =todo.org= task closure + +* Resolved decisions (no approvals — my calls) + +- Python, not sh — testability + coverage; pocketbook precedent. +- One =render= heartbeat (no daemon) — simplest, waybar already polls. +- notify fires from =render='s tick, mutation guarantees once-only. +- Primary = user-cycled, else soonest-remaining; =+N= badge for the rest. +- Multiple simultaneous via tooltip list + badge (not a GTK panel) — keeps it "cool yet simple". +- Pomodoro is one self-advancing item, not four chained timers. +- Runtime-dir state (waybar-restart durable, not reboot durable) — v1. + +* Rollback + +All code on the dotfiles =waybar-timer-module= branch off =09815f3=. Squash-merge +at the end; =git switch main && git branch -D waybar-timer-module= reverts cleanly +if it goes sideways. |
