diff options
Diffstat (limited to 'docs/design')
| -rw-r--r-- | docs/design/2026-06-29-waybar-network-module-spec.org | 2094 | ||||
| -rw-r--r-- | docs/design/2026-06-29-waybar-timer-module-spec.org | 217 | ||||
| -rw-r--r-- | docs/design/2026-06-29-zfs-pre-snapshot-installer.org | 106 | ||||
| -rw-r--r-- | docs/design/2026-06-30-captive-portal-login.org | 89 | ||||
| -rw-r--r-- | docs/design/2026-07-02-bluetooth-panel-spec.org | 470 | ||||
| -rw-r--r-- | docs/design/2026-07-02-desktop-settings-panel-spec.org | 128 | ||||
| -rw-r--r-- | docs/design/2026-07-02-file-manager-swallow-spec.org | 141 | ||||
| -rw-r--r-- | docs/design/2026-07-02-net-panel-other-interfaces-spec.org | 189 | ||||
| -rw-r--r-- | docs/design/2026-07-02-timer-panel-spec.org | 113 | ||||
| -rw-r--r-- | docs/design/2026-07-02-waybar-expansion-animation-feasibility.org | 53 | ||||
| -rw-r--r-- | docs/design/2026-07-03-audio-panel-spec.org | 160 | ||||
| -rw-r--r-- | docs/design/2026-07-03-instrument-console-panels-spec.org | 158 |
12 files changed, 3918 insertions, 0 deletions
diff --git a/docs/design/2026-06-29-waybar-network-module-spec.org b/docs/design/2026-06-29-waybar-network-module-spec.org new file mode 100644 index 0000000..3a1260c --- /dev/null +++ b/docs/design/2026-06-29-waybar-network-module-spec.org @@ -0,0 +1,2094 @@ +#+TITLE: Waybar Network Module — Design Spec +#+AUTHOR: Craig Jennings +#+DATE: 2026-06-29 + +* Status + +*Phases 1-3 SHIPPED* (2026-06-29 → 2026-06-30, dotfiles). The core module is live: +the =net= engine (=status/probe/list/up/down/add/edit/remove/rescan/diagnose/repair/ +doctor/portal/speedtest=), the =waybar-net= indicator (split-cadence cache, redacted +event log, display-only airplane absorption per decision 12), and the GTK4 +layer-shell panel (Connections / Diagnose / Repair / Speed test) with the settled bar +clicks (left = panel, middle = =net portal=, right = =net-fix=; airplane on +Super+Shift+A). 230+ net tests; full dotfiles suite green. Live-verified on velox. + +Built on top since the original spec: +- *Captive-portal login engine* (2026-06-30, dotfiles =a7d7559=) — =net portal= now + runs a native =portal-login= repair tier (drop DoT → recover the portal URL from + the redirect → open a throwaway browser profile → auto-restore DoT once online), + replacing the old shell-out to =captive= for the force-portal flow. =net portal + --restore= is the manual fallback. +- *Portal UX fixes from live testing* (2026-06-30, dotfiles =eef6b0b=) — removed a + polkit-gated =resolvectl flush-caches= that popped an auth dialog (the DoT-drop + restart already clears the cache); added an already-online short-circuit so a + forced run on a working connection opens nothing; suppressed Chrome's first-run + wizard; moved =net portal= off the terminal into the panel status line; hardened + the portal-URL extractor against Firefox's detection page. +- *Panel auto-hide + Close button* (2026-06-30, dotfiles =450b7f0=) — the panel + closes on focus-out (popup behavior, suppressed while a child dialog holds focus) + and carries a Close button bottom-right. + +*V2 redesign in flight* (designed 2026-06-30, not yet built — see todo.org "Network +panel redesign — no terminals, verify-everything, full failure coverage"). It +reverses two earlier choices and widens coverage: +- *No terminals anywhere.* =net-popup= is removed; every action and result renders + in the panel. This depends on a passwordless privileged path — a root-owned helper + plus a narrow NOPASSWD sudoers rule, archsetup-installed — because an in-panel + worker thread has no tty to prompt for a password. Reverses decision 11's + "privileged tiers run in a terminal". +- *New navigation* — top tabs Connections | Diagnostics | Performance. Diagnostics + merges Diagnose + Repair (sub-row Diagnose | Get Me Online | Advanced; a shared + area below shows diagnose items and streams repair progress; Advanced reveals the + individual repair buttons, renamed with tooltips). Speed test lives under + Performance. +- *Verify every action* (each mutating op confirms its effect before reporting + success) and *detect + respond to every failure mode* — the full ~44-mode catalog, + edge cases included, lives in the redesign task and supersedes the table below. + +Phase 4 (docs / rollout) and Phase 5 (VPN) remain. Review incorporated (Codex, +2026-06-30): four review rounds + Craig's cj comments are all dispositioned +([40/40], no open findings) — the fourth round reshaped the V2 panel UX (single nav +target, saved-vs-available groups, join-from-row, the auth matrix, progressive +loading, a findable diagnostics report, and the Waybar visual contract; see "V2 panel +UX"). Phases 1-3's manual live checks are under todo.org "Manual testing and +validation". + +* Goal + +One waybar network component that does the whole job: shows connection state +(including the missing "associated but no internet / captive portal" state), +manages connections from a dropdown (nmcli-backed; secrets stay in +NetworkManager's own store, no separate credential file), and runs the network +diagnostics and remediation off the same place +(captive-portal detection + forcing, bounce/reset, gateway/DNS checks, speed +test). + +It unifies three todo tasks that are really one feature: +- =[#C]= "archsetup Waybar Wi-Fi module should show no-internet state" — the + indicator state plus the 2026-06-22 roam expansion (bounce, diagnostics, speed + test off the component). +- =[#B]= "Network-manager dropdown, nmcli-backed" — the management dropdown. (The + todo task's original "GPG-stored secrets" framing is superseded: secrets stay in + NM's own store, decision 5.) +- The network diagnostics already shipped in =captive= (the hotel/captive-portal + tool, formerly =login-page=) become this module's diagnostics engine rather + than a standalone CLI. + +* Scope + +** In +- *Indicator* — wifi/ethernet icon + signal + SSID, plus an internet sub-state: + online / captive / no-internet / connecting / disconnected / airplane. +- *Absorbs the airplane module* — the airplane state + toggle move into + =custom/net= (airplane is a network concern). Once this ships, the standalone + =custom/airplane= module, the =waybar-airplane= + =airplane-mode= scripts, their + =tests/=, and the css are deleted (listed under Files touched). The + desktop-settings panel (sibling =[#B]=) no longer needs an airplane row. +- *Interface-correct* — targets the wifi (or chosen) device, not the + default-route interface, so an active USB tether or wired link can't mask + wifi state. (Same lesson =captive= fixed; the current =custom/netspeed= keys + off the default route and has the bug.) +- *Connection management (panel)* — list saved connections most-recently-used + first, live signal for in-range wifi, click to switch; add / edit / remove for + open + WPA-PSK; activate any existing saved profile (including enterprise ones + NM already stores); ethernet↔wifi and wifi↔wifi switching even when a link + appears mid-session. +- *Diagnostics (panel)* — read-only Diagnose (captive probe 204-vs-portal with + the extracted portal URL, gateway ping, DNS config) separated from mutating + Repair. Repair has tiers, lightest first: rfkill-unblock, per-connection reset + (fresh MAC), full-stack bounce (=nmcli networking off/on=, then restart + NetworkManager if that fails), and the temporary 1.1.1.1 override test. Each + Repair action confirms and verifies cleanup. +- *Speed test (panel)* — down/up/ping with a progress indicator and last-result + shown, via the already-installed =speedtest-go --json=. +- *Connection secrets* — none of our own. Settings and passwords live where NM + already keeps them: =/etc/NetworkManager/system-connections/*.nmconnection= + (root-only =0600=, the PSK/EAP secret stored inline). We read/write them through + nmcli, which handles the privilege. No separate file, no GPG, no gpg-agent — one + fewer dependency, and NM's store is already the secure-at-rest source of truth. +- *Persistence* — connectivity probe result cached in the runtime dir so the + bar reads it cheaply between probes. +- *Observability* — a redacted JSONL event log so a post-failure session can + diagnose without re-running destructive actions. + +** Out (v1, note for later) +- No replacement of NetworkManager's connection engine. NM stays the thing that + connects; we drive it via nmcli. +- No add/edit *form* for WPA-Enterprise / 802.1X in v1. The reason is effort vs + payoff: 802.1X has many interdependent fields (CA cert, client cert, identity, + anonymous identity, phase-2 auth) where a wrong entry silently fails auth, so a + trustworthy form is a lot of UI for connections Craig rarely adds (open + + WPA-PSK covers home, hotels, and phone hotspots). v1 still *activates* existing + saved enterprise profiles and points editing at =nmtui=/=nmcli=. Settled + (Craig, 2026-06-29): enterprise add/edit is vNext — 24 saved profiles on velox, + 0 enterprise, so the form would be unused UI; if one ever appears nmtui adds it + once and the module activates it thereafter. +- No per-connection captive-portal *auto-login* in v1. (That would mean storing a + portal's login form answers — room number, surname, a checkbox — and replaying + them automatically when a known portal is detected, so the page never appears. + Out for v1 because every portal's form differs and it means storing per-venue + answers; v1 just opens the portal for you.) +- No graphing/history of speed-test results beyond the last run. +- No static-IP / proxy / metered / MAC-randomization editing in v1 (activate + existing, edit elsewhere). +- No VPN / WireGuard management in v1, but it's a planned later phase (Phase 5), + not a permanent exclusion — it folds the existing archsetup wireguard tooling + into the same panel/CLI. +- The desktop-settings dropdown (sibling =[#B]=) is a separate module, but it + shares the GTK4 layer-shell panel shell built here. + +* Architecture + +Three layers. Keep the bar cheap, the panel rich, the logic in one tested place. + +1. *Engine* — a =net= Python package (src-layout, unittest), exposing a CLI. Wraps + every nmcli op and owns the diagnostics. Emits JSON. This is the testable + core (fake =nmcli= / =curl= / =speedtest-go= on PATH, like the existing + =waybar-netspeed= and =waybar-sysmon= test harnesses). Precedent: pocketbook is + Python in the dotfiles repo; =wtimer= is Python for the same testability + reason. +2. *Indicator* — a thin =waybar-net= script that calls =net status --json= and + renders icon + signal + state + tooltip. Replaces =custom/netspeed= + (throughput folds into the tooltip). +3. *Panel* — a GTK4 + gtk4-layer-shell app (mirrors pocketbook's structure) + that imports the engine. Hosts connection management, diagnostics, and the + speed test. + +How the existing pieces map in: +- =captive= (bash, shipped) — its cheap portal-detection logic is mirrored natively + in the engine for the fast status path so the bar never blocks on a subprocess, + and it still exposes a =--probe-json= mode the engine reuses. *As built (2026-06-30): + the force-portal flow is now native too* — =repair.py='s =portal-login= tier does + the DoT drop, portal-URL recovery, clean-browser launch, and auto-restore in + Python, so =net portal= no longer shells out to =captive= for it. =captive= stays a + usable standalone CLI. +- =waybar-netspeed= (sh, shipped) — retired; its throughput sampling moves into + the engine's status output and renders in the indicator tooltip only. +- =nmcli= — the connection backend for every op. + +Language note: the engine is Python; the indicator is a thin Python or sh +wrapper over =net status --json=. The bar path must stay fast (see Performance +budgets), so the indicator does no network I/O itself — it reads link state and +the cached connectivity result. + +Privileged-path model (v2, planned): repairs that need root (rfkill unblock, nmcli +modify/up, networking off/on, =systemctl restart NetworkManager/systemd-resolved=, +resolvectl dns/revert, the DoT toggle) go through a single root-owned helper +installed by archsetup, with a narrow NOPASSWD sudoers rule scoped to that helper +only (never a blanket =mv=/=systemctl= rule). =repair.py= calls =sudo <helper> +<verb>=. This is what lets every action run in-panel with no terminal: a GTK worker +thread has no tty, so without a passwordless path it can't prompt. It also fixes a +latent bug in the shipped portal flow — the detached DoT-restore watcher runs with +no tty and silently fails to restore encrypted DNS when sudo creds aren't cached. + +* Repository + dependencies + +- *Code lives in the dotfiles repo* (=~/.dotfiles=), not archsetup. The =net= + package sits in-tree like pocketbook (src-layout, unittest, Makefile target); + =waybar-net= and the =net= CLI entry live in the hyprland tier + (=hyprland/.local/bin/=). Tests under =tests/net/= and =tests/waybar-net/=. + archsetup owns only the *dependency install*, not the code. +- *archsetup installs the deps* in its Hyprland step: =gtk4-layer-shell=, + =python-gobject=, plus =nmcli=/=curl=/=resolvectl=/=rfkill= (already present via + NetworkManager/curl/systemd/util-linux). Speed test uses =speedtest-go= (AUR + =speedtest-go-bin=, already installed on velox); archsetup adds it to the AUR + list. librespeed-cli is the documented fallback if a self-hosted LibreSpeed + server is ever wanted. No =gpg= dependency (secrets live in NM's own store). +- *Daily-drivers*: a stowed-script + AUR-dep feature, so ratio needs the same + =git pull= + stow + the archsetup-added deps. Note the manual dep step in the + rollout. + +** Makefile targets (console recovery is a first-class path) +=net doctor= and the diagnostics are reachable from a bare TTY when waybar and +the GUI are down — that's the case where you most need them. The dotfiles +Makefile carries targets that wrap the =net= CLI so "get back online" is one make +command from the console: +- =make online= — =net doctor --fix= (diagnose, then apply the lightest repair: + rfkill-unblock → reset → bounce → open portal). The headline recovery target. +- =make net-doctor= — =net doctor= (read-only diagnose + recommendation). +- =make net-status= / =make net-diagnose= / =make net-portal= / =make net-reset= + / =make net-bounce= — the individual ops. +- =make test= — already runs =tests/*=; the =net= package's unittest suites are + collected the same way. +These intentionally need only nmcli/curl/rfkill (no GUI, no waybar, no Python +GTK), so they work from a TTY on a broken graphical session. + +* Connectivity model — split cadence + +The indicator polls every ~2s, but a real internet/captive probe every 2s wastes +battery and can re-trigger a captive portal. So split it: + +- *Fast path (every poll, cheap, no network)* — interface, type, SSID, signal, + IPv4 presence, throughput sample. From nmcli / sysfs only. No network I/O. +- *Slow path (cached, TTL ~45s)* — the actual internet/captive probe (the 204 + check + meta-refresh portal extraction). Result cached at + =$XDG_RUNTIME_DIR/waybar/net-connectivity.json= with a timestamp. + +The indicator reads the cache each poll. When the cache is older than the TTL, +=net status= kicks =net probe= in the background (spawn + detach, never awaited) +and renders the last cached sub-state meanwhile. A user-triggered +diagnose/reconnect refreshes the cache immediately. This keeps the bar +responsive and the portal un-poked. + +** Concurrency, atomicity, staleness +- *Single-flight* — =net probe= takes a lock file at + =$XDG_RUNTIME_DIR/waybar/net-probe.lock= (flock, non-blocking). A second probe + while one runs is a no-op, so a flapping 2s poll can't pile up overlapping + probes. +- *Atomic writes* — the cache is written to a temp file + =os.replace= (atomic + rename), so a reader never sees a half-written cache. Same pattern as =wtimer=. +- *Max probe runtime* — the probe has a hard timeout (≤ 6s total: curl + =--max-time 5= + slack). On timeout it writes an =unknown= result, never hangs. +- *Stale classes* the indicator distinguishes: fresh (< TTL), stale (TTL..3×TTL, + shown with a subdued/aging hint), expired (> 3×TTL → treat as unknown), + unknown (no cache / probe failed). The bar never shows a confident "online" + past the expired threshold. +- *Invalidation* — the cache records the iface + SSID + active-connection UUID it + was taken under; a change in any of them invalidates it immediately (a + reconnect must not show the old network's verdict). +- *Crash cleanup* — a stale lock older than the max runtime is ignored/reclaimed. + +* Performance budgets (hot path) + +The bar exec path (=waybar-net= → =net status=) must stay responsive: +- *Budget*: =net status= returns in < 100ms typical, < 250ms worst case. +- *No sleeping in the bar path.* Throughput is sampled from two reads of + =/sys/class/net/<iface>/statistics/{rx,tx}_bytes= across the *waybar poll + interval itself* (delta since the last cached sample + timestamp), not via an + in-process =sleep= like the old =waybar-netspeed=. The cache holds the prior + counters. +- *Subprocess cap*: at most one =nmcli= invocation on the hot path (a single + =nmcli -t -f ...= multi-field query), plus sysfs reads. Never a per-field + nmcli call. +- *Every subprocess has a timeout* (=nmcli --wait 2=, =subprocess timeout=). On + timeout or error the indicator emits a degraded JSON state (class + =net-degraded=, a neutral glyph) rather than blocking or crashing waybar. +- *Benchmark test*: a fake slow =nmcli= asserts =net status= still returns within + budget by falling back to the degraded state. + +* Engine — =net= CLI surface + +All subcommands take =--json= where a machine reads them. Pure formatting/state +functions under the CLI; IO (nmcli, curl, file) at the edges. Every subcommand +exits non-zero with a JSON error envelope (see JSON schemas) on failure. + +- =net status [--json] [--iface IF]= — fast link state + cached connectivity + sub-state + throughput. The indicator's source. Never does network I/O. +- =net probe [--iface IF]= — run the connectivity/captive probe now, update the + cache (single-flight, atomic), print online | captive (+ portal URL) | + no-internet | unknown. Mirrors =captive='s cheap detection natively. +- =net list [--json]= — saved connections, MRU order, active flag, plus in-range + wifi with signal. +- =net up <uuid>= / =net down [--iface IF]= — switch / disconnect. Operates on + UUID, not name (see nmcli contract). +- =net add= / =net edit <uuid>= / =net remove <uuid>= — manage connections + (open + WPA-PSK) through nmcli; the secret lands in NM's own + =.nmconnection=. Enterprise profiles are activate-only. +- =net rescan [--iface IF]= — wifi rescan. +- =net diagnose [--json]= — read-only report: gateway ping, DNS config, captive + probe. The structured contract below. Doubles as the post-failure snapshot. +- =net repair <action> [--json]= — mutating remediation, lightest first: + =rfkill= (unblock + radio on), =reset= (fresh MAC), =bounce= (full-stack: + =nmcli networking off/on=, escalating to =systemctl restart NetworkManager=), + =dns-test= (temporary 1.1.1.1 override, auto-reverted). Each confirms via the + caller and verifies cleanup. +- =net doctor [--json] [--fix]= — one-shot "get me online" mode for the console: + runs the full diagnose, then applies the lightest repair that fits (unblock + rfkill, reset, bounce, open portal) — read-only without =--fix=, acting with + it. The TTY recovery path when waybar/the GUI is down (see the Makefile + targets). +- =net portal [--restore]= — the native captive-login flow (=repair.py= =portal-login= + tier): short-circuits if already online, else drops DoT to plain DNS, recovers the + portal URL from the redirect, opens it in a throwaway browser profile, and spawns a + detached watcher that restores DoT once online. =--restore= forces the restore now. +- =net speedtest [--json]= — =speedtest-go --json= run; down/up/ping. + +* nmcli contract + +The command wrapper is the reliability boundary; SSIDs and connection names +contain spaces, colons, duplicates, hidden names, and non-ASCII. Rules: + +- *Terse, field-selected output*: =nmcli -t -f <fields> --escape yes ...= and + =nmcli -g <fields> ...= (get-values) for single-value reads. Parse with the + documented escaping (=\:= and =\\=); never naive =cut -d:=. +- *UUID is the handle.* Every saved-profile op (=up=, =down=, =modify=, =delete=) + uses the connection UUID, never the display name — names duplicate and contain + separators. =net list= surfaces UUIDs; the panel maps row → UUID. +- *Wait budgets*: activation/deactivation use =nmcli --wait <n>= with an explicit + budget (hot-path reads =--wait 2=; activation =--wait 30=). No unbounded waits. +- *Connectivity*: NM's own =nmcli networking connectivity= can return + =none/portal/limited/full/unknown=. Use it as a *cheap hint* on the fast path + when present, but the authoritative captive verdict is still our own probe + (NM's portal detection is coarser and config-dependent). +- *Parser tests* (fake nmcli fixtures): escaped colons and backslashes in SSIDs, + embedded newlines, duplicate connection names, hidden SSID (empty name), + non-ASCII SSID, the wired-appears-mid-session case, and the multi-active case + (wifi + tether both up). + +* JSON schemas + +Versioned (="v": 1=) envelopes so tests lock the contract. Sketches (fields +nullable unless noted): + +- =status=: ={v, iface, type: wifi|ethernet|none, ssid, signal, ipv4, + gateway, throughput: {rx_bps, tx_bps}, connectivity: online|captive|no-internet|unknown, + connectivity_age_s, connectivity_class: fresh|stale|expired|unknown, state: + online|captive|no-internet|connecting|disconnected|airplane|wired|degraded}=. +- =probe=: ={v, result: online|captive|no-internet|unknown, portal_url, http_code, + redirect_host, elapsed_ms, ts}=. +- =list=: ={v, connections: [{uuid, name, type, active, last_used, signal, + in_range, security}]}=. +- =diagnose=: ={v, steps: [<diagnostic step, see contract>], overall: + ok|warn|fail}=. +- =speedtest=: ={v, down_mbps, up_mbps, ping_ms, server, elapsed_ms, ts}=. +- error envelope (any command): ={v, error: {code, message, detail, partial: + bool}}= with a non-zero exit. + +* Diagnostics contract + +=net diagnose --json= returns an ordered list of steps. Each step is the unit the +panel renders and the log records: + +- =id= — stable identifier (e.g. =link=, =dhcp=, =gateway-ping=, =dns-config=, + =dns-resolve=, =http-probe=, =portal=). +- =status= — =pending | running | pass | warn | fail | skipped=. +- =title= — short human label. +- =evidence= — redacted detail (the value seen), per the redaction rules. +- =elapsed_ms=. +- =safety= — =read-only= or =mutating= (diagnose steps are all read-only). +- =next_action= — what the user/agent should do on warn/fail (e.g. "open portal", + "reset connection", "switch network"). + +Repair actions (=net repair=) carry the same shape but =safety: mutating=, plus a +=cleanup_verified: bool= field (e.g. the DNS override was reverted) and a +terminal =cleanup-unverified= status when revert can't be confirmed. + +** Diagnose vs Repair (read-only vs mutating) +The panel separates them visually and behaviorally: +- *Diagnose* — probe, gateway ping, DNS config read, captive check. No state + change, no sudo, runnable freely. +- *Repair* — reset (fresh MAC, deletes+recreates the NM profile), DNS override + test (mutates resolver, auto-reverts), portal force. Each needs an explicit + confirm, shows that it's privacy/state-changing, and verifies cleanup. A + Repair whose cleanup can't be verified ends in a visible =cleanup-unverified= + state, never a silent success. + +* Failure states, messages, recovery + +Each row below gives the *exact, final* user-facing string (not a template) with +=<placeholders>= for redacted evidence, plus the evidence field included and the +next action. The string is canonical: every surface renders the same text, so +there's one source of truth. + +Per-surface rendering of the canonical string: +- *Indicator* — the matching glyph + CSS class; the string is the tooltip + (untruncated). +- *Notification* (=notify=) — title = "Networking"; body = the failure label on + its own line, then the canonical string. +- *CLI* — the string on stderr; =--json= puts it in =error.message= with the + evidence in =error.detail= and a stable =error.code=. +- *Panel* — the string as the section banner, with the diagnostic step's evidence + shown beneath. +Evidence is always redacted per the redaction rules (SSID/host shown; PSK/EAP/ +portal tokens never). + +- *associated, no DHCP* — "Connected to <SSID>, no IP (DHCP failed)" → + evidence: SSID, iface → reset / reconnect. +- *no-internet* — "On <SSID>, no internet (gateway reachable, no route out)" → + diagnose / switch network. +- *captive* — "Captive portal at <host> — login required" → Open portal. +- *DNS hijack* — "DNS is being redirected (portal)" → Open portal. +- *DNS broken* — "DNS not resolving (hotel DNS down); 1.1.1.1 works" → use + override / report. +- *HTTP intercepted* — "Traffic is being intercepted before it leaves" → Open + portal. +- *sudo declined* — "Reset needs admin; it was declined — nothing changed" → + retry with auth. +- *command timed out* — "<op> timed out; the system was left unchanged" → retry. +- *partial mutation* — "<op> partially applied: <what>; rolled back to <state>" + → review. +- *missing speedtest-go* — "speedtest-go not installed" → install hint. +- *no wifi hardware* (desktop) — wifi rows hidden; ethernet-only view. +- *wifi rfkill-blocked* — "WiFi is blocked (rfkill)" → unblock. The indicator + detects a soft-blocked radio (=rfkill list= shows the radio off though hardware + is present) and shows this distinct from disconnected. =net repair rfkill= (and + =net doctor --fix= as its first step) runs =rfkill unblock wifi= + =nmcli radio + wifi on= and reconnects. This is the framework-laptop case: an out-of-power + shutdown sometimes leaves wifi soft-blocked at next boot, and yes — the module + recovers it (the rfkill state is the indicator; the rfkill repair / doctor is + the one-step fix). A *hard* block (physical switch) is reported as + not-recoverable-in-software with that message. +- *wifi rfkill hard-blocked* — "WiFi is blocked by the hardware switch" → + evidence: rfkill hard state → flip the physical switch. +- *wrong password / missing secret* — "Saved password for <SSID> was rejected" → + evidence: SSID, NM auth-failure reason → re-enter the password. +- *enterprise auth/cert failure* — "Enterprise login failed for <SSID> (802.1X)" + → evidence: SSID, EAP failure reason → edit the profile in nmtui/nmcli. +- *upstream / AP / provider* — "On <SSID>, link is fine but the network has no + uplink" → evidence: gateway reachable, no route out, not a portal → switch + network or contact the venue. +- *VPN-routed* — "Connected; internet is routed through a VPN (<dev>)" → + evidence: default route on a tun/wg device or non-NM DNS owner → check the VPN, + not WiFi. +- *HTTP interception, no parseable portal URL* — "A portal is intercepting + traffic but didn't give a login link" → evidence: HTTP code, redirect host → + opens neverssl + the gateway page to log in manually. +- *DNS override cleanup unverified* — "Couldn't confirm DNS was restored after the + test" → evidence: iface, attempted revert → revert DNS manually + (=resolvectl revert <iface>=). +- *VPN kill-switch blocking* — "A VPN kill-switch is blocking all traffic, and the + VPN itself is down" → evidence: a block artifact present with no tunnel up → bring + the VPN back, or clear the kill-switch (the exact root command surfaced, not + auto-run). + +*VPN kill-switch detection + correction.* A kill-switch blocks all non-VPN egress when +the tunnel drops, so the link looks up (wifi, IP, gateway) but nothing reaches the +internet. This extends the =deferred-vpn= branch: when a VPN is active and the probe +fails, run a rootless cascade to tell a working tunnel from a kill-switch that's +blocking because the tunnel is down — +- =ip rule= for wg-quick's =not fwmark 0xca6c= + =suppress_prefixlength 0= (and the + PostUp =REJECT ! -o %i= rule that makes it leak-proof); +- =wg show= for an up tunnel interface; +- =nmcli connection show= for Proton's =pvpn-killswitch= / =pvpn-ipv6leak-protection= + (device =pvpnksintrf0=); +- =nft list ruleset= / =iptables -S OUTPUT= for a drop/reject table (=killswitch=, + =protonvpn=, =oifname != "wg0" ... drop=); +- =nmcli -f connection.zone= for a firewalld =drop= zone. +Classify *kill-switch-blocking* only when a block artifact exists AND no tunnel +interface is up — that's what distinguishes it from a healthy VPN. Correction is tiered +by artifact and every option needs root, so surface the exact command rather than +auto-running it: =wg-quick down <iface>=, =nmcli connection delete pvpn-killswitch +pvpn-ipv6leak-protection=, =nft delete table inet killswitch=, or =nmcli connection +modify <con> connection.zone ''=. (Sits alongside the Phase 5 VPN work; detection can +land earlier since =deferred-vpn= already exists.) + +Each message names whether the system was left unchanged, partially changed (with +what), or fully changed, so the user knows the residue. + +* Doctor: escalation, classification, terminal states + +=net doctor= diagnoses, classifies the failure, then (with =--fix=) applies the +*lightest* repair that fits and re-checks — it never loops destructive repairs +against a failure they can't fix. Each failure resolves to one of four outcomes, +and the doctor stops at any terminal one: + +- =fixable= — a local repair should help. Escalate lightest-first: rfkill-unblock + → reset (fresh MAC) → bounce (full stack) → portal, re-probing after each, and + stop as soon as the probe returns online. +- =needs-user-action= (terminal) — no reset/bounce will help; doctor stops and + names the exact next step. Covers: wrong WPA password / missing NM secret + (enter the password), locked keyring or polkit denial (retry with auth), + enterprise 802.1X cert/identity failure (edit the profile in =nmtui=/=nmcli=), + captive portal login-required (open the portal + accept terms). Doctor must not + delete/recreate the profile against these — that loses the saved password and + makes things worse. +- =upstream-not-local= (terminal) — the local link is up but the problem is past + it: AP has no uplink, gateway down/dropping traffic, DHCP server broken, ISP + outage, portal backend failing. =diagnose= proves it (link up + IP + gateway + reachable, but no route out and not a captive redirect), and =doctor --fix= + stops after local repairs are exhausted with "local repairs tried; likely + upstream/AP/provider" + the evidence. Next action: switch networks or contact + the venue. +- =deferred/vpn= (terminal for v1) — an active VPN / policy route / non-NM + resolver owns the default route or DNS, so "no internet" may be the VPN's fault, + not WiFi's. v1 *detects* this (default route on a =tun/wg= device, or DNS owned + by something other than the NM link) and classifies it separately — "link is + fine; internet is VPN-routed" — rather than misclassifying it as a WiFi failure. + v1 does not repair it (VPN management is Phase 5); it names the VPN as the likely + owner and stops. + +** DNS handling in doctor (explicit per class) +- *Captive DNS hijack* — open the portal (the hijack clears on login). No DNS + mutation. +- *Broken resolver, 1.1.1.1 works* — the shipped =dns-test= repair is *diagnostic*: + it sets 1.1.1.1, confirms the venue resolver is the culprit, then auto-reverts + (=cleanup_verified=). Because it reverts, =doctor --fix= does not currently leave + you online in this case — it falls through to =upstream-not-local=, which + misreports a locally-fixable problem. *V2 fix (planned):* on a dns-test *pass* + (public DNS works), set a PERSISTENT resolver override and verify online, with an + offered revert — and classify it as its own outcome rather than upstream. +- *Port-53 / egress blocked* (even 1.1.1.1 fails) — terminal =upstream-not-local=; + doctor stops, since it's not locally fixable. + +* Failure-mode coverage + +*V2 note (2026-06-30):* the authoritative, exhaustive catalog (~44 modes across 10 +connectivity layers, edge cases included, each tagged fix-and-verify or report-text) +now lives in the redesign task (todo.org "Network panel redesign"). The table below is +the v1 baseline; two rows reflect intent the shipped code doesn't yet match, and the +v2 catalog closes them: =gateway unreachable= claims a bounce that doctor never +actually reaches (a no-route failure goes straight to =upstream-not-local=), and +=broken DNS, 1.1.1.1 works= auto-reverts so the user is left offline and misreported +as upstream (the v2 persistent-override fix closes this). + +For each common field failure: does =net diagnose= detect it, can =net doctor +--fix= repair it, and what terminal user action remains when it can't. (The +=needs-user-action= / =upstream-not-local= / =deferred/vpn= outcomes are defined +above.) + +| Failure mode | diagnose detects | doctor --fix | terminal user action | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| rfkill soft block | yes | yes (unblock) | none | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| rfkill hard block | yes | no | flip the physical switch | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| no wifi hardware | yes | n/a | use ethernet | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| associated, no DHCP | yes | yes (reset/bounce) | none, else switch network | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| gateway unreachable | yes | yes (bounce) | switch network if it | +| | | | persists | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| captive DNS hijack | yes | opens portal | log in at the portal | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| broken DNS, 1.1.1.1 works | yes | yes (temp override, | report the venue's DNS | +| | | auto-reverted) | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| HTTP captive portal | yes | opens portal | log in at the portal | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| HTTP interception, no | yes | opens neverssl + gateway | log in manually | +| parseable URL | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| upstream / AP outage | yes (link up, no route out) | no (stops after local) | switch network / contact | +| | | | venue | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| wrong WPA password / | yes | no | enter the password | +| missing secret | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| enterprise auth / cert | yes | no | edit the profile in | +| failure | | | nmtui/nmcli | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| duplicate SSID / | yes (UUID-keyed) | yes (activate by UUID) | none | +| connection-name | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| hidden SSID | yes | yes (connect by name) | enter SSID + password | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| multiple active links | yes | n/a | pick the interface | +| (wifi+tether) | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| wedged NetworkManager | yes | yes (bounce → restart NM) | none, else reboot | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| slow / hung command | yes (degraded) | retries within budget | retry | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| stale / corrupt cache | yes | self-heals (atomic + | none | +| | | invalidation) | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| DNS cleanup failure | yes | flags cleanup-unverified | revert DNS manually | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| missing speedtest backend | yes | n/a | install speedtest-go | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| +| VPN / policy-routing | yes (route/DNS ownership) | no (deferred to Phase 5) | check the VPN | +| interference | | | | +|----------------------------+-----------------------------+-----------------------------+-----------------------------| + +* Observability — logging + redaction + +- *Event log*: JSONL at =$XDG_STATE_HOME/net/events.jsonl= (fallback + =~/.local/state/net/events.jsonl=), size-rotated (e.g. 1 MB × 3). Every + mutating op and probe appends an event: =ts, op, argv (redacted), exit_code, + stderr_tail, elapsed_ms, iface, nm_uuid, probe_url_class, http_code, + redirect_host, cache_event=. +- *Redaction (always on)*: PSKs, EAP identities/passwords, NM secrets, and + portal query tokens are never logged. MAC addresses, full IPs, and SSID are + redacted when configured (=redact_mac=, =redact_ip=, =redact_ssid= in config). +- *Post-failure diagnosis*: =net doctor --json= is the snapshot + recommendation + (diagnose plus the suggested repair), =net diagnose --json= the raw report, and + the event log the history. =net doctor= is the console-recoverable entry point + (reachable as =make online= / =make net-doctor=). +- *Secret-leak tests*: assert no PSK/EAP/portal-token ever appears in any JSON + output, log line, or error message. + +** Automatic diagnostic verbose-capture (V2) + +A distinct layer from the event log above: that log records what =net= did; +this captures what the *underlying stack* did at debug verbosity during a run, so a +failed diagnosis leaves real ground-truth instead of relying on memory. Two triggers, +one mechanism: + +- *Automatic — on a failing diagnose.* When =net diagnose= ends =overall: fail=, the + next escalation (or =Get Me Online=) runs inside a verbose-capture session. +- *Manual — a debug on/off toggle in the panel's Advanced section.* "Debug on" + elevates and leaves it elevated (with a visible "debug capturing" indicator) so the + user can reproduce an intermittent problem over time; "Debug off" restores and + writes the bundle. Useful when the failure doesn't reproduce inside one diagnose. + +Mechanism (shared): +1. *Snapshot* the current log levels (=nmcli general logging=, resolved's level, + wpa_supplicant's). +2. *Elevate* the relevant components to debug at runtime, no restarts, scoped to the + domains that matter (NM: =WIFI,DHCP,DNS,CORE=; resolved; wpa_supplicant). +3. *Run* the diagnostics / repair. +4. *Capture the window*: =journalctl= for NetworkManager + systemd-resolved + + wpa_supplicant since the run started, a =dmesg= tail (driver / firmware / rfkill), + and any =curl -v= probe output. +5. *Restore* every level to its snapshot. +6. *Write a redacted support bundle* to =$XDG_STATE_HOME/net/bundles/<ts>/= and + surface it in the panel. + +Hard requirements: +- *Restore is guaranteed and idempotent.* A =try/finally= restores even on error, + and a crash-recovery guard detects "a prior run left NM/resolved/wpa_supplicant + elevated" on the next run and puts it back — the same shape as the DoT-restore + watcher. A crash must never strand the stack at debug verbosity. +- *Redaction before anything leaves.* Raw wpa_supplicant and NM debug logs carry the + PSK and EAP credentials in cleartext. The captured journal is scrubbed before the + bundle is written, shown, or shared; the secret-leak test asserts no passphrase or + EAP secret survives into a bundle. +- *Privilege via the V2 sudo-helper.* The log-level toggles need root, so they become + verbs on the passwordless helper (decision 16) — no extra prompt. + +Bonus — this closes a real detection gap, not just observability: the spec notes live +auth-failure detection is a v1 limit (it leans on a one-shot NM state-120 snapshot). +wpa_supplicant at debug during the run is exactly how a wrong-password or EAP failure +is caught properly, so the capture feeds back into classification. + +* Indicator (task #C — Phase 1, the fast win) + +** States (internet sub-state on top of link state) +- online — associated and the probe returned 204. Normal icon. +- captive — associated, probe hit a portal. Distinct glyph + warning CSS class; + tooltip names the portal host; left-click opens diagnostics with the portal + ready to open (Phase 2+; see interactions for the Phase-1 interim). +- no-internet — associated, probe failed (no portal, no 204). Distinct glyph + + warning class. +- degraded — =net status= couldn't read link state within budget (slow/failed + nmcli). Neutral glyph, =net-degraded= class. Never blocks the bar. +- rfkill-blocked — the radio is soft-blocked (=rfkill=), distinct from + disconnected. Distinct glyph; the fix is =net repair rfkill= / =net doctor=. +- connecting / disconnected / airplane / wired — as today, plus wired shown + correctly even when it appears after session start. (airplane is now this + module's state, absorbed from the retired airplane module.) + +** Glyphs +Nerd-font codepoints, final values verified live before merge (same discipline +as wtimer). Reuse the signal-strength ramp already in =waybar-netspeed=; add a +captive / no-internet / degraded overlay glyph. + +** Tooltip +SSID + signal + IPv4 + gateway + the throughput readout (absorbed from +netspeed) + the last probe result and its age (stale/expired hinted). + +** Interactions (no keyboard-modifier clicks — waybar can't qualify clicks by +modifier, so the rich actions live in the panel, not ctrl/super-click) +Clicks never block the bar: each dispatches a detached background job, single-flight +per action. *As built (settled live with Craig, 2026-06-29):* +- *left* — =net-panel= toggle (pkill-or-launch the GTK panel). +- *middle* — =net portal= (the captive-login flow). +- *right* — =net-fix= (=net doctor= with =--notify=: reports the result when the + outcome is one-way, opens a terminal only when it's fixable; the v2 redesign moves + even that into the panel). +- airplane toggle moved off the bar to Super+Shift+A. + +* Panel (tasks #B + #C diagnostics — Phases 2-3) + +GTK4 + gtk4-layer-shell, pocketbook scaffold (src-layout package, unittest, +Makefile, gtk4-layer-shell anchored dropdown under the bar). One panel shell, +reused by the future desktop-settings panel. + +Sections as built (Phases 1-3, a four-page stack switcher): +1. *Connections* — list, MRU-first, active marked, live signal bars for in-range + wifi; row click switches; buttons for add / edit / remove; a rescan control. +2. *Diagnose* (read-only) — Probe (204/captive, shows portal URL + Open), Gateway + ping, DNS config. Streaming step output (the diagnostics contract). +3. *Repair* (mutating, confirmed) — tiered lightest-first: Unblock rfkill, Reset + (fresh MAC), Bounce (full stack), DNS override test, Force portal. A "Get me + online" button runs =net doctor --fix= (the auto-escalating sequence). +4. *Speed test* — Run button, progress, down/up/ping result + last-run line. +As built, the panel also auto-hides on focus-out (popup behavior, suppressed while a +child dialog holds focus) and carries a Close button bottom-right (2026-06-30). + +*V2 nav (planned):* three top tabs — Connections | Diagnostics | Performance. +Diagnostics merges the Diagnose and Repair pages into one: a sub-row +=Diagnose= | =Get Me Online= | =Advanced= over a shared area that shows diagnose +items and streams repair progress in-panel (no terminal). =Advanced= reveals the +individual repair tiers (renamed, with tooltips) plus a *Debug capture on/off* +toggle (the manual side of the verbose-capture feature; a failing diagnose triggers +it automatically). Speed test moves under Performance. + +** V2 panel UX — the target design + +The shipped four-page stack (Connections / Diagnose / Repair / Speed test) is +*history*, not active design. V2 is the sole current target: one panel opened from +the bar, three top tabs — Connections | Diagnostics | Performance — and the page +model below is the contract for what gets built and what gets deleted, not just for +labels. + +*** Connections — saved vs available, join-from-row +Three labelled groups, never one merged list: +- *Saved* — saved NM profiles, MRU-first, rendered instantly without a scan. +- *Available now* — scan-backed in-range SSIDs with signal + security; may carry a + loading/stale hint; unsaved networks appear here. +- *Wired* — ethernet when a wired device is present. +=net list= already yields this (=connections.py= lists saved MRU-first, merges live +signal/security for in-range saved profiles, then appends unsaved in-range SSIDs with +=uuid: nil=); the panel groups and labels it. *Rescan refreshes only the +availability/signal layer* — it never gates or reloads the Saved list. + +*Progressive loading:* render the Saved group immediately on open, then overlay +availability, signal, and the unsaved Available-now rows when the scan returns. Show a +small scan-in-progress state (elapsed + last-scan age). A slow or bad radio scan must +not make the whole panel feel stuck — this is the direct answer to "why does it take +so long to see my connections?" + +*Join-from-row (no Add page):* selecting an unsaved Available-now row *is* the join +flow — SSID and security come prefilled from the scan, never retyped. Open networks +connect (confirm only if needed); WPA/WPA2/WPA3-Personal ask only for the password. +The standalone Add button + modal are deleted for visible networks. A hidden/manual +SSID join lives behind an Advanced "Join hidden network" affordance. + +*** Supported authentication classes (the join matrix) +From the scanned NM =SECURITY= value, V2 handles: +- *Inline-supported* — open, open-with-captive-portal, WPA/WPA2/WPA3-Personal + (PSK/SAE), and WPA2/WPA3 transition mode. The row shows the security label so the + user knows why a password is or isn't asked. +- *Activate-only* — 802.1X / enterprise: connect if already saved, else "edit in + nmtui/nmcli" (no add form in v1/V2, per decision 9). +- *Hidden / manual* — behind the Advanced "Join hidden network" affordance. +- *Rare / unsupported* — WEP, OWE/enhanced-open, MAC-registration, voucher, or + proxy-required: a clear in-panel explanation ("not supported here yet") plus a + non-terminal next step, never a hand-off to a terminal tool. + +*** Diagnostics owns the diagnostic story +Diagnostics holds the read-only checks, the repair stream, Get Me Online, debug +capture (Advanced), and the doctor report. A *lightweight* latency/throughput probe +runs inline as a Diagnose evidence row when internet is available (skipped offline, on +a metered/hotspot warning, or with no backend), and its result is stored in the doctor +report. The *full* speed test stays under Performance (decision 19) — which is also +the home for future throughput history, so Performance earns its tab rather than being +a lone button. + +*** Forget confirmation — future tense + verified +The destructive copy is future tense and names the scope: "This will remove the saved +NetworkManager profile and its stored password from this machine." After the op, +verify the UUID is gone, refresh the Saved list, and report "Forgot <SSID>" or "Could +not forget <SSID>; nothing changed / partial <evidence>" — the verify-every-action +decision applied to a destructive op. + +*** Findable diagnostics report +Every diagnose, repair, and speed/performance run ends with a "Copy report" / "Open +report" action in Diagnostics. The report carries the step statuses + elapsed, the +final classification, the last speed/latency result when available, scan age, +route/interface owner, the redacted event-log tail, and the bundle path when verbose +capture ran. It states explicitly whether any repair mutated state and whether +cleanup/verification passed. "Logs exist somewhere" isn't enough when the network is +already down — the report is the one artifact the user copies to hand over. + +*** Visual contract — a Waybar-attached popup +The panel reads as part of the bar, not a separate app. Match the live Waybar theme: +the dark rounded capsule (=border-radius: 1rem=), the golden border, compact monospace +text, and the =custom/net= state colors. Avoid square corners next to rounded UI, keep +cards out of cards, and use compact icon+label controls with tooltips for the advanced +repairs. Reuse any existing archsetup-owned GTK/panel conventions. (Non-blocking for +engine work; blocks final V2 UX acceptance.) + +** Panel state, cancellation, permissions +State machines for: connection-list loading, rescan-in-progress, +activation-in-progress, diagnose-running, repair-running, speedtest-running. Plus +the real terminal states on this two-machine fleet: no-wifi-hardware (desktop → +ethernet-only view) and missing speedtest-go. (No GPG-key state — there's no +credential store; secrets live in NM.) ("No NetworkManager" is not a modeled +state — NM is always present +on these machines; if nmcli is somehow absent the panel shows a single hard-error +and exits.) Long operations show elapsed time and are cancellable where the +underlying op allows (rescan, speedtest, probe); clearly non-cancellable ones +(an in-flight activation) show elapsed + a disabled control. Permission-denied +(sudo/polkit declined) is a first-class outcome with the "nothing changed" +message, never a silent failure. + +Interaction-pattern catalog (=~/code/rulesets/patterns/=) principles that apply: +- transient-state-buttons — all the network levers in one place, reachable by + one chord (the bar click), state visible. +- default-most-common-friction-proportional — connections MRU-ordered so the + common pick is first; destructive ops (remove) and privacy-changing ones + (reset, override) get a confirm, switching does not. +- one-prompt-picker-typed-prefix — if the connection picker ever goes + keyboard-driven, kind (wifi/eth/saved/in-range) + name in one typed picker. + +** Panel UX flow (settle before Phase 2) +The concrete interaction defaults, so the GTK build isn't inventing them: +- *Default focus*: the Connections section, current connection's row selected. If + the indicator opened the panel because of a captive/no-internet state, focus + Diagnose instead with the relevant action highlighted. +- *Row content*: glyph (signal bars / wired / active check) + name + a secondary + line (security type, "active"/last-used). The active row is visually pinned at + top of its group. +- *Buttons*: one *primary* per section (Connections: Connect to the selected row; + Diagnose: Run diagnose; Repair: "Get me online"; Speed test: Run). Secondary + actions (add / edit / remove / rescan; individual repair tiers) are smaller and + grouped. +- *Disabled rules*: Connect disabled on the already-active row; Repair tiers + disabled while one runs; Speed test disabled while running; add/edit disabled + for enterprise (with the "edit in nmtui/nmcli" hint). +- *Confirmations* (exact wording): Reset → "Reset <SSID>? This drops the + connection and reconnects with a new MAC."; Bounce → "Restart networking? All + links drop briefly."; DNS override → "Temporarily set DNS to 1.1.1.1 for the + test? It reverts automatically."; Remove → "Forget <SSID>? The saved password is + deleted." +- *"Get me online" reporting*: shows each escalation step live (Unblock rfkill → + Reset → Bounce → Portal) with per-step pass/fail and stops at the first that + restores internet or at a terminal state, naming the next action. +- *After close*: the bar reflects the new state immediately (signal/refresh on + next poll); a running speedtest/diagnose keeps running and notifies on finish + (panel close doesn't cancel it). +- *Keyboard*: Esc closes (wired); arrows move row focus and Enter activates a + row (GTK ListBox defaults — row-activate connects, never disconnects); Tab is + the plain GTK focus chain, widget by widget (inside a list it crawls row by + row — no section jumps); there is NO type-to-filter. Verified live via + targeted-key AT-SPI probe 2026-07-02; the earlier tab-between-sections and + type-to-filter claims were aspirational and are struck. If section-jump Tab + or filtering is ever wanted, it's a new task, not an existing behavior. + +* Connection management (nmcli) + +- Every op via nmcli per the nmcli contract above (terse, escaped, UUID-keyed, + bounded =--wait=). +- MRU ordering from NM's =connection.timestamp= (last activated), descending. +- Ethernet appears in the list whenever a wired device is present, selectable at + any time; switching just brings the chosen connection up. +- *Mutation safety + rollback*: switching keeps the current connection up until + the new one activates successfully (=nmcli --wait 30=); on failure it does not + tear down the working link, surfaces the failure, and leaves the prior + connection active. =net down= notes that NM may auto-reactivate a profile and + reports the post-op active connection so the user isn't surprised. A switch that + needs a password it doesn't have prompts (or fails with "password required"), + never silently strands. The exact NM command sequence (preflight active-state + read → activate target → verify default route → on failure, confirm prior + still up) is pinned in the engine and tested against fake nmcli. +- *Add/edit scope*: open + WPA-PSK only in v1. Existing saved profiles of any + type (including enterprise) can be *activated*; editing an enterprise profile + shows "edit via nmtui/nmcli" rather than a broken partial form. + +* Connection secrets (no separate store) + +Per Craig's call: don't build a parallel credential store. Settings and secrets +live where NetworkManager already keeps them, so there's one source of truth and +no extra dependency (no GPG, no gpg-agent, no =~/.config/net/connections=). + +- *Where secrets live*: =/etc/NetworkManager/system-connections/<name>.nmconnection=, + root-owned =0600=, with the PSK/EAP secret stored inline (the default + =secret-flags=0= "owned by NM"). That's already secure-at-rest (root-only) and + is what =nmcli= reads/writes. +- *How we touch them*: every add/edit/remove goes through =nmcli= (=connection add + / modify / delete=), which writes the =.nmconnection= with the right ownership + and perms. We never read or write =system-connections= files directly (root) and + never copy a secret out of them. +- *No export / import / sync* — there's nothing to sync. A new machine gets its + connections the way it always has (the user joins, or restores NM profiles), + not from a tool-specific vault. +- *config file*: =~/.config/net/config= still exists, but only for non-secret + preferences (speedtest server, redaction flags, probe TTL). It holds no + credentials. +- *No secret leakage*: PSK/EAP never appear in =net=' =--json= output, the event + log, or error text (tested) — even though NM is the store, our surfaces must not + echo a secret =nmcli= happens to return. + +* Speed test + +- Backend: *=speedtest-go=* (=--json=, =--server=, =--no-download/--no-upload=), + already installed on velox (AUR =speedtest-go-bin=). No new dependency for v1. + librespeed-cli is the documented fallback for a self-hosted LibreSpeed server. +- =net speedtest --json= parses speedtest-go's JSON into the =speedtest= schema. +- *Server policy*: auto-select nearest by default; allow a pinned server id in + =~/.config/net/config=. +- *Timeout + cancellation*: a hard run timeout (e.g. 60s); the panel run is + cancellable (kills the child). Offline / rate-limited / no-server errors map to + the failure-message table. +- *Tests*: fixture JSON (success) and fixture stderr (offline, no server, + malformed output) drive =net speedtest= parsing without touching the network. + +* Help + documentation + +In-app help has three layers, each reachable in the situation it's needed: + +- *CLI help (works from a dead-GUI TTY)*: =net --help= lists the subcommands in + one screen; =net <cmd> --help= documents each (flags, what it mutates, the + console-recovery targets). The Makefile targets are self-describing (=make help= + lists =online= / =net-doctor= / etc. with one-line descriptions). This is the + layer that matters most when you're at a console with no network. +- *Panel help (in the GUI)*: a small =?= affordance in the panel header opens an + inline help pane — what each section does, which Repair actions mutate state, + what the indicator glyphs/colors mean. Per-control tooltips on the less-obvious + buttons (rfkill, bounce, DNS override). No external help browser. +- *User guide (the durable doc)*: a README / docs page covering every command, + the indicator states + glyphs, the panel sections, the config file keys, the + recovery make targets, troubleshooting (the failure-message table), and + rollback. Written so a future session — or Craig six months out — can operate + and recover the module from the doc alone. + +The failure-message table above is the single source of truth for the +troubleshooting text; the guide and the panel help both render from it rather +than restating it. + +* Enhancement radar + +Low-cost adjacent affordances, each dispositioned so cheap wins aren't lost and +the v1 panel stays focused. (Several are already in v1 by virtue of other +sections; marked here so the consideration is visible.) + +| Enhancement | Disposition | Reason | +|-------------------------------------+-------------+--------------------------------------------------------| +| Open / copy portal URL | v1 | already in the captive flow; trivial Open + Copy | +|-------------------------------------+-------------+--------------------------------------------------------| +| Forget network | v1 | it's the remove op, already specced | +|-------------------------------------+-------------+--------------------------------------------------------| +| Rescan now | v1 | already a Connections control | +|-------------------------------------+-------------+--------------------------------------------------------| +| Retry with hardware MAC | v1 | captive already has --hardware-mac; expose in Repair | +|-------------------------------------+-------------+--------------------------------------------------------| +| Pin speedtest server | v1 | already a config key | +|-------------------------------------+-------------+--------------------------------------------------------| +| Copy redacted doctor report | v1 | cheap, serves the observability/support goal | +|-------------------------------------+-------------+--------------------------------------------------------| +| Show last good network / result | vNext | needs small history persistence | +|-------------------------------------+-------------+--------------------------------------------------------| +| Watch mode for net doctor | vNext | a --watch loop; handy at a TTY, not v1-critical | +|-------------------------------------+-------------+--------------------------------------------------------| +| Actionable desktop notifications | vNext | dunst supports actions; extra wiring | +|-------------------------------------+-------------+--------------------------------------------------------| +| Keyboard connection picker (fuzzel) | vNext | the typed-prefix pattern; panel covers v1 | +|-------------------------------------+-------------+--------------------------------------------------------| +| QR-code share / import WiFi | rejected | low value for a personal 2-machine setup; phones do QR | +|-------------------------------------+-------------+--------------------------------------------------------| + +* Waybar wiring + +- Replace =custom/netspeed= with =custom/net= in the bar's module list (same + slot). +- Module def: =exec: waybar-net=, =return-type: json=, =interval: 2=, a =signal= + for on-demand refresh (next free signal after wtimer's 14), =on-click=, + =on-click-right=, =on-click-middle= per the phase-aware interactions (each + dispatches a detached job, never blocks). +- Remove the old =on-click: pypr toggle network= scratchpad only once the panel + replaces it (Phase 2); Phase 1 keeps it as the interim manager. + +* Testing plan (TDD) + +- *Engine (normal)* — fake =nmcli= + =curl= + =speedtest-go= on PATH; assert + command sequences and parsed/emitted JSON for status, list, up/down, + add/edit/remove, probe, diagnose, repair, speedtest. Pure state/format + functions tested directly. JSON schemas locked by example. +- *Portal parser* — already covered in =tests/captive= (Normal/Boundary/Error + + the real SONIFI body). The engine's native probe reuses the same cases. +- *nmcli parsing* — escaped colon/backslash/newline in SSID, duplicate names, + hidden SSID, non-ASCII, wired-mid-session, multi-active (wifi+tether). +- *Failure + concurrency (the risky classes)* — slow/hung nmcli/curl/speedtest + (degraded state within budget), concurrent =net status= probe refresh + (single-flight), corrupt cache (recovered), stale cache after SSID change + (invalidated), permission denied / sudo declined, DNS-override cleanup failure + (=cleanup-unverified=), NM partial activation (rollback keeps prior link), + secret redaction, missing speedtest-go, no wifi hardware, rfkill soft/hard + block. +- *Doctor classification* — fixture-driven =net doctor= over fake nmcli/curl + asserting the right terminal classification + that =--fix= stops before + destructive repairs: auth failures (=needs-user-action=), upstream/AP failure + (=upstream-not-local=), VPN-routed failure (=deferred/vpn=), and the DNS classes + (hijack → portal, broken-but-1.1.1.1-works → offered override, egress-blocked → + upstream). Assert the failure-mode coverage table's "detects / repairs / terminal + action" holds for each row. +- *Indicator* — drive =net status --json= through =waybar-net=, assert the JSON + per state (online / captive / no-internet / degraded / wired / disconnected / + rfkill), iface override via env. +- *Panel* — pocketbook-style: backing logic (list ordering, op dispatch, + state-machine transitions), not GTK widgets. +- *NM secrets / no-leak* — add/edit writes the secret into NM via nmcli (asserted + against fake nmcli, never to a tool-owned file); assert no PSK/EAP appears in any + =--json=, log line, or error (there is no credential store to round-trip). +- *Live checklist (gated out of the suite)* — a "Manual testing and validation" + task per phase for the real-network states (captive at a hotel, no-internet, + switch under load, reset, speedtest) that can't be faked. + +** Harness + coverage gate +The concrete contract, matching the repo's existing convention (not pytest — the +dotfiles suites are =unittest=, run by =make test= as =python3 -m unittest= over +=tests/*/test_*.py=; 33 suites today): +- *Framework*: =unittest=. Each suite is =tests/<name>/test_<name>.py= + (=tests/net/=, =tests/waybar-net/=), collected by the existing =make test= loop + — no new runner, no pytest dependency. +- *Fakes on a temp PATH*: =fake-nmcli=, =fake-curl=, =fake-speedtest-go=, + =fake-rfkill=, =fake-resolvectl= live as executable stubs in =tests/<name>/= + (the =tests/layout-navigate/fake-hyprctl= pattern). A fixture file encodes the + command→canned-output map and the stub appends each invocation to a log the test + asserts against. Subprocess timeouts are simulated by a stub that sleeps past the + budget; =net status= must still return the degraded state. +- *Waybar wrappers end-to-end*: =waybar-net= is run as a subprocess with the fake + PATH and the env overrides (iface, cache path), asserting the emitted JSON — same + as =tests/waybar-netspeed=. +- *Coverage*: coverage.py is absent system-wide (and not importable), so coverage + runs in a throwaway venv (=python3 -m venv=, =pip install coverage=, =coverage + run -m unittest=, =coverage report=) — the method the wtimer suite used (95%). + Target: *branch* coverage over =net/= and the wrapper, ≥ 90% on the pure + classifier/parser modules. + +** Coverage as a gap-finder, not a number (per phase) +Line coverage alone misses the branches that matter here, so each phase ends with +a *coverage-gap pass*, not just a percentage: +- After the first green run, read the branch report and map every uncovered branch + to either a new test or a consciously-excluded live-only behavior (with a comment + or a Manual-testing entry naming it). +- *Branch coverage is required* for the pure logic: the doctor classifier (every + outcome — fixable / needs-user-action / upstream-not-local / deferred-vpn), the + cleanup-unverified path, the redaction paths, the degraded hot-path fallback, the + timeout branches, and the portal/nmcli parsers. +- A phase isn't "done" until its coverage-gap pass is recorded — uncovered logic is + either tested or explicitly excused, never silently uncovered. + +* Files touched (planned, all in =~/.dotfiles=) + +- =net/= package (src-layout, like pocketbook) — engine + panel. +- =hyprland/.local/bin/waybar-net= — the indicator (replaces =waybar-netspeed=). +- =hyprland/.local/bin/net= — engine CLI entry (console-script shim). +- =hyprland/.config/waybar/config= — swap =custom/netspeed= → =custom/net=; + remove =custom/airplane=. +- =hyprland/.config/waybar/style.css= — captive / no-internet / degraded / + rfkill classes; remove airplane classes. +- =tests/net/=, =tests/waybar-net/= — suites. +- =captive= — refactor: extract probe + reset into functions callable + non-interactively (a =--json= probe mode) so the engine reuses them. +- =~/.config/net/config= — seed config (probe TTL, speedtest server, redaction + flags). No secrets; not a credential store. +- dotfiles =Makefile= — add the console-recovery targets (=online=, =net-doctor=, + =net-status=, =net-diagnose=, =net-portal=, =net-reset=, =net-bounce=). +- *Deletions once net ships* (the airplane module is absorbed): + =hyprland/.local/bin/waybar-airplane=, =hyprland/.local/bin/airplane-mode=, + =tests/waybar-airplane/=, =tests/airplane-mode/=, and the =custom/airplane= + module + its css. +- archsetup Hyprland step — add =gtk4-layer-shell=, =python-gobject=, + =speedtest-go-bin= to the install lists (the only archsetup change; no =gpg= + added, secrets stay in NM's store). + +* Resolved decisions (Craig's calls + this response) + +1. Panel UI tech → GTK4 + gtk4-layer-shell, shared pocketbook scaffold (one + panel shell, reused by the desktop-settings sibling). +2. Engine language → Python =net= package; shells out to =captive= for the + portal-force flow, native cheap probe for the bar path. +3. Connectivity probe → split cadence (fast link poll every 2s + slow cached + internet/captive probe, TTL ~45s) with single-flight + atomic cache. +4. No keyboard-modifier clicks (waybar can't qualify them) — the panel hosts the + rich actions; bar clicks dispatch detached jobs (phase-aware). +5. No separate credential store (Craig's call, cj). Secrets live in NM's own + =system-connections= (root =0600=, inline), touched via nmcli. No GPG, no + gpg-agent, no =~/.config/net/connections=. Supersedes the earlier GPG-store + design. +6. =custom/netspeed= absorbed into =custom/net=; throughput moves to the tooltip. +7. Speed-test backend → =speedtest-go= (already installed), not a new + librespeed-cli dependency; librespeed-cli is the self-hosted fallback. +8. Code lives in the dotfiles repo; archsetup only installs deps. +9. v1 add/edit scope = open + WPA-PSK; enterprise/802.1X is activate-only, + add/edit is vNext (settled by Craig 2026-06-29 — no enterprise networks in his + history, so the form would be unused UI). +10. =net doctor= is in v1 (Craig's call, cj) — a one-shot diagnose+fix mode, + reachable from a TTY via =make online= / =make net-doctor=. (The earlier + "defer the doctor/bundle command" decision is reversed.) +11. Diagnose (read-only) and Repair (mutating, confirmed) are separated in the + panel and the CLI; Repair is tiered lightest-first (rfkill → reset → bounce). +12. =custom/net= absorbs the airplane module (Craig's call, cj). *As built + (2026-06-29, option 1): display-only.* net shows the airplane state (reads + the airplane-mode state file); the =airplane-mode= low-power toggle is kept + (radios + CPU + brightness + services is not a network concern) and moved to + =custom/net='s right-click + signal 15. Only the redundant display pieces — + =waybar-airplane=, =custom/airplane=, and the retired =waybar-netspeed= — + plus their tests/css were deleted. The earlier "delete airplane-mode" framing + is superseded. +13. Repair includes a full-stack bounce and an rfkill-unblock (Craig's calls, + cj) — the latter recovers the framework-laptop post-power-loss soft-block. +14. VPN / WireGuard is a planned Phase 5 (Craig's call, cj), not a permanent + exclusion. + +V2 redesign decisions (Craig, 2026-06-30): + +15. *No terminals anywhere in the module* — =net-popup= is removed; every action and + result renders in the panel. No terminal is ever used to report information to the + user or to collect input from them: every prompt, confirmation, repair stream, and + result lives in the panel UI (Craig, cj, 2026-06-30). Reverses the part of decision + 11 that ran privileged repairs in a terminal "so sudo/polkit can prompt". (Unrelated + to the doctor's "terminal states" — that word means a final outcome, not a tty. The + one open question is the dead-GUI console-recovery path; see the VERIFY in todo.org.) +16. *Passwordless privileged path* — a root-owned helper + a narrow NOPASSWD sudoers + rule scoped to it, archsetup-installed, run as =sudo <helper> <verb>=. This gates + decision 15 (a worker thread can't prompt). Absorbs the earlier DoT-toggle + follow-up and fixes the detached-restore-watcher bug. +17. *Verify every action* — each mutating op (repair, connect, forget, add, DNS + override) re-checks its effect and surfaces pass/fail in the panel. +18. *Detect + respond to every failure mode, edges included* — the full ~44-mode + catalog (todo.org redesign task) is the contract; auto-fix where safe, else report + the exact in-panel text. Includes IPv6-only awareness and multi-homing, which need + diagnose to stop being IPv4-only and single-iface. +19. *Navigation* — top tabs Connections | Diagnostics | Performance; Diagnostics + merges Diagnose + Repair (Diagnose | Get Me Online | Advanced over a shared + streaming area); Speed test under Performance. +20. *Automatic diagnostic verbose-capture* (Craig, 2026-06-30) — on a failing + diagnose, elevate the underlying stack (NM / resolved / wpa_supplicant) to debug, + capture the journal + dmesg window, restore (guaranteed + crash-guarded), and + write a redacted bundle. Plus a manual Debug on/off toggle in Advanced. Restore + bulletproof, secrets scrubbed before the bundle, log-level toggles via the V2 + helper. See Observability. + +* Implementation phases + +*Phases 1-3 are SHIPPED* (2026-06-29 → 2026-06-30, dotfiles); their acceptance +criteria passed and the work is live on velox. Phase 4 (docs/rollout) and Phase 5 +(VPN) remain. The V2 redesign phases at the end are designed, not yet built. + +- *Phase 1 — Indicator + console recovery (task #C).* =net status= + =net probe= + (native cheap probe, reusing captive's logic) + the =captive= probe refactor + + =waybar-net= + the split-cadence cache (single-flight, atomic, stale classes) + + CSS states (incl. rfkill) + performance budget. Plus the CLI-only recovery path: + =net repair= tiers (rfkill / reset / bounce), =net doctor [--fix]=, and the + Makefile targets (=make online= etc.) — all testable without the GTK panel. + Absorbs the airplane state and removes the standalone airplane module. Interim + left-click keeps the existing scratchpad until the panel lands. + - *Acceptance*: fresh-login waybar smoke test shows correct state on + online/captive/no-internet/wired/rfkill; =net status= stays within budget + under a fake slow nmcli (degraded state); =net doctor --fix= recovers a + soft-blocked radio from a TTY; the live captive checklist passes at a real + portal; the airplane state works and the old airplane module is gone; + reverting = swap =custom/netspeed= + =custom/airplane= back. +- *Phase 2 — Panel shell + connection management (task #B core).* GTK4 + layer-shell scaffold + =net list/up/down/add/edit/remove/rescan= + MRU list + + mutation safety/rollback + panel state machines. + - *Acceptance*: switch wifi↔wifi and ethernet↔wifi without stranding; a failed + switch leaves the prior link up; add/edit open + WPA-PSK writes the secret to + NM; remove confirms; panel states render for loading/rescan/activation. +- *Phase 3 — Diagnostics + speed test in the panel.* Wire =net diagnose= / + =net repair= / =net doctor= / =net portal= / =net speedtest= into the Diagnose + vs Repair sections; the "Get me online" button; portal Open button; speedtest + progress + cancel. + - *Acceptance*: diagnose runs read-only; each repair tier confirms + verifies + cleanup (DNS override reverts, shown); speedtest result parses from + speedtest-go and a fixture-driven failure shows the right message. +- *Phase 4 — Docs + rollout.* In-app help (=net --help= / per-command help, the + panel help affordance), README/user-guide (commands, panel, config, + troubleshooting, the make targets, rollback), and the manual dep step on ratio. + - *Acceptance*: =net --help= and each subcommand's help are complete; the + user-guide covers every command + the recovery targets; ratio rollout + documented. +- *Phase 5 — VPN / WireGuard (future).* Fold the existing archsetup wireguard + tooling into the same panel + CLI (=net vpn ...=). Out of the v1 milestone; + specced separately when picked up. + +V2 redesign phases (designed 2026-06-30, dependency order): +- *V2.1 — Sudo helper + NOPASSWD sudoers (gates everything).* Root-owned helper + dispatching net's fixed privileged verbs, archsetup-installed, narrow sudoers. + Also fixes the detached DoT-restore-watcher bug. + - *Acceptance*: every repair runs passwordless in-panel on a non-NOPASSWD machine; + the sudoers rule is scoped to the helper only. +- *V2.2 — Merged Diagnostics panel + nav restructure.* Connections | Diagnostics | + Performance; the Diagnostics sub-row + shared streaming area; Advanced reveal + + tooltips; delete =net-popup=. + - *Acceptance*: no terminal opens for any action; repair progress streams in the + panel; Speed test lives under Performance. +- *V2.3 — IPv6-aware and multi-homing-aware diagnose.* Stop treating no-IPv4 as a + failure when online over IPv6; identify which interface owns the default route. +- *V2.4 — Close every detect/correct gap in the catalog, with post-action + verification.* Work the redesign-task catalog mode by mode. +- *V2.5 — Automatic diagnostic verbose-capture.* Snapshot/elevate/capture/restore + around a failing diagnose + the Advanced Debug on/off toggle; guaranteed + + crash-guarded restore; redacted support bundle; helper log-level verbs. + - *Acceptance*: a failing diagnose leaves a redacted bundle (NM/resolved/ + wpa_supplicant journal + dmesg) and restores every log level; a crash mid-capture + is detected and restored on the next run; the secret-leak test finds no PSK/EAP in + a bundle; the toggle elevates and restores on demand. + +* Open items / risks + +- gtk4-layer-shell dropdown anchoring under a waybar module needs the same + positioning work pocketbook solved; reuse it. (Phase 2.) +- The =captive= refactor must keep the standalone CLI behavior identical while + exposing a non-interactive =--json= probe; covered by the existing + =tests/captive= suite plus new probe-mode tests. (Phase 1.) +- speedtest-go server selection variance (nearest-server flor) — pin a server in + config if results are noisy. (Phase 3.) +- The background-probe kick from =net status= must be truly non-blocking (spawn + + detach); enforced by the single-flight lock and the performance benchmark test. + +* Rollback + +Each phase is independent. The indicator (Phase 1) is a drop-in replacement for +=custom/netspeed= (and =custom/airplane=); reverting is swapping those modules +back in the config and restoring their scripts. The panel is additive — not +wiring its clicks leaves the bar working as before. No credential store to roll +back (secrets stay in NM throughout). + +* Review findings [40/40] + +** DONE Define the structured diagnostics contract :blocking: +The spec says the engine "emits JSON" and that diagnostics "reuse =captive= +verbatim", but the current =~/.dotfiles/common/.local/bin/captive= flow is a +human-readable bash script that mixes diagnostics, sudo prompts, DNS mutation, +browser launch, and terminal prose. A GTK panel cannot reliably turn that into +clear state, progress, cancellation, or useful error messages. Define the +machine contract before implementation: every diagnostic step should have a +stable id, status (=pending/running/pass/warn/fail/skipped=), redacted evidence, +elapsed time, safety outcome, and next action. Keep =captive= as the interactive +CLI, but either refactor reusable probe/reset functions behind =net diagnose +--json= or make =captive= expose a non-interactive JSON mode. This blocks the +panel and logging work because otherwise the implementer must invent the +boundary. + +Disposition: accept — added the "Diagnostics contract" section (per-step id / +status / evidence / elapsed / safety / next_action) and the =captive= =--json= +probe-mode refactor under Architecture + Files touched. + +** DONE Specify user-facing failure messages and recovery actions :blocking: +The spec names failure states like =no-internet=, =captive=, failed probe, +failed reset, missing DNS, and missing speed-test backend, but it does not define +the messages the user sees or what each message tells them to do next. For this +feature, "error" is not enough: a user needs to know whether WiFi is associated, +whether DHCP succeeded, whether DNS is hijacked/broken, whether HTTP is +intercepted, whether sudo was declined, whether a command timed out, and whether +the system was left unchanged or partially changed. Add a message table for the +indicator, panel, and CLI with: failure class, visible text, evidence included, +redaction rule, and next action. This is blocking because UX quality here is the +product, not an implementation detail. + +Disposition: accept — added the "Failure states, messages, recovery" section +covering each class, the visible message, the "what changed" residue note, and +the next action across indicator/panel/CLI. + +** DONE Define the debug log and redacted support bundle :blocking: +There is no observability section. When this fails in a hotel or cafe, an agent +needs enough evidence to diagnose it without rerunning destructive actions. Add +log location, rotation/retention, JSONL event schema, command argv logging, +exit-code/stderr capture, elapsed time, selected iface, NM active connection +UUID, probe URL class, HTTP code, redirect host, DNS servers, and cache +read/write events. Also define a =net doctor --json= or =net debug-bundle= +command that emits redacted status, recent log events, dependency versions, and +a reproduction command. Redact SSID if configured, MAC addresses, portal query +tokens, PSKs, EAP identities/passwords, IPs when requested, and all GPG/NM +secrets. This blocks implementation readiness because post-failure diagnosis is +currently left to ad hoc terminal spelunking. + +Disposition: modify — accepted the JSONL event log, the schema, and the redaction +rules in full (new "Observability" section). Deferred the dedicated =net +debug-bundle= / =net doctor= command to vNext: for a single-user tool =net +diagnose --json= (the snapshot) plus the event log (the history) cover +post-failure diagnosis; a bundle command is gold-plating for v1. Recorded under +Out + Resolved decision 10. + +** DONE Pin the nmcli parsing and timeout contract :blocking: +The spec lists nmcli operations but not the exact fields, output modes, escaping +rules, ID semantics, or timeouts. This is risky because SSIDs and connection +names can contain spaces, colons, duplicates, hidden names, and non-ASCII; the +current =waybar-netspeed= already had an SSID parsing bug. The nmcli manual +documents =--terse=, =--get-values=, =--escape=, =--wait=, ID/UUID/path +selection, =passwd-file=, and built-in connectivity states +(=none/portal/limited/full/unknown=) at +https://man.archlinux.org/man/nmcli.1.en. The spec should require UUIDs for +saved-profile operations, explicit =--wait= budgets, parser tests for escaped +colons/backslashes/newlines/duplicate names/hidden SSIDs, and a decision on when +to use or ignore =nmcli networking connectivity [check]=. This is blocking +because the command wrapper is the core reliability boundary. + +Disposition: accept — added the "nmcli contract" section: terse + =--escape= + +=--get-values=, UUID-keyed ops, explicit =--wait= budgets, NM connectivity as a +cheap hint (our probe authoritative), and the parser test matrix. + +** DONE Define cache concurrency, atomicity, and stale-state behavior :blocking: +=net status= may spawn =net probe= whenever the cache is stale, but the spec +does not define locking, process coalescing, atomic writes, crash cleanup, or +what happens when the probe hangs. With a 2s Waybar interval, a bad network could +start overlapping probes, corrupt the runtime cache, or keep showing stale +"online" while the link is gone. Add a single-flight lock under +=$XDG_RUNTIME_DIR/waybar=, atomic write+rename for cache updates, max probe +runtime, stale age classes (fresh/stale/expired/unknown), cache invalidation on +iface/SSID/connection UUID change, and tests for concurrent =net status= calls. +This blocks the fast-path design because it is the main performance and +correctness risk. + +Disposition: accept — added "Concurrency, atomicity, staleness" under the +Connectivity model: flock single-flight, temp+rename atomic write, ≤6s probe +timeout, fresh/stale/expired/unknown classes, iface/SSID/UUID invalidation, stale +lock reclaim, plus concurrency tests in the test plan. + +** DONE Bound hot-path performance with measured budgets :blocking: +The spec says the cheap poll should be sub-100ms, but the proposed fast path +still may call multiple =nmcli= commands every two seconds, read sysfs, parse +throughput, and maybe spawn a background probe. The existing =waybar-netspeed= +had a deliberate sleep for throughput sampling; replacing it must define how +throughput is sampled without sleeping in the bar path. Add a per-command budget +for =waybar-net= and =net status=, a maximum number of subprocesses on the hot +path, a timeout for every subprocess, benchmark tests with fake slow =nmcli=, +and a rule that the indicator emits a degraded JSON state rather than blocking. +This is blocking because Waybar custom modules can visibly freeze or lag when +their exec path stalls. + +Disposition: accept — added the "Performance budgets" section: <100ms typical / +<250ms worst, throughput sampled across the poll interval (no in-process sleep), +one nmcli call max on the hot path, timeouts on every subprocess, the degraded +state, and a fake-slow-nmcli benchmark test. + +** DONE Make click actions non-blocking and visible :blocking: +Waybar right-click runs =net reset= and middle-click runs =net portal= directly. +Those operations can require sudo, open browsers, mutate DNS, delete/recreate NM +profiles, or hang on network commands, but Waybar click handlers provide no +panel, terminal, progress, or cancellation surface by default. Define whether +right/middle click instead opens the panel focused on the action, dispatches a +background job with notifications, or is removed from v1. If kept, specify +single-flight behavior, how sudo/polkit prompts surface, how success/failure is +reported, and how the user can inspect logs. This blocks UX readiness because +the fastest remediation path is currently the easiest place to hide failure. + +Disposition: modify — accepted the concern; made the interactions phase-aware and +non-blocking. Every click dispatches a detached, single-flight background job and +reports via =notify=; sudo surfaces through polkit/the normal prompt; failures go +to the notify + the event log. In Phase 1 (no panel) left-click runs probe + +notify and keeps the scratchpad; from Phase 2 left-click opens the panel focused +on the action. Recorded in the Indicator "Interactions" subsection. + +** DONE Specify connection mutation safety and rollback :blocking: +The spec says row click switches connections and remove gets a confirm, but it +does not define what happens when a switch partially succeeds, disconnects the +current working link, needs a password, loses the default route, or triggers +auto-activation. The nmcli manual warns that =connection down= does not prevent +future auto-activation and may internally block a profile until user action. +Define preflight, the exact NM command sequence, whether the old active +connection is kept until the new one proves usable, when rollback is attempted, +how long activation waits, and what the panel says when rollback fails. This is +blocking because the module can strand the user offline. + +Disposition: accept — added "Mutation safety + rollback" under Connection +management: keep the prior link up until the target activates (=--wait 30=), no +teardown on failure, password-required surfaced not stranded, =net down= reports +post-op active state + the auto-reactivation caveat, and the pinned NM command +sequence is tested against fake nmcli. + +** DONE Define the credential-store security model :blocking: +The GPG store is described as optional and default-unencrypted, but the spec does +not define file modes, schema, secret-source rules, import/export prompts, +recipient verification, stale secret handling, or what is logged. It also says +NM remains source of truth while the user-owned store contains PSK/EAP secrets, +which creates two truth sources for sensitive data. Add a precise schema, +=0600= file creation with parent-dir permissions, encrypted-recipient checks, +plaintext warning text, explicit opt-in flow, redaction requirements, behavior +when NM has a secret not in the store, behavior when the store has a secret NM +rejects, and tests for no secret leakage in JSON/logs/errors. This blocks Phase +4 and the full spec because otherwise the implementer must make security +decisions mid-code. + +Disposition: accept — rewrote "Credential storage" with the versioned schema, +=0600= file / =0700= dir, recipient verification on opt-in, the plaintext +warning, secret-source rule (entered/exported, never harvested from root store), +the two-source reconciliation policy (NM wins live, store wins for what NM +lacks, stale-secret flagging), and the no-leak tests. + +** DONE Define EAP, enterprise WiFi, and unsupported connection behavior :blocking: +The store says "PSK/EAP" and connection management says add/edit, but there is +no v1 contract for WPA-Enterprise fields, certificates, identity vs anonymous +identity, hidden networks, static IP, proxy settings, metered flags, MAC +randomization, or 802.1X prompt behavior. Either scope v1 to open/WPA-PSK plus +existing saved-profile activation, or define the minimum EAP form and the +unsupported-state messages. This blocks add/edit/import because enterprise WiFi +is too sensitive to hand-wave. + +Disposition: modify (scope) — scoped v1 to open + WPA-PSK add/edit, with +*activation* of any existing saved profile (including enterprise). Enterprise / +802.1X add/edit, static-IP, proxy, metered, and MAC-randomization editing are +vNext, shown as "edit via nmtui/nmcli". Recorded in Scope/Out, Connection +management, and Resolved decision 9. + +** DONE Split read-only diagnostics from mutating remediation :blocking: +The panel's diagnostics section includes probe, bounce/reset, gateway ping, and +DNS override test in one area, while =captive= currently performs resets and +temporary DNS changes as part of its flow. Users need to know which buttons are +read-only and which mutate NM profiles, MAC mode, DNS, or browser state. Add +separate "Diagnose" and "Repair" actions, confirmations for destructive or +privacy-changing operations, explicit cleanup verification for DNS override, and +a terminal state when cleanup is unverified. This blocks readiness because +network repair must not surprise the user or leave hidden residue. + +Disposition: accept — split the panel into a read-only Diagnose section and a +confirmed, mutating Repair section (and split the CLI into =net diagnose= vs =net +repair=). Added =cleanup_verified= + a terminal =cleanup-unverified= state to the +diagnostics contract. + +** DONE Define panel state, cancellation, and permissions UX :blocking: +The panel sections list buttons and a streaming output area, but not loading +states, disabled states, empty states, keyboard/focus behavior, cancellation, or +permission-denied handling. Add panel state machines for connection list loading, +rescan in progress, activation in progress, diagnostics running, speedtest +running, and no NetworkManager/no WiFi/no permissions/no GPG key/no +librespeed-cli. Each long operation should be cancellable where possible or +clearly non-cancellable with an elapsed-time display. This blocks the GTK work +because without it the implementer must invent the user flow. + +Disposition: modify — accepted the state-machine requirement (added "Panel state, +cancellation, permissions"), but scoped the state set to what can actually occur +on the two-machine fleet: dropped "no NetworkManager" as a modeled state (NM is +always present; a missing nmcli is a single hard-error exit) and kept +no-wifi-hardware, missing speedtest-go, no-GPG-key, plus the in-progress states +with elapsed-time + cancellation where the op allows. + +** DONE Verify speed-test dependency, server choice, and failure contract :blocking: +The spec chooses =librespeed-cli= and notes availability/default-server research +as an open risk, but Phase 3 still depends on parsing its JSON and showing +progress. I checked the upstream project page +(https://github.com/librespeed/speedtest-cli) and the AUR URL named by search is +not sufficient as a verified package/install contract in this spec. Add the +exact package name/source to install, command version expected, JSON shape, +server-selection policy, timeout, cancellation behavior, offline/rate-limited +messages, and tests with fixture JSON and fixture stderr. This blocks Phase 3 +because speed-test failure modes are otherwise undefined. + +Disposition: modify — verified live and changed the backend: =speedtest-go= (AUR +=speedtest-go-bin=, 1.x) is already installed on velox and supports =--json=, +=--server=, =--no-download/--no-upload=, so v1 needs no new dependency. +librespeed-cli (AUR =librespeed-cli= / =-bin=) is the documented self-hosted +fallback. Added the "Speed test" section with server policy, timeout, +cancellation, the failure-message mapping, and fixture-JSON/stderr tests. + +** DONE Define dependency installation and repo boundaries :blocking: +The files touched section alternates between archsetup paths and the external +dotfiles repo, while pocketbook has been folded into this repo and its previous +archsetup provisioning was intentionally removed. The spec should state where +the =net= package actually lives, which repository owns the scripts/tests, +whether =gtk4-layer-shell=, =python-gobject=, =librespeed-cli=, =gpg=, =nmcli=, +=curl=, and =resolvectl= are installed by archsetup or assumed present, and the +Makefile targets for test/lint/install. This blocks implementation because the +current path plan can produce code that is not installed on a fresh machine. + +Disposition: accept — added the "Repository + dependencies" section: all code in +=~/.dotfiles= (=net/= package in-tree like pocketbook, scripts in the hyprland +tier, tests under =tests/=), archsetup owns only the dep install +(=gtk4-layer-shell=, =python-gobject=, =speedtest-go-bin=; nmcli/curl/resolvectl +already present), Makefile =make test= collects the package suite, and a +daily-drivers note for ratio. Rewrote Files touched to match. + +** DONE Expand the test plan for failure, concurrency, and live verification :blocking: +The testing plan covers normal parsing and fake command sequences, but it misses +the riskiest behaviors: slow/hung =nmcli=/=curl=/=librespeed=, concurrent +=net status= cache refresh, corrupt cache, stale cache after SSID change, +permission denied, sudo declined, DNS override cleanup failure, NM partial +activation, duplicate connection names, secret redaction, missing optional +dependencies, no WiFi hardware, wired+tether+WiFi ambiguity, portal redirect +tokens, and Waybar click handlers. Add unit/fixture tests for each class plus a +manual/live checklist gated out of the normal suite. This is blocking because +the current plan would leave the exact "things that can go wrong here" mostly +untested. + +Disposition: accept — rewrote the Testing plan with the "Failure + concurrency" +class (slow/hung commands, single-flight, corrupt/stale cache, perm-denied, +cleanup-failure, partial activation, redaction, missing deps, no-wifi, +multi-active) and a per-phase live checklist gated out of the suite. + +** DONE Define status JSON schemas and compatibility rules +The spec says all subcommands take =--json= but does not define schemas. Add +versioned JSON examples for =status=, =probe=, =list=, =diagnose=, =speedtest=, +and error envelopes, including nullable fields and unknown/degraded states. This +is non-blocking for product direction but should be fixed before code so tests +can lock the CLI contract. + +Disposition: accept — added the "JSON schemas" section with versioned (=v:1=) +envelopes for status / probe / list / diagnose / speedtest and a shared error +envelope, including the degraded/unknown states. + +** DONE Rename or alias the phasing section for workflow compatibility +The spec has a usable =Phasing= section, but the spec-review workflow expects an +=Implementation phases= section that can be lifted into =todo.org=. Rename it or +add an alias heading during response. This is non-blocking because the existing +phase decomposition is understandable, but aligning the heading prevents future +workflow friction. + +Disposition: accept — renamed =Phasing= → =Implementation phases= and added +per-phase acceptance criteria. + +** DONE Add documentation and rollout acceptance checks +Rollback is described, but docs and rollout are thin. Add README/user-guide +updates for commands, panel behavior, config file, GPG opt-in, troubleshooting, +and rollback; add acceptance checks for each phase, including a fresh-login +Waybar smoke test and restoring =custom/netspeed=. This is non-blocking but +important for handing the feature to a future session without re-discovery. + +Disposition: accept — added per-phase acceptance criteria under Implementation +phases (incl. the fresh-login waybar smoke test and the =custom/netspeed= +restore), a Phase 4 "Docs + rollout", and (answering Craig's cj follow-up) a +dedicated "Help + documentation" section with the three help layers (CLI help, +panel help affordance, user guide). + +** DONE Add a failure-mode coverage table :blocking: +The spec now names many individual network failures, but it still does not carry +one compact coverage matrix that says, for each common failure mode, whether +=net diagnose= detects it, whether =net doctor --fix= can repair it, and what +terminal user action remains when it cannot. Add a table covering at least: +rfkill soft block, rfkill hard block, no WiFi hardware, associated/no DHCP, +gateway unreachable, captive DNS hijack, broken DNS where 1.1.1.1 works, HTTP +portal, HTTP interception without a parseable portal URL, upstream/AP outage, +wrong WPA password or missing secret, enterprise auth/cert failure, duplicate +SSID/connection-name ambiguity, hidden SSID, multiple active links, wedged +NetworkManager, slow/hung command, stale/corrupt cache, DNS cleanup failure, +missing speedtest backend, and VPN/routing interference. This blocks because +Craig asked for confidence that the diagnostics and doctor cover the real field +failures, and prose scattered across sections is too easy to misread. + +Disposition: accept — added the "Failure-mode coverage" section: a 22-row table +(every mode the finding named) with detect / doctor-fix / terminal-action +columns, conformed to the org-table standard (rules under every row, ≤120). + +** DONE Pin DNS repair semantics in doctor :blocking: +The spec diagnoses DNS hijack, broken hotel DNS, and the temporary 1.1.1.1 +override test, but =net doctor --fix= does not say whether it merely recommends +the override, applies a temporary override during recovery, or leaves DNS alone +after diagnosis. Define the exact behavior for each DNS class: captive hijack +should open the portal, broken DNS where 1.1.1.1 works should either offer an +explicit temporary repair with cleanup verification or recommend the command, +and port-53/egress blocking should stop as upstream/not locally fixable. This is +blocking because DNS is one of the most common "connected but unusable" failures +and the current doctor contract is ambiguous. + +Disposition: accept — added "DNS handling in doctor (explicit per class)" under +the new Doctor section: hijack → open portal (no DNS mutation); broken-but-1.1.1.1 +→ explicit temporary override with cleanup verification under =--fix=, recommend +otherwise; egress-blocked → terminal =upstream-not-local=. + +** DONE Make auth failures terminal user-action states :blocking: +Wrong WPA password, missing NM secret, locked keyring/polkit denial, enterprise +802.1X certificate/identity failure, and portal login-required are not fixed by +resetting or bouncing NetworkManager. The doctor sequence should classify these +as =needs-user-action= terminal states, stop before looping through destructive +repairs, and tell the user the exact next action (enter password, edit profile in +=nmtui=/=nmcli=, accept portal terms, provide cert/identity, or retry with +admin auth). This blocks because repeated reset/bounce against auth failures is +slow, noisy, and can make the network state worse without helping. + +Disposition: accept — added the =needs-user-action= terminal outcome to the +Doctor section: wrong password / missing secret / keyring-or-polkit denial / +802.1X cert-or-identity failure / portal-login-required all stop the doctor before +any destructive repair and name the exact next step. + +** DONE Define upstream/AP/provider failure terminal states :blocking: +Some failures are not client-repairable: AP has no uplink, hotel gateway is +down, DHCP server is broken, gateway drops traffic, ISP outage, or captive +portal backend is failing. The spec should define how =diagnose= proves "local +link is up but upstream is broken" and how =doctor --fix= stops after local +repairs are exhausted with a clear message like "local repairs tried; likely +upstream/AP/provider" plus the evidence. This blocks because users need to know +when to stop poking the laptop and switch networks or contact the venue. + +Disposition: accept — added the =upstream-not-local= terminal outcome: diagnose +proves link-up + IP + gateway-reachable but no route out and no captive redirect; +=doctor --fix= stops after local repairs with "local repairs tried; likely +upstream/AP/provider" + evidence → switch network / contact venue. + +** DONE Decide how VPN and policy routing affect v1 diagnosis +VPN/WireGuard management is Phase 5, but active VPNs, policy routes, DNS +overrides, and firewall killswitches can break apparent internet access in v1. +The current spec does not say whether v1 detects active VPN/policy routing and +classifies "network is fine, VPN route/DNS is broken" separately from WiFi +failure. Add either a v1 diagnostic check for active VPN/default-route/DNS +ownership with a "deferred repair" outcome, or explicitly state that VPN-routed +failures are out of scope and may be misclassified. This is blocking if Craig +expects the module to diagnose normal daily-driver network failures while VPN +tooling remains separate. + +Disposition: accept (chose the detect-and-classify option) — v1 detects an active +VPN / non-NM default route / non-NM DNS owner and classifies =deferred/vpn= ("link +is fine; internet is VPN-routed"), distinct from a WiFi failure. v1 does not +repair it (VPN management is Phase 5); it names the VPN as the likely owner and +stops. Added to the Doctor section + the coverage table + a doctor-classification +test. + +** DONE Remove stale GPG-store references from the resolved spec +The spec now decides "no separate credential store; secrets live in +NetworkManager", but the Testing plan still mentions =gpg round-trip= and =GPG +store= tests, and the panel-state list still mentions a no-GPG-key state. Remove +those stale references and replace them with NM-secret/no-secret-leak tests. +This is non-blocking for product behavior but blocking for implementation +clarity: otherwise tests will be written for a credential store that no longer +exists. + +Disposition: accept — replaced the Testing-plan =gpg round-trip= / =GPG store= +bullets with an "NM secrets / no-leak" test (add/edit writes the secret via nmcli; +assert no PSK/EAP in any JSON/log/error; no store to round-trip) and dropped the +=no-GPG-key= panel state. Residue from the cj-comment pass that dropped the store. + +** DONE Reconcile status, goal, and task text before implementation :blocking: +The spec status says "Implementation-ready with caveats" and "Phase 1 ready to +build", but the body still has an unresolved enterprise add/edit VERIFY, the +Goal still says "optional GPG-encrypted secret store", and the unified task title +still names "GPG-stored secrets" even though the accepted design removed the +store. Before implementation, make the top-level status, goal, scope, task +mapping, and resolved decisions agree with the current design. This blocks +readiness because a developer starting from the top of the file would still build +or plan around abandoned GPG-store behavior. + +Disposition: accept — fixed the Goal ("secrets stay in NM's own store"), the +=[#B]= task-mapping line (notes the "GPG-stored secrets" framing is superseded by +decision 5), the enterprise VERIFY (now resolved → Status updated), and corrected +the stale =pytest= mentions to =unittest= (the repo's actual harness). Top-of-file +status/goal/scope/decisions now agree with the design. + +** DONE Resolve enterprise add/edit scope or make the caveat explicit :blocking: +The spec still says "One open question for Craig: pull enterprise add/edit into +v1?" and points to a VERIFY in =todo.org=. That is a real product-scope decision: +if enterprise add/edit is in v1, panel forms, nmcli command sequences, tests, +error messages, and docs change materially; if it is out, the UI must consistently +show activate-only with "edit in nmtui/nmcli". Decide it in the spec before +implementation, or downgrade the status to =Ready with caveats= with this exact +accepted caveat. As written, the spec cannot be plain =Ready=. + +Disposition: accept — Craig decided (2026-06-29): enterprise add/edit is vNext, +activate-only in v1. Settled in the Status line, the Scope/Out bullet, decision 9, +and the VERIFY (now DONE in todo.org). The UI shows activate-only with "edit in +nmtui/nmcli" consistently. Evidence: 24 saved profiles, 0 enterprise. + +** DONE Define the concrete test harness and coverage gate :blocking: +The spec says TDD, fake binaries on PATH, and benchmark tests, but it does not +define the actual harness contract: pytest vs unittest for the =net= package, +where fake =nmcli=/=curl=/=speedtest-go=/=rfkill=/=resolvectl= live, how test +fixtures encode command histories, how subprocess timeouts are simulated, how +Waybar scripts are executed end-to-end, and how coverage is run. Add the exact +Makefile targets (=test=, =test-unit= or package-local =pytest=), pytest config, +coverage command (e.g. branch coverage over =net/= and =waybar-net= wrappers), +minimum threshold, and the rule for reading the coverage report to add missing +tests before declaring a phase done. This blocks readiness because "what is the +test harness?" is still answerable only by analogy to older suites. + +Disposition: accept — added the "Harness + coverage gate" section. Corrected the +premise: the repo is =unittest= (=make test= → =python3 -m unittest=, 33 suites), +not pytest. Pinned the fake-binary stub convention (=tests/<name>/fake-*= on a +temp PATH), the fixture command→output map, timeout simulation, the end-to-end +=waybar-net= subprocess run, and coverage via a throwaway venv (coverage.py is +absent system-wide) with a ≥90% branch target on the pure modules. + +** DONE Use coverage to find missing behavior, not just report a percentage :blocking: +The spec does not say how coverage findings affect implementation. For this +feature, line coverage alone can miss the important holes: doctor classification +branches, cleanup-unverified paths, redaction paths, degraded hot-path fallbacks, +timeout branches, and auth/upstream/VPN terminal states. Define coverage review +criteria per phase: branch coverage for pure classifiers and parsers, named +untested branches allowed only with comments or manual-check entries, and a +required "coverage gap pass" after the first green test run that maps uncovered +logic back to tests or consciously excluded live-only behavior. This blocks +readiness because the current test plan is broad but does not force the suite to +expose missing edge tests. + +Disposition: accept — added the "Coverage as a gap-finder, not a number (per +phase)" subsection: branch coverage required for the doctor classifier (every +outcome), cleanup-unverified, redaction, degraded-fallback, timeout, and the +parsers; a mandatory coverage-gap pass after the first green run mapping each +uncovered branch to a test or a named live-only exclusion; a phase isn't done +until that pass is recorded. + +** DONE Convert error classes into exact user-facing strings and evidence fields :blocking: +The failure table and doctor outcomes classify errors well, but many messages +are still templates or descriptions rather than final text. Add exact strings +for indicator tooltip, notification, CLI stderr, JSON =error.message=, and panel +banner/step text for every failure-mode row, including cases doctor cannot fix: +wrong password, missing secret, enterprise cert failure, upstream/AP/provider +failure, VPN-routed failure, hard rfkill block, DNS cleanup failure, speedtest +missing, and HTTP interception without parseable URL. For each string, specify +the redacted evidence included and the next action. This blocks UX readiness +because "useful error" is only testable once the actual text and evidence are +defined. + +Disposition: accept — rewrote the Failure states section: each row now carries the +exact final string (with =<placeholder>= evidence), the evidence field, and the +next action, plus a per-surface rendering rule (indicator tooltip / notify / +CLI+JSON error.message+detail+code / panel banner all render the one canonical +string). Added the missing doctor-unfixable rows: hard rfkill, wrong password / +missing secret, enterprise cert failure, upstream/AP/provider, VPN-routed, HTTP +interception without a parseable URL, and DNS cleanup-unverified. + +** DONE Add an enhancement disposition table +The spec captures several good enhancements (doctor, Makefile recovery, rfkill, +airplane absorption, VPN phase), but it does not show that low-cost adjacent +enhancements were considered and accepted/deferred/rejected. Add a small radar +table for likely affordances: copy redacted doctor report, open/copy portal URL, +retry with hardware MAC, forget network, rescan now, pin speedtest server, show +last good network/result, watch mode for =net doctor=, desktop notification +actions, QR-code/share WiFi import/export, and keyboard picker. Mark each +=v1=, =vNext=, or =rejected= with a one-line reason. This is non-blocking, but it +prevents accidental loss of cheap UX wins and keeps the v1 panel focused. + +Disposition: accept — added the "Enhancement radar" table dispositioning all the +named affordances: open/copy portal URL, forget network, rescan, hardware-MAC +retry, pin speedtest server, copy redacted doctor report = v1; last-good +network/result, doctor watch mode, actionable notifications, keyboard picker = +vNext; QR-share = rejected (low value for a 2-machine personal setup). + +** DONE Tighten the panel UX flow before Phase 2 +The panel has sections and state machines, but not a concrete interaction flow: +default focused section, row content, primary/secondary buttons, disabled-state +rules, confirmation wording for reset/bounce/DNS override, how "Get me online" +reports each escalation, what stays visible after the panel closes, and keyboard +navigation. Add a short UX flow spec or wire-level outline before Phase 2. This +is non-blocking for Phase 1, but it blocks Phase 2 implementation because a GTK +panel can easily become noisy or surprising if these defaults are invented while +coding. + +Disposition: accept — added the "Panel UX flow (settle before Phase 2)" +subsection: default focus (Connections, or Diagnose when opened from a captive +state), row content, one primary button per section, disabled-state rules, exact +confirmation wording for reset/bounce/DNS-override/remove, the live "Get me +online" escalation reporting, what survives panel close, and keyboard nav. + +** DONE Reconcile the panel navigation source of truth :blocking: +Disposition: accept — folded into "V2 panel UX". V2 (Connections | Diagnostics | +Performance) is the sole current target; the shipped four-page stack is marked history, +not active design. +The spec now names at least three navigation shapes: the shipped four-page stack +(Connections / Diagnose / Repair / Speed test), the V2 three-tab plan +(Connections / Diagnostics / Performance), and the redesign task's Diagnostics +sub-row (Diagnose / Get Me Online / Advanced). That leaves an implementer free +to keep extra pages and buttons even though Craig is explicitly asking for the +opposite. Make V2 the sole current target: one panel opened from the bar, top +tabs =Connections | Diagnostics | Performance=, with Diagnostics owning the +read-only checks, repair stream, debug capture, doctor report, and related +diagnostic evidence. Mark the old four-page stack as shipped history only, not +active design. This blocks the redesign because the page model determines what +code is deleted, not just labels. + +** DONE Fold speed tests into the diagnostic story :blocking: +Disposition: modify — Craig pre-decided Speed test lives under Performance (decision +19), and Performance carries future throughput history, which meets this finding's own +"keep the tab only if it carries ongoing throughput" condition. Accepted the rest: +Diagnostics runs a lightweight inline latency/throughput probe as a Diagnose evidence +row (with skip conditions for offline / metered / no-backend), and the full speed +result is stored in the doctor report. Folded into "V2 panel UX → Diagnostics owns the +diagnostic story". +Speed test is currently isolated under =Performance=, while the Goal and user +mental model treat speed, latency, and packet loss as part of "diagnostics." +That split risks another top-level button/page whose only job is a diagnostic +measurement. Keep the top-level =Performance= tab only if it carries ongoing +throughput/history later; for V2, specify that Diagnostics can run a lightweight +performance check from the same Diagnose/Get Me Online flow when internet is +available, and that the full speed test is presented as a diagnostic evidence +row or secondary action rather than a separate repair-adjacent workflow. Define +when it is skipped (offline, metered/hotspot warning, missing backend) and how +the result is stored in the doctor report. This is blocking because otherwise +the implementation preserves avoidable navigation and misses a useful failure +signal. + +** DONE Define saved-list vs available-scan semantics :blocking: +Disposition: accept — folded into "V2 panel UX → Connections". Saved / Available now / +Wired groups; Rescan refreshes only the availability/signal layer, never the Saved +list. +=net list= merges saved profiles with in-range scanned networks, while the panel +copy calls the page "Connections" and the control "Rescan." It is not clear to a +user whether they are looking at saved connections, currently available +networks, or both. The current implementation confirms the ambiguity: +=connections.py= lists saved profiles MRU-first, merges live signal/security for +saved profiles that are in range, then appends unsaved in-range SSIDs with +=uuid: nil=. Rename and specify the groups: e.g. =Saved= (instant, does not +require scan), =Available now= (scan-backed, may still be loading/stale), and +=Wired=. =Scan= should refresh only the availability/signal layer, not gate the +saved profile list. This blocks readiness because it affects loading behavior, +button enablement, and whether unsaved rows can be selected. + +** DONE Replace the Add page with join-from-row behavior :blocking: +Disposition: accept — folded into "V2 panel UX → Connections". Selecting an unsaved +Available-now row is the join flow (SSID/security prefilled); the standalone Add modal +is deleted for visible networks; hidden/manual join lives behind Advanced. +The current Add dialog asks for an SSID as free text even though a scan usually +already found the SSID and security type. That is redundant UI and a common +network-manager mistake: it turns "join this visible network" into "copy a name +from the list and type it again." V2 should remove the standalone Add button and +modal for normal visible networks. Selecting an unsaved available row should +become the join flow: the SSID/security are prefilled from the row, open +networks connect with a confirmation only if needed, WPA/WPA2/WPA3-Personal ask +only for the password, and hidden/manual SSID is tucked behind an Advanced +"Join hidden network" affordance. Keep edit/create for enterprise profiles out +of v1/V2 unless explicitly added later. This blocks the redesign because it +changes the primary connection workflow and deletes a whole page/control. + +** DONE Pin the supported authentication types in the join flow :blocking: +Disposition: accept — folded into "V2 panel UX → Supported authentication classes". +The spec says "open + WPA-PSK" and "enterprise activate-only," but cafe/hotel +networks also commonly appear as open captive portals, WPA/WPA2/WPA3-Personal +(PSK/SAE), and sometimes transition-mode networks; less commonly they use +enterprise/802.1X, WEP, OWE/enhanced-open, MAC registration, voucher portals, or +proxy-required networks. Define the V2 join matrix from the scanned NM +=SECURITY= value: supported inline (open, captive/open, WPA/WPA2/WPA3 Personal), +activate-only if already saved (802.1X/enterprise), hidden-manual behind +Advanced, and unsupported/rare types with a clear in-panel explanation plus a +non-terminal next step. If an auth type is common enough to support, support it +in the panel; if it is too rare for V2, say "not supported here yet" and keep +the user in the same UI rather than sending them to a terminal tool. Also define +what security label appears in the row so the user knows why a password is or is +not requested. This blocks because the Add/Join deletion above cannot be +implemented safely without knowing which auth classes the simplified flow covers. + +** DONE Fix destructive confirmation tense and verification +Disposition: accept — folded into "V2 panel UX → Forget confirmation". +The Forget confirmation says "The saved password is deleted" before the user has +clicked Forget. That reads as if the destructive action already happened. Change +the copy to future tense and name the scope, e.g. "This will remove the saved +NetworkManager profile and its stored password from this machine." After the +operation, verify the UUID is gone, refresh the Saved list, and report either +"Forgot <SSID>" or "Could not forget <SSID>; nothing changed / partial state +<evidence>." This is non-blocking because the existing confirm prevents an +accidental click, but the wording is misleading and the V2 "verify every action" +decision should cover it. + +** DONE Make connection loading progressive and observable :blocking: +Disposition: accept — folded into "V2 panel UX → Connections (progressive loading)". +Opening the panel currently says "Loading connections..." while =net list= +collects both saved profiles and the WiFi scan. Saved profiles do not require a +network scan, so a slow scan should not delay the saved list. Split loading into +two phases: render saved NM profiles immediately, then overlay availability, +signal, and unsaved in-range rows when the scan completes. Show a small +scan-in-progress state with elapsed time and stale-last-scan age, and make +Rescan update only the scan-backed fields. This blocks because it is the direct +answer to "why does it take so long to see the list of connections?" and keeps a +bad radio scan from making the whole panel feel broken. + +** DONE Define the visual contract with Waybar and existing Archsetup UI +Disposition: accept — folded into "V2 panel UX → Visual contract". +The panel is a layer-shell popup anchored under Waybar, but the spec does not +state the visual contract. The live Waybar theme uses a dark rounded capsule +(=border-radius: 1rem=), golden border, compact monospace text, and state colors +for =custom/net=; the GTK panel currently has a generic title, stack switcher, +default GTK controls, and square-ish/default widget corners. Add a short style +section: panel should read as a Waybar-attached popup, not a separate app; match +Waybar's palette, border/radius, spacing density, and state colors; avoid square +corners where surrounding UI is rounded; keep cards out of cards; use compact +icon+label controls with tooltips for advanced repairs. Also cite any existing +Archsetup-owned GTK/panel conventions that should be reused. This is +non-blocking for engine work but should block final V2 UX acceptance. + +** DONE Add a diagnostics report affordance that users can actually find +Disposition: accept — folded into "V2 panel UX → Findable diagnostics report". +The observability design has a JSONL event log, =diagnose --json=, automatic +verbose capture, and redacted bundles, but the panel flow does not yet define +the user affordance that turns those into an inspectable diagnosis. Add a +Diagnostics-side "Copy report" / "Open report" action after every diagnose, +repair, and speed/performance run. The report should include the current step +statuses, elapsed time, final classification, last speed/latency result when +available, scan age, route/interface owner, relevant redacted event-log tail, +and bundle path when verbose capture ran. It must explicitly say whether any +repair mutated state and whether cleanup/verification passed. This is blocking +for observability because "logs exist somewhere" is not enough when the network +is already failing. + +* Review and iteration history + +** 2026-06-29 Mon @ 17:00:39 -0400 — Codex — reviewer + +- *What changed or was recommended:* Rubric: =Not ready=. Applied the + spec-review workflow and added blocking findings for diagnostics structure, + user-facing errors, observability, nmcli contracts, cache concurrency, + hot-path performance, Waybar click actions, mutation rollback, credential + security, unsupported WiFi types, panel states, speed-test dependency + verification, install boundaries, and test coverage. +- *Why:* The spec has the right broad shape, but this feature combines a status + indicator, NetworkManager mutation, sudo-backed remediation, captive-portal + probing, secret storage, and GTK UI. Too many failure modes were still left for + implementers to invent during coding. +- *Artifacts:* Findings recorded in =Review findings=. Local code read included + =~/.dotfiles/hyprland/.local/bin/waybar-netspeed=, + =~/.dotfiles/common/.local/bin/captive=, + =~/.dotfiles/tests/waybar-netspeed/test_waybar_netspeed.py=, + =~/.dotfiles/tests/captive/test_captive.py=, pocketbook's GTK layer-shell + scaffold, current Waybar config, and the linked =todo.org= tasks. External + checks included the Arch nmcli man page and upstream LibreSpeed CLI page. + +** 2026-06-29 Mon @ 17:35:00 -0400 — Claude Code (archsetup) — responder + +- *What changed:* Processed all 18 findings — 13 accepted, 5 modified, 0 + rejected. Accepts added new sections: Repository + dependencies, Performance + budgets, nmcli contract, JSON schemas, Diagnostics contract (+ Diagnose vs + Repair), Failure states/messages/recovery, Observability, cache + concurrency/atomicity/staleness, mutation safety + rollback, the rewritten + credential security model, the expanded test plan, JSON schemas, the renamed + Implementation phases with per-phase acceptance criteria, and a Status line. + Modifies: deferred the =net debug-bundle= command (log + =diagnose --json= + cover it), made click actions phase-aware + non-blocking, scoped enterprise + WiFi to activate-only (add/edit vNext), trimmed the panel state set to states + that occur on the two-machine fleet, and switched the speed-test backend to the + already-installed =speedtest-go= (verified live) instead of a new + librespeed-cli dependency. +- *Why:* Codex's review was strong and mostly right; the modifies are scope calls + for a single-user, two-machine tool (avoid gold-plating, use installed deps, + don't model impossible states) and one phasing reconciliation (the panel is + Phase 2, so Phase-1 clicks can't open it). +- *Artifacts:* All 18 findings completed =DONE= in place with dispositions. Live + check: =pacman -Qq | grep speed= (speedtest-go-bin installed; =speedtest-go + --json= confirmed). Status now "review incorporated (Codex, 2026-06-29)", + implementation-ready with the named Phase-2/3 caveats; Phase 1 ready to build. + +** 2026-06-29 Mon @ 17:37:58 -0400 — Claude Code (archsetup) — responder (cj comments) + +- *What changed:* Folded in Craig's 10 cj comments on the spec. Design changes: + (1) dropped the separate credential store entirely — secrets stay in NM's own + =system-connections= (root =0600=, inline), touched via nmcli; no GPG, no + gpg-agent (rewrote the secrets section, decision 5, dropped Phase 4 + the gpg + dep). (2) Added =net doctor [--fix]= + Makefile console-recovery targets + (=make online= etc.) as a first-class TTY path; reversed the earlier + defer-the-doctor call (decision 10). (3) Added a full-stack =bounce= repair and + an =rfkill= unblock repair + indicator state — the rfkill one recovers the + framework-laptop post-power-loss soft-block Craig hit. (4) =custom/net= absorbs + the airplane module; the standalone airplane scripts/tests/module are deleted on + ship. (5) Moved VPN/WireGuard from "out" to a planned Phase 5. (6) Added a + "Help + documentation" section (CLI help / panel help / user guide). Answered + the enterprise-defer rationale and the captive-auto-login explanation inline. +- *Why:* Craig's comments simplify (NM-only secrets, fewer deps) and harden the + recovery story (doctor + make targets + rfkill/bounce reachable from a dead-GUI + console — his stated need). Enterprise add/edit in v1 is the one open call, + raised as a VERIFY. +- *Artifacts:* All 10 cj blocks removed. Live checks: airplane scripts/tests + present (confirmed deletion targets); =/etc/NetworkManager/system-connections/= + is root =0600= with inline secrets (confirms the NM-only secret model); + =rfkill= present. VERIFY filed in =todo.org= for the enterprise-scope decision. + +** 2026-06-29 Mon @ 17:49:24 -0400 — Codex — reviewer + +- *What changed or was recommended:* Added a second feedback batch from the + common network failure-mode review: require a diagnose/doctor coverage table, + pin DNS repair semantics, make auth failures terminal user-action states, + define upstream/AP/provider terminal states, decide how active VPN/policy + routing affects v1 diagnosis, and remove stale GPG-store references. +- *Why:* The design now has a solid recovery shape, but the field question is + sharper: for each common failure, does =diagnose= catch it, does =doctor --fix= + fix it, or does the user need to switch networks/provide credentials/contact + the venue? That mapping needs to live in the spec, not in chat. +- *Artifacts:* Findings appended under =Review findings= after reviewing the + current spec's Diagnostics, Failure states, Doctor, Testing plan, and + responder history. + +** 2026-06-29 Mon @ 18:06:22 -0400 — Codex — reviewer + +- *What changed or was recommended:* Added a readiness pass focused on + development-time questions, missed enhancements, UX cleanliness, error wording, + and test/coverage mechanics. New findings require reconciling stale status/goal + text, resolving or explicitly caveating enterprise add/edit, defining the + concrete harness and coverage gate, making coverage actionable, turning failure + classes into exact user-facing strings, adding an enhancement disposition + table, and tightening the Phase-2 panel UX flow. +- *Why:* The network failure model is now strong, but implementation would still + force developers to invent test infrastructure, coverage expectations, final + error copy, and several UX details. The spec should decide those up front or + explicitly mark them as caveats. +- *Artifacts:* Findings appended under =Review findings= after reading the + updated spec, current archsetup Makefile test targets, pocketbook's pyproject, + and the linked =todo.org= enterprise VERIFY. + +** 2026-06-29 Mon @ 17:55:08 -0400 — Claude Code (archsetup) — responder (2nd review) + +- *What changed:* Processed all 6 second-batch findings — 6 accepted, 0 modified, + 0 rejected. Added a "Doctor: escalation, classification, terminal states" + section with four outcomes (=fixable= escalating lightest-first; =needs-user- + action=, =upstream-not-local=, =deferred/vpn= terminal) and explicit per-class + DNS handling. Added a "Failure-mode coverage" table (22 rows: detect / doctor-fix + / terminal action, org-table-standard conformant, ≤120). Cleaned the last + GPG-store residue (Testing plan → NM-secret/no-leak test; dropped the no-GPG-key + panel state). Added a doctor-classification test class. +- *Why:* The findings sharpen the doctor from "escalate repairs" to "classify, + then stop at the right terminal state" — the safety property that keeps + =doctor --fix= from looping destructive repairs against a wrong password or an + upstream outage. The VPN finding got the detect-and-classify option (not repair; + that's Phase 5) so v1 doesn't misread a VPN-routed failure as WiFi down. The + coverage table is the single artifact Craig asked for to trust the field + coverage. +- *Artifacts:* All 6 findings =DONE= in place; cookie =[24/24]=. Table conformed + via =wrap-org-table.el= (max line 120). No new findings introduced; no scope + expansion past what the findings asked. Spec remains implementation-ready, the + enterprise add/edit VERIFY still the one open decision. + +** 2026-06-29 Mon @ 18:24:51 -0400 — Claude Code (archsetup) — responder (3rd review) + +- *What changed:* Processed the third review (7 findings). The enterprise-scope + finding closed first on Craig's call (vNext, activate-only). The other 6, all + accepted: (1) reconciled the top-of-file text — fixed the Goal's GPG-store + wording, the =[#B]= task-mapping line, the resolved enterprise VERIFY, and + corrected the stale =pytest= mentions to =unittest= (the repo's real harness). + (2) Added a "Harness + coverage gate" section (unittest, fake-binary stubs on a + temp PATH, venv coverage, ≥90% branch on pure modules). (3) Added a per-phase + "coverage as a gap-finder" pass. (4) Rewrote the Failure states section to exact + final strings + evidence fields + a per-surface rendering rule, and added the + missing doctor-unfixable rows. (5) Added the "Enhancement radar" table + (v1/vNext/rejected). (6) Added the "Panel UX flow" subsection. +- *Why:* The findings close the gap between "design decided" and "a developer can + start": the harness/coverage contract, the exact UX strings, and the panel flow + are the things otherwise invented mid-code. The =pytest=→=unittest= correction + was a real defect — the spec contradicted the repo's actual test convention. +- *Artifacts:* All 31 findings =DONE=; cookie =[31/31]=. Both new tables conformed + via =wrap-org-table.el= (coverage 120, radar 110). Harness verified against the + live repo (33 unittest suites, =make test=, coverage.py absent → venv). Status + raised to "Ready for Phase 1; Ready-with-caveats overall" — no open decisions + remain. + +** 2026-06-30 Tue @ 16:32:07 -0400 — Claude Code (archsetup) — responder (build + V2 redesign) + +- *What changed:* Brought the spec current with what shipped and what got decided + across 2026-06-29/30. Recorded Phases 1-3 as SHIPPED (engine, indicator, GTK4 + panel, bar clicks). Added the native captive-login engine (=portal-login= repair + tier replacing the =captive= shell-out), the live-testing portal UX fixes (removed + the polkit-gated flush, already-online short-circuit, Chrome first-run suppression, + in-panel portal, extractor hardening), and the panel auto-hide + Close button. + Then folded in the V2 redesign Craig directed: no terminals anywhere, a passwordless + root-helper + NOPASSWD sudoers as the enabler, verify-every-action, the full + ~44-mode failure catalog (now the authoritative coverage, in todo.org), and the + Connections | Diagnostics | Performance nav. Added decisions 15-19, V2 phases + 2.1-2.4, and corrected the two failure-table rows the shipped code never matched + (gateway-bounce, venue-DNS auto-revert). +- *Why:* The spec had drifted behind the code (still said "Phase 1 shipped, terminal + for privileged ops") and behind Craig's redesign calls. The doc is the contract for + the V2 build, so it has to state the no-terminal reversal, the sudo-helper + dependency, and where coverage now lives. +- *Artifacts:* Updated Status, Architecture, Engine CLI surface, Doctor DNS handling, + Failure-mode coverage, Interactions, Panel, Resolved decisions, Implementation + phases. Shipped commits referenced: dotfiles =a7d7559=, =eef6b0b=, =450b7f0=. Full + catalog + redesign children in todo.org "Network panel redesign". + +** 2026-06-30 Tue @ 17:29:29 -0400 — Codex — reviewer + +- *What changed or was recommended:* Rubric: =Not ready= for the V2 redesign until + the new blocking findings are dispositioned. Added a fourth review batch focused on + diagnostics/observability and the simple UX questions Craig raised: reconcile the + active panel nav, fold performance/speed testing into the diagnostics story, make + saved-vs-available list semantics explicit, replace the Add page with join-from-row, + pin supported WiFi authentication classes, fix Forget confirmation tense, make + connection loading progressive, define the Waybar/Archsetup visual contract, and add + a findable redacted diagnostics report affordance. +- *Why:* The spec has strong engine and failure-mode coverage, but the current doc and + shipped implementation still replicate common network-manager UX mistakes: ambiguous + mixed lists, scan-gated saved data, redundant SSID entry, misleading destructive + copy, diagnostics split across too many pages, and observability that exists in logs + more than in the user's workflow. +- *Artifacts:* Findings appended under =Review findings= after reading the + spec-review workflow, the target spec, todo.org's redesign task, and current + dotfiles implementation files =net/src/net/connections.py=, + =net/src/net/gui.py=, =net/src/net/panel.py=, and Waybar =style.css=. + +** 2026-06-30 Tue @ 17:36:40 -0400 — Claude Code (archsetup) — responder (4th review) + +- *What changed:* Dispositioned all nine fourth-review findings (8 accept, 1 modify, + 0 reject) and wove them into a new "V2 panel UX — the target design" section. + Accepts: V2's three-tab nav is the sole target (the four-page stack is history); + Connections splits into Saved / Available now / Wired with Rescan touching only the + scan layer; selecting an unsaved row is the join flow (the Add modal is deleted); + a join matrix pins which NM SECURITY classes are inline-supported / activate-only / + hidden / unsupported; loading renders Saved first and overlays the scan; the Forget + copy goes future-tense + verified; every run ends with a Copy/Open redacted report; + and a Waybar visual contract (rounded capsule, golden border, state colors). Modify: + the speed-test finding kept Craig's decision-19 placement (full speed test under + Performance, which carries future history) while accepting a lightweight inline + latency probe as Diagnose evidence stored in the doctor report. Cookie [40/40]. +- *Why:* Codex read the live implementation and caught the UX places where the module + still replicated common network-manager mistakes — mixed lists, scan-gated saved + data, redundant SSID entry, misleading destructive copy, diagnostics scattered + across pages, observability that lived in logs more than the workflow. +- *Artifacts:* Findings 32-40 completed in place with dispositions; the modify reason + on the speed-test finding. New "V2 panel UX" section under Panel. todo.org redesign + task updated to point the V2 build at the dispositioned design. + +** 2026-07-01 Wed @ 10:43:18 -0400 — Claude Code (archsetup) — responder (cj comments) + +- *What changed:* Folded in Craig's three cj comments from his review. (1) Notification + rendering: title = "Networking", body = the failure label on its own line then the + canonical string. (2) VPN kill-switch: added a "kill-switch blocking" failure state + plus a detection-and-correction strategy off the =deferred-vpn= branch (rootless + cascade over =ip rule= fwmark 0xca6c / =wg show= / Proton =pvpn-*= NM connections / + =nft=/=iptables= drop tables / firewalld =drop= zone; classify blocking only when a + block artifact exists AND no tunnel is up; correction surfaces the exact root command + per artifact). (3) Terminals: strengthened decision 15 to "no terminal ever reports + to or collects input from the user", disambiguated from the doctor's "terminal + states" wording. +- *Why:* Craig's review annotations. The kill-switch closes a real gap in the + VPN-routed classification; the terminal directive makes the no-terminal rule + absolute for the module UX. +- *Artifacts:* Three cj blocks removed. VPN research subagent cited wg-quick man page, + Pro Custodibus, System76/Proton killswitch docs, and local =doctor.py:42= / + =classify.py:60= / =USNY.conf:15=. One open tension filed as a VERIFY in todo.org: + the dead-GUI console-recovery path (=make online= from a TTY) vs the no-terminal + directive. diff --git a/docs/design/2026-06-29-waybar-timer-module-spec.org b/docs/design/2026-06-29-waybar-timer-module-spec.org new file mode 100644 index 0000000..4b0ed0e --- /dev/null +++ b/docs/design/2026-06-29-waybar-timer-module-spec.org @@ -0,0 +1,217 @@ +#+TITLE: Waybar Timer Module (wtimer) — Design Spec +#+AUTHOR: Craig Jennings & Claude +#+DATE: 2026-06-29 + +* Goal + +One always-visible waybar module that keeps time four ways — countdown timer, +wall-clock alarm, count-up stopwatch, and pomodoro — with several items running +at once. The bar shows the most urgent item with a per-type glyph; the tooltip +lists them all. Backed by a single =wtimer= script over a small JSON state file. +notify fires on completion. fuzzel drives creation. No GTK app. + +Source task: archsetup =todo.org= "Waybar timer module" (=:waybar:=), including +the folded roam-capture scope expansion (mode-selectable single panel, +stopwatch, multiple simultaneous, per-mode hover text). + +* Scope + +** In +- *Timer* — count down a duration, notify on elapse, then remove. +- *Alarm* — fire at a wall-clock time, notify, then remove. +- *Stopwatch* — count up from start; pause/resume; manual stop. +- *Pomodoro* — work/break cycles (25/5, long break 15 after 4 works), auto-advance with a notify at each phase change, runs until cancelled. +- *Multiple simultaneous* — N items of any mix held in state. Bar shows one primary item plus a =+N= badge; tooltip lists every item with its remaining/elapsed and label. +- *Pause / resume* per item; *cancel* one or all. +- *Interactions* — click to create (fuzzel), middle-click pause/resume primary, right-click cancel (fuzzel pick), scroll to cycle which item is primary. +- *Per-type glyph + CSS state classes* (running / paused / urgent / break). +- *Persistence across waybar restarts* (state file in the runtime dir). + +** Out (v1, note for later) +- No GTK panel — waybar module + tooltip + fuzzel only. +- No persistence across *reboot* (runtime-dir state clears). Alarms set before a reboot won't survive. Acceptable v1; revisit with =~/.local/state= + a catch-up-on-boot pass if wanted. +- No sound selection per item (uses notify's type sound). +- No history/stats of completed pomodoros beyond the current run's cycle count. + +* Architecture + +- =wtimer= — a single executable Python script in =hyprland/.local/bin/=. Chosen over POSIX sh (the other waybar backings) deliberately: the multi-item state machine, time arithmetic, pomodoro FSM, and JSON I/O are cleaner in Python, and it gives real line/branch *coverage numbers* (Craig asked for them). Precedent: pocketbook is Python in this repo. +- *Pure core + thin IO shell.* All logic is pure functions taking =now= as a parameter (dependency-injected clock — satisfies testing.md: no recursion, no scope-shadowing, production reads =time.time()=, tests pass an explicit instant). The CLI layer does the IO: read state, call pure fns, write state, emit JSON, shell out to notify/fuzzel. +- *State file*: =$XDG_RUNTIME_DIR/waybar/wtimer.json= (env override =WTIMER_STATE= for tests). Same runtime-dir convention as =sysmon-metric=. +- *Heartbeat*: waybar calls =wtimer render= every 1s. =render= runs the tick logic first (detect elapsed items, fire notify, advance pomodoro, drop finished timers/alarms), then prints the waybar JSON. One entry point waybar polls; no separate daemon. +- *Concurrency (BLOCKER from review).* The 1s =render= and the click/scroll handlers (=add=, =toggle=, =cancel=, =cycle=) are separate processes doing read-modify-write on the same state file. Without serialization, last-writer-wins drops a click's =add=, or clobbers render's "item removed/advanced" write so the same item ticks and notifies again next second. So every read-modify-write takes an exclusive =flock= on the state file for the whole cycle, and writes go through a temp file + =os.replace= (atomic), so a concurrent render never reads a half-written file. This is what actually makes "notify fires once" true — the mutation is only authoritative under the lock. +- *State dir*: =render= and the mutating commands =mkdir -p= the state dir first (=$XDG_RUNTIME_DIR/waybar/= may not exist on a fresh boot). +- *Clock injection everywhere*: =now= comes from =WTIMER_NOW= (epoch) if set, else =time.time()=. Pure fns take =now= as a parameter; the CLI seeds it from the env. This lets the CLI integration tests hit boundary instants (exactly-at-target), not just the pure-fn tests. +- *Instant refresh*: after any mutating command, send waybar =SIGRTMIN+14= (the module's signal) so the bar updates immediately instead of lagging up to 1s. Faked in tests (=WTIMER_REFRESH= override, default =pkill -RTMIN+14 waybar=). + +* State model + +#+begin_src json +{ + "items": [ + {"id": "1", "type": "timer", "label": "tea", "target": 1751240400, "duration": 300, "paused_left": null}, + {"id": "2", "type": "alarm", "label": "", "target": 1751251200, "paused_left": null}, + {"id": "3", "type": "stopwatch", "label": "", "start": 1751240000, "paused_elapsed": null}, + {"id": "4", "type": "pomodoro", "label": "", "target": 1751241900, "phase": "work", + "cycle": 1, "work": 1500, "short": 300, "long": 900, "interval": 4, "paused_left": null} + ], + "primary": "1", + "seq": 4 +} +#+end_src + +- =seq= is the monotonic id source (string ids). +- *Paused* timer/pomodoro: =paused_left= holds seconds remaining; =target= ignored while paused; resume sets =target = now + paused_left=, =paused_left = null=. +- *Paused* stopwatch: =paused_elapsed= holds elapsed seconds; resume sets =start = now - paused_elapsed=. +- =primary= is the id the bar shows; =null= or stale → auto-select (below). + +* Display logic + +** Primary selection (bar text) +1. If =primary= names a live item, show it. +2. Else the running countdown (timer/alarm/pomodoro) with the smallest remaining. +3. Else the first running stopwatch. +4. Else idle (no items). + +** Bar text +- =<glyph> <time>= for the primary, plus = +N= when N other items exist. +- Idle: a dim timer glyph alone (or empty — decide at render; lean dim glyph so the module has a stable click target). +- =time= formatting: =M:SS= under 1h, =H:MM:SS= at/over 1h. Stopwatch counts up; timer/alarm/pomodoro count down to target. +- Paused item: prefix a pause glyph or rely on the =paused= class (CSS dims it). + +** Glyphs (nerd font; final codepoints verified live before merge) +- timer , alarm , stopwatch , pomodoro-work , pomodoro-break (coffee), paused , idle (dim). +- One glyph table at the top of the script so a live-render tweak is one edit. + +** Tooltip (all items) +One line per item: =<glyph> <label-or-type> <remaining/elapsed> (<state>)=. Pomodoro line shows phase + cycle (e.g. =work 2/4=). Header line summarizes count. Empty state: "No timers". + +** CSS classes (the =alt=/=class= field) +=timer= / =alarm= / =stopwatch= / =pomodoro-work= / =pomodoro-break=, plus =paused= and =urgent= (remaining < 60s). Drives color in style.css + both themes. + +* Commands (CLI) + +| Command | Effect | +|---------------------------------+---------------------------------------------------------------------| +| =wtimer render= | tick + emit waybar JSON (the heartbeat) | +| =wtimer add timer <dur> [label]=| add a countdown (=dur= like =25m=, =90s=, =1h30m=, =5= → minutes) | +| =wtimer add alarm <HH:MM> [lbl]=| add a wall-clock alarm (next occurrence of that time) | +| =wtimer add stopwatch [label]= | start a count-up | +| =wtimer add pomodoro [label]= | start a pomodoro at work phase | +| =wtimer new= | fuzzel: pick type, prompt value, dispatch to =add= (thin wrapper) | +| =wtimer toggle [id]= | pause/resume the item (default: primary) | +| =wtimer cancel <id>= | remove one item | +| =wtimer pick-cancel= | fuzzel: choose an item to cancel (right-click handler) | +| =wtimer cancel-all= | clear all | +| =wtimer cycle [next|prev]= | move the primary pointer across all items (incl. paused), state-list order, wrapping | + +Duration parse: =Nh=, =Nm=, =Ns= combos, or a bare integer = minutes. Reject +unparseable input (exit non-zero, notify nothing). Alarm parse: =HH:MM= 24h; if +that time today already passed, target tomorrow. + +* Notifications + +- Timer elapse: =notify alarm "Timer" "<label or duration> done" --persist=. +- Alarm fire: =notify alarm "Alarm" "<HH:MM><, label>" --persist=. +- Pomodoro phase change: =notify info "Pomodoro" "Work → short break (3/4)"= (no =--persist=; phase nudges shouldn't pile up), long-break and work-resume worded accordingly. +- notify is faked on PATH in tests; assert type + that it fired once per event. + +* Pomodoro semantics + +- Defaults: work 25m, short 5m, long 15m, interval 4 (long break after every 4th work). +- FSM: work → short → work → short → work → short → work → long → work … +- =cycle= counts completed works in the current set (1..interval); resets after a long break. +- Each phase elapse advances =phase=, recomputes =target=, fires the phase notify. Pomodoro never auto-removes; cancel ends it. + +* Waybar wiring + +** Module def (config) — signal 14 (next free; 8–13 used) +#+begin_src json +"custom/timer": { + "exec": "wtimer render", + "return-type": "json", + "interval": 1, + "signal": 14, + "on-click": "wtimer new", + "on-click-middle": "wtimer toggle", + "on-click-right": "wtimer pick-cancel", + "on-scroll-up": "wtimer cycle next", + "on-scroll-down": "wtimer cycle prev" +} +#+end_src + +** Position — right of the sysmon (battery/resource) module +Insert =custom/timer= into =modules-right= immediately after =custom/sysmon= +(between =custom/sysmon= and =custom/netspeed=). On screen that places it just +right of the battery/resource readout. + +** Not collapsible — survives the right-side collapse +The module *definition* lives in the canonical config object, and =waybar-collapse= +only swaps the =modules-right= *array* in the runtime copy (which it seeds from +canonical, so the def is always present). So making the timer non-collapsible is +purely an array-membership change: add =custom/timer= to the =waybar-collapse= +right *base set* so it stays listed when the right side collapses: +- laptop: =["custom/arrow-right","custom/sysmon","custom/timer","tray","custom/date","custom/worldclock"]= +- desktop: =["custom/arrow-right","custom/timer","tray","custom/date","custom/worldclock"]= +Update the =tests/waybar-collapse= base-set expectations to match (TDD the change). + +* CSS + +Add =#custom-timer= plus the state classes to all three stylesheets. Keep the +*selectors and structure* parallel across the three (what the theme-drift test +checks); the actual color *values* are per-theme (dupre vs hudson) and differ by +design, so this is structural parity, not byte-identity. Confirm against the real +CSS files what the drift test compares before editing. +- =hyprland/.config/waybar/style.css= +- =hyprland/.config/themes/dupre/...= waybar css +- =hyprland/.config/themes/hudson/...= waybar css +Colors: normal = foreground; =urgent= = a warning hue (reuse the sysmon +warn/crit palette); =paused= = dimmed; =pomodoro-break= = a calmer accent. + +* Testing plan (TDD) + +- Suite: =tests/wtimer/test_wtimer.py= (auto-discovered by =make test='s =tests/*/test_*.py= glob — no enumeration gap). +- *Pure-function tests* (fast, the bulk), explicit injected =now=: + - =parse_duration=: =25m=, =90s=, =1h30m=, =5= (→min), =0=, negative, garbage, empty (Normal/Boundary/Error). + - =parse_alarm=: future today, already-passed-today → tomorrow, =00:00=, =23:59=, =24:00=/=12:60= invalid, non-=HH:MM=. + - =format_time=: 0, 59s, 60s, 3599s, 3600s, multi-hour, negative clamps to 0. + - =add_item= for each type; =seq= increments; ids unique. + - =tick=: timer not-yet-elapsed (no change), exactly-at-target, past-target (fires once, removed); alarm same; pomodoro work→short→…→long→work advance + cycle counting + the 4th-work→long boundary; paused items never tick; multiple items in one tick. + - =select_primary=: explicit primary, stale primary falls back, soonest-remaining rule, stopwatch-only, empty. + - =render_payload=: text/tooltip/class for each type + paused + urgent + =+N= badge + idle. + - =toggle= pause then resume round-trips remaining/elapsed exactly; =cycle= wraps; =cancel= / =cancel-all=. +- *CLI integration tests* (subprocess, fakes on PATH, =WTIMER_NOW= to hit boundaries): =add= then =render= round-trip; =render= fires the faked =notify= once on an elapsed item and drops it; state file created if absent; *missing parent dir* created (fresh-boot case); corrupt/empty state file → treated as empty, no crash; mutating command sends the faked refresh signal. +- *Concurrency test*: spawn overlapping =render= + a mutating command against one state file; assert no lost update (the added item survives) and exactly-once notify (no double-fire from a clobbered tick). This is the regression guard for the flock/atomic-write fix. +- *Mocking boundary*: fake =notify=, =fuzzel=, =killall= on PATH (record calls); never mock the wtimer logic. Clock injected as a parameter. +- *Coverage*: measure with =coverage.py= if present (target 90%+ on the logic per testing.md business-logic bar); report the actual number. If =coverage= is absent, report per-command/per-branch case coverage explicitly and flag the tool gap (verification.md). +- =tests/waybar-collapse= base-set expectations updated for the new module. +- =tests/= theme-drift check stays green (CSS parity). + +* Files touched + +dotfiles branch =waybar-timer-module=: +- =hyprland/.local/bin/wtimer= (new, executable) +- =tests/wtimer/test_wtimer.py= (new) +- =hyprland/.config/waybar/config= (module def + modules-right position) +- =hyprland/.local/bin/waybar-collapse= (base-set) + =tests/waybar-collapse/...= (expectations) +- =hyprland/.config/waybar/style.css= + dupre + hudson waybar css (CSS) + +archsetup (main, at the end): +- this spec +- =todo.org= task closure + +* Resolved decisions (no approvals — my calls) + +- Python, not sh — testability + coverage; pocketbook precedent. +- One =render= heartbeat (no daemon) — simplest, waybar already polls. +- notify fires from =render='s tick, mutation guarantees once-only. +- Primary = user-cycled, else soonest-remaining; =+N= badge for the rest. +- Multiple simultaneous via tooltip list + badge (not a GTK panel) — keeps it "cool yet simple". +- Pomodoro is one self-advancing item, not four chained timers. +- Runtime-dir state (waybar-restart durable, not reboot durable) — v1. + +* Rollback + +All code on the dotfiles =waybar-timer-module= branch off =09815f3=. Squash-merge +at the end; =git switch main && git branch -D waybar-timer-module= reverts cleanly +if it goes sideways. diff --git a/docs/design/2026-06-29-zfs-pre-snapshot-installer.org b/docs/design/2026-06-29-zfs-pre-snapshot-installer.org new file mode 100644 index 0000000..e5a339e --- /dev/null +++ b/docs/design/2026-06-29-zfs-pre-snapshot-installer.org @@ -0,0 +1,106 @@ +#+TITLE: ZFS pre-pacman snapshot installer step (durable retention) +#+DATE: 2026-06-29 +#+SOURCE: handoff from the home project, 2026-06-29 + +* Problem + +A pacman =PreTransaction= hook snapshots =zroot/ROOT/default@pre-pacman_<ts>= +before every transaction, but nothing prunes them. Sanoid doesn't manage them +(they aren't =autosnap_= names), so they accumulated to 53 on velox between +April and the 2026-06-29 health check. Unbounded, they fill the pool over time. + +* What's actually on velox vs. archsetup + +The live =/usr/local/bin/zfs-pre-snapshot= is *not* authored by archsetup — +=git grep= for its content (=MIN_INTERVAL=, the pre-pacman =LOCKFILE= logic) +finds nothing tracked. The =PreTransaction= hooks in the archsetup monolith +(~lines 910, 1907, 1942) are the live-update guard, a different hook. The +script appears hand-placed on velox. + +The 2026-01-17 security doc line "ZFS pre-pacman snapshots (already in +install-archzfs)" is therefore *out of date* — archsetup does not install this. +Incorporating the fix is a NET-NEW installer step, not a patch to an existing +one. Correct that stale doc line as part of the work. + +velox was patched live (pruned to 10, script replaced with the self-pruning +version below); live backup at =/usr/local/bin/zfs-pre-snapshot.bak-2026-06-29=. + +* Proposed installer step + +In the archzfs / ZFS-on-root install path, gated to ZFS-root installs (velox is +the only ZFS daily driver; ratio is btrfs), install: + +1. =/etc/pacman.d/hooks/zfs-snapshot.hook= — the =PreTransaction= hook that + runs the script. *Not included in the handoff* — source it from velox + (=/etc/pacman.d/hooks/zfs-snapshot.hook=) or write it. +2. =/usr/local/bin/zfs-pre-snapshot= — the =KEEP=10= self-pruning version + below. + +Tests live in archsetup, so this wants an archsetup session and a ZFS-root VM +test (=make test FS_PROFILE=zfs=), not a cross-project edit from home. + +* The script (KEEP=10 self-pruning version) + +#+begin_src bash +#!/bin/bash +POOL="zroot" +DATASET="$POOL/ROOT/default" +LOCKFILE="/tmp/.zfs-pre-snapshot.lock" +MIN_INTERVAL=60 +KEEP=10 # how many pre-pacman snapshots to retain (rollback safety for recent transactions) + +# Skip if a snapshot was created within the last 60 seconds +if [ -f "$LOCKFILE" ]; then + last=$(stat -c %Y "$LOCKFILE" 2>/dev/null || echo 0) + now=$(date +%s) + if (( now - last < MIN_INTERVAL )); then + exit 0 + fi +fi + +TIMESTAMP=$(date +%Y-%m-%d_%H-%M-%S) +SNAPSHOT_NAME="pre-pacman_$TIMESTAMP" + +if zfs snapshot "$DATASET@$SNAPSHOT_NAME"; then + echo "Created snapshot: $DATASET@$SNAPSHOT_NAME" + touch "$LOCKFILE" + + # Retention: keep only the most recent $KEEP pre-pacman snapshots, destroy older ones. + # Sanoid does not manage these (they aren't autosnap_), so prune them here at creation time. + zfs list -H -o name -t snapshot -s creation "$DATASET" 2>/dev/null \ + | grep '@pre-pacman_' \ + | head -n -"$KEEP" \ + | while read -r old; do + zfs destroy "$old" && echo "Pruned old snapshot: $old" + done +else + echo "Warning: Failed to create snapshot" >&2 +fi +#+end_src + +* Implementation (2026-06-30) + +- Hook sourced from velox (=/etc/pacman.d/hooks/zfs-snapshot.hook=) and embedded + as a heredoc in =configure_pre_pacman_snapshots()=. +- Insertion point: a new =configure_pre_pacman_snapshots()= gated on + =is_zfs_root=, called from =boot_ux= (the last step) so the hook doesn't fire + during the install's own package operations — the first pre-pacman snapshot is + the fresh system. The script ships as =scripts/zfs-pre-snapshot= (the + =zfs-replicate= pattern), made =ZFS_PRE_*=-env-overridable for testability. +- Tests: =tests/zfs-pre-snapshot/= unit-tests the pruning logic against a fake + =zfs= (creates, prunes oldest-past-KEEP, ignores non-=pre-pacman_= snapshots, + honors the lockfile, warns on snapshot failure); =test_boot.py= asserts the + hook + script land on a ZFS install; the orchestrator test pins the new + =boot_ux= substep. + +* Note on the "stale security doc" + +The 2026-01-17 line "ZFS pre-pacman snapshots (already in install-archzfs)" is +*not* stale: that file is an archive generated by install-archzfs (see its +header and footer), and the claim is accurate for install-archzfs. The real gap +was that archsetup took sanoid from install-archzfs but never ported the +pre-pacman hook. This change ports it. The archive is left untouched. + +* Remaining + +- ZFS-root VM verification (=make test FS_PROFILE=zfs=) before the task closes. diff --git a/docs/design/2026-06-30-captive-portal-login.org b/docs/design/2026-06-30-captive-portal-login.org new file mode 100644 index 0000000..1739689 --- /dev/null +++ b/docs/design/2026-06-30-captive-portal-login.org @@ -0,0 +1,89 @@ +#+TITLE: Captive-portal login — learnings + baking it into the net panel +#+DATE: 2026-06-30 +#+SOURCE: the 2026-06-30 Hyatt wifi saga (velox) + +* Why this exists + +On a locked-down-DNS laptop, captive portals never show their login page, even +though phones get on fine. We spent hours on a Hyatt portal before finding the +mechanism; this captures it so the fix becomes a panel feature instead of a +one-off script. + +* The mechanism (what actually blocks the login) + +A redirect portal works by *DNS hijack*: you query a name, the hotel's resolver +hands back the portal, you get the login page. Two things on velox stop that: + +- *System resolver forces DNS-over-TLS.* =/etc/systemd/resolved.conf.d/dns-over-tls.conf= + hardcodes =DNS=1.1.1.1#... 9.9.9.9#...= with =DNSOverTLS=yes=. The system never + queries the hotel's resolver at all. The hotel blocks 853 (DoT) and external + 53, so system DNS is simply dead on the portal — only 443 (DoH) gets out. +- *Browser DoH.* Chrome "secure DNS" on bypasses the hotel DNS too, so the + browser never gets redirected either. + +A phone works because it uses *plain DNS* from the hotel plus a built-in +captive-portal popper. The laptop has neither. + +Confirmed facts from the saga: +- Front desk: it's a normal redirect-to-login portal. Phone: connects fine. +- No DHCP option 114 (RFC 8910) — the portal doesn't advertise its URL. But the + URL is recoverable from the HTTP 302 once you're on plain DNS. +- The walled garden whitelists OS captive-detection endpoints + (=captive.apple.com= returns "Success") — a *misleading* signal, not real + internet. Don't trust it. +- 443/DoH egress works broadly on the portal; only port-53 DNS is held. So + "system DNS fails" never means "no internet" here. + +* The working fix (=~/.local/bin/hotel-wifi=, to be folded in) + +Temporarily disable DoT → plain hotel DNS → discover the portal URL from the +redirect → open it in a clean browser profile (no DoH, no stale HSTS/cookies) → +click the button → restore DoT. Reversible; tested to restore cleanly. + +#+begin_src sh +#!/bin/sh +# hotel-wifi disable DoT -> find the portal login URL -> open it +# hotel-wifi off restore normal encrypted DNS (run once online) +conf=/etc/systemd/resolved.conf.d/dns-over-tls.conf +if [ "${1:-on}" = "off" ]; then + [ -f "$conf.captive-disabled" ] && sudo mv "$conf.captive-disabled" "$conf" + sudo systemctl restart systemd-resolved + echo "Encrypted DNS (DoT) restored."; exit 0 +fi +[ -f "$conf" ] && sudo mv "$conf" "$conf.captive-disabled" +sudo systemctl restart systemd-resolved; sleep 1 +resolvectl flush-caches 2>/dev/null || true +portal="" +for t in http://captive.apple.com/hotspot-detect.html http://neverssl.com \ + http://detectportal.firefox.com/canonical.html; do + loc=$(curl -sS -m 6 -o /dev/null -w '%{redirect_url}' "$t" 2>/dev/null) + [ -n "$loc" ] && { portal="$loc"; break; } + url=$(curl -sS -m 6 "$t" 2>/dev/null | grep -ioE 'https?://[^"'"'"' >]+' \ + | grep -ivE 'apple\.com|neverssl|firefox|w3\.org|gstatic' | head -1) + [ -n "$url" ] && { portal="$url"; break; } +done +prof=$(mktemp -d) +setsid -f google-chrome-stable --user-data-dir="$prof" "${portal:-http://neverssl.com}" >/dev/null 2>&1 +echo "Click the login button. When online: hotel-wifi off" +#+end_src + +* Baking it into the net panel (the task) + +- The net engine already diagnoses captive / no-internet. When it sees a held + portal, the panel should offer a first-class *"Log in to this network"* + action that runs the plain-DNS + clean-browser flow above, reversibly, and + auto-restores DoT when connectivity returns (or on a timeout). +- Reconcile with the existing =net portal= command and the =captive= helper — + they assumed a DNS-hijack-to-gateway model that did NOT match this portal + (gateway served no web; DNS was held, not hijacked-to-portal). The plain-DNS + approach is the one that worked; make it the engine's portal path. +- The DoT toggle must be safe and reversible (the =off= step). Consider a + per-connection or time-boxed DoT-off that can't strand encrypted DNS. +- Surface the misleading-"Success" lesson: a whitelisted captive-check passing + is not "online" — gate on a real, non-whitelisted fetch. + +* Related fix that unblocked the panel (already shipped) + +The panel could never switch networks because =net up= placed =--wait= after the +nmcli subcommand (it's a global option). Fixed in dotfiles 2432311; fake-nmcli +now rejects the misplaced flag so it can't regress. diff --git a/docs/design/2026-07-02-bluetooth-panel-spec.org b/docs/design/2026-07-02-bluetooth-panel-spec.org new file mode 100644 index 0000000..121197a --- /dev/null +++ b/docs/design/2026-07-02-bluetooth-panel-spec.org @@ -0,0 +1,470 @@ +#+TITLE: Bluetooth Panel — CLI-Driven, Net-Panel Kin +#+AUTHOR: Craig Jennings +#+DATE: 2026-07-02 +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* IMPLEMENTED Status +:PROPERTIES: +:ID: 1271a845-4463-4831-9902-990eda6b2265 +:END: +- [2026-07-02 Thu] IMPLEMENTED — all five phases shipped the same day + (dotfiles eb2230f / 76b2c05 / e372de3 / 2a026b1; archsetup d8d8c53): + engine, panel, bar module + blueman retirement, bt-priv + package swap, + install wiring proven by VM assertions. 43 dotfiles suites green, both + AT-SPI smokes green, panels verified live; the phase 4-5 VM assertions + run on the next VM pass. +- [2026-07-02 Thu] DOING — spec-response decomposed the five phases into + build sub-tasks under the todo.org parent (:SPEC_ID: bound); build + started same day per Craig ("4 first, then 1" — bugs then bluetooth). +- [2026-07-02 Thu] READY — spec-review passed the gate: all four + decisions resolved, phases decomposable, CLI verbs verified against + bluez 5.86. Two non-blocking findings recorded and dispositioned in + the same pass (donor-pattern answers). +- [2026-07-02 Thu] DRAFT — initial spec from Craig's request: a bluetooth + module driving a CLI underneath, consistent with the net panel, minimal + interface, full functionality, diagnostics section, visual mockups. + +* Metadata + +| Field | Value | +|--------+---------------------------------------------------------------------------------| +| Status | implemented | +|--------+---------------------------------------------------------------------------------| +| Owner | Craig Jennings | +|--------+---------------------------------------------------------------------------------| +| Repo | dotfiles (bt module); archsetup (packages, sudoers, keybind defaults) | +|--------+---------------------------------------------------------------------------------| +| Kin | net panel (architecture donor), desktop-settings panel (same donor, shared css) | +|--------+---------------------------------------------------------------------------------| + +* Problem + +Bluetooth on both daily drivers runs through blueman: a tray applet plus a +GTK3 manager window (Super+Shift+B). It's the odd one out on the desktop — +a foreign visual style next to the dupre-themed panels, a tray icon where +every other indicator is a first-class waybar module, and no diagnostics +story at all. When the BT mouse fails to reconnect at boot (a recurring +gotcha — touchpad-auto exists because of it) or headphones pair but route +no audio, the fix is a terminal séance: bluetoothctl, rfkill, systemctl, +wpctl, in whatever order folklore suggests. + +The net panel proved the shape that fixes this: a minimal layer-shell +popup over a GTK-free engine that drives a CLI, with a diagnostics tab +that names the failure and offers the repair. Bluetooth is the same +problem with a smaller surface: one adapter, a handful of devices, a +short list of well-known failure modes. + +* Goals + +1. Visibility: adapter power state and every known device with live state + (connected, battery, signal) in one glance — panel and bar module agree. +2. Control: power, scan, pair, connect, disconnect, forget — full + functionality from the panel, zero terminals (the net panel's V2 + contract). +3. Diagnostics: a doctor that walks the known failure chain (adapter → + rfkill → service → power → device → audio profile), names the broken + link in evidence rows, and offers tiered repairs. +4. Consistency: same stack, same window shape, same interaction grammar, + same palette as the net panel. A user who knows one panel knows both. + +Audio-profile switching is in scope for v1 (Craig, 2026-07-02 — "bitten +by this too many times to count"): the doctor's audio-profile step +carries a one-click repair, not just a diagnosis, and connected audio +devices surface their active profile (details in the doctor chain below). + +Non-goals (this iteration): OBEX file transfer, multi-adapter support +(both machines have one controller), BLE sensor/GATT browsing. + +* Design sketch + +** Architecture — the net panel's stack, verbatim + +- GTK4 + gtk4-layer-shell, Blueprint .blp compiled to committed .ui + (=make ui=), PyGObject at runtime. +- Humble-object split: GTK-free =PanelModel= presenter (unit-tested like + net's), thin composite-widget pages, =bg(work, done)= worker-thread + helper for every slow call. +- Engine: a new =bt= package in dotfiles (=bluetooth/src/bt/=, sibling of + =net/=), CLI entry =bt= with =bt status= / =bt panel= / =bt doctor= — + the same cmd/cli layout as net. +- Layer-shell OVERLAY popup anchored TOP+RIGHT, 380x520, Esc closes, + focus-out auto-hides, single-instance toggle via a =bt-panel= wrapper. + Dupre palette css shared with the net panel (the factored css asset the + desktop-settings spec calls for — three consumers now, so the factoring + happens in this project's phase 1 if settings hasn't landed it). +- Testing: engine TDD with fake binaries on a temp PATH (fake-bluetoothctl, + fake-rfkill, fake-systemctl, fake-wpctl); PanelModel unit suite; one + gated AT-SPI smoke (=make test-panel= pattern). + +** CLI backing — bluetoothctl one-shot verbs + +bluez 5.86 (installed) supports everything non-interactive: + +- Adapter: =bluetoothctl show= (powered, discoverable, pairable), + =bluetoothctl power on|off=. +- Device lists: =bluetoothctl devices Paired|Connected|Trusted= — the + Paired view is a merge of Paired + Connected states; =bluetoothctl info + <mac>= per row fills caption detail (battery percentage rides bluez's + built-in Battery1 profile and appears in info output; RSSI appears + during discovery). +- Scan: =bluetoothctl --timeout N scan on= (bounded discovery burst), + then =devices= diffed against Paired for the Nearby list. The panel + scans in 8s bursts with a live "Scanning…" state rather than an + unbounded scan. +- Connect/disconnect/forget: =bluetoothctl connect|disconnect|remove <mac>=. +- Pairing: the one interactive corner. =bluetoothctl pair <mac>= can demand + a passkey confirmation. The engine drives bluetoothctl's line protocol + over a pty with a bounded state machine (expect "Confirm passkey", + reply yes/no); a passkey prompt surfaces as a panel dialog showing the + six digits, mirroring the net panel's password dialog. NoInputNoOutput + devices (mice, most headphones) sail through without the dialog. +- rfkill: the user is in the =rfkill= group, so block/unblock is + unprivileged (=rfkill unblock bluetooth=). +- Privileged path: exactly one verb needs root — =systemctl restart + bluetooth= — so =bt-priv= is a one-verb closed helper with its own + NOPASSWD sudoers rule placed by archsetup, cloning net-priv's + regex-validated pattern rather than widening net-priv's scope. + +** Panel anatomy + +Two tabs. Devices is the panel; Diagnostics is the escape hatch. + +Devices tab, Paired sub-view (the default — daily use is reconnecting +known devices, not discovering new ones): + +#+begin_example +╭──────────────────────────────────────────────╮ +│ [ Devices ] [ Diagnostics ] │ ← top switcher +│ │ +│ Bluetooth ●──○ hci0 on │ ← adapter row: power switch +│ ──────────────────────────────────────────── │ +│ [ Paired ] [ Nearby ] │ ← sub-view switcher +│ ┌──────────────────────────────────────────┐ │ +│ │ MX Master 3 │ │ +│ │ Connected · battery 80% │ │ +│ │ WH-1000XM4 │ │ +│ │ Paired, not connected │ │ +│ │ K380 Keyboard │ │ +│ │ Paired, not connected │ │ +│ │ │ │ +│ └──────────────────────────────────────────┘ │ +│ [ Disconnect ] [ Forget ] │ ← acts on selected row +╰──────────────────────────────────────────────╯ +#+end_example + +The primary button is one control with a state-following label: +"Connect" when the selection is disconnected (suggested-action styling), +"Disconnect" when connected. Row-activate (Enter / double-click) +connects — never disconnects — matching the net panel's asymmetry. +Captions carry the human state line; the MAC lives in the row tooltip, +not the visible caption. + +Devices tab, Nearby sub-view: + +#+begin_example +╭──────────────────────────────────────────────╮ +│ [ Devices ] [ Diagnostics ] │ +│ │ +│ Bluetooth ●──○ hci0 on │ +│ ──────────────────────────────────────────── │ +│ [ Paired ] [ Nearby ] │ +│ ┌──────────────────────────────────────────┐ │ +│ │ Scanning… (6s) │ │ ← overlay state label +│ │ JBL Flip 6 −58 dBm │ │ +│ │ Pixel 9 −71 dBm │ │ +│ │ (unnamed) 74:A5:… −83 dBm │ │ +│ └──────────────────────────────────────────┘ │ +│ [ Pair ] [ Rescan ] [ Discoverable ⊙ ] │ +╰──────────────────────────────────────────────╯ +#+end_example + +Pair does the whole intended thing — pair, then trust, then connect — +because pairing a device means "use it now and reconnect on its own +later" (decision below). Discoverable is a toggle for the inbound case +(pairing a phone TO the laptop), off by default, auto-off with bluez's +discoverable-timeout. Rows sort by RSSI, strongest first; named devices +above unnamed ones. + +Diagnostics tab (mirrors the net panel's shape: one big verb + streaming +evidence rows + tiered repairs behind confirmation): + +#+begin_example +╭──────────────────────────────────────────────╮ +│ [ Devices ] [ Diagnostics ] │ +│ │ +│ [ Get Bluetooth Working ] [ Advanced ▸]│ +│ ┌──────────────────────────────────────────┐ │ +│ │ ✓ Adapter present (hci0) │ │ +│ │ ✓ Not blocked (rfkill clear) │ │ +│ │ ✓ bluetooth.service active │ │ +│ │ ✓ Adapter powered │ │ +│ │ ✗ MX Master 3: paired but unreachable │ │ +│ │ … Re-pair suggested — see below │ │ +│ │ │ │ +│ │ Fix: [ Reconnect ] [ Re-pair device ] │ │ +│ └──────────────────────────────────────────┘ │ +│ power-cycle · restart service · unblock │ ← tiered repairs (confirm) +╰──────────────────────────────────────────────╯ +#+end_example + +The doctor chain, in order, each an evidence row: + +1. Adapter present — =bluetoothctl list= / rfkill has an hci entry. + Absent → hardware/driver verdict, no repair offered. +2. rfkill state — soft-blocked names the likely cause when the + airplane-mode state file says airplane is on ("Blocked by airplane + mode — turn airplane mode off"), otherwise offers Unblock (no root + needed, rfkill group). +3. bluetooth.service — inactive/failed → offer restart (the one bt-priv + verb), evidence quotes the last journal line. +4. Adapter powered — off → offer power on (and note if a boot-time + policy keeps turning it off). +5. Per-device reachability — paired-but-connect-fails distinguishes + "device off/out of range" (RSSI absent in a scan burst) from "bond + corrupt" (connect error string), and only the latter suggests the + re-pair repair (remove + pair + trust + connect, confirmed first — + it's the destructive tier). +6. Audio profile (audio devices only) — device connected but no wpctl + sink/source, or the card stuck in HSP/HFP when A2DP is expected: + evidence names the active profile and offers the repair inline — + "Switch to A2DP" drives =wpctl set-profile <card> <index>= (profile + inventory from =pw-dump= — ground truth 2026-07-02: wpctl can't + enumerate a card's profiles, and the card's =bluez5.profile= prop + reads "off" mid-stream; the card's Profile param and the sink node's + =api.bluez5.profile= are authoritative), verifies the sink came back + in the expected profile, and reports fixed or no-change. In v1 per + Craig (2026-07-02): this failure mode has bitten repeatedly, so it + gets the one-click fix, not just a diagnosis. Connected audio-device + row captions also show the profile when it's the degraded one + ("Connected · mic mode (HSP)") so the state is visible before the + doctor runs. + +Repairs confirm with the net panel's future-tense scope copy ("This will +restart the Bluetooth service. Connected devices will drop and +reconnect."), run on the worker thread, verify after (re-read state, +report "fixed" or "no change"), and never chain silently. + +** Bar module + +=custom/bluetooth= replacing the blueman-applet tray icon: the panel's +glanceable layer, one glyph, state-following like =custom/net=: + +#+begin_example + off / blocked (dim; red slash variant when rfkill-blocked) + on, nothing connected (dim) + connected (white; tooltip lists devices + battery) +#+end_example + +Tooltip carries device names, battery percentages, and the keybind hints +(the module-tooltip convention shipped 2026-07-02). Click opens the +panel (=bt-panel= toggle wrapper); the existing Super+Shift+B bind moves +from blueman-manager to =bt panel=. Low-battery on a connected device +(<15%) adds a red percentage to the glyph text — the mouse dying +mid-meeting is the one state worth surfacing unprompted. + +** UX conformance notes + +Named against the heuristics the panel family follows (Nielsen's ten, +plus the rulesets patterns catalog): + +- Visibility of status: live captions, scan countdown, elapsed ticker on + long ops, verify-after-repair rows. +- Match to the real world: device-kind glyphs + plain state lines; MACs + demoted to tooltips; "Forget" not "Remove bond". +- User control: Esc closes, Rescan is idempotent, scan bursts are + bounded, repairs confirm, running ops show a Stop where stoppable. +- Consistency: interaction grammar is the net panel's — same switcher + layout, same primary-button contract, same confirm copy shape. +- Error prevention: Forget and Re-pair confirm; power-off while devices + are connected states the consequence in the confirm body. +- Recognition over recall: every action is a visible button; no context + menus, no hidden gestures (transient-state-buttons pattern). +- Minimalism: two tabs, one primary action per view, detail behind + tooltips and the Advanced reveal. +- Help users recover: the doctor's evidence rows name the broken link + and carry the repair inline (default-most-common-friction-proportional: + the likely fix is one click, the destructive one is confirmed). + +Tension found with the net panel while writing this (filed as todo.org +tasks per Craig's instruction, 2026-07-02): transient error toasts +auto-dismiss in 4s, and the V2 spec's keyboard-navigation claims +(tab-between-sections, arrow rows, type-to-filter) aren't verifiably +implemented. Both filed against the net panel rather than cloned here; +this panel adopts whatever resolution those tasks land on. + +* Decisions (Craig) [4/4] + +** DONE Pair implies trust + connect? +CLOSED: [2026-07-02 Thu] +Decided (Craig, 2026-07-02): yes — one Pair verb does pair → trust → +connect. A device that shouldn't auto-reconnect gets untrusted later; a +per-device auto-reconnect toggle can ride a later pass. + +** DONE Retire blueman entirely? +CLOSED: [2026-07-02 Thu] +Decided (Craig, 2026-07-02): drop it outright, no bake-in period — the +package leaves archsetup and both machines once phase 2 lands, +bluetoothctl stays as the terminal fallback. Craig's framing: any issue +after retirement is a signal the doctor needs another check or the panel +has a real bug, and it gets fixed there rather than papered over by +keeping blueman around. + +** DONE Battery in the row caption or tooltip only? +CLOSED: [2026-07-02 Thu] +Approved (Craig, 2026-07-02): caption when the device reports it +("Connected · battery 80%"), tooltip otherwise. + +** DONE Scan burst length and auto-rescan? +CLOSED: [2026-07-02 Thu] +Approved (Craig, 2026-07-02): 8s bursts, no auto-repeat — Rescan stays +explicit, matching the net panel's Available view. + +* Review findings [2/2] + +** DONE Empty-state and no-adapter presentation copy undefined :nonblocking: +CLOSED: [2026-07-02 Thu] +The mockups show populated lists; the spec didn't say what an empty Paired +list, an empty post-scan Nearby list, or a machine with no adapter shows +in the panel and on the bar glyph. Dispositioned same pass: clone the +donor — the net panel's in-box overlay message pattern (=show_loading= / +placeholder label) carries the copy. Paired empty: "No paired devices — +switch to Nearby to pair one." Nearby post-scan empty: "Nothing found — +Rescan, or make the device discoverable." No adapter: adapter row reads +"No Bluetooth adapter", Devices controls disable, Diagnostics stays +usable (the doctor's step 1 names the hardware/driver verdict); bar +glyph shows the off/blocked state. Non-blocking; recorded so the +implementer doesn't invent copy mid-build. + +** DONE Logging/redaction carry-over unstated :nonblocking: +CLOSED: [2026-07-02 Thu] +The spec says "the net panel's stack, verbatim" but didn't name whether +the engine adopts net's =eventlog= (structured op log) and =redact= +(sensitive-field scrubbing) modules. Dispositioned same pass: yes, both +carry over — every mutating verb (pair/connect/forget/repair) logs an +eventlog entry, and MACs are the redaction surface (device names stay, +MACs redact in copied reports, mirroring net's report redaction). +Non-blocking; it's the donor default made explicit. + +* Implementation phases + +1. Engine =bt= package: adapter/device/scan probes over fake-bluetoothctl, + status + doctor chain (rfkill, service, powered, reachability, audio + profile probe + A2DP switch repair over fake-wpctl) — pure TDD, no + GTK. =bt status= and =bt doctor= work from a terminal. Shared dupre + css factored to the common asset if the settings panel hasn't already + done it. +2. Panel: PanelModel presenter + Blueprint pages (Devices with + Paired/Nearby, Diagnostics), worker-thread wiring, pairing-dialog + state machine, bt-panel toggle wrapper, AT-SPI smoke. Super+Shift+B + rebind. +3. Bar module =custom/bluetooth= (glyph states, tooltip, low-battery + surface, refresh signal), waybar config + suite coverage; blueman + retirement per the decision. +4. bt-priv one-verb helper + sudoers rule in archsetup; package-list + swap (blueman out per decision, bluez-utils stays); VM test + assertions. +5. archsetup keybind/config defaults so a fresh install lands the panel + wired (waybar module present, bind set, sudoers placed). + +* Review and iteration history + +** 2026-07-02 Thu @ 15:19:58 -0400 — Claude Code (archsetup) — phase 5 builder, spec closed +- *What changed or was recommended:* Phase 5 shipped and the spec flipped + to IMPLEMENTED. No new install code was needed — the waybar module, the + =Super+Shift+B= bind, and the shared panel css all ride the dotfiles + hyprland tier that a fresh install already clones and stows, and sudoers + is covered by the blanket grant. The phase's substance is proof: + =test_desktop.py= gained hyprland-gated assertions for the four stowed + bt bins, the =custom/bluetooth= waybar entry, the =bt-panel= keybind, + and the stowed =panel.css=. +- *Why:* Final phase of the DOING decomposition; with it the todo parent + closed and the lifecycle keyword flipped with a history line. +- *Artifacts:* archsetup =scripts/testing/tests/test_desktop.py=; todo.org + parent DONE + dated phase 5 / test-surface entries; this spec's Status + heading. + +** 2026-07-02 Thu @ 15:16:51 -0400 — Claude Code (archsetup) — phase 4 builder +- *What changed or was recommended:* Phase 4 shipped. Dotfiles =2a026b1=: + the stowed =bt-priv= shim (one verb, verified against the fake-systemctl) + and the sxhkd =Super+Shift+B= bind repointed from blueman-manager to + =st -e bluetoothctl= (terminal fallback per the retirement decision — the + panel is Wayland-only). archsetup: blueman dropped from the + =desktop_environment= package loop; VM assertions added (bluez/bluez-utils + present, blueman absent). blueman also removed live from velox. +- *Why:* Build order per the DOING decomposition. The spec's "sudoers rule" + item resolved as net-priv's did: archsetup already grants the primary + user blanket =NOPASSWD: ALL= (archsetup:1089), so a narrow bt-priv rule + would be dead config — no new sudoers needed, and phase 5's "sudoers + placed" is satisfied by the existing grant. +- *Artifacts:* dotfiles =hyprland/.local/bin/bt-priv=, + =common/.config/sxhkd/sxhkdrc=; archsetup =archsetup= (bluetooth loop), + =scripts/testing/tests/test_packages.py=; dated phase 4 entry under the + todo.org parent. + +** 2026-07-02 Thu @ 15:06:00 -0400 — Claude Code (archsetup) — phase 3 builder +- *What changed or was recommended:* Phase 3 shipped (dotfiles =e372de3=): + the =custom/bluetooth= bar module (state-following glyph, low-battery red + percentage, device+battery tooltip with the keybind hint, signal 10 with + the panel poking it after each reload) and the blueman retirement from the + Hyprland session (exec-once + windowrules removed, applet killed live). + The phase 2 deferred items also closed this pass: both AT-SPI smokes green + (the bt smoke's primary-button assertion fixed for the state-following + label, =c1a8219=), both panels eyeballed correct in dupre, and the + net-panel keyboard claims verified live (archsetup =e80df2b= — false + claims struck from the net spec). +- *Why:* Build order per the DOING decomposition; the Zoom meeting ended, + unblocking the visual work. Phases 4-5 (bt-priv/sudoers/packages, install + defaults — archsetup side) remain. +- *Artifacts:* dotfiles =bluetooth/src/bt/indicator.py=, =waybar-bt=, + waybar config + three css files; dated phase 3 entry under the todo.org + parent. + +** 2026-07-02 Thu @ 14:15:27 -0400 — Claude Code (archsetup) — phase 2 builder +- *What changed or was recommended:* Phase 2 shipped (dotfiles =76b2c05=): + the GTK panel — PanelModel/viewmodel presenter pair (69 tests), Blueprint + pages, pairing pty state machine with default-deny passkey confirms, + manage.py op envelopes shared by CLI and panel (power + discoverable verbs + added), =bt-panel= toggle, Super+Shift+B rebind. The shared dupre css + factoring landed as planned: net's inline =_CSS= became + =themes/dupre/panel.css= with =dupre-*= classes, both panels consume it. + 43 suites green. The AT-SPI smoke (=make test-panel-bt=) is written but + not yet run live — a Zoom meeting occupied the compositor; it runs when + the meeting ends, along with a visual check of both panels. +- *Why:* Build order per the DOING decomposition; phases 3-5 (bar module, + bt-priv/sudoers, install defaults) remain. +- *Artifacts:* dotfiles =bluetooth/src/bt/{panel,viewmodel,pairing,manage, + gui,pages}.py=, =ui/*.blp=, =tests/bt/test_btpanel.py=, the panel smoke; + dated phase 2 entry under the todo.org parent. + +** 2026-07-02 Thu @ 13:31:00 -0400 — Claude Code (archsetup) — phase 1 builder +- *What changed or was recommended:* Phase 1 shipped (dotfiles =eb2230f=): + the =bt= engine package, 101 tests over fakes, live-verified read-only + on velox. Two spec corrections from ground truth: profile inventory + comes from =pw-dump= (wpctl can't enumerate profiles), and the active + profile reads from the card's Profile param / sink's + =api.bluez5.profile= (the card's =bluez5.profile= prop is unreliable). + The shared-css factoring moved into phase 2 — net's css is an inline + string in its =gui.py=, so extracting it belongs with the first second + consumer rather than as a standalone poke at the working net panel. +- *Why:* Build order per the DOING decomposition; corrections keep the + spec honest for the phase 2 implementer. +- *Artifacts:* dotfiles =bluetooth/src/bt/=, =tests/bt/=, the stowed + =bt= shim; dated phase 1 entry under the todo.org parent. + +** 2026-07-02 Thu @ 13:10:00 -0400 — Claude Code (archsetup) — reviewer + responder +- *What changed or was recommended:* Ran the spec-review gate: passed. + All four decisions were already DONE (cookie added to the heading); + the five phases are each a clean single-session stop; CLI verbs are + verified against installed bluez 5.86. Two non-blocking findings + recorded and dispositioned in the same fused pass (empty-state / + no-adapter copy, eventlog + redaction carry-over) — both resolve to + "clone the net-panel donor," now stated explicitly. Flipped DRAFT → + READY → DOING and decomposed the phases into build sub-tasks under the + todo.org parent with :SPEC_ID: bound. +- *Why:* Craig queued the build ("4 first, then 1", 2026-07-02) after + resolving all decisions the same morning; the gate held nothing back, + so review and response fused to keep the speedrun moving. +- *Artifacts:* Findings in =* Review findings [2/2]= above; build parent + in todo.org ("Bluetooth panel + bar module"); net-panel toast fix the + UX-conformance note references landed as dotfiles =0f017d4=. diff --git a/docs/design/2026-07-02-desktop-settings-panel-spec.org b/docs/design/2026-07-02-desktop-settings-panel-spec.org new file mode 100644 index 0000000..8becf71 --- /dev/null +++ b/docs/design/2026-07-02-desktop-settings-panel-spec.org @@ -0,0 +1,128 @@ +#+TITLE: Desktop-Settings Dropdown Panel +#+AUTHOR: Craig Jennings +#+DATE: 2026-07-02 +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* DRAFT Status +:PROPERTIES: +:ID: fb7eec22-a214-4568-82c4-903612f4832f +:END: +- [2026-07-02 Thu] DRAFT — initial spec from the todo.org task "Desktop-settings + dropdown panel" (2026-06-24 review), updated for the Blueprint/GTK4 pipeline + the net panel stood up 2026-07-01. + +* Metadata + +| Field | Value | +|--------+----------------------------------------------| +| Status | draft | +|--------+----------------------------------------------| +| Owner | Craig Jennings | +|--------+----------------------------------------------| +| Repo | dotfiles | +|--------+----------------------------------------------| +| Kin | net panel (architecture donor), theme studio | +|--------+----------------------------------------------| + +* Problem + +Desktop toggles are scattered: dim, caffeine/idle, touchpad/mouse, airplane +mode each own a bar module and a keybind; brightness and keyboard-backlight +have keybinds but no visible control or level readout. The bar is running out +of glanceable width (hence the collapse arrows), and sliders can't live in +waybar at all. One settings dropdown — a gear glyph opening a small panel — +gathers them. + +* Goals + +1. One panel with every desktop toggle + slider: auto-dim, idle/caffeine, + touchpad, mouse, airplane (laptop-only), screen brightness, keyboard + backlight. +2. Conditional rows appear only when the hardware/context applies (mouse + present, trackpad present, battery present) — reuse the detection the + airplane/touchpad indicators already do. +3. Every control reflects live state and verifies its action took (the net + panel's verify-everything contract). +4. Bar stays the quick layer: which standalone indicators survive is a + decision below. + +* Design sketch + +** Architecture — clone the net panel's proven stack + +- GTK4 + gtk4-layer-shell, Blueprint .blp sources compiled to committed .ui + (=make ui=; dev-only build dependency, fresh clones run without the + compiler). +- Humble-object split: a GTK-free PanelModel presenter (unit-tested to 100% + like the net PanelModel) + thin composite-widget pages. Backing actions in + a GTK-free settings.py that shells out to brightnessctl / hyprctl / the + existing toggle scripts, TDD'd with fake binaries like every dotfiles + suite. +- One gated AT-SPI smoke (the run-panel-smoke.sh pattern), no bespoke + headless widget suite. +- Dupre WIP palette CSS, shared with the net panel — factor the palette + block into a common css asset both panels load rather than duplicating + (feeds the theme-studio task later). + +** Controls and their backings + +| Control | Backing | +|--------------------+----------------------------------------------| +| Auto-dim toggle | hyprctl decoration:dim_inactive (dim-toggle) | +|--------------------+----------------------------------------------| +| Idle / caffeine | hypridle start/stop (caffeine-toggle) | +|--------------------+----------------------------------------------| +| Touchpad toggle | toggle-touchpad + touchpad-state file | +|--------------------+----------------------------------------------| +| Mouse toggle | same mechanism, mouse-state file | +|--------------------+----------------------------------------------| +| Airplane mode | airplane-mode script (laptop-only row) | +|--------------------+----------------------------------------------| +| Screen brightness | brightnessctl (backlight class), slider + % | +|--------------------+----------------------------------------------| +| Keyboard backlight | brightnessctl (kbd_backlight class), slider | +|--------------------+----------------------------------------------| + +Slider changes apply live (throttled) and read back the actual level after +apply — verify-everything. Toggles re-read their source of truth after +firing, same as the bar indicators do, and the bar modules get their refresh +signals so both surfaces agree. + +** Open/close behavior + +Gear glyph module on the bar right cluster; click toggles the panel +(layer-shell anchored under the bar, right-aligned). Focus-out auto-hide + +Close button, matching the net panel. Keybind decision below. + +* Decisions (Craig) + +** TODO Which standalone bar indicators collapse into the panel? +Options per module (dim, touchpad, caffeine): keep on bar + mirrored in +panel; or panel-only (frees bar width). Recommendation: keep touchpad and +caffeine visible on the bar (state you glance at), move dim into the panel +(you set it rarely), keep airplane where it is. + +** TODO Keybind for the panel? +Super+Shift+G (gear) is free. Or no keybind — mouse-only surface. + +** TODO Where does the code live? +Recommendation: dotfiles =settings/= sibling to =net/= (same src-layout, +tests in tests/settings/), sharing the palette css. In-tree pocketbook-style +was the old note; the net panel is the better donor now. + +** TODO Slider granularity and floor +brightnessctl exposes 0-100%; a 5% floor stops "screen went black in a dark +room" lockouts. Confirm the floor (or allow 0 with a long-press escape +hatch). + +* Implementation phases + +1. settings.py backings (brightness get/set, kbd backlight, toggle + state readers) — pure engine, TDD with fake brightnessctl/hyprctl. +2. PanelModel presenter (rows, conditional visibility, verify-after-apply + semantics) — unit-tested, no GTK. +3. Blueprint UI + gear bar module + open/close wiring; palette css factored + to a shared asset; AT-SPI smoke. +4. Bar-module consolidation per the decision above (drop/keep indicators, + refresh-signal wiring, keybind). diff --git a/docs/design/2026-07-02-file-manager-swallow-spec.org b/docs/design/2026-07-02-file-manager-swallow-spec.org new file mode 100644 index 0000000..4c61be1 --- /dev/null +++ b/docs/design/2026-07-02-file-manager-swallow-spec.org @@ -0,0 +1,141 @@ +#+TITLE: File-Manager Swallow Pattern +#+AUTHOR: Craig Jennings +#+DATE: 2026-07-02 +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* CANCELLED Status +:PROPERTIES: +:ID: d92e0074-f594-4e83-81a0-faf282e15ed0 +:END: +- [2026-07-02 Thu] CANCELLED — targeted the wrong file manager. Craig's ask + is about the dirvish popup (Super+F, an Emacs frame), not nautilus (the + Super+Shift+F bind that misled the grounding). For dirvish the right + design is elisp-side and strictly better: Emacs is the launcher, so it + can spawn the handler directly (=start-process=), hide the popup frame, + and restore it from a process sentinel — exact exit tracking plus a + failure notify, no window-event heuristics. Reassigned to .emacs.d via + its inbox (2026-07-02-2231-from-archsetup-dirvish-popup-swallow-handoff). + The gio double-fork finding below stands for any gio-launching file + manager; the daemon design is kept for reference only. +- [2026-07-02 Thu] DRAFT — initial spec from Craig's roam capture ("when the + file manager launches another app, it should hide and return when that + process ends"). Feasibility ground truth sampled live on velox same + evening: Hyprland's native swallow cannot work here (see Problem), so the + design is an event-listener daemon. + +* Metadata + +| Field | Value | +|--------+----------------------------------------------------| +| Status | cancelled | +|--------+----------------------------------------------------| +| Owner | Craig Jennings | +|--------+----------------------------------------------------| +| Repo | dotfiles (daemon + config); archsetup (none) | +|--------+----------------------------------------------------| +| Kin | touchpad-auto (socket-listener donor), | +| | hypr-refocus-scratchpad (event-daemon sibling) | +|--------+----------------------------------------------------| + +* Problem + +Opening a file from nautilus (Super+Shift+F, tiled, class +=org.gnome.Nautilus=) spawns a viewer window while nautilus stays in the +layout. The wanted behavior is the swallow pattern: the file manager hides +while the app it launched runs, and returns when that app exits. Today +there's no signal connecting the two windows — the viewer lands wherever +the layout puts it, nautilus lingers, and quitting is manual. + +*Hyprland's native swallow is ruled out — measured, not assumed.* +=misc:enable_swallow= + =swallow_regex= would be exactly this feature in two +config lines, but it matches by walking the new window's PID ancestry to +the swallow candidate's PID. Nautilus launches handlers through GLib +(=g_app_info_launch_default_for_uri=), and that path orphans the child: +reproduced live on velox 2026-07-02 with a python-gi launcher — feh came up +with PPID 1 (reparented to init) while the launcher was still alive. The +ancestry walk hits init before it hits nautilus, every time, for every +handler. Any design that depends on PID parentage is dead on arrival; the +signal has to come from window events instead. + +Ground truth on handlers (velox, 2026-07-02): pdf → zathura, image → feh, +video → mpv, text/code → emacsclient (window belongs to the emacs daemon). +Side-note, out of scope here: feh is X11 — an XWayland viewer on a +no-XWayland-by-preference setup; a default-handler review is its own task. + +* Goals + +- Double-click a file in nautilus → the viewer takes its place; nautilus is + gone (special workspace, not killed — state and tabs survive). +- Quit the viewer → nautilus returns and has focus. +- Nothing else changes: terminals, scratchpads, and every other window keep + their current behavior. +- Config-driven, testable logic, one small daemon — the touchpad-auto shape. + +* Design sketch + +A =hypr-swallow= daemon (dotfiles, =hyprland/.local/bin/=) listening on the +Hyprland IPC event socket (socket2), same as =touchpad-auto=: + +- Track the active window (=activewindow>>= events carry class + title; + =activewindowv2>>= carries the address). +- On =openwindow>>= (address, workspace, class, title) while the active + window's class is a configured *parent* (nautilus): dispatch + =movetoworkspacesilent special:swallow,address:0x<parent>=, record + child-address → {parent-address, origin workspace}. +- On =closewindow>>= of a recorded child: bring the parent back + (=movetoworkspace=) and focus it; drop the record. +- On =closewindow>>= of a hidden parent (nautilus quit while hidden): drop + the record, nothing to restore. +- Exception classes (fuzzel, dunst, scratchpad classes, the panels) never + trigger a swallow even when they open over nautilus. +- Pure event-machine core (parse lines → state transitions → dispatch list), + unit-tested against recorded event streams; a thin socket loop around it. + +Known edge, handled: Super+Shift+F while nautilus is hidden re-runs +=nautilus=, which activates the existing (hidden) instance instead of +opening a window. The daemon (or the bind) must restore-and-untrack in that +case so the bind never appears dead. + +Known limitation, accepted: the emacsclient case never swallows — the +window belongs to the long-running emacs daemon and =closewindow= for it +means a frame closed, not "the file is done." The parent-class trigger plus +exception list naturally leaves it alone only if we exclude it explicitly — +see decision 2. + +* Decisions (Craig) + +** TODO Trigger breadth: any new window while nautilus is active, or an allowlist of viewer classes? +"Any window" is simple and catches every handler, but a false positive +exists: an app you launched seconds earlier from elsewhere finishes starting +while you're focused on nautilus → nautilus gets swallowed by an unrelated +window. An allowlist (zathura, mpv, imv, feh, …) can't be surprised but +needs maintaining. Recommendation: any-window + exception list — the false +positive is rare and self-healing (close the window or refocus). + +** TODO The emacs frame case: swallow or exempt? +Opening a text file from nautilus raises/creates an emacs frame. Swallowing +nautilus under it "works" going in, but the restore fires when *any* frame +closes, which may be much later or never. Recommendation: exempt =emacs= — +text files just open, nautilus stays. + +** TODO Restore destination: the workspace nautilus came from, or the one you're on when the viewer closes? +If you move the viewer to another workspace and quit it there, "origin" +teleports you back; "current" brings nautilus to you. Recommendation: +current workspace — the restore should land where your attention is. + +** TODO Multiple children: refcount or single-slot? +You can only launch a second file after restoring nautilus manually, so +overlap is rare — but a fast double-launch can record two children. +Recommendation: refcount — restore when the last tracked child closes. + +* Implementation phases + +1. =hypr-swallow= core: pure event-machine (TDD over recorded event + streams; fake hyprctl for dispatch assertions), config block at the top + (parent classes, exception classes), unittest suite in =tests/=. +2. Socket loop + wiring: exec-once in hyprland.conf, the Super+Shift+F + restore-if-hidden interplay, daemon single-instance guard. +3. Live verification on velox (zathura + mpv round-trips, the emacs case, + the false-positive probe) + manual-testing entries; ratio rides the + dotfiles pull. diff --git a/docs/design/2026-07-02-net-panel-other-interfaces-spec.org b/docs/design/2026-07-02-net-panel-other-interfaces-spec.org new file mode 100644 index 0000000..6b0a72d --- /dev/null +++ b/docs/design/2026-07-02-net-panel-other-interfaces-spec.org @@ -0,0 +1,189 @@ +#+TITLE: Net Panel — Tailscale, VPN, and WireGuard Interfaces +#+AUTHOR: Craig Jennings +#+DATE: 2026-07-02 +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* IMPLEMENTED Status +:PROPERTIES: +:ID: 79a1075a-4b56-4f25-a861-b69f120a636a +:END: +- [2026-07-02 Thu] IMPLEMENTED — all six phases shipped (dotfiles 2d9d060, + 21db05a, 31ba056, b4010bf, b5c8442; archsetup 0389790 + the wireguard + import script): probes, panel Tunnels view, diagnose/doctor route + awareness, bar badge, installer swap + operator, velox config migration. + Residual human steps filed under todo.org "Manual testing and + validation": proton CLI sign-in (per machine) and the first live + badge/tunnel round-trip. Ratio picks up the import + package swap on its + trip. +- [2026-07-02 Thu] DOING — decomposed into six build phases under the + todo.org parent (:SPEC_ID: bound); build started same evening per Craig + ("tunnels build now + audio-panel spec alongside"). +- [2026-07-02 Thu] READY — fused review passed the gate: 4/4 decisions + resolved, phases decomposable, claims re-verified live (proton-vpn-cli + 1.0.1 in extra, binary =/usr/bin/protonvpn=, no package conflict with the + GTK app; =tailscale status --json= shape confirmed on velox — Self/Peer/ + CurrentTailnet.Name/MagicDNSSuffix; zero NM wireguard connections yet, + seven configs in assets awaiting the phase 6 import). +- [2026-07-02 Thu] DRAFT — initial spec from the roam capture "other network + interfaces (tailscale, VPNs, wireguard)" filed in todo.org 2026-07-02. + +* Metadata + +| Field | Value | +|--------+---------------------------------------------------| +| Status | implemented | +|--------+---------------------------------------------------| +| Owner | Craig Jennings | +|--------+---------------------------------------------------| +| Repo | dotfiles (net module); archsetup (packages) | +|--------+---------------------------------------------------| +| Parent | Waybar network module spec (2026-06-29), V2 panel | +|--------+---------------------------------------------------| + +* Problem + +The net panel's Connections tab shows what NetworkManager knows: WiFi networks +and wired links. The machines also run overlay and tunnel interfaces the panel +is blind to: + +- Tailscale (tailscaled, both daily drivers; the tailnet is how the machines + reach each other; not an NM device) +- WireGuard configs (assets/wireguard-config/ carries Proton VPN configs; + importable as NM connections of type wireguard or run via wg-quick) +- Commercial VPN clients (Proton VPN GTK app is installed on velox; owns its + own tunnel device) + +When one of these is up it changes routing, DNS, and reachability — exactly +the things the Diagnostics tab reasons about — yet the panel neither shows nor +controls them, and the doctor can misattribute a VPN-caused failure to the +underlying link. + +* Goals + +1. Visibility: the Connections tab shows overlay/tunnel interfaces with live + state (up/down, address, and for tailscale the tailnet peers summary). +2. Control: bring each up or down from the panel row, same interaction shape + as Join/Disconnect on WiFi rows (no terminals — V2 contract). +3. Diagnostics awareness: diagnose/doctor know when a tunnel owns the default + route or DNS, name it in evidence rows, and stop misattributing its + failures to the physical link. + +Non-goals (this iteration): installing or configuring VPN providers, tailnet +ACL management, exit-node selection UI (a "use exit node" affordance can ride +a later pass), kill-switch management (tracked separately in the spec's +failure catalog). + +* Design sketch + +** Data sources — one probe per backend, engine-side + +New GTK-free module net/src/net/overlays.py with one probe per backend, +each returning the same small dict shape ({kind, name, state, addr, detail, +can_toggle}): + +- tailscale: =tailscale status --json= (rich: self, peers, exit node, health + messages). Daemon down → state "stopped". Binary absent → backend absent. +- wireguard-nm: =nmcli -t connection show= filtered to type wireguard — + up/down via the existing nmcli wrapper (activate/deactivate connection). + The seven Proton configs in assets/wireguard-config/ import cleanly + (=nmcli connection import type wireguard file <conf>=, then + =connection.autoconnect no= immediately — imports default to autoconnect + yes). They use only PrivateKey/Address/DNS + PublicKey/AllowedIPs/Endpoint, + no PostUp/PostDown anywhere, so no wg-quick path is needed at all + (Craig, 2026-07-02). All are full-tunnel (AllowedIPs 0.0.0.0/0) — the + panel should treat them as mutually exclusive. +- proton: drive the official proton-vpn-cli (Arch extra repo, v1.0.x, + stable since 2026-04) — connect/disconnect/status verbs. It drives NM + underneath (python-proton-vpn-network-manager), so the panel still sees + connection events through NM. Runtime-exclusive with the GTK app, which + gets dropped from the install. The imported NM wireguard configs remain + a raw fallback when the CLI/API path is down; the CLI stays primary + because the raw configs lack kill switch, port forwarding, and server + rotation. + +** Panel + +A fourth Connections group "Tunnels" (after Saved / Available now / Wired) +using the existing group-header + row machinery. Row: glyph per kind, name, +state caption; primary action Up/Down where can_toggle, else Open app. +Tailscale row detail (subtitle or tooltip): tailnet name, peer count online, +exit node if any. + +** Privileged path + +- tailscale up/down: needs root or operator — =tailscale set --operator= at + install time (archsetup) makes the user an operator, so no sudo needed at + runtime. Fallback: the V2 net-priv helper gains tailscale-up/down verbs. +- NM wireguard connections: no privilege needed (NM polkit default for the + active user). + +** Diagnostics awareness + +- diag gains an "overlay owns default route/DNS" detection step: when the + default route or resolv.conf points at a tunnel interface, evidence names + it ("default route via tailscale0") and failure classification runs the + physical-link checks against the underlying device instead. +- doctor: a tunnel-caused egress failure (VPN up but its endpoint dead) + classifies fixable with next_action "bring the tunnel down / reconnect", + not a WiFi reset. + +** Bar indicator + +Part of v1 (Craig, 2026-07-02 — "shouldn't be optional"): a small overlay +badge on the net glyph when a tunnel owns the default route. Rides the same +route/DNS-ownership detection the diagnostics step adds. + +* Decisions (Craig) + +** DONE Which backends ship in the first pass? +CLOSED: [2026-07-02 Thu] +Approved (Craig, 2026-07-02): tailscale + NM-managed wireguard. Craig asked +whether the wireguard configs can be ported to NM so wg-quick drops out +entirely — yes: all seven configs in assets/wireguard-config/ use only the +six directives NM imports cleanly (verified 2026-07-02; import command and +autoconnect caveat now in the design sketch). wg-quick is out of the spec, +not deferred. Proton control is CLI-driven per the Proton decision below, +superseding the detection-only recommendation here. + +** DONE Tailscale control path: operator flag at install vs net-priv verbs? +CLOSED: [2026-07-02 Thu] +Approved (Craig, 2026-07-02): =tailscale set --operator=$USER= in archsetup's +tailscale step (declarative, no sudo at runtime); net-priv verbs only if +operator mode proves insufficient (e.g. up with flags). +** DONE Does "Tunnels" belong in Connections or its own tab? +CLOSED: [2026-07-02 Thu] +Approved (Craig, 2026-07-02): a Connections group. A fourth top tab dilutes +the V2 nav for three rows. + +** DONE Proton VPN: detect-only or drive its CLI? +CLOSED: [2026-07-02 Thu] +Decided (Craig, 2026-07-02): drive it through a CLI. Research (2026-07-02): +Proton shipped an official Linux CLI — first release 2025-11, stable v1.0.0 +2026-04, packaged in Arch extra as proton-vpn-cli (1.0.1 at check time), +with kill switch, port forwarding, NetShield, server selection, and a +status command. It drives NM underneath, so the panel sees its connections +through the existing NM event path. Spec changes: the proton backend calls +protonvpn connect/disconnect/status instead of device-detection +(can_toggle true); archsetup installs proton-vpn-cli and drops +proton-vpn-gtk-app (the two can't run concurrently per the project README — +untested locally); the imported NM wireguard configs stay as a raw fallback. +Sources: [[https://protonvpn.com/support/linux-cli][Proton Linux CLI guide]], +[[https://protonvpn.com/support/release-notes-linux-cli][CLI release notes]], +[[https://github.com/ProtonVPN/proton-vpn-cli][proton-vpn-cli repo]]. +* Implementation phases + +1. overlays.py probes (tailscale JSON, nmcli wireguard filter, proton-vpn-cli + status) — pure engine, TDD with fake binaries; =net status= grows an + overlays section. +2. Panel Tunnels group + Up/Down wiring through the worker thread; AT-SPI + smoke extension. +3. Diagnose/doctor overlay awareness (route/DNS ownership step, classifier + rows, evidence text) — TDD against the diag harness. +4. waybar-net tunnel badge on the net glyph (v1 per the bar-indicator + decision), riding phase 3's route-ownership detection; suite coverage. +5. archsetup: tailscale operator flag in the tailscale install step; + proton-vpn-cli replaces proton-vpn-gtk-app in the package list; VM test + assertions. +6. One-time per-machine migration: import the seven assets/wireguard-config + configs into NM with autoconnect off (scriptable; both daily drivers). diff --git a/docs/design/2026-07-02-timer-panel-spec.org b/docs/design/2026-07-02-timer-panel-spec.org new file mode 100644 index 0000000..2c9f7d4 --- /dev/null +++ b/docs/design/2026-07-02-timer-panel-spec.org @@ -0,0 +1,113 @@ +#+TITLE: Timer GTK Panel +#+AUTHOR: Craig Jennings +#+DATE: 2026-07-02 +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* DRAFT Status +:PROPERTIES: +:ID: 1770af2e-b093-4024-a512-ae4324a2869f +:END: +- [2026-07-02 Thu] DRAFT — initial spec from Craig's roam capture "give the + timer a gtk UI/UX like the network panel. spec this out." + +* Metadata + +| Field | Value | +|--------+---------------------------------------------------| +| Status | draft | +|--------+---------------------------------------------------| +| Owner | Craig Jennings | +|--------+---------------------------------------------------| +| Repo | dotfiles | +|--------+---------------------------------------------------| +| Kin | net panel (architecture donor), wtimer (backing), | +| | desktop-settings panel spec (sibling) | +|--------+---------------------------------------------------| + +* Problem + +The timer's whole UI is a chain of three fuzzel prompts (type, value, label) +plus a fourth for cancel. That flow can't show what's already running while +you create, can't offer one-tap presets, gives no feedback on a typo until +the add silently fails, and pomodoro state (phase, cycle) is only visible in +a tooltip. The 2026-07-02 styling pass made the dialogs presentable, but the +shape is still four blind modals for what is really one small control +surface. + +* Goals + +1. One panel, opened from the bar's timer module, that shows everything + running (live countdowns, pomodoro phase/cycle, paused state) and creates + new items without leaving it. +2. One-tap presets for the common cases (tea, pomodoro, quick alarm) next to + freeform entry, with inline validation before the add. +3. Per-item controls: pause/resume, cancel, promote to primary (the bar + glyph slot). +4. wtimer stays the single owner of timer state and the notification path; + the panel is a view over it, never a second engine. + +* Design sketch + +** Architecture — clone the net panel's proven stack + +- GTK4 + gtk4-layer-shell dropdown anchored under the timer module, Blueprint + .blp compiled to committed .ui (=make ui=; compiler is dev-only). +- Humble-object split: GTK-free PanelModel presenter, unit-tested to 100%, + with thin widget bindings; one gated AT-SPI smoke via the + run-panel-smoke.sh pattern. +- Backing: shell out to the existing wtimer CLI (=add=, =toggle=, =cancel=, + =cycle=, =render=). =render= already emits a JSON payload; the panel polls + it (or subscribes to the same RTMIN+14 refresh signal) for live state. + wtimer's 89-case suite keeps owning the logic; panel tests fake the CLI + like every dotfiles suite fakes binaries. +- Dupre WIP palette CSS shared with the net panel (same factoring the + desktop-settings spec calls for — one palette asset, three panels). + +** Layout sketch + +- Header row: running-item count + a Clear All button (maps to cancel-all). +- Item list: one row per item — type glyph, label, live countdown / clock + time / phase+cycle for pomodoro, pause and cancel buttons, click-to-promote. +- Create strip: four type buttons (the wtimer glyphs), preset chips per type + (e.g. 5m / 15m / 25m / 60m for timers), a freeform entry validated with + wtimer's own parsers, an optional label field. +- Empty state: the create strip alone, centered. + +** What happens to the fuzzel flow + +The keybind/fuzzel path stays as the keyboard-fast lane (it's now styled and +tested); the panel replaces the click-driven path on the bar module. Whether +the fuzzel chain eventually retires is a decision below. + +* Decisions (Craig) + +** TODO Panel scope: standalone timer panel, or a page in the desktop-settings panel? +The desktop-settings spec (sibling DRAFT) could host timers as a page. +Standalone matches the net panel's one-domain-one-panel shape and keeps the +timer dropdown small; folding in means one panel binary fewer. Recommend +standalone, sharing the palette/css asset. + +** TODO Fuzzel flow: keep as keyboard fast lane, or retire once the panel lands? +Keeping both costs two creation paths to maintain (though the fuzzel chain is +small and freshly tested). Recommend keep until the panel proves itself, then +revisit. + +** TODO Presets: which chips per type? +Strawman: timer 5m/15m/25m/60m; alarm +30m/top-of-hour/07:00; pomodoro +default cycle only; stopwatch needs none. Adjust to taste. + +** TODO Live updates: poll render (1s, like the bar) or a wtimer "watch" mode? +Polling reuses what exists and matches the bar's cadence; a watch/subscribe +mode is cleaner but grows wtimer. Recommend polling first. + +* Implementation phases + +1. PanelModel presenter + CLI-backing seam (TDD, GTK-free, 100% like the net + PanelModel). +2. Blueprint UI: item list + create strip, wired to the presenter; palette + css factored to the shared asset. +3. Bar integration: timer module left-click opens the panel (replacing the + fuzzel menu binding there), RTMIN+14 refresh keeps bar and panel in step. +4. AT-SPI smoke + manual-testing checklist; decide the fuzzel flow's future + after a week of real use. diff --git a/docs/design/2026-07-02-waybar-expansion-animation-feasibility.org b/docs/design/2026-07-02-waybar-expansion-animation-feasibility.org new file mode 100644 index 0000000..cb195c6 --- /dev/null +++ b/docs/design/2026-07-02-waybar-expansion-animation-feasibility.org @@ -0,0 +1,53 @@ +#+TITLE: Waybar Expansion Animation — Feasibility Assessment +#+AUTHOR: Craig Jennings +#+DATE: 2026-07-02 + +* Question + +The todo.org task "Smooth waybar expansion animation" [#C]: the collapse/expand +jump is abrupt, and a few systray icons pop in one-by-one afterward. Can the +expansion animate smoothly? + +* How the collapse actually works + +=waybar-collapse= rewrites the module arrays in a runtime copy of the config +(=$XDG_RUNTIME_DIR/waybar/config=) and sends waybar SIGUSR2. Waybar reloads: +it tears down every module widget and rebuilds the bar from the new config. + +* Findings + +1. *No widget survives the reload, so nothing can transition.* GTK3 CSS + transitions animate property changes on live widgets. The collapse + mechanism replaces the whole widget tree; there is no widget on both sides + of the change to interpolate. The jump is structural, not stylistic. +2. *GTK3 doesn't animate widget add/remove anyway.* Smooth insert/remove needs + =GtkRevealer= wrapping, which waybar does not use for modules. Making it do + so is an upstream waybar patch, not a config or CSS matter. +3. *The layer surface resize isn't animatable either.* Hyprland layerrules can + animate map/unmap (slide/fade), but the bar stays mapped through a collapse + — the same surface changes width. No compositor-side hook exists for that. +4. *A CSS-only fake covers custom modules at best.* Custom modules could emit + a "collapsed" class and transition font-size/padding toward zero (GTK3 CSS + can animate those). But the collapsed set includes built-ins — tray, + pulseaudio, workspaces — which take no script-driven classes. The result + would be half the modules gliding and half popping: worse than the clean + jump. +5. *Tray icons popping in one-by-one is separate and unfixable here.* That's + asynchronous StatusNotifier re-registration after the reload; each app + answers on its own schedule. Only keeping the tray alive across the change + (i.e. not reloading) avoids it. + +* Conclusion + +Not feasible with the current collapse mechanism, and no acceptable partial +measure exists. A real animation requires waybar itself to support dynamic +module sets with Revealer-style transitions (an upstream feature), or +replacing the collapse-by-reload design entirely. + +* Recommendation + +Close the task as infeasible-for-now (or park at [#D] with a pointer here). +Revisit only if waybar upstream gains dynamic module visibility (worth a +check at major waybar releases) or if the bar ever migrates to a custom +GTK4 shell — the Blueprint pipeline from the net panel would make Revealer +transitions natural there. diff --git a/docs/design/2026-07-03-audio-panel-spec.org b/docs/design/2026-07-03-audio-panel-spec.org new file mode 100644 index 0000000..16a087d --- /dev/null +++ b/docs/design/2026-07-03-audio-panel-spec.org @@ -0,0 +1,160 @@ +#+TITLE: Audio Panel — the pulsemixer console +#+AUTHOR: Craig Jennings +#+DATE: 2026-07-03 +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* IMPLEMENTED Status +:PROPERTIES: +:ID: 71f556c6-ee02-47cc-a3be-68c8289380f3 +:END: +- [2026-07-03 Fri] IMPLEMENTED — built in a no-approvals speedrun in the + dotfiles repo (branch panel-bugfixing): engine (pactl), presenter, GTK + panel, PTT arming, bar indicator, and the bar/keybind wiring, across four + commits 65e5bb0..9601420. 102 unit tests + a passing AT-SPI smoke on velox. + All five Decisions below resolved. Live-eyeball validation (visual polish, + PTT-in-a-meeting, fader feel, the master-mute hardware key) is the one open + follow-up, tracked as a manual-testing task in todo.org. +- [2026-07-03 Fri] DRAFT — stub from the todo.org task "Audio panel spec" + (roam ask 2026-07-02) plus the 2026-07-03 waybar/sound design discussion. + Written to iterate alongside the prototype + (=working/sound-panel/sound-panel-prototype.html=). Spine is present; the + Decisions and Design detail get filled in as we go. + +* Metadata + +| Field | Value | +|--------+---------------------------------------------------| +| Status | implemented | +|--------+---------------------------------------------------| +| Owner | Craig Jennings | +|--------+---------------------------------------------------| +| Repo | dotfiles | +|--------+---------------------------------------------------| +| Kin | net panel + bt panel (architecture + aesthetic | +| | donors), desktop-settings panel (sibling) | +|--------+---------------------------------------------------| + +* Problem + +Audio control today is the pyprland audio scratchpad (Super+A) — a floating +pulsemixer TUI — plus scattered bar affordances: =pulseaudio= (volume, click +to mute sink), =pulseaudio#mic= (mic glyph + mic-toggle), Super+M audio-cycle +ring, Super+Shift+A mic-toggle. There's no single glanceable surface that +shows every sink and source, lets you set the default output/input, and +carries the meeting-grade mic controls Craig wants (a clean muted mode and a +hold-to-talk mode). The net + bluetooth panels set the pattern for exactly +this shape; audio is the third instrument in the family. + +* Goals + +1. One panel, opened from the bar's sound glyph, exposing the full pulsemixer + surface: every sink and source, per-device volume, per-device mute, and + switching the default output and input. +2. Replace the pyprland audio scratchpad (Super+A) as the primary audio UI. +3. Mic modes for meetings: *live*, *muted*, and *push-to-talk* (mic stays + muted except while Space is held, releasing re-mutes). +4. A *master quick-mute* — one action mutes all output — reachable from the + faceplate and a keybind. +5. Instrument-console aesthetic and architecture consistent with net + bt: + same faceplate, lamps, engraved sections, console keys, needle gauges, + verify-everything contract. +6. The bar glyph reflects live state: speaker + three arcs normally, a + speaker-with-✕ when muted (Craig's called glyphs). + +* Design sketch + +Prototype: =working/sound-panel/sound-panel-prototype.html= (the reference for +layout + idioms below). + +** Surface (from the prototype) + +- *Faceplate* — status lamp, sound glyph, state word (PLAYBACK / MUTED), a + MUTED badge, the SND·01 unit label, and the *master quick-mute switch* + (same switch idiom as net wifi / bt power), plus the close ✕. +- *OUTPUTS section* — one row per sink. Row body click = set default (gold + DEF tag). A machined fader sets that sink's volume; the trailing glyph + mutes just that device. Active/default row is emphasized (cream name, gold + lamp/glyph). +- *INPUTS section* — one row per source, same idioms. +- *Mic mode* — three console keys: LIVE / MUTED / PUSH·TALK. Push-to-talk + keeps the mic muted (red IN needle) and un-mutes only while Space is held. +- *Twin VU needles* — output level + input level, the sound analog of net + throughput and bt battery gauges. Needle goes red when its side is muted. + +** Architecture — clone the net/bt panel stack + +- GTK4 + gtk4-layer-shell, Blueprint =.blp= → committed =.ui= (=make ui=, + dev-only build dep). +- Humble-object split: a GTK-free PanelModel presenter (unit-tested like the + net/bt PanelModels) + thin composite-widget pages. Backing actions in a + GTK-free =audio.py= that shells to the audio control layer (pactl / + wpctl / pulsemixer — pick below), TDD'd with fake binaries. +- One gated AT-SPI smoke (=run-panel-smoke.sh= pattern). +- Shared instrument-console palette CSS asset (the one net/bt/settings all + load) — do not duplicate the palette block. +- Code lives in dotfiles =audio/= sibling to =net/= (src-layout, tests in + =tests/audio/=). + +* Decisions (Craig) + +** DONE Audio control backend — pactl vs wpctl vs pulsemixer +CLOSED: [2026-07-03 Fri] +Resolved: =pactl= (the engine module is =pactl.py=). Both ratio and velox run +PipeWire with the pipewire-pulse compat layer and no PulseAudio daemon, so +pactl and wpctl hit the same graph — but =pactl -f json= gives structured, +name-addressable output where wpctl offers only a volatile-id tree. Reads go +through =pactl -f json list sinks|sources= + =get-default-*=; writes target +devices by stable name behind an argv-charset guard. + +** DONE Push-to-talk mechanism under Wayland (feasibility — phase 1) +CLOSED: [2026-07-03 Fri] +Resolved: route (a), Hyprland dynamic binds. The phase-1 spike confirmed all +three primitives on velox (Hyprland 0.55.4): =hyprctl keyword bind/unbind= +adds and removes a bind live, =bindr= fires on release, and =pactl +set-source-mute @DEFAULT_SOURCE@ 0|1= toggles the mic cleanly. =ptt.py= arms a +press bind (un-mute) + a bindr (re-mute) on entering PTT mode and unbinds on +leaving, so the talk key isn't grabbed globally otherwise. No evdev needed. +Documented behavior: while PTT is armed, the talk key is the talk key. + +** DONE Quick-mute keybind + scope +CLOSED: [2026-07-03 Fri] +Resolved: the XF86AudioMute hardware key (Super+Shift+M turned out to be taken +by the monocle-layout bind, so the spec's assumption was wrong). The mute key +now runs =audio quick-mute=, which mutes every output (master), not just the +default sink — identical on a single-sink machine, correct on a multi-sink +one. Also reachable from the faceplate master switch and the panel. Scope: +master mute of all sinks, with verify-after-apply per sink. + +** DONE Bar glyph click map +CLOSED: [2026-07-03 Fri] +Resolved with the low-regret wiring: kept the existing =pulseaudio= waybar +module (left-click mute, scroll volume — no regression) and repointed its +right-click from the retired pulsemixer scratchpad to =audio-panel=. So: left += mute, right = open panel, scroll = volume. A fuller =custom/audio= indicator +(state-following speaker glyph + its own click map) is built and tested +(=indicator.py= + =waybar-audio=) but stays unwired until the new bar glyph +gets a live eyeball — the swap is a one-line waybar edit when Craig's ready. + +** DONE Fate of the existing audio affordances +CLOSED: [2026-07-03 Fri] +Resolved: Super+A repurposed from =pypr toggle audio= (the pulsemixer +scratchpad) to =audio-panel= — the panel is the primary audio UI now, so the +scratchpad is retired. Its definition still sits in the machine-local +=pyprland.toml= (not stowed) and can be deleted by hand. Kept: =pulseaudio= + +=pulseaudio#mic= waybar modules (glance + scroll + the mic-mute glance), +Super+M cycle, Super+Shift+A + XF86AudioMicMute mic-toggle. Changed: +XF86AudioMute → master quick-mute (see the quick-mute decision above). + +* Implementation phases + +1. Push-to-talk feasibility spike (decision above) — the one unknown; settle + the mechanism before committing the mic-mode design. +2. =audio.py= backings (list/get/set/mute/default for sinks + sources) — + pure engine, TDD with a fake audio backend. +3. PanelModel presenter (rows, default tracking, mic modes, master mute, + verify-after-apply) — unit-tested, no GTK. +4. Blueprint UI + sound bar glyph (normal / muted / ptt states) + open/close + wiring; shared palette css; AT-SPI smoke. +5. Bar-affordance consolidation per the decision above; retire the Super+A + scratchpad; keybinds. diff --git a/docs/design/2026-07-03-instrument-console-panels-spec.org b/docs/design/2026-07-03-instrument-console-panels-spec.org new file mode 100644 index 0000000..315e0b4 --- /dev/null +++ b/docs/design/2026-07-03-instrument-console-panels-spec.org @@ -0,0 +1,158 @@ +#+TITLE: Instrument-console rebuild — net + bluetooth panels +#+DATE: 2026-07-03 +#+TODO: TODO | DONE +#+TODO: DRAFT READY DOING | IMPLEMENTED SUPERSEDED CANCELLED + +* IMPLEMENTED Status +:PROPERTIES: +:ID: e73877f5-4f5b-4f81-b946-dbaa6145e0d5 +:END: +- 2026-07-03 Fri @ 06:49 -0400 :: DOING → IMPLEMENTED: all six phases shipped + (net GTK-free layer 81ec9c3, net view 800ef60; bt GTK-free layer 5318b34, bt + view 66f03d9; phase-6 dead-code removal f4e688e). Both panels are single-screen + instrument consoles, verified live on velox — 46 suites + full make test green, + both AT-SPI smokes green end to end, render matching the approved prototype. The + three folded tasks (network panel redesign, bt switch placement + title, bt + rename devices) closed with the build. +- 2026-07-03 Fri @ 02:07 -0400 :: DRAFT → READY → DOING in one stroke: Craig + approved the design through five interactive prototype iterations and + authorized the no-approvals speedrun ("let's build them now... go"). The + review gate was the live prototype session itself. +- 2026-07-03 Fri @ 02:07 -0400 :: Created (DRAFT) from the prototype session. + +* Metadata + +| Field | Value | +|---------------------+------------------------------------------------------------| +| Status | implemented | +|---------------------+------------------------------------------------------------| +| Owner | Craig Jennings | +|---------------------+------------------------------------------------------------| +| Repos | dotfiles (net/, bluetooth/, themes), archsetup | +|---------------------+------------------------------------------------------------| +| Normative reference | [[file:../../assets/2026-07-03-instrument-console-panels-prototype.html][assets/2026-07-03-instrument-console-panels-prototype.html]] | +|---------------------+------------------------------------------------------------| + +* Summary + +Rebuild both GTK layer-shell panels (net, bluetooth) from the tabbed layout +to the instrument-console design: one screen, no tabs, a faceplate with a +state word + badges + radio switch + close, engraved section labels, lamp +rows that act on click, dial meters under the console keys, and a doctor +that does it all. The interactive prototype =panel-console-v3.html= is the +normative design reference — when this spec and the prototype disagree on a +visual or interaction, the prototype wins. + +* Decisions (all resolved — Craig, prototype session 2026-07-02/03) + +- Replace the tabbed panels outright. No fallback flag; git history is the + rollback. Net panel first, bluetooth second. +- Advanced repair tiers leave the panel entirely. DOCTOR runs the full + diagnose → classify → lightest-repair → re-verify escalation (the engine + already does this). The surgical tiers stay CLI-only (=net repair ...=). +- Faceplate (both panels): state lamp + state word, badges, unit label + (NET·01 / BT·01), radio switch (wifi radio / bt adapter power), close ✕. + Badges: TUNNEL (gold, net), AIRPLANE (gold, both), LOW BATT (red, bt). +- Sections in order — net: CHANNEL, NETWORKS (+ hidden action), TUNNELS, + CONSOLE (DOCTOR / SPEED TEST keys), meters, output. bt: ADAPTER (with + clickable =discoverable= chip), PAIRED, NEARBY (+ scanning note), CONSOLE + (DOCTOR / SCAN), battery gauges, output. +- Section row budgets, half-row peek, internal scroll (thin slate + scrollbar): NETWORKS 5.5 rows, TUNNELS 4.5, PAIRED 5.5, NEARBY 4.5. + In-range networks sort active-first then strongest-signal-first. Counts on + the engraved headers ("networks · 12 in range", "tunnels · 1 up of 9", + "paired · 3", "nearby · 12"). The panel silhouette never grows with list + length; only the output well is variable and it caps at ~170px. +- Lamp-row grammar: green = live/connected, gold = available/actionable, + off = down/stored, red = failed; busy = pulsing gold during transitions. + Rows act on click (tunnels toggle, networks join, paired devices + connect/disconnect toggle, nearby devices pair). +- Arm-first for anything disruptive or destructive, 3s auto-disarm: + - forget (network or bt device): hover reveals ✕; first click arms the + row terracotta ("forget? click ✕ again"), second fires. No dialog. + - disconnect (active network): click the active row; first click arms in + GOLD ("disconnect? click again") — disruptive, not destructive — second + fires. +- Meters (net): two dials, RX·DOWN / TX·UP, gold needles, mode tag top-left + (LIVE green / TEST gold), HOLD tag top-right. Idle: live link throughput. + Speed test: cards flash gold, needles sweep the measured rate, then PIN + the final value with HOLD; clicking a held meter releases it to LIVE. + Scale 0–100 Mbps, auto-relabel to 0–1000 when a reading exceeds 100. + Dial top inset ~13px so the corner tags never touch the arc. +- Speed test output well gets ONLY: location line ("location: <city> by + <sponsor>"), ping (+jitter), final line, conditioned tip(s). The rates + live in the meters, not the text. +- Battery gauges (bt): same dial chrome; one per connected device (two + slots; empty slot dim "NO DEVICE"/"ADAPTER OFF"); needle+value red under + 15% and the LOW BATT faceplate badge lights. +- Output well: doctor streams the checks with their narration lines + (viewmodel.STEP_NARRATION) and repair steps in gold; verdict line closes + (olive for pass/fixed). A dismiss ✕ appears in the well's corner whenever + it has content. Both panels. +- WiFi radio switch: =nmcli radio wifi on|off=. Off empties NETWORKS to one + dim "wifi radio off" row, drops the connection, kills tailscale rows' + reachability; on rejoins the last network (NM autoconnect does this for + real). Airplane mode is system-level (Super+Shift+A owns it): both panels + reflect it (state word AIRPLANE, gold badge, switches down); a switch + flipped under airplane refuses with a toast naming the exit. A routed + ethernet link keeps the net panel ONLINE through airplane mode. +- Ethernet: presence-based row pinned atop NETWORKS when a cable is up + ("enp… · active · wired · 1.0 Gbps" / "wired · standby"); CHANNEL swaps + the signal ladder for "wired · <speed> full-duplex" when routed; clicking + the row toggles route ownership via device disconnect/connect. +- Pairing (bt): nearby row click → busy → passkey-confirm dialog (large + gold digits) → device moves to PAIRED and connects. SCAN key refreshes + with a "scanning…" note on the header. +- Rename (bt): hover ✎ on a paired row → dialog prefilled → bluez + =set-alias= (closes the filed rename task). +- Tooltips: any ellipsized row label carries its full text as the tooltip. +- Dialogs (join / hidden SSID / passkey / rename) keep the in-panel dupre + dialog style (gold border, dark well inputs, gold caret). +- Close: ✕ on the faceplate + Esc (already shipped; keep). +- Folded tasks: "Network panel redesign", "Bluetooth panel: switch placement + + panel title", "Bluetooth panel: rename devices" — all close with this + build's phases. + +* Engine gaps (small, close during phases) + +- radio verb: =nmcli radio wifi on|off= helper (manage or sysio) + tests. +- hidden-SSID join: =manage.add= grows a hidden flag + (=802-11-wireless.hidden yes=). +- ethernet: device rows from =nmcli dev= (type ethernet) + disconnect/ + connect verbs (device-level; =net down --iface= already disconnects). +- bt rename: btctl =set-alias= one-shot verb + verify-after read. +- bt battery: already exposed (indicator uses it). +- speedtest meters: =run_speedtest_stream= on_update already ticks (pty). +- link speed for wired channel line: =ethtool=-free read from + =/sys/class/net/<dev>/speed=. + +* Implementation phases + +1. [X] Spec + task wiring (this file; todo.org parent task with :SPEC_ID:). +2. [X] Net GTK-free layer (TDD): viewmodel row composers for the console + sections (network rows sorted+counted, tunnel rows, channel facts, + faceplate state word derivation, meter scale logic, arm state machines + for forget/disconnect), PanelModel restructure (sections, no tabs). + Engine gaps: radio verb, hidden join, ethernet rows, wired link speed. +3. [X] Net view rebuild: gui.py single-page console built in Python + (faceplate, engraved scrolled sections, console keys, cairo dial meters + with mode/hold tags, output well + dismiss), panel.css additions + (engrave, lamps, dial, badges, arm tints). AT-SPI smoke + driver + rewritten for the console layout. Shipped with phase 4 (dotfiles + 800ef60): a view-only intermediate is a broken panel (rows and switches + that do nothing), so view + interactions landed together. +4. [X] Net interactions: join/hidden/forget (arm terracotta)/disconnect + (arm gold)/radio switch/ethernet toggle/doctor stream/speed-test-drives- + meters, toasts. Verified live on velox (DOCTOR streams, SPEED TEST sweeps + both dials then HOLD). Shipped in dotfiles 800ef60 with phase 3. +5. [X] Bluetooth panel: same treatment end to end (faceplate + power + switch, adapter chip, paired/nearby lamp rows, pair passkey flow, + rename via set-alias, forget arm, battery gauges + LOW BATT, DOCTOR / + SCAN keys, output). bt smoke rewritten. Shipped in two commits mirroring + net: dotfiles 5318b34 (GTK-free layer + engine gaps) and 66f03d9 (view + + interactions + smoke). rename lands on the bluez Alias via busctl + (set-alias has no MAC-addressed one-shot); verified live on velox (smoke + green end to end, screenshot matches the prototype). +6. [X] Live verification both panels on velox + all suites + smokes green; + summary of findings written to file; folded tasks closed; dead code + removed; session context finalized. |
