aboutsummaryrefslogtreecommitdiff
path: root/docs/design/2026-06-29-waybar-network-module-spec.org
diff options
context:
space:
mode:
Diffstat (limited to 'docs/design/2026-06-29-waybar-network-module-spec.org')
-rw-r--r--docs/design/2026-06-29-waybar-network-module-spec.org2094
1 files changed, 2094 insertions, 0 deletions
diff --git a/docs/design/2026-06-29-waybar-network-module-spec.org b/docs/design/2026-06-29-waybar-network-module-spec.org
new file mode 100644
index 0000000..3a1260c
--- /dev/null
+++ b/docs/design/2026-06-29-waybar-network-module-spec.org
@@ -0,0 +1,2094 @@
+#+TITLE: Waybar Network Module — Design Spec
+#+AUTHOR: Craig Jennings
+#+DATE: 2026-06-29
+
+* Status
+
+*Phases 1-3 SHIPPED* (2026-06-29 → 2026-06-30, dotfiles). The core module is live:
+the =net= engine (=status/probe/list/up/down/add/edit/remove/rescan/diagnose/repair/
+doctor/portal/speedtest=), the =waybar-net= indicator (split-cadence cache, redacted
+event log, display-only airplane absorption per decision 12), and the GTK4
+layer-shell panel (Connections / Diagnose / Repair / Speed test) with the settled bar
+clicks (left = panel, middle = =net portal=, right = =net-fix=; airplane on
+Super+Shift+A). 230+ net tests; full dotfiles suite green. Live-verified on velox.
+
+Built on top since the original spec:
+- *Captive-portal login engine* (2026-06-30, dotfiles =a7d7559=) — =net portal= now
+ runs a native =portal-login= repair tier (drop DoT → recover the portal URL from
+ the redirect → open a throwaway browser profile → auto-restore DoT once online),
+ replacing the old shell-out to =captive= for the force-portal flow. =net portal
+ --restore= is the manual fallback.
+- *Portal UX fixes from live testing* (2026-06-30, dotfiles =eef6b0b=) — removed a
+ polkit-gated =resolvectl flush-caches= that popped an auth dialog (the DoT-drop
+ restart already clears the cache); added an already-online short-circuit so a
+ forced run on a working connection opens nothing; suppressed Chrome's first-run
+ wizard; moved =net portal= off the terminal into the panel status line; hardened
+ the portal-URL extractor against Firefox's detection page.
+- *Panel auto-hide + Close button* (2026-06-30, dotfiles =450b7f0=) — the panel
+ closes on focus-out (popup behavior, suppressed while a child dialog holds focus)
+ and carries a Close button bottom-right.
+
+*V2 redesign in flight* (designed 2026-06-30, not yet built — see todo.org "Network
+panel redesign — no terminals, verify-everything, full failure coverage"). It
+reverses two earlier choices and widens coverage:
+- *No terminals anywhere.* =net-popup= is removed; every action and result renders
+ in the panel. This depends on a passwordless privileged path — a root-owned helper
+ plus a narrow NOPASSWD sudoers rule, archsetup-installed — because an in-panel
+ worker thread has no tty to prompt for a password. Reverses decision 11's
+ "privileged tiers run in a terminal".
+- *New navigation* — top tabs Connections | Diagnostics | Performance. Diagnostics
+ merges Diagnose + Repair (sub-row Diagnose | Get Me Online | Advanced; a shared
+ area below shows diagnose items and streams repair progress; Advanced reveals the
+ individual repair buttons, renamed with tooltips). Speed test lives under
+ Performance.
+- *Verify every action* (each mutating op confirms its effect before reporting
+ success) and *detect + respond to every failure mode* — the full ~44-mode catalog,
+ edge cases included, lives in the redesign task and supersedes the table below.
+
+Phase 4 (docs / rollout) and Phase 5 (VPN) remain. Review incorporated (Codex,
+2026-06-30): four review rounds + Craig's cj comments are all dispositioned
+([40/40], no open findings) — the fourth round reshaped the V2 panel UX (single nav
+target, saved-vs-available groups, join-from-row, the auth matrix, progressive
+loading, a findable diagnostics report, and the Waybar visual contract; see "V2 panel
+UX"). Phases 1-3's manual live checks are under todo.org "Manual testing and
+validation".
+
+* Goal
+
+One waybar network component that does the whole job: shows connection state
+(including the missing "associated but no internet / captive portal" state),
+manages connections from a dropdown (nmcli-backed; secrets stay in
+NetworkManager's own store, no separate credential file), and runs the network
+diagnostics and remediation off the same place
+(captive-portal detection + forcing, bounce/reset, gateway/DNS checks, speed
+test).
+
+It unifies three todo tasks that are really one feature:
+- =[#C]= "archsetup Waybar Wi-Fi module should show no-internet state" — the
+ indicator state plus the 2026-06-22 roam expansion (bounce, diagnostics, speed
+ test off the component).
+- =[#B]= "Network-manager dropdown, nmcli-backed" — the management dropdown. (The
+ todo task's original "GPG-stored secrets" framing is superseded: secrets stay in
+ NM's own store, decision 5.)
+- The network diagnostics already shipped in =captive= (the hotel/captive-portal
+ tool, formerly =login-page=) become this module's diagnostics engine rather
+ than a standalone CLI.
+
+* Scope
+
+** In
+- *Indicator* — wifi/ethernet icon + signal + SSID, plus an internet sub-state:
+ online / captive / no-internet / connecting / disconnected / airplane.
+- *Absorbs the airplane module* — the airplane state + toggle move into
+ =custom/net= (airplane is a network concern). Once this ships, the standalone
+ =custom/airplane= module, the =waybar-airplane= + =airplane-mode= scripts, their
+ =tests/=, and the css are deleted (listed under Files touched). The
+ desktop-settings panel (sibling =[#B]=) no longer needs an airplane row.
+- *Interface-correct* — targets the wifi (or chosen) device, not the
+ default-route interface, so an active USB tether or wired link can't mask
+ wifi state. (Same lesson =captive= fixed; the current =custom/netspeed= keys
+ off the default route and has the bug.)
+- *Connection management (panel)* — list saved connections most-recently-used
+ first, live signal for in-range wifi, click to switch; add / edit / remove for
+ open + WPA-PSK; activate any existing saved profile (including enterprise ones
+ NM already stores); ethernet↔wifi and wifi↔wifi switching even when a link
+ appears mid-session.
+- *Diagnostics (panel)* — read-only Diagnose (captive probe 204-vs-portal with
+ the extracted portal URL, gateway ping, DNS config) separated from mutating
+ Repair. Repair has tiers, lightest first: rfkill-unblock, per-connection reset
+ (fresh MAC), full-stack bounce (=nmcli networking off/on=, then restart
+ NetworkManager if that fails), and the temporary 1.1.1.1 override test. Each
+ Repair action confirms and verifies cleanup.
+- *Speed test (panel)* — down/up/ping with a progress indicator and last-result
+ shown, via the already-installed =speedtest-go --json=.
+- *Connection secrets* — none of our own. Settings and passwords live where NM
+ already keeps them: =/etc/NetworkManager/system-connections/*.nmconnection=
+ (root-only =0600=, the PSK/EAP secret stored inline). We read/write them through
+ nmcli, which handles the privilege. No separate file, no GPG, no gpg-agent — one
+ fewer dependency, and NM's store is already the secure-at-rest source of truth.
+- *Persistence* — connectivity probe result cached in the runtime dir so the
+ bar reads it cheaply between probes.
+- *Observability* — a redacted JSONL event log so a post-failure session can
+ diagnose without re-running destructive actions.
+
+** Out (v1, note for later)
+- No replacement of NetworkManager's connection engine. NM stays the thing that
+ connects; we drive it via nmcli.
+- No add/edit *form* for WPA-Enterprise / 802.1X in v1. The reason is effort vs
+ payoff: 802.1X has many interdependent fields (CA cert, client cert, identity,
+ anonymous identity, phase-2 auth) where a wrong entry silently fails auth, so a
+ trustworthy form is a lot of UI for connections Craig rarely adds (open +
+ WPA-PSK covers home, hotels, and phone hotspots). v1 still *activates* existing
+ saved enterprise profiles and points editing at =nmtui=/=nmcli=. Settled
+ (Craig, 2026-06-29): enterprise add/edit is vNext — 24 saved profiles on velox,
+ 0 enterprise, so the form would be unused UI; if one ever appears nmtui adds it
+ once and the module activates it thereafter.
+- No per-connection captive-portal *auto-login* in v1. (That would mean storing a
+ portal's login form answers — room number, surname, a checkbox — and replaying
+ them automatically when a known portal is detected, so the page never appears.
+ Out for v1 because every portal's form differs and it means storing per-venue
+ answers; v1 just opens the portal for you.)
+- No graphing/history of speed-test results beyond the last run.
+- No static-IP / proxy / metered / MAC-randomization editing in v1 (activate
+ existing, edit elsewhere).
+- No VPN / WireGuard management in v1, but it's a planned later phase (Phase 5),
+ not a permanent exclusion — it folds the existing archsetup wireguard tooling
+ into the same panel/CLI.
+- The desktop-settings dropdown (sibling =[#B]=) is a separate module, but it
+ shares the GTK4 layer-shell panel shell built here.
+
+* Architecture
+
+Three layers. Keep the bar cheap, the panel rich, the logic in one tested place.
+
+1. *Engine* — a =net= Python package (src-layout, unittest), exposing a CLI. Wraps
+ every nmcli op and owns the diagnostics. Emits JSON. This is the testable
+ core (fake =nmcli= / =curl= / =speedtest-go= on PATH, like the existing
+ =waybar-netspeed= and =waybar-sysmon= test harnesses). Precedent: pocketbook is
+ Python in the dotfiles repo; =wtimer= is Python for the same testability
+ reason.
+2. *Indicator* — a thin =waybar-net= script that calls =net status --json= and
+ renders icon + signal + state + tooltip. Replaces =custom/netspeed=
+ (throughput folds into the tooltip).
+3. *Panel* — a GTK4 + gtk4-layer-shell app (mirrors pocketbook's structure)
+ that imports the engine. Hosts connection management, diagnostics, and the
+ speed test.
+
+How the existing pieces map in:
+- =captive= (bash, shipped) — its cheap portal-detection logic is mirrored natively
+ in the engine for the fast status path so the bar never blocks on a subprocess,
+ and it still exposes a =--probe-json= mode the engine reuses. *As built (2026-06-30):
+ the force-portal flow is now native too* — =repair.py='s =portal-login= tier does
+ the DoT drop, portal-URL recovery, clean-browser launch, and auto-restore in
+ Python, so =net portal= no longer shells out to =captive= for it. =captive= stays a
+ usable standalone CLI.
+- =waybar-netspeed= (sh, shipped) — retired; its throughput sampling moves into
+ the engine's status output and renders in the indicator tooltip only.
+- =nmcli= — the connection backend for every op.
+
+Language note: the engine is Python; the indicator is a thin Python or sh
+wrapper over =net status --json=. The bar path must stay fast (see Performance
+budgets), so the indicator does no network I/O itself — it reads link state and
+the cached connectivity result.
+
+Privileged-path model (v2, planned): repairs that need root (rfkill unblock, nmcli
+modify/up, networking off/on, =systemctl restart NetworkManager/systemd-resolved=,
+resolvectl dns/revert, the DoT toggle) go through a single root-owned helper
+installed by archsetup, with a narrow NOPASSWD sudoers rule scoped to that helper
+only (never a blanket =mv=/=systemctl= rule). =repair.py= calls =sudo <helper>
+<verb>=. This is what lets every action run in-panel with no terminal: a GTK worker
+thread has no tty, so without a passwordless path it can't prompt. It also fixes a
+latent bug in the shipped portal flow — the detached DoT-restore watcher runs with
+no tty and silently fails to restore encrypted DNS when sudo creds aren't cached.
+
+* Repository + dependencies
+
+- *Code lives in the dotfiles repo* (=~/.dotfiles=), not archsetup. The =net=
+ package sits in-tree like pocketbook (src-layout, unittest, Makefile target);
+ =waybar-net= and the =net= CLI entry live in the hyprland tier
+ (=hyprland/.local/bin/=). Tests under =tests/net/= and =tests/waybar-net/=.
+ archsetup owns only the *dependency install*, not the code.
+- *archsetup installs the deps* in its Hyprland step: =gtk4-layer-shell=,
+ =python-gobject=, plus =nmcli=/=curl=/=resolvectl=/=rfkill= (already present via
+ NetworkManager/curl/systemd/util-linux). Speed test uses =speedtest-go= (AUR
+ =speedtest-go-bin=, already installed on velox); archsetup adds it to the AUR
+ list. librespeed-cli is the documented fallback if a self-hosted LibreSpeed
+ server is ever wanted. No =gpg= dependency (secrets live in NM's own store).
+- *Daily-drivers*: a stowed-script + AUR-dep feature, so ratio needs the same
+ =git pull= + stow + the archsetup-added deps. Note the manual dep step in the
+ rollout.
+
+** Makefile targets (console recovery is a first-class path)
+=net doctor= and the diagnostics are reachable from a bare TTY when waybar and
+the GUI are down — that's the case where you most need them. The dotfiles
+Makefile carries targets that wrap the =net= CLI so "get back online" is one make
+command from the console:
+- =make online= — =net doctor --fix= (diagnose, then apply the lightest repair:
+ rfkill-unblock → reset → bounce → open portal). The headline recovery target.
+- =make net-doctor= — =net doctor= (read-only diagnose + recommendation).
+- =make net-status= / =make net-diagnose= / =make net-portal= / =make net-reset=
+ / =make net-bounce= — the individual ops.
+- =make test= — already runs =tests/*=; the =net= package's unittest suites are
+ collected the same way.
+These intentionally need only nmcli/curl/rfkill (no GUI, no waybar, no Python
+GTK), so they work from a TTY on a broken graphical session.
+
+* Connectivity model — split cadence
+
+The indicator polls every ~2s, but a real internet/captive probe every 2s wastes
+battery and can re-trigger a captive portal. So split it:
+
+- *Fast path (every poll, cheap, no network)* — interface, type, SSID, signal,
+ IPv4 presence, throughput sample. From nmcli / sysfs only. No network I/O.
+- *Slow path (cached, TTL ~45s)* — the actual internet/captive probe (the 204
+ check + meta-refresh portal extraction). Result cached at
+ =$XDG_RUNTIME_DIR/waybar/net-connectivity.json= with a timestamp.
+
+The indicator reads the cache each poll. When the cache is older than the TTL,
+=net status= kicks =net probe= in the background (spawn + detach, never awaited)
+and renders the last cached sub-state meanwhile. A user-triggered
+diagnose/reconnect refreshes the cache immediately. This keeps the bar
+responsive and the portal un-poked.
+
+** Concurrency, atomicity, staleness
+- *Single-flight* — =net probe= takes a lock file at
+ =$XDG_RUNTIME_DIR/waybar/net-probe.lock= (flock, non-blocking). A second probe
+ while one runs is a no-op, so a flapping 2s poll can't pile up overlapping
+ probes.
+- *Atomic writes* — the cache is written to a temp file + =os.replace= (atomic
+ rename), so a reader never sees a half-written cache. Same pattern as =wtimer=.
+- *Max probe runtime* — the probe has a hard timeout (≤ 6s total: curl
+ =--max-time 5= + slack). On timeout it writes an =unknown= result, never hangs.
+- *Stale classes* the indicator distinguishes: fresh (< TTL), stale (TTL..3×TTL,
+ shown with a subdued/aging hint), expired (> 3×TTL → treat as unknown),
+ unknown (no cache / probe failed). The bar never shows a confident "online"
+ past the expired threshold.
+- *Invalidation* — the cache records the iface + SSID + active-connection UUID it
+ was taken under; a change in any of them invalidates it immediately (a
+ reconnect must not show the old network's verdict).
+- *Crash cleanup* — a stale lock older than the max runtime is ignored/reclaimed.
+
+* Performance budgets (hot path)
+
+The bar exec path (=waybar-net= → =net status=) must stay responsive:
+- *Budget*: =net status= returns in < 100ms typical, < 250ms worst case.
+- *No sleeping in the bar path.* Throughput is sampled from two reads of
+ =/sys/class/net/<iface>/statistics/{rx,tx}_bytes= across the *waybar poll
+ interval itself* (delta since the last cached sample + timestamp), not via an
+ in-process =sleep= like the old =waybar-netspeed=. The cache holds the prior
+ counters.
+- *Subprocess cap*: at most one =nmcli= invocation on the hot path (a single
+ =nmcli -t -f ...= multi-field query), plus sysfs reads. Never a per-field
+ nmcli call.
+- *Every subprocess has a timeout* (=nmcli --wait 2=, =subprocess timeout=). On
+ timeout or error the indicator emits a degraded JSON state (class
+ =net-degraded=, a neutral glyph) rather than blocking or crashing waybar.
+- *Benchmark test*: a fake slow =nmcli= asserts =net status= still returns within
+ budget by falling back to the degraded state.
+
+* Engine — =net= CLI surface
+
+All subcommands take =--json= where a machine reads them. Pure formatting/state
+functions under the CLI; IO (nmcli, curl, file) at the edges. Every subcommand
+exits non-zero with a JSON error envelope (see JSON schemas) on failure.
+
+- =net status [--json] [--iface IF]= — fast link state + cached connectivity
+ sub-state + throughput. The indicator's source. Never does network I/O.
+- =net probe [--iface IF]= — run the connectivity/captive probe now, update the
+ cache (single-flight, atomic), print online | captive (+ portal URL) |
+ no-internet | unknown. Mirrors =captive='s cheap detection natively.
+- =net list [--json]= — saved connections, MRU order, active flag, plus in-range
+ wifi with signal.
+- =net up <uuid>= / =net down [--iface IF]= — switch / disconnect. Operates on
+ UUID, not name (see nmcli contract).
+- =net add= / =net edit <uuid>= / =net remove <uuid>= — manage connections
+ (open + WPA-PSK) through nmcli; the secret lands in NM's own
+ =.nmconnection=. Enterprise profiles are activate-only.
+- =net rescan [--iface IF]= — wifi rescan.
+- =net diagnose [--json]= — read-only report: gateway ping, DNS config, captive
+ probe. The structured contract below. Doubles as the post-failure snapshot.
+- =net repair <action> [--json]= — mutating remediation, lightest first:
+ =rfkill= (unblock + radio on), =reset= (fresh MAC), =bounce= (full-stack:
+ =nmcli networking off/on=, escalating to =systemctl restart NetworkManager=),
+ =dns-test= (temporary 1.1.1.1 override, auto-reverted). Each confirms via the
+ caller and verifies cleanup.
+- =net doctor [--json] [--fix]= — one-shot "get me online" mode for the console:
+ runs the full diagnose, then applies the lightest repair that fits (unblock
+ rfkill, reset, bounce, open portal) — read-only without =--fix=, acting with
+ it. The TTY recovery path when waybar/the GUI is down (see the Makefile
+ targets).
+- =net portal [--restore]= — the native captive-login flow (=repair.py= =portal-login=
+ tier): short-circuits if already online, else drops DoT to plain DNS, recovers the
+ portal URL from the redirect, opens it in a throwaway browser profile, and spawns a
+ detached watcher that restores DoT once online. =--restore= forces the restore now.
+- =net speedtest [--json]= — =speedtest-go --json= run; down/up/ping.
+
+* nmcli contract
+
+The command wrapper is the reliability boundary; SSIDs and connection names
+contain spaces, colons, duplicates, hidden names, and non-ASCII. Rules:
+
+- *Terse, field-selected output*: =nmcli -t -f <fields> --escape yes ...= and
+ =nmcli -g <fields> ...= (get-values) for single-value reads. Parse with the
+ documented escaping (=\:= and =\\=); never naive =cut -d:=.
+- *UUID is the handle.* Every saved-profile op (=up=, =down=, =modify=, =delete=)
+ uses the connection UUID, never the display name — names duplicate and contain
+ separators. =net list= surfaces UUIDs; the panel maps row → UUID.
+- *Wait budgets*: activation/deactivation use =nmcli --wait <n>= with an explicit
+ budget (hot-path reads =--wait 2=; activation =--wait 30=). No unbounded waits.
+- *Connectivity*: NM's own =nmcli networking connectivity= can return
+ =none/portal/limited/full/unknown=. Use it as a *cheap hint* on the fast path
+ when present, but the authoritative captive verdict is still our own probe
+ (NM's portal detection is coarser and config-dependent).
+- *Parser tests* (fake nmcli fixtures): escaped colons and backslashes in SSIDs,
+ embedded newlines, duplicate connection names, hidden SSID (empty name),
+ non-ASCII SSID, the wired-appears-mid-session case, and the multi-active case
+ (wifi + tether both up).
+
+* JSON schemas
+
+Versioned (="v": 1=) envelopes so tests lock the contract. Sketches (fields
+nullable unless noted):
+
+- =status=: ={v, iface, type: wifi|ethernet|none, ssid, signal, ipv4,
+ gateway, throughput: {rx_bps, tx_bps}, connectivity: online|captive|no-internet|unknown,
+ connectivity_age_s, connectivity_class: fresh|stale|expired|unknown, state:
+ online|captive|no-internet|connecting|disconnected|airplane|wired|degraded}=.
+- =probe=: ={v, result: online|captive|no-internet|unknown, portal_url, http_code,
+ redirect_host, elapsed_ms, ts}=.
+- =list=: ={v, connections: [{uuid, name, type, active, last_used, signal,
+ in_range, security}]}=.
+- =diagnose=: ={v, steps: [<diagnostic step, see contract>], overall:
+ ok|warn|fail}=.
+- =speedtest=: ={v, down_mbps, up_mbps, ping_ms, server, elapsed_ms, ts}=.
+- error envelope (any command): ={v, error: {code, message, detail, partial:
+ bool}}= with a non-zero exit.
+
+* Diagnostics contract
+
+=net diagnose --json= returns an ordered list of steps. Each step is the unit the
+panel renders and the log records:
+
+- =id= — stable identifier (e.g. =link=, =dhcp=, =gateway-ping=, =dns-config=,
+ =dns-resolve=, =http-probe=, =portal=).
+- =status= — =pending | running | pass | warn | fail | skipped=.
+- =title= — short human label.
+- =evidence= — redacted detail (the value seen), per the redaction rules.
+- =elapsed_ms=.
+- =safety= — =read-only= or =mutating= (diagnose steps are all read-only).
+- =next_action= — what the user/agent should do on warn/fail (e.g. "open portal",
+ "reset connection", "switch network").
+
+Repair actions (=net repair=) carry the same shape but =safety: mutating=, plus a
+=cleanup_verified: bool= field (e.g. the DNS override was reverted) and a
+terminal =cleanup-unverified= status when revert can't be confirmed.
+
+** Diagnose vs Repair (read-only vs mutating)
+The panel separates them visually and behaviorally:
+- *Diagnose* — probe, gateway ping, DNS config read, captive check. No state
+ change, no sudo, runnable freely.
+- *Repair* — reset (fresh MAC, deletes+recreates the NM profile), DNS override
+ test (mutates resolver, auto-reverts), portal force. Each needs an explicit
+ confirm, shows that it's privacy/state-changing, and verifies cleanup. A
+ Repair whose cleanup can't be verified ends in a visible =cleanup-unverified=
+ state, never a silent success.
+
+* Failure states, messages, recovery
+
+Each row below gives the *exact, final* user-facing string (not a template) with
+=<placeholders>= for redacted evidence, plus the evidence field included and the
+next action. The string is canonical: every surface renders the same text, so
+there's one source of truth.
+
+Per-surface rendering of the canonical string:
+- *Indicator* — the matching glyph + CSS class; the string is the tooltip
+ (untruncated).
+- *Notification* (=notify=) — title = "Networking"; body = the failure label on
+ its own line, then the canonical string.
+- *CLI* — the string on stderr; =--json= puts it in =error.message= with the
+ evidence in =error.detail= and a stable =error.code=.
+- *Panel* — the string as the section banner, with the diagnostic step's evidence
+ shown beneath.
+Evidence is always redacted per the redaction rules (SSID/host shown; PSK/EAP/
+portal tokens never).
+
+- *associated, no DHCP* — "Connected to <SSID>, no IP (DHCP failed)" →
+ evidence: SSID, iface → reset / reconnect.
+- *no-internet* — "On <SSID>, no internet (gateway reachable, no route out)" →
+ diagnose / switch network.
+- *captive* — "Captive portal at <host> — login required" → Open portal.
+- *DNS hijack* — "DNS is being redirected (portal)" → Open portal.
+- *DNS broken* — "DNS not resolving (hotel DNS down); 1.1.1.1 works" → use
+ override / report.
+- *HTTP intercepted* — "Traffic is being intercepted before it leaves" → Open
+ portal.
+- *sudo declined* — "Reset needs admin; it was declined — nothing changed" →
+ retry with auth.
+- *command timed out* — "<op> timed out; the system was left unchanged" → retry.
+- *partial mutation* — "<op> partially applied: <what>; rolled back to <state>"
+ → review.
+- *missing speedtest-go* — "speedtest-go not installed" → install hint.
+- *no wifi hardware* (desktop) — wifi rows hidden; ethernet-only view.
+- *wifi rfkill-blocked* — "WiFi is blocked (rfkill)" → unblock. The indicator
+ detects a soft-blocked radio (=rfkill list= shows the radio off though hardware
+ is present) and shows this distinct from disconnected. =net repair rfkill= (and
+ =net doctor --fix= as its first step) runs =rfkill unblock wifi= + =nmcli radio
+ wifi on= and reconnects. This is the framework-laptop case: an out-of-power
+ shutdown sometimes leaves wifi soft-blocked at next boot, and yes — the module
+ recovers it (the rfkill state is the indicator; the rfkill repair / doctor is
+ the one-step fix). A *hard* block (physical switch) is reported as
+ not-recoverable-in-software with that message.
+- *wifi rfkill hard-blocked* — "WiFi is blocked by the hardware switch" →
+ evidence: rfkill hard state → flip the physical switch.
+- *wrong password / missing secret* — "Saved password for <SSID> was rejected" →
+ evidence: SSID, NM auth-failure reason → re-enter the password.
+- *enterprise auth/cert failure* — "Enterprise login failed for <SSID> (802.1X)"
+ → evidence: SSID, EAP failure reason → edit the profile in nmtui/nmcli.
+- *upstream / AP / provider* — "On <SSID>, link is fine but the network has no
+ uplink" → evidence: gateway reachable, no route out, not a portal → switch
+ network or contact the venue.
+- *VPN-routed* — "Connected; internet is routed through a VPN (<dev>)" →
+ evidence: default route on a tun/wg device or non-NM DNS owner → check the VPN,
+ not WiFi.
+- *HTTP interception, no parseable portal URL* — "A portal is intercepting
+ traffic but didn't give a login link" → evidence: HTTP code, redirect host →
+ opens neverssl + the gateway page to log in manually.
+- *DNS override cleanup unverified* — "Couldn't confirm DNS was restored after the
+ test" → evidence: iface, attempted revert → revert DNS manually
+ (=resolvectl revert <iface>=).
+- *VPN kill-switch blocking* — "A VPN kill-switch is blocking all traffic, and the
+ VPN itself is down" → evidence: a block artifact present with no tunnel up → bring
+ the VPN back, or clear the kill-switch (the exact root command surfaced, not
+ auto-run).
+
+*VPN kill-switch detection + correction.* A kill-switch blocks all non-VPN egress when
+the tunnel drops, so the link looks up (wifi, IP, gateway) but nothing reaches the
+internet. This extends the =deferred-vpn= branch: when a VPN is active and the probe
+fails, run a rootless cascade to tell a working tunnel from a kill-switch that's
+blocking because the tunnel is down —
+- =ip rule= for wg-quick's =not fwmark 0xca6c= + =suppress_prefixlength 0= (and the
+ PostUp =REJECT ! -o %i= rule that makes it leak-proof);
+- =wg show= for an up tunnel interface;
+- =nmcli connection show= for Proton's =pvpn-killswitch= / =pvpn-ipv6leak-protection=
+ (device =pvpnksintrf0=);
+- =nft list ruleset= / =iptables -S OUTPUT= for a drop/reject table (=killswitch=,
+ =protonvpn=, =oifname != "wg0" ... drop=);
+- =nmcli -f connection.zone= for a firewalld =drop= zone.
+Classify *kill-switch-blocking* only when a block artifact exists AND no tunnel
+interface is up — that's what distinguishes it from a healthy VPN. Correction is tiered
+by artifact and every option needs root, so surface the exact command rather than
+auto-running it: =wg-quick down <iface>=, =nmcli connection delete pvpn-killswitch
+pvpn-ipv6leak-protection=, =nft delete table inet killswitch=, or =nmcli connection
+modify <con> connection.zone ''=. (Sits alongside the Phase 5 VPN work; detection can
+land earlier since =deferred-vpn= already exists.)
+
+Each message names whether the system was left unchanged, partially changed (with
+what), or fully changed, so the user knows the residue.
+
+* Doctor: escalation, classification, terminal states
+
+=net doctor= diagnoses, classifies the failure, then (with =--fix=) applies the
+*lightest* repair that fits and re-checks — it never loops destructive repairs
+against a failure they can't fix. Each failure resolves to one of four outcomes,
+and the doctor stops at any terminal one:
+
+- =fixable= — a local repair should help. Escalate lightest-first: rfkill-unblock
+ → reset (fresh MAC) → bounce (full stack) → portal, re-probing after each, and
+ stop as soon as the probe returns online.
+- =needs-user-action= (terminal) — no reset/bounce will help; doctor stops and
+ names the exact next step. Covers: wrong WPA password / missing NM secret
+ (enter the password), locked keyring or polkit denial (retry with auth),
+ enterprise 802.1X cert/identity failure (edit the profile in =nmtui=/=nmcli=),
+ captive portal login-required (open the portal + accept terms). Doctor must not
+ delete/recreate the profile against these — that loses the saved password and
+ makes things worse.
+- =upstream-not-local= (terminal) — the local link is up but the problem is past
+ it: AP has no uplink, gateway down/dropping traffic, DHCP server broken, ISP
+ outage, portal backend failing. =diagnose= proves it (link up + IP + gateway
+ reachable, but no route out and not a captive redirect), and =doctor --fix=
+ stops after local repairs are exhausted with "local repairs tried; likely
+ upstream/AP/provider" + the evidence. Next action: switch networks or contact
+ the venue.
+- =deferred/vpn= (terminal for v1) — an active VPN / policy route / non-NM
+ resolver owns the default route or DNS, so "no internet" may be the VPN's fault,
+ not WiFi's. v1 *detects* this (default route on a =tun/wg= device, or DNS owned
+ by something other than the NM link) and classifies it separately — "link is
+ fine; internet is VPN-routed" — rather than misclassifying it as a WiFi failure.
+ v1 does not repair it (VPN management is Phase 5); it names the VPN as the likely
+ owner and stops.
+
+** DNS handling in doctor (explicit per class)
+- *Captive DNS hijack* — open the portal (the hijack clears on login). No DNS
+ mutation.
+- *Broken resolver, 1.1.1.1 works* — the shipped =dns-test= repair is *diagnostic*:
+ it sets 1.1.1.1, confirms the venue resolver is the culprit, then auto-reverts
+ (=cleanup_verified=). Because it reverts, =doctor --fix= does not currently leave
+ you online in this case — it falls through to =upstream-not-local=, which
+ misreports a locally-fixable problem. *V2 fix (planned):* on a dns-test *pass*
+ (public DNS works), set a PERSISTENT resolver override and verify online, with an
+ offered revert — and classify it as its own outcome rather than upstream.
+- *Port-53 / egress blocked* (even 1.1.1.1 fails) — terminal =upstream-not-local=;
+ doctor stops, since it's not locally fixable.
+
+* Failure-mode coverage
+
+*V2 note (2026-06-30):* the authoritative, exhaustive catalog (~44 modes across 10
+connectivity layers, edge cases included, each tagged fix-and-verify or report-text)
+now lives in the redesign task (todo.org "Network panel redesign"). The table below is
+the v1 baseline; two rows reflect intent the shipped code doesn't yet match, and the
+v2 catalog closes them: =gateway unreachable= claims a bounce that doctor never
+actually reaches (a no-route failure goes straight to =upstream-not-local=), and
+=broken DNS, 1.1.1.1 works= auto-reverts so the user is left offline and misreported
+as upstream (the v2 persistent-override fix closes this).
+
+For each common field failure: does =net diagnose= detect it, can =net doctor
+--fix= repair it, and what terminal user action remains when it can't. (The
+=needs-user-action= / =upstream-not-local= / =deferred/vpn= outcomes are defined
+above.)
+
+| Failure mode | diagnose detects | doctor --fix | terminal user action |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| rfkill soft block | yes | yes (unblock) | none |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| rfkill hard block | yes | no | flip the physical switch |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| no wifi hardware | yes | n/a | use ethernet |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| associated, no DHCP | yes | yes (reset/bounce) | none, else switch network |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| gateway unreachable | yes | yes (bounce) | switch network if it |
+| | | | persists |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| captive DNS hijack | yes | opens portal | log in at the portal |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| broken DNS, 1.1.1.1 works | yes | yes (temp override, | report the venue's DNS |
+| | | auto-reverted) | |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| HTTP captive portal | yes | opens portal | log in at the portal |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| HTTP interception, no | yes | opens neverssl + gateway | log in manually |
+| parseable URL | | | |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| upstream / AP outage | yes (link up, no route out) | no (stops after local) | switch network / contact |
+| | | | venue |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| wrong WPA password / | yes | no | enter the password |
+| missing secret | | | |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| enterprise auth / cert | yes | no | edit the profile in |
+| failure | | | nmtui/nmcli |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| duplicate SSID / | yes (UUID-keyed) | yes (activate by UUID) | none |
+| connection-name | | | |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| hidden SSID | yes | yes (connect by name) | enter SSID + password |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| multiple active links | yes | n/a | pick the interface |
+| (wifi+tether) | | | |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| wedged NetworkManager | yes | yes (bounce → restart NM) | none, else reboot |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| slow / hung command | yes (degraded) | retries within budget | retry |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| stale / corrupt cache | yes | self-heals (atomic + | none |
+| | | invalidation) | |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| DNS cleanup failure | yes | flags cleanup-unverified | revert DNS manually |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| missing speedtest backend | yes | n/a | install speedtest-go |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+| VPN / policy-routing | yes (route/DNS ownership) | no (deferred to Phase 5) | check the VPN |
+| interference | | | |
+|----------------------------+-----------------------------+-----------------------------+-----------------------------|
+
+* Observability — logging + redaction
+
+- *Event log*: JSONL at =$XDG_STATE_HOME/net/events.jsonl= (fallback
+ =~/.local/state/net/events.jsonl=), size-rotated (e.g. 1 MB × 3). Every
+ mutating op and probe appends an event: =ts, op, argv (redacted), exit_code,
+ stderr_tail, elapsed_ms, iface, nm_uuid, probe_url_class, http_code,
+ redirect_host, cache_event=.
+- *Redaction (always on)*: PSKs, EAP identities/passwords, NM secrets, and
+ portal query tokens are never logged. MAC addresses, full IPs, and SSID are
+ redacted when configured (=redact_mac=, =redact_ip=, =redact_ssid= in config).
+- *Post-failure diagnosis*: =net doctor --json= is the snapshot + recommendation
+ (diagnose plus the suggested repair), =net diagnose --json= the raw report, and
+ the event log the history. =net doctor= is the console-recoverable entry point
+ (reachable as =make online= / =make net-doctor=).
+- *Secret-leak tests*: assert no PSK/EAP/portal-token ever appears in any JSON
+ output, log line, or error message.
+
+** Automatic diagnostic verbose-capture (V2)
+
+A distinct layer from the event log above: that log records what =net= did;
+this captures what the *underlying stack* did at debug verbosity during a run, so a
+failed diagnosis leaves real ground-truth instead of relying on memory. Two triggers,
+one mechanism:
+
+- *Automatic — on a failing diagnose.* When =net diagnose= ends =overall: fail=, the
+ next escalation (or =Get Me Online=) runs inside a verbose-capture session.
+- *Manual — a debug on/off toggle in the panel's Advanced section.* "Debug on"
+ elevates and leaves it elevated (with a visible "debug capturing" indicator) so the
+ user can reproduce an intermittent problem over time; "Debug off" restores and
+ writes the bundle. Useful when the failure doesn't reproduce inside one diagnose.
+
+Mechanism (shared):
+1. *Snapshot* the current log levels (=nmcli general logging=, resolved's level,
+ wpa_supplicant's).
+2. *Elevate* the relevant components to debug at runtime, no restarts, scoped to the
+ domains that matter (NM: =WIFI,DHCP,DNS,CORE=; resolved; wpa_supplicant).
+3. *Run* the diagnostics / repair.
+4. *Capture the window*: =journalctl= for NetworkManager + systemd-resolved +
+ wpa_supplicant since the run started, a =dmesg= tail (driver / firmware / rfkill),
+ and any =curl -v= probe output.
+5. *Restore* every level to its snapshot.
+6. *Write a redacted support bundle* to =$XDG_STATE_HOME/net/bundles/<ts>/= and
+ surface it in the panel.
+
+Hard requirements:
+- *Restore is guaranteed and idempotent.* A =try/finally= restores even on error,
+ and a crash-recovery guard detects "a prior run left NM/resolved/wpa_supplicant
+ elevated" on the next run and puts it back — the same shape as the DoT-restore
+ watcher. A crash must never strand the stack at debug verbosity.
+- *Redaction before anything leaves.* Raw wpa_supplicant and NM debug logs carry the
+ PSK and EAP credentials in cleartext. The captured journal is scrubbed before the
+ bundle is written, shown, or shared; the secret-leak test asserts no passphrase or
+ EAP secret survives into a bundle.
+- *Privilege via the V2 sudo-helper.* The log-level toggles need root, so they become
+ verbs on the passwordless helper (decision 16) — no extra prompt.
+
+Bonus — this closes a real detection gap, not just observability: the spec notes live
+auth-failure detection is a v1 limit (it leans on a one-shot NM state-120 snapshot).
+wpa_supplicant at debug during the run is exactly how a wrong-password or EAP failure
+is caught properly, so the capture feeds back into classification.
+
+* Indicator (task #C — Phase 1, the fast win)
+
+** States (internet sub-state on top of link state)
+- online — associated and the probe returned 204. Normal icon.
+- captive — associated, probe hit a portal. Distinct glyph + warning CSS class;
+ tooltip names the portal host; left-click opens diagnostics with the portal
+ ready to open (Phase 2+; see interactions for the Phase-1 interim).
+- no-internet — associated, probe failed (no portal, no 204). Distinct glyph +
+ warning class.
+- degraded — =net status= couldn't read link state within budget (slow/failed
+ nmcli). Neutral glyph, =net-degraded= class. Never blocks the bar.
+- rfkill-blocked — the radio is soft-blocked (=rfkill=), distinct from
+ disconnected. Distinct glyph; the fix is =net repair rfkill= / =net doctor=.
+- connecting / disconnected / airplane / wired — as today, plus wired shown
+ correctly even when it appears after session start. (airplane is now this
+ module's state, absorbed from the retired airplane module.)
+
+** Glyphs
+Nerd-font codepoints, final values verified live before merge (same discipline
+as wtimer). Reuse the signal-strength ramp already in =waybar-netspeed=; add a
+captive / no-internet / degraded overlay glyph.
+
+** Tooltip
+SSID + signal + IPv4 + gateway + the throughput readout (absorbed from
+netspeed) + the last probe result and its age (stale/expired hinted).
+
+** Interactions (no keyboard-modifier clicks — waybar can't qualify clicks by
+modifier, so the rich actions live in the panel, not ctrl/super-click)
+Clicks never block the bar: each dispatches a detached background job, single-flight
+per action. *As built (settled live with Craig, 2026-06-29):*
+- *left* — =net-panel= toggle (pkill-or-launch the GTK panel).
+- *middle* — =net portal= (the captive-login flow).
+- *right* — =net-fix= (=net doctor= with =--notify=: reports the result when the
+ outcome is one-way, opens a terminal only when it's fixable; the v2 redesign moves
+ even that into the panel).
+- airplane toggle moved off the bar to Super+Shift+A.
+
+* Panel (tasks #B + #C diagnostics — Phases 2-3)
+
+GTK4 + gtk4-layer-shell, pocketbook scaffold (src-layout package, unittest,
+Makefile, gtk4-layer-shell anchored dropdown under the bar). One panel shell,
+reused by the future desktop-settings panel.
+
+Sections as built (Phases 1-3, a four-page stack switcher):
+1. *Connections* — list, MRU-first, active marked, live signal bars for in-range
+ wifi; row click switches; buttons for add / edit / remove; a rescan control.
+2. *Diagnose* (read-only) — Probe (204/captive, shows portal URL + Open), Gateway
+ ping, DNS config. Streaming step output (the diagnostics contract).
+3. *Repair* (mutating, confirmed) — tiered lightest-first: Unblock rfkill, Reset
+ (fresh MAC), Bounce (full stack), DNS override test, Force portal. A "Get me
+ online" button runs =net doctor --fix= (the auto-escalating sequence).
+4. *Speed test* — Run button, progress, down/up/ping result + last-run line.
+As built, the panel also auto-hides on focus-out (popup behavior, suppressed while a
+child dialog holds focus) and carries a Close button bottom-right (2026-06-30).
+
+*V2 nav (planned):* three top tabs — Connections | Diagnostics | Performance.
+Diagnostics merges the Diagnose and Repair pages into one: a sub-row
+=Diagnose= | =Get Me Online= | =Advanced= over a shared area that shows diagnose
+items and streams repair progress in-panel (no terminal). =Advanced= reveals the
+individual repair tiers (renamed, with tooltips) plus a *Debug capture on/off*
+toggle (the manual side of the verbose-capture feature; a failing diagnose triggers
+it automatically). Speed test moves under Performance.
+
+** V2 panel UX — the target design
+
+The shipped four-page stack (Connections / Diagnose / Repair / Speed test) is
+*history*, not active design. V2 is the sole current target: one panel opened from
+the bar, three top tabs — Connections | Diagnostics | Performance — and the page
+model below is the contract for what gets built and what gets deleted, not just for
+labels.
+
+*** Connections — saved vs available, join-from-row
+Three labelled groups, never one merged list:
+- *Saved* — saved NM profiles, MRU-first, rendered instantly without a scan.
+- *Available now* — scan-backed in-range SSIDs with signal + security; may carry a
+ loading/stale hint; unsaved networks appear here.
+- *Wired* — ethernet when a wired device is present.
+=net list= already yields this (=connections.py= lists saved MRU-first, merges live
+signal/security for in-range saved profiles, then appends unsaved in-range SSIDs with
+=uuid: nil=); the panel groups and labels it. *Rescan refreshes only the
+availability/signal layer* — it never gates or reloads the Saved list.
+
+*Progressive loading:* render the Saved group immediately on open, then overlay
+availability, signal, and the unsaved Available-now rows when the scan returns. Show a
+small scan-in-progress state (elapsed + last-scan age). A slow or bad radio scan must
+not make the whole panel feel stuck — this is the direct answer to "why does it take
+so long to see my connections?"
+
+*Join-from-row (no Add page):* selecting an unsaved Available-now row *is* the join
+flow — SSID and security come prefilled from the scan, never retyped. Open networks
+connect (confirm only if needed); WPA/WPA2/WPA3-Personal ask only for the password.
+The standalone Add button + modal are deleted for visible networks. A hidden/manual
+SSID join lives behind an Advanced "Join hidden network" affordance.
+
+*** Supported authentication classes (the join matrix)
+From the scanned NM =SECURITY= value, V2 handles:
+- *Inline-supported* — open, open-with-captive-portal, WPA/WPA2/WPA3-Personal
+ (PSK/SAE), and WPA2/WPA3 transition mode. The row shows the security label so the
+ user knows why a password is or isn't asked.
+- *Activate-only* — 802.1X / enterprise: connect if already saved, else "edit in
+ nmtui/nmcli" (no add form in v1/V2, per decision 9).
+- *Hidden / manual* — behind the Advanced "Join hidden network" affordance.
+- *Rare / unsupported* — WEP, OWE/enhanced-open, MAC-registration, voucher, or
+ proxy-required: a clear in-panel explanation ("not supported here yet") plus a
+ non-terminal next step, never a hand-off to a terminal tool.
+
+*** Diagnostics owns the diagnostic story
+Diagnostics holds the read-only checks, the repair stream, Get Me Online, debug
+capture (Advanced), and the doctor report. A *lightweight* latency/throughput probe
+runs inline as a Diagnose evidence row when internet is available (skipped offline, on
+a metered/hotspot warning, or with no backend), and its result is stored in the doctor
+report. The *full* speed test stays under Performance (decision 19) — which is also
+the home for future throughput history, so Performance earns its tab rather than being
+a lone button.
+
+*** Forget confirmation — future tense + verified
+The destructive copy is future tense and names the scope: "This will remove the saved
+NetworkManager profile and its stored password from this machine." After the op,
+verify the UUID is gone, refresh the Saved list, and report "Forgot <SSID>" or "Could
+not forget <SSID>; nothing changed / partial <evidence>" — the verify-every-action
+decision applied to a destructive op.
+
+*** Findable diagnostics report
+Every diagnose, repair, and speed/performance run ends with a "Copy report" / "Open
+report" action in Diagnostics. The report carries the step statuses + elapsed, the
+final classification, the last speed/latency result when available, scan age,
+route/interface owner, the redacted event-log tail, and the bundle path when verbose
+capture ran. It states explicitly whether any repair mutated state and whether
+cleanup/verification passed. "Logs exist somewhere" isn't enough when the network is
+already down — the report is the one artifact the user copies to hand over.
+
+*** Visual contract — a Waybar-attached popup
+The panel reads as part of the bar, not a separate app. Match the live Waybar theme:
+the dark rounded capsule (=border-radius: 1rem=), the golden border, compact monospace
+text, and the =custom/net= state colors. Avoid square corners next to rounded UI, keep
+cards out of cards, and use compact icon+label controls with tooltips for the advanced
+repairs. Reuse any existing archsetup-owned GTK/panel conventions. (Non-blocking for
+engine work; blocks final V2 UX acceptance.)
+
+** Panel state, cancellation, permissions
+State machines for: connection-list loading, rescan-in-progress,
+activation-in-progress, diagnose-running, repair-running, speedtest-running. Plus
+the real terminal states on this two-machine fleet: no-wifi-hardware (desktop →
+ethernet-only view) and missing speedtest-go. (No GPG-key state — there's no
+credential store; secrets live in NM.) ("No NetworkManager" is not a modeled
+state — NM is always present
+on these machines; if nmcli is somehow absent the panel shows a single hard-error
+and exits.) Long operations show elapsed time and are cancellable where the
+underlying op allows (rescan, speedtest, probe); clearly non-cancellable ones
+(an in-flight activation) show elapsed + a disabled control. Permission-denied
+(sudo/polkit declined) is a first-class outcome with the "nothing changed"
+message, never a silent failure.
+
+Interaction-pattern catalog (=~/code/rulesets/patterns/=) principles that apply:
+- transient-state-buttons — all the network levers in one place, reachable by
+ one chord (the bar click), state visible.
+- default-most-common-friction-proportional — connections MRU-ordered so the
+ common pick is first; destructive ops (remove) and privacy-changing ones
+ (reset, override) get a confirm, switching does not.
+- one-prompt-picker-typed-prefix — if the connection picker ever goes
+ keyboard-driven, kind (wifi/eth/saved/in-range) + name in one typed picker.
+
+** Panel UX flow (settle before Phase 2)
+The concrete interaction defaults, so the GTK build isn't inventing them:
+- *Default focus*: the Connections section, current connection's row selected. If
+ the indicator opened the panel because of a captive/no-internet state, focus
+ Diagnose instead with the relevant action highlighted.
+- *Row content*: glyph (signal bars / wired / active check) + name + a secondary
+ line (security type, "active"/last-used). The active row is visually pinned at
+ top of its group.
+- *Buttons*: one *primary* per section (Connections: Connect to the selected row;
+ Diagnose: Run diagnose; Repair: "Get me online"; Speed test: Run). Secondary
+ actions (add / edit / remove / rescan; individual repair tiers) are smaller and
+ grouped.
+- *Disabled rules*: Connect disabled on the already-active row; Repair tiers
+ disabled while one runs; Speed test disabled while running; add/edit disabled
+ for enterprise (with the "edit in nmtui/nmcli" hint).
+- *Confirmations* (exact wording): Reset → "Reset <SSID>? This drops the
+ connection and reconnects with a new MAC."; Bounce → "Restart networking? All
+ links drop briefly."; DNS override → "Temporarily set DNS to 1.1.1.1 for the
+ test? It reverts automatically."; Remove → "Forget <SSID>? The saved password is
+ deleted."
+- *"Get me online" reporting*: shows each escalation step live (Unblock rfkill →
+ Reset → Bounce → Portal) with per-step pass/fail and stops at the first that
+ restores internet or at a terminal state, naming the next action.
+- *After close*: the bar reflects the new state immediately (signal/refresh on
+ next poll); a running speedtest/diagnose keeps running and notifies on finish
+ (panel close doesn't cancel it).
+- *Keyboard*: Esc closes (wired); arrows move row focus and Enter activates a
+ row (GTK ListBox defaults — row-activate connects, never disconnects); Tab is
+ the plain GTK focus chain, widget by widget (inside a list it crawls row by
+ row — no section jumps); there is NO type-to-filter. Verified live via
+ targeted-key AT-SPI probe 2026-07-02; the earlier tab-between-sections and
+ type-to-filter claims were aspirational and are struck. If section-jump Tab
+ or filtering is ever wanted, it's a new task, not an existing behavior.
+
+* Connection management (nmcli)
+
+- Every op via nmcli per the nmcli contract above (terse, escaped, UUID-keyed,
+ bounded =--wait=).
+- MRU ordering from NM's =connection.timestamp= (last activated), descending.
+- Ethernet appears in the list whenever a wired device is present, selectable at
+ any time; switching just brings the chosen connection up.
+- *Mutation safety + rollback*: switching keeps the current connection up until
+ the new one activates successfully (=nmcli --wait 30=); on failure it does not
+ tear down the working link, surfaces the failure, and leaves the prior
+ connection active. =net down= notes that NM may auto-reactivate a profile and
+ reports the post-op active connection so the user isn't surprised. A switch that
+ needs a password it doesn't have prompts (or fails with "password required"),
+ never silently strands. The exact NM command sequence (preflight active-state
+ read → activate target → verify default route → on failure, confirm prior
+ still up) is pinned in the engine and tested against fake nmcli.
+- *Add/edit scope*: open + WPA-PSK only in v1. Existing saved profiles of any
+ type (including enterprise) can be *activated*; editing an enterprise profile
+ shows "edit via nmtui/nmcli" rather than a broken partial form.
+
+* Connection secrets (no separate store)
+
+Per Craig's call: don't build a parallel credential store. Settings and secrets
+live where NetworkManager already keeps them, so there's one source of truth and
+no extra dependency (no GPG, no gpg-agent, no =~/.config/net/connections=).
+
+- *Where secrets live*: =/etc/NetworkManager/system-connections/<name>.nmconnection=,
+ root-owned =0600=, with the PSK/EAP secret stored inline (the default
+ =secret-flags=0= "owned by NM"). That's already secure-at-rest (root-only) and
+ is what =nmcli= reads/writes.
+- *How we touch them*: every add/edit/remove goes through =nmcli= (=connection add
+ / modify / delete=), which writes the =.nmconnection= with the right ownership
+ and perms. We never read or write =system-connections= files directly (root) and
+ never copy a secret out of them.
+- *No export / import / sync* — there's nothing to sync. A new machine gets its
+ connections the way it always has (the user joins, or restores NM profiles),
+ not from a tool-specific vault.
+- *config file*: =~/.config/net/config= still exists, but only for non-secret
+ preferences (speedtest server, redaction flags, probe TTL). It holds no
+ credentials.
+- *No secret leakage*: PSK/EAP never appear in =net=' =--json= output, the event
+ log, or error text (tested) — even though NM is the store, our surfaces must not
+ echo a secret =nmcli= happens to return.
+
+* Speed test
+
+- Backend: *=speedtest-go=* (=--json=, =--server=, =--no-download/--no-upload=),
+ already installed on velox (AUR =speedtest-go-bin=). No new dependency for v1.
+ librespeed-cli is the documented fallback for a self-hosted LibreSpeed server.
+- =net speedtest --json= parses speedtest-go's JSON into the =speedtest= schema.
+- *Server policy*: auto-select nearest by default; allow a pinned server id in
+ =~/.config/net/config=.
+- *Timeout + cancellation*: a hard run timeout (e.g. 60s); the panel run is
+ cancellable (kills the child). Offline / rate-limited / no-server errors map to
+ the failure-message table.
+- *Tests*: fixture JSON (success) and fixture stderr (offline, no server,
+ malformed output) drive =net speedtest= parsing without touching the network.
+
+* Help + documentation
+
+In-app help has three layers, each reachable in the situation it's needed:
+
+- *CLI help (works from a dead-GUI TTY)*: =net --help= lists the subcommands in
+ one screen; =net <cmd> --help= documents each (flags, what it mutates, the
+ console-recovery targets). The Makefile targets are self-describing (=make help=
+ lists =online= / =net-doctor= / etc. with one-line descriptions). This is the
+ layer that matters most when you're at a console with no network.
+- *Panel help (in the GUI)*: a small =?= affordance in the panel header opens an
+ inline help pane — what each section does, which Repair actions mutate state,
+ what the indicator glyphs/colors mean. Per-control tooltips on the less-obvious
+ buttons (rfkill, bounce, DNS override). No external help browser.
+- *User guide (the durable doc)*: a README / docs page covering every command,
+ the indicator states + glyphs, the panel sections, the config file keys, the
+ recovery make targets, troubleshooting (the failure-message table), and
+ rollback. Written so a future session — or Craig six months out — can operate
+ and recover the module from the doc alone.
+
+The failure-message table above is the single source of truth for the
+troubleshooting text; the guide and the panel help both render from it rather
+than restating it.
+
+* Enhancement radar
+
+Low-cost adjacent affordances, each dispositioned so cheap wins aren't lost and
+the v1 panel stays focused. (Several are already in v1 by virtue of other
+sections; marked here so the consideration is visible.)
+
+| Enhancement | Disposition | Reason |
+|-------------------------------------+-------------+--------------------------------------------------------|
+| Open / copy portal URL | v1 | already in the captive flow; trivial Open + Copy |
+|-------------------------------------+-------------+--------------------------------------------------------|
+| Forget network | v1 | it's the remove op, already specced |
+|-------------------------------------+-------------+--------------------------------------------------------|
+| Rescan now | v1 | already a Connections control |
+|-------------------------------------+-------------+--------------------------------------------------------|
+| Retry with hardware MAC | v1 | captive already has --hardware-mac; expose in Repair |
+|-------------------------------------+-------------+--------------------------------------------------------|
+| Pin speedtest server | v1 | already a config key |
+|-------------------------------------+-------------+--------------------------------------------------------|
+| Copy redacted doctor report | v1 | cheap, serves the observability/support goal |
+|-------------------------------------+-------------+--------------------------------------------------------|
+| Show last good network / result | vNext | needs small history persistence |
+|-------------------------------------+-------------+--------------------------------------------------------|
+| Watch mode for net doctor | vNext | a --watch loop; handy at a TTY, not v1-critical |
+|-------------------------------------+-------------+--------------------------------------------------------|
+| Actionable desktop notifications | vNext | dunst supports actions; extra wiring |
+|-------------------------------------+-------------+--------------------------------------------------------|
+| Keyboard connection picker (fuzzel) | vNext | the typed-prefix pattern; panel covers v1 |
+|-------------------------------------+-------------+--------------------------------------------------------|
+| QR-code share / import WiFi | rejected | low value for a personal 2-machine setup; phones do QR |
+|-------------------------------------+-------------+--------------------------------------------------------|
+
+* Waybar wiring
+
+- Replace =custom/netspeed= with =custom/net= in the bar's module list (same
+ slot).
+- Module def: =exec: waybar-net=, =return-type: json=, =interval: 2=, a =signal=
+ for on-demand refresh (next free signal after wtimer's 14), =on-click=,
+ =on-click-right=, =on-click-middle= per the phase-aware interactions (each
+ dispatches a detached job, never blocks).
+- Remove the old =on-click: pypr toggle network= scratchpad only once the panel
+ replaces it (Phase 2); Phase 1 keeps it as the interim manager.
+
+* Testing plan (TDD)
+
+- *Engine (normal)* — fake =nmcli= + =curl= + =speedtest-go= on PATH; assert
+ command sequences and parsed/emitted JSON for status, list, up/down,
+ add/edit/remove, probe, diagnose, repair, speedtest. Pure state/format
+ functions tested directly. JSON schemas locked by example.
+- *Portal parser* — already covered in =tests/captive= (Normal/Boundary/Error +
+ the real SONIFI body). The engine's native probe reuses the same cases.
+- *nmcli parsing* — escaped colon/backslash/newline in SSID, duplicate names,
+ hidden SSID, non-ASCII, wired-mid-session, multi-active (wifi+tether).
+- *Failure + concurrency (the risky classes)* — slow/hung nmcli/curl/speedtest
+ (degraded state within budget), concurrent =net status= probe refresh
+ (single-flight), corrupt cache (recovered), stale cache after SSID change
+ (invalidated), permission denied / sudo declined, DNS-override cleanup failure
+ (=cleanup-unverified=), NM partial activation (rollback keeps prior link),
+ secret redaction, missing speedtest-go, no wifi hardware, rfkill soft/hard
+ block.
+- *Doctor classification* — fixture-driven =net doctor= over fake nmcli/curl
+ asserting the right terminal classification + that =--fix= stops before
+ destructive repairs: auth failures (=needs-user-action=), upstream/AP failure
+ (=upstream-not-local=), VPN-routed failure (=deferred/vpn=), and the DNS classes
+ (hijack → portal, broken-but-1.1.1.1-works → offered override, egress-blocked →
+ upstream). Assert the failure-mode coverage table's "detects / repairs / terminal
+ action" holds for each row.
+- *Indicator* — drive =net status --json= through =waybar-net=, assert the JSON
+ per state (online / captive / no-internet / degraded / wired / disconnected /
+ rfkill), iface override via env.
+- *Panel* — pocketbook-style: backing logic (list ordering, op dispatch,
+ state-machine transitions), not GTK widgets.
+- *NM secrets / no-leak* — add/edit writes the secret into NM via nmcli (asserted
+ against fake nmcli, never to a tool-owned file); assert no PSK/EAP appears in any
+ =--json=, log line, or error (there is no credential store to round-trip).
+- *Live checklist (gated out of the suite)* — a "Manual testing and validation"
+ task per phase for the real-network states (captive at a hotel, no-internet,
+ switch under load, reset, speedtest) that can't be faked.
+
+** Harness + coverage gate
+The concrete contract, matching the repo's existing convention (not pytest — the
+dotfiles suites are =unittest=, run by =make test= as =python3 -m unittest= over
+=tests/*/test_*.py=; 33 suites today):
+- *Framework*: =unittest=. Each suite is =tests/<name>/test_<name>.py=
+ (=tests/net/=, =tests/waybar-net/=), collected by the existing =make test= loop
+ — no new runner, no pytest dependency.
+- *Fakes on a temp PATH*: =fake-nmcli=, =fake-curl=, =fake-speedtest-go=,
+ =fake-rfkill=, =fake-resolvectl= live as executable stubs in =tests/<name>/=
+ (the =tests/layout-navigate/fake-hyprctl= pattern). A fixture file encodes the
+ command→canned-output map and the stub appends each invocation to a log the test
+ asserts against. Subprocess timeouts are simulated by a stub that sleeps past the
+ budget; =net status= must still return the degraded state.
+- *Waybar wrappers end-to-end*: =waybar-net= is run as a subprocess with the fake
+ PATH and the env overrides (iface, cache path), asserting the emitted JSON — same
+ as =tests/waybar-netspeed=.
+- *Coverage*: coverage.py is absent system-wide (and not importable), so coverage
+ runs in a throwaway venv (=python3 -m venv=, =pip install coverage=, =coverage
+ run -m unittest=, =coverage report=) — the method the wtimer suite used (95%).
+ Target: *branch* coverage over =net/= and the wrapper, ≥ 90% on the pure
+ classifier/parser modules.
+
+** Coverage as a gap-finder, not a number (per phase)
+Line coverage alone misses the branches that matter here, so each phase ends with
+a *coverage-gap pass*, not just a percentage:
+- After the first green run, read the branch report and map every uncovered branch
+ to either a new test or a consciously-excluded live-only behavior (with a comment
+ or a Manual-testing entry naming it).
+- *Branch coverage is required* for the pure logic: the doctor classifier (every
+ outcome — fixable / needs-user-action / upstream-not-local / deferred-vpn), the
+ cleanup-unverified path, the redaction paths, the degraded hot-path fallback, the
+ timeout branches, and the portal/nmcli parsers.
+- A phase isn't "done" until its coverage-gap pass is recorded — uncovered logic is
+ either tested or explicitly excused, never silently uncovered.
+
+* Files touched (planned, all in =~/.dotfiles=)
+
+- =net/= package (src-layout, like pocketbook) — engine + panel.
+- =hyprland/.local/bin/waybar-net= — the indicator (replaces =waybar-netspeed=).
+- =hyprland/.local/bin/net= — engine CLI entry (console-script shim).
+- =hyprland/.config/waybar/config= — swap =custom/netspeed= → =custom/net=;
+ remove =custom/airplane=.
+- =hyprland/.config/waybar/style.css= — captive / no-internet / degraded /
+ rfkill classes; remove airplane classes.
+- =tests/net/=, =tests/waybar-net/= — suites.
+- =captive= — refactor: extract probe + reset into functions callable
+ non-interactively (a =--json= probe mode) so the engine reuses them.
+- =~/.config/net/config= — seed config (probe TTL, speedtest server, redaction
+ flags). No secrets; not a credential store.
+- dotfiles =Makefile= — add the console-recovery targets (=online=, =net-doctor=,
+ =net-status=, =net-diagnose=, =net-portal=, =net-reset=, =net-bounce=).
+- *Deletions once net ships* (the airplane module is absorbed):
+ =hyprland/.local/bin/waybar-airplane=, =hyprland/.local/bin/airplane-mode=,
+ =tests/waybar-airplane/=, =tests/airplane-mode/=, and the =custom/airplane=
+ module + its css.
+- archsetup Hyprland step — add =gtk4-layer-shell=, =python-gobject=,
+ =speedtest-go-bin= to the install lists (the only archsetup change; no =gpg=
+ added, secrets stay in NM's store).
+
+* Resolved decisions (Craig's calls + this response)
+
+1. Panel UI tech → GTK4 + gtk4-layer-shell, shared pocketbook scaffold (one
+ panel shell, reused by the desktop-settings sibling).
+2. Engine language → Python =net= package; shells out to =captive= for the
+ portal-force flow, native cheap probe for the bar path.
+3. Connectivity probe → split cadence (fast link poll every 2s + slow cached
+ internet/captive probe, TTL ~45s) with single-flight + atomic cache.
+4. No keyboard-modifier clicks (waybar can't qualify them) — the panel hosts the
+ rich actions; bar clicks dispatch detached jobs (phase-aware).
+5. No separate credential store (Craig's call, cj). Secrets live in NM's own
+ =system-connections= (root =0600=, inline), touched via nmcli. No GPG, no
+ gpg-agent, no =~/.config/net/connections=. Supersedes the earlier GPG-store
+ design.
+6. =custom/netspeed= absorbed into =custom/net=; throughput moves to the tooltip.
+7. Speed-test backend → =speedtest-go= (already installed), not a new
+ librespeed-cli dependency; librespeed-cli is the self-hosted fallback.
+8. Code lives in the dotfiles repo; archsetup only installs deps.
+9. v1 add/edit scope = open + WPA-PSK; enterprise/802.1X is activate-only,
+ add/edit is vNext (settled by Craig 2026-06-29 — no enterprise networks in his
+ history, so the form would be unused UI).
+10. =net doctor= is in v1 (Craig's call, cj) — a one-shot diagnose+fix mode,
+ reachable from a TTY via =make online= / =make net-doctor=. (The earlier
+ "defer the doctor/bundle command" decision is reversed.)
+11. Diagnose (read-only) and Repair (mutating, confirmed) are separated in the
+ panel and the CLI; Repair is tiered lightest-first (rfkill → reset → bounce).
+12. =custom/net= absorbs the airplane module (Craig's call, cj). *As built
+ (2026-06-29, option 1): display-only.* net shows the airplane state (reads
+ the airplane-mode state file); the =airplane-mode= low-power toggle is kept
+ (radios + CPU + brightness + services is not a network concern) and moved to
+ =custom/net='s right-click + signal 15. Only the redundant display pieces —
+ =waybar-airplane=, =custom/airplane=, and the retired =waybar-netspeed= —
+ plus their tests/css were deleted. The earlier "delete airplane-mode" framing
+ is superseded.
+13. Repair includes a full-stack bounce and an rfkill-unblock (Craig's calls,
+ cj) — the latter recovers the framework-laptop post-power-loss soft-block.
+14. VPN / WireGuard is a planned Phase 5 (Craig's call, cj), not a permanent
+ exclusion.
+
+V2 redesign decisions (Craig, 2026-06-30):
+
+15. *No terminals anywhere in the module* — =net-popup= is removed; every action and
+ result renders in the panel. No terminal is ever used to report information to the
+ user or to collect input from them: every prompt, confirmation, repair stream, and
+ result lives in the panel UI (Craig, cj, 2026-06-30). Reverses the part of decision
+ 11 that ran privileged repairs in a terminal "so sudo/polkit can prompt". (Unrelated
+ to the doctor's "terminal states" — that word means a final outcome, not a tty. The
+ one open question is the dead-GUI console-recovery path; see the VERIFY in todo.org.)
+16. *Passwordless privileged path* — a root-owned helper + a narrow NOPASSWD sudoers
+ rule scoped to it, archsetup-installed, run as =sudo <helper> <verb>=. This gates
+ decision 15 (a worker thread can't prompt). Absorbs the earlier DoT-toggle
+ follow-up and fixes the detached-restore-watcher bug.
+17. *Verify every action* — each mutating op (repair, connect, forget, add, DNS
+ override) re-checks its effect and surfaces pass/fail in the panel.
+18. *Detect + respond to every failure mode, edges included* — the full ~44-mode
+ catalog (todo.org redesign task) is the contract; auto-fix where safe, else report
+ the exact in-panel text. Includes IPv6-only awareness and multi-homing, which need
+ diagnose to stop being IPv4-only and single-iface.
+19. *Navigation* — top tabs Connections | Diagnostics | Performance; Diagnostics
+ merges Diagnose + Repair (Diagnose | Get Me Online | Advanced over a shared
+ streaming area); Speed test under Performance.
+20. *Automatic diagnostic verbose-capture* (Craig, 2026-06-30) — on a failing
+ diagnose, elevate the underlying stack (NM / resolved / wpa_supplicant) to debug,
+ capture the journal + dmesg window, restore (guaranteed + crash-guarded), and
+ write a redacted bundle. Plus a manual Debug on/off toggle in Advanced. Restore
+ bulletproof, secrets scrubbed before the bundle, log-level toggles via the V2
+ helper. See Observability.
+
+* Implementation phases
+
+*Phases 1-3 are SHIPPED* (2026-06-29 → 2026-06-30, dotfiles); their acceptance
+criteria passed and the work is live on velox. Phase 4 (docs/rollout) and Phase 5
+(VPN) remain. The V2 redesign phases at the end are designed, not yet built.
+
+- *Phase 1 — Indicator + console recovery (task #C).* =net status= + =net probe=
+ (native cheap probe, reusing captive's logic) + the =captive= probe refactor +
+ =waybar-net= + the split-cadence cache (single-flight, atomic, stale classes) +
+ CSS states (incl. rfkill) + performance budget. Plus the CLI-only recovery path:
+ =net repair= tiers (rfkill / reset / bounce), =net doctor [--fix]=, and the
+ Makefile targets (=make online= etc.) — all testable without the GTK panel.
+ Absorbs the airplane state and removes the standalone airplane module. Interim
+ left-click keeps the existing scratchpad until the panel lands.
+ - *Acceptance*: fresh-login waybar smoke test shows correct state on
+ online/captive/no-internet/wired/rfkill; =net status= stays within budget
+ under a fake slow nmcli (degraded state); =net doctor --fix= recovers a
+ soft-blocked radio from a TTY; the live captive checklist passes at a real
+ portal; the airplane state works and the old airplane module is gone;
+ reverting = swap =custom/netspeed= + =custom/airplane= back.
+- *Phase 2 — Panel shell + connection management (task #B core).* GTK4
+ layer-shell scaffold + =net list/up/down/add/edit/remove/rescan= + MRU list +
+ mutation safety/rollback + panel state machines.
+ - *Acceptance*: switch wifi↔wifi and ethernet↔wifi without stranding; a failed
+ switch leaves the prior link up; add/edit open + WPA-PSK writes the secret to
+ NM; remove confirms; panel states render for loading/rescan/activation.
+- *Phase 3 — Diagnostics + speed test in the panel.* Wire =net diagnose= /
+ =net repair= / =net doctor= / =net portal= / =net speedtest= into the Diagnose
+ vs Repair sections; the "Get me online" button; portal Open button; speedtest
+ progress + cancel.
+ - *Acceptance*: diagnose runs read-only; each repair tier confirms + verifies
+ cleanup (DNS override reverts, shown); speedtest result parses from
+ speedtest-go and a fixture-driven failure shows the right message.
+- *Phase 4 — Docs + rollout.* In-app help (=net --help= / per-command help, the
+ panel help affordance), README/user-guide (commands, panel, config,
+ troubleshooting, the make targets, rollback), and the manual dep step on ratio.
+ - *Acceptance*: =net --help= and each subcommand's help are complete; the
+ user-guide covers every command + the recovery targets; ratio rollout
+ documented.
+- *Phase 5 — VPN / WireGuard (future).* Fold the existing archsetup wireguard
+ tooling into the same panel + CLI (=net vpn ...=). Out of the v1 milestone;
+ specced separately when picked up.
+
+V2 redesign phases (designed 2026-06-30, dependency order):
+- *V2.1 — Sudo helper + NOPASSWD sudoers (gates everything).* Root-owned helper
+ dispatching net's fixed privileged verbs, archsetup-installed, narrow sudoers.
+ Also fixes the detached DoT-restore-watcher bug.
+ - *Acceptance*: every repair runs passwordless in-panel on a non-NOPASSWD machine;
+ the sudoers rule is scoped to the helper only.
+- *V2.2 — Merged Diagnostics panel + nav restructure.* Connections | Diagnostics |
+ Performance; the Diagnostics sub-row + shared streaming area; Advanced reveal +
+ tooltips; delete =net-popup=.
+ - *Acceptance*: no terminal opens for any action; repair progress streams in the
+ panel; Speed test lives under Performance.
+- *V2.3 — IPv6-aware and multi-homing-aware diagnose.* Stop treating no-IPv4 as a
+ failure when online over IPv6; identify which interface owns the default route.
+- *V2.4 — Close every detect/correct gap in the catalog, with post-action
+ verification.* Work the redesign-task catalog mode by mode.
+- *V2.5 — Automatic diagnostic verbose-capture.* Snapshot/elevate/capture/restore
+ around a failing diagnose + the Advanced Debug on/off toggle; guaranteed +
+ crash-guarded restore; redacted support bundle; helper log-level verbs.
+ - *Acceptance*: a failing diagnose leaves a redacted bundle (NM/resolved/
+ wpa_supplicant journal + dmesg) and restores every log level; a crash mid-capture
+ is detected and restored on the next run; the secret-leak test finds no PSK/EAP in
+ a bundle; the toggle elevates and restores on demand.
+
+* Open items / risks
+
+- gtk4-layer-shell dropdown anchoring under a waybar module needs the same
+ positioning work pocketbook solved; reuse it. (Phase 2.)
+- The =captive= refactor must keep the standalone CLI behavior identical while
+ exposing a non-interactive =--json= probe; covered by the existing
+ =tests/captive= suite plus new probe-mode tests. (Phase 1.)
+- speedtest-go server selection variance (nearest-server flor) — pin a server in
+ config if results are noisy. (Phase 3.)
+- The background-probe kick from =net status= must be truly non-blocking (spawn +
+ detach); enforced by the single-flight lock and the performance benchmark test.
+
+* Rollback
+
+Each phase is independent. The indicator (Phase 1) is a drop-in replacement for
+=custom/netspeed= (and =custom/airplane=); reverting is swapping those modules
+back in the config and restoring their scripts. The panel is additive — not
+wiring its clicks leaves the bar working as before. No credential store to roll
+back (secrets stay in NM throughout).
+
+* Review findings [40/40]
+
+** DONE Define the structured diagnostics contract :blocking:
+The spec says the engine "emits JSON" and that diagnostics "reuse =captive=
+verbatim", but the current =~/.dotfiles/common/.local/bin/captive= flow is a
+human-readable bash script that mixes diagnostics, sudo prompts, DNS mutation,
+browser launch, and terminal prose. A GTK panel cannot reliably turn that into
+clear state, progress, cancellation, or useful error messages. Define the
+machine contract before implementation: every diagnostic step should have a
+stable id, status (=pending/running/pass/warn/fail/skipped=), redacted evidence,
+elapsed time, safety outcome, and next action. Keep =captive= as the interactive
+CLI, but either refactor reusable probe/reset functions behind =net diagnose
+--json= or make =captive= expose a non-interactive JSON mode. This blocks the
+panel and logging work because otherwise the implementer must invent the
+boundary.
+
+Disposition: accept — added the "Diagnostics contract" section (per-step id /
+status / evidence / elapsed / safety / next_action) and the =captive= =--json=
+probe-mode refactor under Architecture + Files touched.
+
+** DONE Specify user-facing failure messages and recovery actions :blocking:
+The spec names failure states like =no-internet=, =captive=, failed probe,
+failed reset, missing DNS, and missing speed-test backend, but it does not define
+the messages the user sees or what each message tells them to do next. For this
+feature, "error" is not enough: a user needs to know whether WiFi is associated,
+whether DHCP succeeded, whether DNS is hijacked/broken, whether HTTP is
+intercepted, whether sudo was declined, whether a command timed out, and whether
+the system was left unchanged or partially changed. Add a message table for the
+indicator, panel, and CLI with: failure class, visible text, evidence included,
+redaction rule, and next action. This is blocking because UX quality here is the
+product, not an implementation detail.
+
+Disposition: accept — added the "Failure states, messages, recovery" section
+covering each class, the visible message, the "what changed" residue note, and
+the next action across indicator/panel/CLI.
+
+** DONE Define the debug log and redacted support bundle :blocking:
+There is no observability section. When this fails in a hotel or cafe, an agent
+needs enough evidence to diagnose it without rerunning destructive actions. Add
+log location, rotation/retention, JSONL event schema, command argv logging,
+exit-code/stderr capture, elapsed time, selected iface, NM active connection
+UUID, probe URL class, HTTP code, redirect host, DNS servers, and cache
+read/write events. Also define a =net doctor --json= or =net debug-bundle=
+command that emits redacted status, recent log events, dependency versions, and
+a reproduction command. Redact SSID if configured, MAC addresses, portal query
+tokens, PSKs, EAP identities/passwords, IPs when requested, and all GPG/NM
+secrets. This blocks implementation readiness because post-failure diagnosis is
+currently left to ad hoc terminal spelunking.
+
+Disposition: modify — accepted the JSONL event log, the schema, and the redaction
+rules in full (new "Observability" section). Deferred the dedicated =net
+debug-bundle= / =net doctor= command to vNext: for a single-user tool =net
+diagnose --json= (the snapshot) plus the event log (the history) cover
+post-failure diagnosis; a bundle command is gold-plating for v1. Recorded under
+Out + Resolved decision 10.
+
+** DONE Pin the nmcli parsing and timeout contract :blocking:
+The spec lists nmcli operations but not the exact fields, output modes, escaping
+rules, ID semantics, or timeouts. This is risky because SSIDs and connection
+names can contain spaces, colons, duplicates, hidden names, and non-ASCII; the
+current =waybar-netspeed= already had an SSID parsing bug. The nmcli manual
+documents =--terse=, =--get-values=, =--escape=, =--wait=, ID/UUID/path
+selection, =passwd-file=, and built-in connectivity states
+(=none/portal/limited/full/unknown=) at
+https://man.archlinux.org/man/nmcli.1.en. The spec should require UUIDs for
+saved-profile operations, explicit =--wait= budgets, parser tests for escaped
+colons/backslashes/newlines/duplicate names/hidden SSIDs, and a decision on when
+to use or ignore =nmcli networking connectivity [check]=. This is blocking
+because the command wrapper is the core reliability boundary.
+
+Disposition: accept — added the "nmcli contract" section: terse + =--escape= +
+=--get-values=, UUID-keyed ops, explicit =--wait= budgets, NM connectivity as a
+cheap hint (our probe authoritative), and the parser test matrix.
+
+** DONE Define cache concurrency, atomicity, and stale-state behavior :blocking:
+=net status= may spawn =net probe= whenever the cache is stale, but the spec
+does not define locking, process coalescing, atomic writes, crash cleanup, or
+what happens when the probe hangs. With a 2s Waybar interval, a bad network could
+start overlapping probes, corrupt the runtime cache, or keep showing stale
+"online" while the link is gone. Add a single-flight lock under
+=$XDG_RUNTIME_DIR/waybar=, atomic write+rename for cache updates, max probe
+runtime, stale age classes (fresh/stale/expired/unknown), cache invalidation on
+iface/SSID/connection UUID change, and tests for concurrent =net status= calls.
+This blocks the fast-path design because it is the main performance and
+correctness risk.
+
+Disposition: accept — added "Concurrency, atomicity, staleness" under the
+Connectivity model: flock single-flight, temp+rename atomic write, ≤6s probe
+timeout, fresh/stale/expired/unknown classes, iface/SSID/UUID invalidation, stale
+lock reclaim, plus concurrency tests in the test plan.
+
+** DONE Bound hot-path performance with measured budgets :blocking:
+The spec says the cheap poll should be sub-100ms, but the proposed fast path
+still may call multiple =nmcli= commands every two seconds, read sysfs, parse
+throughput, and maybe spawn a background probe. The existing =waybar-netspeed=
+had a deliberate sleep for throughput sampling; replacing it must define how
+throughput is sampled without sleeping in the bar path. Add a per-command budget
+for =waybar-net= and =net status=, a maximum number of subprocesses on the hot
+path, a timeout for every subprocess, benchmark tests with fake slow =nmcli=,
+and a rule that the indicator emits a degraded JSON state rather than blocking.
+This is blocking because Waybar custom modules can visibly freeze or lag when
+their exec path stalls.
+
+Disposition: accept — added the "Performance budgets" section: <100ms typical /
+<250ms worst, throughput sampled across the poll interval (no in-process sleep),
+one nmcli call max on the hot path, timeouts on every subprocess, the degraded
+state, and a fake-slow-nmcli benchmark test.
+
+** DONE Make click actions non-blocking and visible :blocking:
+Waybar right-click runs =net reset= and middle-click runs =net portal= directly.
+Those operations can require sudo, open browsers, mutate DNS, delete/recreate NM
+profiles, or hang on network commands, but Waybar click handlers provide no
+panel, terminal, progress, or cancellation surface by default. Define whether
+right/middle click instead opens the panel focused on the action, dispatches a
+background job with notifications, or is removed from v1. If kept, specify
+single-flight behavior, how sudo/polkit prompts surface, how success/failure is
+reported, and how the user can inspect logs. This blocks UX readiness because
+the fastest remediation path is currently the easiest place to hide failure.
+
+Disposition: modify — accepted the concern; made the interactions phase-aware and
+non-blocking. Every click dispatches a detached, single-flight background job and
+reports via =notify=; sudo surfaces through polkit/the normal prompt; failures go
+to the notify + the event log. In Phase 1 (no panel) left-click runs probe +
+notify and keeps the scratchpad; from Phase 2 left-click opens the panel focused
+on the action. Recorded in the Indicator "Interactions" subsection.
+
+** DONE Specify connection mutation safety and rollback :blocking:
+The spec says row click switches connections and remove gets a confirm, but it
+does not define what happens when a switch partially succeeds, disconnects the
+current working link, needs a password, loses the default route, or triggers
+auto-activation. The nmcli manual warns that =connection down= does not prevent
+future auto-activation and may internally block a profile until user action.
+Define preflight, the exact NM command sequence, whether the old active
+connection is kept until the new one proves usable, when rollback is attempted,
+how long activation waits, and what the panel says when rollback fails. This is
+blocking because the module can strand the user offline.
+
+Disposition: accept — added "Mutation safety + rollback" under Connection
+management: keep the prior link up until the target activates (=--wait 30=), no
+teardown on failure, password-required surfaced not stranded, =net down= reports
+post-op active state + the auto-reactivation caveat, and the pinned NM command
+sequence is tested against fake nmcli.
+
+** DONE Define the credential-store security model :blocking:
+The GPG store is described as optional and default-unencrypted, but the spec does
+not define file modes, schema, secret-source rules, import/export prompts,
+recipient verification, stale secret handling, or what is logged. It also says
+NM remains source of truth while the user-owned store contains PSK/EAP secrets,
+which creates two truth sources for sensitive data. Add a precise schema,
+=0600= file creation with parent-dir permissions, encrypted-recipient checks,
+plaintext warning text, explicit opt-in flow, redaction requirements, behavior
+when NM has a secret not in the store, behavior when the store has a secret NM
+rejects, and tests for no secret leakage in JSON/logs/errors. This blocks Phase
+4 and the full spec because otherwise the implementer must make security
+decisions mid-code.
+
+Disposition: accept — rewrote "Credential storage" with the versioned schema,
+=0600= file / =0700= dir, recipient verification on opt-in, the plaintext
+warning, secret-source rule (entered/exported, never harvested from root store),
+the two-source reconciliation policy (NM wins live, store wins for what NM
+lacks, stale-secret flagging), and the no-leak tests.
+
+** DONE Define EAP, enterprise WiFi, and unsupported connection behavior :blocking:
+The store says "PSK/EAP" and connection management says add/edit, but there is
+no v1 contract for WPA-Enterprise fields, certificates, identity vs anonymous
+identity, hidden networks, static IP, proxy settings, metered flags, MAC
+randomization, or 802.1X prompt behavior. Either scope v1 to open/WPA-PSK plus
+existing saved-profile activation, or define the minimum EAP form and the
+unsupported-state messages. This blocks add/edit/import because enterprise WiFi
+is too sensitive to hand-wave.
+
+Disposition: modify (scope) — scoped v1 to open + WPA-PSK add/edit, with
+*activation* of any existing saved profile (including enterprise). Enterprise /
+802.1X add/edit, static-IP, proxy, metered, and MAC-randomization editing are
+vNext, shown as "edit via nmtui/nmcli". Recorded in Scope/Out, Connection
+management, and Resolved decision 9.
+
+** DONE Split read-only diagnostics from mutating remediation :blocking:
+The panel's diagnostics section includes probe, bounce/reset, gateway ping, and
+DNS override test in one area, while =captive= currently performs resets and
+temporary DNS changes as part of its flow. Users need to know which buttons are
+read-only and which mutate NM profiles, MAC mode, DNS, or browser state. Add
+separate "Diagnose" and "Repair" actions, confirmations for destructive or
+privacy-changing operations, explicit cleanup verification for DNS override, and
+a terminal state when cleanup is unverified. This blocks readiness because
+network repair must not surprise the user or leave hidden residue.
+
+Disposition: accept — split the panel into a read-only Diagnose section and a
+confirmed, mutating Repair section (and split the CLI into =net diagnose= vs =net
+repair=). Added =cleanup_verified= + a terminal =cleanup-unverified= state to the
+diagnostics contract.
+
+** DONE Define panel state, cancellation, and permissions UX :blocking:
+The panel sections list buttons and a streaming output area, but not loading
+states, disabled states, empty states, keyboard/focus behavior, cancellation, or
+permission-denied handling. Add panel state machines for connection list loading,
+rescan in progress, activation in progress, diagnostics running, speedtest
+running, and no NetworkManager/no WiFi/no permissions/no GPG key/no
+librespeed-cli. Each long operation should be cancellable where possible or
+clearly non-cancellable with an elapsed-time display. This blocks the GTK work
+because without it the implementer must invent the user flow.
+
+Disposition: modify — accepted the state-machine requirement (added "Panel state,
+cancellation, permissions"), but scoped the state set to what can actually occur
+on the two-machine fleet: dropped "no NetworkManager" as a modeled state (NM is
+always present; a missing nmcli is a single hard-error exit) and kept
+no-wifi-hardware, missing speedtest-go, no-GPG-key, plus the in-progress states
+with elapsed-time + cancellation where the op allows.
+
+** DONE Verify speed-test dependency, server choice, and failure contract :blocking:
+The spec chooses =librespeed-cli= and notes availability/default-server research
+as an open risk, but Phase 3 still depends on parsing its JSON and showing
+progress. I checked the upstream project page
+(https://github.com/librespeed/speedtest-cli) and the AUR URL named by search is
+not sufficient as a verified package/install contract in this spec. Add the
+exact package name/source to install, command version expected, JSON shape,
+server-selection policy, timeout, cancellation behavior, offline/rate-limited
+messages, and tests with fixture JSON and fixture stderr. This blocks Phase 3
+because speed-test failure modes are otherwise undefined.
+
+Disposition: modify — verified live and changed the backend: =speedtest-go= (AUR
+=speedtest-go-bin=, 1.x) is already installed on velox and supports =--json=,
+=--server=, =--no-download/--no-upload=, so v1 needs no new dependency.
+librespeed-cli (AUR =librespeed-cli= / =-bin=) is the documented self-hosted
+fallback. Added the "Speed test" section with server policy, timeout,
+cancellation, the failure-message mapping, and fixture-JSON/stderr tests.
+
+** DONE Define dependency installation and repo boundaries :blocking:
+The files touched section alternates between archsetup paths and the external
+dotfiles repo, while pocketbook has been folded into this repo and its previous
+archsetup provisioning was intentionally removed. The spec should state where
+the =net= package actually lives, which repository owns the scripts/tests,
+whether =gtk4-layer-shell=, =python-gobject=, =librespeed-cli=, =gpg=, =nmcli=,
+=curl=, and =resolvectl= are installed by archsetup or assumed present, and the
+Makefile targets for test/lint/install. This blocks implementation because the
+current path plan can produce code that is not installed on a fresh machine.
+
+Disposition: accept — added the "Repository + dependencies" section: all code in
+=~/.dotfiles= (=net/= package in-tree like pocketbook, scripts in the hyprland
+tier, tests under =tests/=), archsetup owns only the dep install
+(=gtk4-layer-shell=, =python-gobject=, =speedtest-go-bin=; nmcli/curl/resolvectl
+already present), Makefile =make test= collects the package suite, and a
+daily-drivers note for ratio. Rewrote Files touched to match.
+
+** DONE Expand the test plan for failure, concurrency, and live verification :blocking:
+The testing plan covers normal parsing and fake command sequences, but it misses
+the riskiest behaviors: slow/hung =nmcli=/=curl=/=librespeed=, concurrent
+=net status= cache refresh, corrupt cache, stale cache after SSID change,
+permission denied, sudo declined, DNS override cleanup failure, NM partial
+activation, duplicate connection names, secret redaction, missing optional
+dependencies, no WiFi hardware, wired+tether+WiFi ambiguity, portal redirect
+tokens, and Waybar click handlers. Add unit/fixture tests for each class plus a
+manual/live checklist gated out of the normal suite. This is blocking because
+the current plan would leave the exact "things that can go wrong here" mostly
+untested.
+
+Disposition: accept — rewrote the Testing plan with the "Failure + concurrency"
+class (slow/hung commands, single-flight, corrupt/stale cache, perm-denied,
+cleanup-failure, partial activation, redaction, missing deps, no-wifi,
+multi-active) and a per-phase live checklist gated out of the suite.
+
+** DONE Define status JSON schemas and compatibility rules
+The spec says all subcommands take =--json= but does not define schemas. Add
+versioned JSON examples for =status=, =probe=, =list=, =diagnose=, =speedtest=,
+and error envelopes, including nullable fields and unknown/degraded states. This
+is non-blocking for product direction but should be fixed before code so tests
+can lock the CLI contract.
+
+Disposition: accept — added the "JSON schemas" section with versioned (=v:1=)
+envelopes for status / probe / list / diagnose / speedtest and a shared error
+envelope, including the degraded/unknown states.
+
+** DONE Rename or alias the phasing section for workflow compatibility
+The spec has a usable =Phasing= section, but the spec-review workflow expects an
+=Implementation phases= section that can be lifted into =todo.org=. Rename it or
+add an alias heading during response. This is non-blocking because the existing
+phase decomposition is understandable, but aligning the heading prevents future
+workflow friction.
+
+Disposition: accept — renamed =Phasing= → =Implementation phases= and added
+per-phase acceptance criteria.
+
+** DONE Add documentation and rollout acceptance checks
+Rollback is described, but docs and rollout are thin. Add README/user-guide
+updates for commands, panel behavior, config file, GPG opt-in, troubleshooting,
+and rollback; add acceptance checks for each phase, including a fresh-login
+Waybar smoke test and restoring =custom/netspeed=. This is non-blocking but
+important for handing the feature to a future session without re-discovery.
+
+Disposition: accept — added per-phase acceptance criteria under Implementation
+phases (incl. the fresh-login waybar smoke test and the =custom/netspeed=
+restore), a Phase 4 "Docs + rollout", and (answering Craig's cj follow-up) a
+dedicated "Help + documentation" section with the three help layers (CLI help,
+panel help affordance, user guide).
+
+** DONE Add a failure-mode coverage table :blocking:
+The spec now names many individual network failures, but it still does not carry
+one compact coverage matrix that says, for each common failure mode, whether
+=net diagnose= detects it, whether =net doctor --fix= can repair it, and what
+terminal user action remains when it cannot. Add a table covering at least:
+rfkill soft block, rfkill hard block, no WiFi hardware, associated/no DHCP,
+gateway unreachable, captive DNS hijack, broken DNS where 1.1.1.1 works, HTTP
+portal, HTTP interception without a parseable portal URL, upstream/AP outage,
+wrong WPA password or missing secret, enterprise auth/cert failure, duplicate
+SSID/connection-name ambiguity, hidden SSID, multiple active links, wedged
+NetworkManager, slow/hung command, stale/corrupt cache, DNS cleanup failure,
+missing speedtest backend, and VPN/routing interference. This blocks because
+Craig asked for confidence that the diagnostics and doctor cover the real field
+failures, and prose scattered across sections is too easy to misread.
+
+Disposition: accept — added the "Failure-mode coverage" section: a 22-row table
+(every mode the finding named) with detect / doctor-fix / terminal-action
+columns, conformed to the org-table standard (rules under every row, ≤120).
+
+** DONE Pin DNS repair semantics in doctor :blocking:
+The spec diagnoses DNS hijack, broken hotel DNS, and the temporary 1.1.1.1
+override test, but =net doctor --fix= does not say whether it merely recommends
+the override, applies a temporary override during recovery, or leaves DNS alone
+after diagnosis. Define the exact behavior for each DNS class: captive hijack
+should open the portal, broken DNS where 1.1.1.1 works should either offer an
+explicit temporary repair with cleanup verification or recommend the command,
+and port-53/egress blocking should stop as upstream/not locally fixable. This is
+blocking because DNS is one of the most common "connected but unusable" failures
+and the current doctor contract is ambiguous.
+
+Disposition: accept — added "DNS handling in doctor (explicit per class)" under
+the new Doctor section: hijack → open portal (no DNS mutation); broken-but-1.1.1.1
+→ explicit temporary override with cleanup verification under =--fix=, recommend
+otherwise; egress-blocked → terminal =upstream-not-local=.
+
+** DONE Make auth failures terminal user-action states :blocking:
+Wrong WPA password, missing NM secret, locked keyring/polkit denial, enterprise
+802.1X certificate/identity failure, and portal login-required are not fixed by
+resetting or bouncing NetworkManager. The doctor sequence should classify these
+as =needs-user-action= terminal states, stop before looping through destructive
+repairs, and tell the user the exact next action (enter password, edit profile in
+=nmtui=/=nmcli=, accept portal terms, provide cert/identity, or retry with
+admin auth). This blocks because repeated reset/bounce against auth failures is
+slow, noisy, and can make the network state worse without helping.
+
+Disposition: accept — added the =needs-user-action= terminal outcome to the
+Doctor section: wrong password / missing secret / keyring-or-polkit denial /
+802.1X cert-or-identity failure / portal-login-required all stop the doctor before
+any destructive repair and name the exact next step.
+
+** DONE Define upstream/AP/provider failure terminal states :blocking:
+Some failures are not client-repairable: AP has no uplink, hotel gateway is
+down, DHCP server is broken, gateway drops traffic, ISP outage, or captive
+portal backend is failing. The spec should define how =diagnose= proves "local
+link is up but upstream is broken" and how =doctor --fix= stops after local
+repairs are exhausted with a clear message like "local repairs tried; likely
+upstream/AP/provider" plus the evidence. This blocks because users need to know
+when to stop poking the laptop and switch networks or contact the venue.
+
+Disposition: accept — added the =upstream-not-local= terminal outcome: diagnose
+proves link-up + IP + gateway-reachable but no route out and no captive redirect;
+=doctor --fix= stops after local repairs with "local repairs tried; likely
+upstream/AP/provider" + evidence → switch network / contact venue.
+
+** DONE Decide how VPN and policy routing affect v1 diagnosis
+VPN/WireGuard management is Phase 5, but active VPNs, policy routes, DNS
+overrides, and firewall killswitches can break apparent internet access in v1.
+The current spec does not say whether v1 detects active VPN/policy routing and
+classifies "network is fine, VPN route/DNS is broken" separately from WiFi
+failure. Add either a v1 diagnostic check for active VPN/default-route/DNS
+ownership with a "deferred repair" outcome, or explicitly state that VPN-routed
+failures are out of scope and may be misclassified. This is blocking if Craig
+expects the module to diagnose normal daily-driver network failures while VPN
+tooling remains separate.
+
+Disposition: accept (chose the detect-and-classify option) — v1 detects an active
+VPN / non-NM default route / non-NM DNS owner and classifies =deferred/vpn= ("link
+is fine; internet is VPN-routed"), distinct from a WiFi failure. v1 does not
+repair it (VPN management is Phase 5); it names the VPN as the likely owner and
+stops. Added to the Doctor section + the coverage table + a doctor-classification
+test.
+
+** DONE Remove stale GPG-store references from the resolved spec
+The spec now decides "no separate credential store; secrets live in
+NetworkManager", but the Testing plan still mentions =gpg round-trip= and =GPG
+store= tests, and the panel-state list still mentions a no-GPG-key state. Remove
+those stale references and replace them with NM-secret/no-secret-leak tests.
+This is non-blocking for product behavior but blocking for implementation
+clarity: otherwise tests will be written for a credential store that no longer
+exists.
+
+Disposition: accept — replaced the Testing-plan =gpg round-trip= / =GPG store=
+bullets with an "NM secrets / no-leak" test (add/edit writes the secret via nmcli;
+assert no PSK/EAP in any JSON/log/error; no store to round-trip) and dropped the
+=no-GPG-key= panel state. Residue from the cj-comment pass that dropped the store.
+
+** DONE Reconcile status, goal, and task text before implementation :blocking:
+The spec status says "Implementation-ready with caveats" and "Phase 1 ready to
+build", but the body still has an unresolved enterprise add/edit VERIFY, the
+Goal still says "optional GPG-encrypted secret store", and the unified task title
+still names "GPG-stored secrets" even though the accepted design removed the
+store. Before implementation, make the top-level status, goal, scope, task
+mapping, and resolved decisions agree with the current design. This blocks
+readiness because a developer starting from the top of the file would still build
+or plan around abandoned GPG-store behavior.
+
+Disposition: accept — fixed the Goal ("secrets stay in NM's own store"), the
+=[#B]= task-mapping line (notes the "GPG-stored secrets" framing is superseded by
+decision 5), the enterprise VERIFY (now resolved → Status updated), and corrected
+the stale =pytest= mentions to =unittest= (the repo's actual harness). Top-of-file
+status/goal/scope/decisions now agree with the design.
+
+** DONE Resolve enterprise add/edit scope or make the caveat explicit :blocking:
+The spec still says "One open question for Craig: pull enterprise add/edit into
+v1?" and points to a VERIFY in =todo.org=. That is a real product-scope decision:
+if enterprise add/edit is in v1, panel forms, nmcli command sequences, tests,
+error messages, and docs change materially; if it is out, the UI must consistently
+show activate-only with "edit in nmtui/nmcli". Decide it in the spec before
+implementation, or downgrade the status to =Ready with caveats= with this exact
+accepted caveat. As written, the spec cannot be plain =Ready=.
+
+Disposition: accept — Craig decided (2026-06-29): enterprise add/edit is vNext,
+activate-only in v1. Settled in the Status line, the Scope/Out bullet, decision 9,
+and the VERIFY (now DONE in todo.org). The UI shows activate-only with "edit in
+nmtui/nmcli" consistently. Evidence: 24 saved profiles, 0 enterprise.
+
+** DONE Define the concrete test harness and coverage gate :blocking:
+The spec says TDD, fake binaries on PATH, and benchmark tests, but it does not
+define the actual harness contract: pytest vs unittest for the =net= package,
+where fake =nmcli=/=curl=/=speedtest-go=/=rfkill=/=resolvectl= live, how test
+fixtures encode command histories, how subprocess timeouts are simulated, how
+Waybar scripts are executed end-to-end, and how coverage is run. Add the exact
+Makefile targets (=test=, =test-unit= or package-local =pytest=), pytest config,
+coverage command (e.g. branch coverage over =net/= and =waybar-net= wrappers),
+minimum threshold, and the rule for reading the coverage report to add missing
+tests before declaring a phase done. This blocks readiness because "what is the
+test harness?" is still answerable only by analogy to older suites.
+
+Disposition: accept — added the "Harness + coverage gate" section. Corrected the
+premise: the repo is =unittest= (=make test= → =python3 -m unittest=, 33 suites),
+not pytest. Pinned the fake-binary stub convention (=tests/<name>/fake-*= on a
+temp PATH), the fixture command→output map, timeout simulation, the end-to-end
+=waybar-net= subprocess run, and coverage via a throwaway venv (coverage.py is
+absent system-wide) with a ≥90% branch target on the pure modules.
+
+** DONE Use coverage to find missing behavior, not just report a percentage :blocking:
+The spec does not say how coverage findings affect implementation. For this
+feature, line coverage alone can miss the important holes: doctor classification
+branches, cleanup-unverified paths, redaction paths, degraded hot-path fallbacks,
+timeout branches, and auth/upstream/VPN terminal states. Define coverage review
+criteria per phase: branch coverage for pure classifiers and parsers, named
+untested branches allowed only with comments or manual-check entries, and a
+required "coverage gap pass" after the first green test run that maps uncovered
+logic back to tests or consciously excluded live-only behavior. This blocks
+readiness because the current test plan is broad but does not force the suite to
+expose missing edge tests.
+
+Disposition: accept — added the "Coverage as a gap-finder, not a number (per
+phase)" subsection: branch coverage required for the doctor classifier (every
+outcome), cleanup-unverified, redaction, degraded-fallback, timeout, and the
+parsers; a mandatory coverage-gap pass after the first green run mapping each
+uncovered branch to a test or a named live-only exclusion; a phase isn't done
+until that pass is recorded.
+
+** DONE Convert error classes into exact user-facing strings and evidence fields :blocking:
+The failure table and doctor outcomes classify errors well, but many messages
+are still templates or descriptions rather than final text. Add exact strings
+for indicator tooltip, notification, CLI stderr, JSON =error.message=, and panel
+banner/step text for every failure-mode row, including cases doctor cannot fix:
+wrong password, missing secret, enterprise cert failure, upstream/AP/provider
+failure, VPN-routed failure, hard rfkill block, DNS cleanup failure, speedtest
+missing, and HTTP interception without parseable URL. For each string, specify
+the redacted evidence included and the next action. This blocks UX readiness
+because "useful error" is only testable once the actual text and evidence are
+defined.
+
+Disposition: accept — rewrote the Failure states section: each row now carries the
+exact final string (with =<placeholder>= evidence), the evidence field, and the
+next action, plus a per-surface rendering rule (indicator tooltip / notify /
+CLI+JSON error.message+detail+code / panel banner all render the one canonical
+string). Added the missing doctor-unfixable rows: hard rfkill, wrong password /
+missing secret, enterprise cert failure, upstream/AP/provider, VPN-routed, HTTP
+interception without a parseable URL, and DNS cleanup-unverified.
+
+** DONE Add an enhancement disposition table
+The spec captures several good enhancements (doctor, Makefile recovery, rfkill,
+airplane absorption, VPN phase), but it does not show that low-cost adjacent
+enhancements were considered and accepted/deferred/rejected. Add a small radar
+table for likely affordances: copy redacted doctor report, open/copy portal URL,
+retry with hardware MAC, forget network, rescan now, pin speedtest server, show
+last good network/result, watch mode for =net doctor=, desktop notification
+actions, QR-code/share WiFi import/export, and keyboard picker. Mark each
+=v1=, =vNext=, or =rejected= with a one-line reason. This is non-blocking, but it
+prevents accidental loss of cheap UX wins and keeps the v1 panel focused.
+
+Disposition: accept — added the "Enhancement radar" table dispositioning all the
+named affordances: open/copy portal URL, forget network, rescan, hardware-MAC
+retry, pin speedtest server, copy redacted doctor report = v1; last-good
+network/result, doctor watch mode, actionable notifications, keyboard picker =
+vNext; QR-share = rejected (low value for a 2-machine personal setup).
+
+** DONE Tighten the panel UX flow before Phase 2
+The panel has sections and state machines, but not a concrete interaction flow:
+default focused section, row content, primary/secondary buttons, disabled-state
+rules, confirmation wording for reset/bounce/DNS override, how "Get me online"
+reports each escalation, what stays visible after the panel closes, and keyboard
+navigation. Add a short UX flow spec or wire-level outline before Phase 2. This
+is non-blocking for Phase 1, but it blocks Phase 2 implementation because a GTK
+panel can easily become noisy or surprising if these defaults are invented while
+coding.
+
+Disposition: accept — added the "Panel UX flow (settle before Phase 2)"
+subsection: default focus (Connections, or Diagnose when opened from a captive
+state), row content, one primary button per section, disabled-state rules, exact
+confirmation wording for reset/bounce/DNS-override/remove, the live "Get me
+online" escalation reporting, what survives panel close, and keyboard nav.
+
+** DONE Reconcile the panel navigation source of truth :blocking:
+Disposition: accept — folded into "V2 panel UX". V2 (Connections | Diagnostics |
+Performance) is the sole current target; the shipped four-page stack is marked history,
+not active design.
+The spec now names at least three navigation shapes: the shipped four-page stack
+(Connections / Diagnose / Repair / Speed test), the V2 three-tab plan
+(Connections / Diagnostics / Performance), and the redesign task's Diagnostics
+sub-row (Diagnose / Get Me Online / Advanced). That leaves an implementer free
+to keep extra pages and buttons even though Craig is explicitly asking for the
+opposite. Make V2 the sole current target: one panel opened from the bar, top
+tabs =Connections | Diagnostics | Performance=, with Diagnostics owning the
+read-only checks, repair stream, debug capture, doctor report, and related
+diagnostic evidence. Mark the old four-page stack as shipped history only, not
+active design. This blocks the redesign because the page model determines what
+code is deleted, not just labels.
+
+** DONE Fold speed tests into the diagnostic story :blocking:
+Disposition: modify — Craig pre-decided Speed test lives under Performance (decision
+19), and Performance carries future throughput history, which meets this finding's own
+"keep the tab only if it carries ongoing throughput" condition. Accepted the rest:
+Diagnostics runs a lightweight inline latency/throughput probe as a Diagnose evidence
+row (with skip conditions for offline / metered / no-backend), and the full speed
+result is stored in the doctor report. Folded into "V2 panel UX → Diagnostics owns the
+diagnostic story".
+Speed test is currently isolated under =Performance=, while the Goal and user
+mental model treat speed, latency, and packet loss as part of "diagnostics."
+That split risks another top-level button/page whose only job is a diagnostic
+measurement. Keep the top-level =Performance= tab only if it carries ongoing
+throughput/history later; for V2, specify that Diagnostics can run a lightweight
+performance check from the same Diagnose/Get Me Online flow when internet is
+available, and that the full speed test is presented as a diagnostic evidence
+row or secondary action rather than a separate repair-adjacent workflow. Define
+when it is skipped (offline, metered/hotspot warning, missing backend) and how
+the result is stored in the doctor report. This is blocking because otherwise
+the implementation preserves avoidable navigation and misses a useful failure
+signal.
+
+** DONE Define saved-list vs available-scan semantics :blocking:
+Disposition: accept — folded into "V2 panel UX → Connections". Saved / Available now /
+Wired groups; Rescan refreshes only the availability/signal layer, never the Saved
+list.
+=net list= merges saved profiles with in-range scanned networks, while the panel
+copy calls the page "Connections" and the control "Rescan." It is not clear to a
+user whether they are looking at saved connections, currently available
+networks, or both. The current implementation confirms the ambiguity:
+=connections.py= lists saved profiles MRU-first, merges live signal/security for
+saved profiles that are in range, then appends unsaved in-range SSIDs with
+=uuid: nil=. Rename and specify the groups: e.g. =Saved= (instant, does not
+require scan), =Available now= (scan-backed, may still be loading/stale), and
+=Wired=. =Scan= should refresh only the availability/signal layer, not gate the
+saved profile list. This blocks readiness because it affects loading behavior,
+button enablement, and whether unsaved rows can be selected.
+
+** DONE Replace the Add page with join-from-row behavior :blocking:
+Disposition: accept — folded into "V2 panel UX → Connections". Selecting an unsaved
+Available-now row is the join flow (SSID/security prefilled); the standalone Add modal
+is deleted for visible networks; hidden/manual join lives behind Advanced.
+The current Add dialog asks for an SSID as free text even though a scan usually
+already found the SSID and security type. That is redundant UI and a common
+network-manager mistake: it turns "join this visible network" into "copy a name
+from the list and type it again." V2 should remove the standalone Add button and
+modal for normal visible networks. Selecting an unsaved available row should
+become the join flow: the SSID/security are prefilled from the row, open
+networks connect with a confirmation only if needed, WPA/WPA2/WPA3-Personal ask
+only for the password, and hidden/manual SSID is tucked behind an Advanced
+"Join hidden network" affordance. Keep edit/create for enterprise profiles out
+of v1/V2 unless explicitly added later. This blocks the redesign because it
+changes the primary connection workflow and deletes a whole page/control.
+
+** DONE Pin the supported authentication types in the join flow :blocking:
+Disposition: accept — folded into "V2 panel UX → Supported authentication classes".
+The spec says "open + WPA-PSK" and "enterprise activate-only," but cafe/hotel
+networks also commonly appear as open captive portals, WPA/WPA2/WPA3-Personal
+(PSK/SAE), and sometimes transition-mode networks; less commonly they use
+enterprise/802.1X, WEP, OWE/enhanced-open, MAC registration, voucher portals, or
+proxy-required networks. Define the V2 join matrix from the scanned NM
+=SECURITY= value: supported inline (open, captive/open, WPA/WPA2/WPA3 Personal),
+activate-only if already saved (802.1X/enterprise), hidden-manual behind
+Advanced, and unsupported/rare types with a clear in-panel explanation plus a
+non-terminal next step. If an auth type is common enough to support, support it
+in the panel; if it is too rare for V2, say "not supported here yet" and keep
+the user in the same UI rather than sending them to a terminal tool. Also define
+what security label appears in the row so the user knows why a password is or is
+not requested. This blocks because the Add/Join deletion above cannot be
+implemented safely without knowing which auth classes the simplified flow covers.
+
+** DONE Fix destructive confirmation tense and verification
+Disposition: accept — folded into "V2 panel UX → Forget confirmation".
+The Forget confirmation says "The saved password is deleted" before the user has
+clicked Forget. That reads as if the destructive action already happened. Change
+the copy to future tense and name the scope, e.g. "This will remove the saved
+NetworkManager profile and its stored password from this machine." After the
+operation, verify the UUID is gone, refresh the Saved list, and report either
+"Forgot <SSID>" or "Could not forget <SSID>; nothing changed / partial state
+<evidence>." This is non-blocking because the existing confirm prevents an
+accidental click, but the wording is misleading and the V2 "verify every action"
+decision should cover it.
+
+** DONE Make connection loading progressive and observable :blocking:
+Disposition: accept — folded into "V2 panel UX → Connections (progressive loading)".
+Opening the panel currently says "Loading connections..." while =net list=
+collects both saved profiles and the WiFi scan. Saved profiles do not require a
+network scan, so a slow scan should not delay the saved list. Split loading into
+two phases: render saved NM profiles immediately, then overlay availability,
+signal, and unsaved in-range rows when the scan completes. Show a small
+scan-in-progress state with elapsed time and stale-last-scan age, and make
+Rescan update only the scan-backed fields. This blocks because it is the direct
+answer to "why does it take so long to see the list of connections?" and keeps a
+bad radio scan from making the whole panel feel broken.
+
+** DONE Define the visual contract with Waybar and existing Archsetup UI
+Disposition: accept — folded into "V2 panel UX → Visual contract".
+The panel is a layer-shell popup anchored under Waybar, but the spec does not
+state the visual contract. The live Waybar theme uses a dark rounded capsule
+(=border-radius: 1rem=), golden border, compact monospace text, and state colors
+for =custom/net=; the GTK panel currently has a generic title, stack switcher,
+default GTK controls, and square-ish/default widget corners. Add a short style
+section: panel should read as a Waybar-attached popup, not a separate app; match
+Waybar's palette, border/radius, spacing density, and state colors; avoid square
+corners where surrounding UI is rounded; keep cards out of cards; use compact
+icon+label controls with tooltips for advanced repairs. Also cite any existing
+Archsetup-owned GTK/panel conventions that should be reused. This is
+non-blocking for engine work but should block final V2 UX acceptance.
+
+** DONE Add a diagnostics report affordance that users can actually find
+Disposition: accept — folded into "V2 panel UX → Findable diagnostics report".
+The observability design has a JSONL event log, =diagnose --json=, automatic
+verbose capture, and redacted bundles, but the panel flow does not yet define
+the user affordance that turns those into an inspectable diagnosis. Add a
+Diagnostics-side "Copy report" / "Open report" action after every diagnose,
+repair, and speed/performance run. The report should include the current step
+statuses, elapsed time, final classification, last speed/latency result when
+available, scan age, route/interface owner, relevant redacted event-log tail,
+and bundle path when verbose capture ran. It must explicitly say whether any
+repair mutated state and whether cleanup/verification passed. This is blocking
+for observability because "logs exist somewhere" is not enough when the network
+is already failing.
+
+* Review and iteration history
+
+** 2026-06-29 Mon @ 17:00:39 -0400 — Codex — reviewer
+
+- *What changed or was recommended:* Rubric: =Not ready=. Applied the
+ spec-review workflow and added blocking findings for diagnostics structure,
+ user-facing errors, observability, nmcli contracts, cache concurrency,
+ hot-path performance, Waybar click actions, mutation rollback, credential
+ security, unsupported WiFi types, panel states, speed-test dependency
+ verification, install boundaries, and test coverage.
+- *Why:* The spec has the right broad shape, but this feature combines a status
+ indicator, NetworkManager mutation, sudo-backed remediation, captive-portal
+ probing, secret storage, and GTK UI. Too many failure modes were still left for
+ implementers to invent during coding.
+- *Artifacts:* Findings recorded in =Review findings=. Local code read included
+ =~/.dotfiles/hyprland/.local/bin/waybar-netspeed=,
+ =~/.dotfiles/common/.local/bin/captive=,
+ =~/.dotfiles/tests/waybar-netspeed/test_waybar_netspeed.py=,
+ =~/.dotfiles/tests/captive/test_captive.py=, pocketbook's GTK layer-shell
+ scaffold, current Waybar config, and the linked =todo.org= tasks. External
+ checks included the Arch nmcli man page and upstream LibreSpeed CLI page.
+
+** 2026-06-29 Mon @ 17:35:00 -0400 — Claude Code (archsetup) — responder
+
+- *What changed:* Processed all 18 findings — 13 accepted, 5 modified, 0
+ rejected. Accepts added new sections: Repository + dependencies, Performance
+ budgets, nmcli contract, JSON schemas, Diagnostics contract (+ Diagnose vs
+ Repair), Failure states/messages/recovery, Observability, cache
+ concurrency/atomicity/staleness, mutation safety + rollback, the rewritten
+ credential security model, the expanded test plan, JSON schemas, the renamed
+ Implementation phases with per-phase acceptance criteria, and a Status line.
+ Modifies: deferred the =net debug-bundle= command (log + =diagnose --json=
+ cover it), made click actions phase-aware + non-blocking, scoped enterprise
+ WiFi to activate-only (add/edit vNext), trimmed the panel state set to states
+ that occur on the two-machine fleet, and switched the speed-test backend to the
+ already-installed =speedtest-go= (verified live) instead of a new
+ librespeed-cli dependency.
+- *Why:* Codex's review was strong and mostly right; the modifies are scope calls
+ for a single-user, two-machine tool (avoid gold-plating, use installed deps,
+ don't model impossible states) and one phasing reconciliation (the panel is
+ Phase 2, so Phase-1 clicks can't open it).
+- *Artifacts:* All 18 findings completed =DONE= in place with dispositions. Live
+ check: =pacman -Qq | grep speed= (speedtest-go-bin installed; =speedtest-go
+ --json= confirmed). Status now "review incorporated (Codex, 2026-06-29)",
+ implementation-ready with the named Phase-2/3 caveats; Phase 1 ready to build.
+
+** 2026-06-29 Mon @ 17:37:58 -0400 — Claude Code (archsetup) — responder (cj comments)
+
+- *What changed:* Folded in Craig's 10 cj comments on the spec. Design changes:
+ (1) dropped the separate credential store entirely — secrets stay in NM's own
+ =system-connections= (root =0600=, inline), touched via nmcli; no GPG, no
+ gpg-agent (rewrote the secrets section, decision 5, dropped Phase 4 + the gpg
+ dep). (2) Added =net doctor [--fix]= + Makefile console-recovery targets
+ (=make online= etc.) as a first-class TTY path; reversed the earlier
+ defer-the-doctor call (decision 10). (3) Added a full-stack =bounce= repair and
+ an =rfkill= unblock repair + indicator state — the rfkill one recovers the
+ framework-laptop post-power-loss soft-block Craig hit. (4) =custom/net= absorbs
+ the airplane module; the standalone airplane scripts/tests/module are deleted on
+ ship. (5) Moved VPN/WireGuard from "out" to a planned Phase 5. (6) Added a
+ "Help + documentation" section (CLI help / panel help / user guide). Answered
+ the enterprise-defer rationale and the captive-auto-login explanation inline.
+- *Why:* Craig's comments simplify (NM-only secrets, fewer deps) and harden the
+ recovery story (doctor + make targets + rfkill/bounce reachable from a dead-GUI
+ console — his stated need). Enterprise add/edit in v1 is the one open call,
+ raised as a VERIFY.
+- *Artifacts:* All 10 cj blocks removed. Live checks: airplane scripts/tests
+ present (confirmed deletion targets); =/etc/NetworkManager/system-connections/=
+ is root =0600= with inline secrets (confirms the NM-only secret model);
+ =rfkill= present. VERIFY filed in =todo.org= for the enterprise-scope decision.
+
+** 2026-06-29 Mon @ 17:49:24 -0400 — Codex — reviewer
+
+- *What changed or was recommended:* Added a second feedback batch from the
+ common network failure-mode review: require a diagnose/doctor coverage table,
+ pin DNS repair semantics, make auth failures terminal user-action states,
+ define upstream/AP/provider terminal states, decide how active VPN/policy
+ routing affects v1 diagnosis, and remove stale GPG-store references.
+- *Why:* The design now has a solid recovery shape, but the field question is
+ sharper: for each common failure, does =diagnose= catch it, does =doctor --fix=
+ fix it, or does the user need to switch networks/provide credentials/contact
+ the venue? That mapping needs to live in the spec, not in chat.
+- *Artifacts:* Findings appended under =Review findings= after reviewing the
+ current spec's Diagnostics, Failure states, Doctor, Testing plan, and
+ responder history.
+
+** 2026-06-29 Mon @ 18:06:22 -0400 — Codex — reviewer
+
+- *What changed or was recommended:* Added a readiness pass focused on
+ development-time questions, missed enhancements, UX cleanliness, error wording,
+ and test/coverage mechanics. New findings require reconciling stale status/goal
+ text, resolving or explicitly caveating enterprise add/edit, defining the
+ concrete harness and coverage gate, making coverage actionable, turning failure
+ classes into exact user-facing strings, adding an enhancement disposition
+ table, and tightening the Phase-2 panel UX flow.
+- *Why:* The network failure model is now strong, but implementation would still
+ force developers to invent test infrastructure, coverage expectations, final
+ error copy, and several UX details. The spec should decide those up front or
+ explicitly mark them as caveats.
+- *Artifacts:* Findings appended under =Review findings= after reading the
+ updated spec, current archsetup Makefile test targets, pocketbook's pyproject,
+ and the linked =todo.org= enterprise VERIFY.
+
+** 2026-06-29 Mon @ 17:55:08 -0400 — Claude Code (archsetup) — responder (2nd review)
+
+- *What changed:* Processed all 6 second-batch findings — 6 accepted, 0 modified,
+ 0 rejected. Added a "Doctor: escalation, classification, terminal states"
+ section with four outcomes (=fixable= escalating lightest-first; =needs-user-
+ action=, =upstream-not-local=, =deferred/vpn= terminal) and explicit per-class
+ DNS handling. Added a "Failure-mode coverage" table (22 rows: detect / doctor-fix
+ / terminal action, org-table-standard conformant, ≤120). Cleaned the last
+ GPG-store residue (Testing plan → NM-secret/no-leak test; dropped the no-GPG-key
+ panel state). Added a doctor-classification test class.
+- *Why:* The findings sharpen the doctor from "escalate repairs" to "classify,
+ then stop at the right terminal state" — the safety property that keeps
+ =doctor --fix= from looping destructive repairs against a wrong password or an
+ upstream outage. The VPN finding got the detect-and-classify option (not repair;
+ that's Phase 5) so v1 doesn't misread a VPN-routed failure as WiFi down. The
+ coverage table is the single artifact Craig asked for to trust the field
+ coverage.
+- *Artifacts:* All 6 findings =DONE= in place; cookie =[24/24]=. Table conformed
+ via =wrap-org-table.el= (max line 120). No new findings introduced; no scope
+ expansion past what the findings asked. Spec remains implementation-ready, the
+ enterprise add/edit VERIFY still the one open decision.
+
+** 2026-06-29 Mon @ 18:24:51 -0400 — Claude Code (archsetup) — responder (3rd review)
+
+- *What changed:* Processed the third review (7 findings). The enterprise-scope
+ finding closed first on Craig's call (vNext, activate-only). The other 6, all
+ accepted: (1) reconciled the top-of-file text — fixed the Goal's GPG-store
+ wording, the =[#B]= task-mapping line, the resolved enterprise VERIFY, and
+ corrected the stale =pytest= mentions to =unittest= (the repo's real harness).
+ (2) Added a "Harness + coverage gate" section (unittest, fake-binary stubs on a
+ temp PATH, venv coverage, ≥90% branch on pure modules). (3) Added a per-phase
+ "coverage as a gap-finder" pass. (4) Rewrote the Failure states section to exact
+ final strings + evidence fields + a per-surface rendering rule, and added the
+ missing doctor-unfixable rows. (5) Added the "Enhancement radar" table
+ (v1/vNext/rejected). (6) Added the "Panel UX flow" subsection.
+- *Why:* The findings close the gap between "design decided" and "a developer can
+ start": the harness/coverage contract, the exact UX strings, and the panel flow
+ are the things otherwise invented mid-code. The =pytest=→=unittest= correction
+ was a real defect — the spec contradicted the repo's actual test convention.
+- *Artifacts:* All 31 findings =DONE=; cookie =[31/31]=. Both new tables conformed
+ via =wrap-org-table.el= (coverage 120, radar 110). Harness verified against the
+ live repo (33 unittest suites, =make test=, coverage.py absent → venv). Status
+ raised to "Ready for Phase 1; Ready-with-caveats overall" — no open decisions
+ remain.
+
+** 2026-06-30 Tue @ 16:32:07 -0400 — Claude Code (archsetup) — responder (build + V2 redesign)
+
+- *What changed:* Brought the spec current with what shipped and what got decided
+ across 2026-06-29/30. Recorded Phases 1-3 as SHIPPED (engine, indicator, GTK4
+ panel, bar clicks). Added the native captive-login engine (=portal-login= repair
+ tier replacing the =captive= shell-out), the live-testing portal UX fixes (removed
+ the polkit-gated flush, already-online short-circuit, Chrome first-run suppression,
+ in-panel portal, extractor hardening), and the panel auto-hide + Close button.
+ Then folded in the V2 redesign Craig directed: no terminals anywhere, a passwordless
+ root-helper + NOPASSWD sudoers as the enabler, verify-every-action, the full
+ ~44-mode failure catalog (now the authoritative coverage, in todo.org), and the
+ Connections | Diagnostics | Performance nav. Added decisions 15-19, V2 phases
+ 2.1-2.4, and corrected the two failure-table rows the shipped code never matched
+ (gateway-bounce, venue-DNS auto-revert).
+- *Why:* The spec had drifted behind the code (still said "Phase 1 shipped, terminal
+ for privileged ops") and behind Craig's redesign calls. The doc is the contract for
+ the V2 build, so it has to state the no-terminal reversal, the sudo-helper
+ dependency, and where coverage now lives.
+- *Artifacts:* Updated Status, Architecture, Engine CLI surface, Doctor DNS handling,
+ Failure-mode coverage, Interactions, Panel, Resolved decisions, Implementation
+ phases. Shipped commits referenced: dotfiles =a7d7559=, =eef6b0b=, =450b7f0=. Full
+ catalog + redesign children in todo.org "Network panel redesign".
+
+** 2026-06-30 Tue @ 17:29:29 -0400 — Codex — reviewer
+
+- *What changed or was recommended:* Rubric: =Not ready= for the V2 redesign until
+ the new blocking findings are dispositioned. Added a fourth review batch focused on
+ diagnostics/observability and the simple UX questions Craig raised: reconcile the
+ active panel nav, fold performance/speed testing into the diagnostics story, make
+ saved-vs-available list semantics explicit, replace the Add page with join-from-row,
+ pin supported WiFi authentication classes, fix Forget confirmation tense, make
+ connection loading progressive, define the Waybar/Archsetup visual contract, and add
+ a findable redacted diagnostics report affordance.
+- *Why:* The spec has strong engine and failure-mode coverage, but the current doc and
+ shipped implementation still replicate common network-manager UX mistakes: ambiguous
+ mixed lists, scan-gated saved data, redundant SSID entry, misleading destructive
+ copy, diagnostics split across too many pages, and observability that exists in logs
+ more than in the user's workflow.
+- *Artifacts:* Findings appended under =Review findings= after reading the
+ spec-review workflow, the target spec, todo.org's redesign task, and current
+ dotfiles implementation files =net/src/net/connections.py=,
+ =net/src/net/gui.py=, =net/src/net/panel.py=, and Waybar =style.css=.
+
+** 2026-06-30 Tue @ 17:36:40 -0400 — Claude Code (archsetup) — responder (4th review)
+
+- *What changed:* Dispositioned all nine fourth-review findings (8 accept, 1 modify,
+ 0 reject) and wove them into a new "V2 panel UX — the target design" section.
+ Accepts: V2's three-tab nav is the sole target (the four-page stack is history);
+ Connections splits into Saved / Available now / Wired with Rescan touching only the
+ scan layer; selecting an unsaved row is the join flow (the Add modal is deleted);
+ a join matrix pins which NM SECURITY classes are inline-supported / activate-only /
+ hidden / unsupported; loading renders Saved first and overlays the scan; the Forget
+ copy goes future-tense + verified; every run ends with a Copy/Open redacted report;
+ and a Waybar visual contract (rounded capsule, golden border, state colors). Modify:
+ the speed-test finding kept Craig's decision-19 placement (full speed test under
+ Performance, which carries future history) while accepting a lightweight inline
+ latency probe as Diagnose evidence stored in the doctor report. Cookie [40/40].
+- *Why:* Codex read the live implementation and caught the UX places where the module
+ still replicated common network-manager mistakes — mixed lists, scan-gated saved
+ data, redundant SSID entry, misleading destructive copy, diagnostics scattered
+ across pages, observability that lived in logs more than the workflow.
+- *Artifacts:* Findings 32-40 completed in place with dispositions; the modify reason
+ on the speed-test finding. New "V2 panel UX" section under Panel. todo.org redesign
+ task updated to point the V2 build at the dispositioned design.
+
+** 2026-07-01 Wed @ 10:43:18 -0400 — Claude Code (archsetup) — responder (cj comments)
+
+- *What changed:* Folded in Craig's three cj comments from his review. (1) Notification
+ rendering: title = "Networking", body = the failure label on its own line then the
+ canonical string. (2) VPN kill-switch: added a "kill-switch blocking" failure state
+ plus a detection-and-correction strategy off the =deferred-vpn= branch (rootless
+ cascade over =ip rule= fwmark 0xca6c / =wg show= / Proton =pvpn-*= NM connections /
+ =nft=/=iptables= drop tables / firewalld =drop= zone; classify blocking only when a
+ block artifact exists AND no tunnel is up; correction surfaces the exact root command
+ per artifact). (3) Terminals: strengthened decision 15 to "no terminal ever reports
+ to or collects input from the user", disambiguated from the doctor's "terminal
+ states" wording.
+- *Why:* Craig's review annotations. The kill-switch closes a real gap in the
+ VPN-routed classification; the terminal directive makes the no-terminal rule
+ absolute for the module UX.
+- *Artifacts:* Three cj blocks removed. VPN research subagent cited wg-quick man page,
+ Pro Custodibus, System76/Proton killswitch docs, and local =doctor.py:42= /
+ =classify.py:60= / =USNY.conf:15=. One open tension filed as a VERIFY in todo.org:
+ the dead-GUI console-recovery path (=make online= from a TTY) vs the no-terminal
+ directive.