aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorCraig Jennings <c@cjennings.net>2026-06-30 16:22:26 -0400
committerCraig Jennings <c@cjennings.net>2026-06-30 16:22:26 -0400
commitf9c9df6dc079f39b304938fb7fd795df7c319120 (patch)
treeadd0fafcdb5afa89666e131343d7a3eca209ac48
parentf8f9865a0d97e65e0924eaf2f0a68ef544bed8ae (diff)
downloadarchsetup-f9c9df6dc079f39b304938fb7fd795df7c319120.tar.gz
archsetup-f9c9df6dc079f39b304938fb7fd795df7c319120.zip
docs(todo): file network panel redesign + full failure-mode catalog
-rw-r--r--todo.org114
1 files changed, 111 insertions, 3 deletions
diff --git a/todo.org b/todo.org
index 7597795..3882a98 100644
--- a/todo.org
+++ b/todo.org
@@ -34,9 +34,6 @@ Force portal (panel Repair tab) = =net-popup net portal= = the same portal-login
- Chrome first-run wizard fired on every launch — =_open_portal= made a fresh tempfile profile but passed no first-run flags. Added =--no-first-run --no-default-browser-check= + a unit test.
- Flashing sudo prompt for the DoT drop + pointless resolved restart on velox, where the DoT drop-in the code looks for (=/etc/systemd/resolved.conf.d/dns-over-tls.conf=) doesn't exist. Guarded =_disable_dot=/=_restore_dot= to be true no-ops (no sudo, no restart) when there's no DoT drop-in to move; tests assert no systemctl call fires.
-*** TODO [#C] Passwordless DoT toggle for DoT-on machines + reconcile velox DoT path :network:bug:
-On a machine where DoT *is* managed via the drop-in, the portal flow still needs sudo to move it + restart resolved. The detached restore-watcher (=portal_restore_watch=) runs with no tty, so it can't prompt — if sudo creds aren't cached when it fires, =_restore_dot= silently fails and leaves DNS unencrypted (DoT off) until a manual =net portal --restore=. Fix: a small root-owned helper (=/usr/local/bin/net-dot-toggle on|off=, archsetup-installed, NOT stowed/user-writable) + a narrow sudoers NOPASSWD drop-in scoped to that helper (blanket NOPASSWD on mv/systemctl is too broad). Then =repair.py= calls =sudo net-dot-toggle …=. Separately: reconcile where velox's DoT actually lives — it currently shows -DNSOverTLS with no drop-in, so the "drop DoT" premise is a no-op here; decide whether velox should run DoT at all (=NET_DOT_CONF= overrides the path). velox itself is now unaffected (the no-op guard), so this is the durable cross-machine fix, not a velox blocker.
-
** TODO [#B] Consistent red=off across waybar toggle modules :waybar:
Extend the red=off convention (just added to the touchpad/mouse indicator) to the other toggles — sound volume, microphone mute, and caffeine — so a disabled / muted / off state reads red across the board. Skip the "cross"/slash; the color alone carries it. Origin: roam inbox capture.
@@ -84,6 +81,117 @@ Add =@emacs-eask/cli= to archsetup's provisioning so fresh machines get it. Eask
- Decision: also set a persistent user npm prefix (=~/.npmrc= with =prefix=${HOME}/.local=)? If yes, that =~/.npmrc= is a legitimate dotfile to stow; if no, rely on the explicit =--prefix= flag alone. =~/.eask/= is a regenerable cache — leave un-stowed.
- Acceptance: fresh run leaves =eask= on PATH at =~/.local/bin/eask= (no root); =cd ~/code/chime && make setup && make test= works.
+** TODO [#B] Network panel redesign — no terminals, verify-everything, full failure coverage :feature:waybar:network:
+Major evolution of the shipped =custom/net= module ([[file:docs/design/2026-06-29-waybar-network-module-spec.org]]).
+Reverses the spec's "privileged tiers run in a net-popup terminal" decision. Origin:
+design conversation 2026-06-30.
+
+*** Locked decisions
+- *No terminals anywhere in the module.* Delete =net-popup= entirely. Every action and
+ every result renders in the panel.
+- *Passwordless privileged path (the enabler).* A single root-owned helper runs net's
+ specific privileged commands (rfkill unblock, nmcli modify/up, networking off/on,
+ systemctl restart NetworkManager/systemd-resolved, resolvectl dns/revert, DoT toggle),
+ installed by archsetup with a narrow NOPASSWD sudoers rule scoped to that helper only
+ (never blanket mv/systemctl). =repair.py= calls =sudo <helper> <verb>=. This supersedes
+ and absorbs the earlier [#C] "Passwordless DoT toggle" follow-up. Without it an in-panel
+ worker thread can't prompt for a password, so this gates the whole no-terminal goal.
+- *Verify every action.* Every mutating op confirms its effect before reporting success
+ (doctor already re-probes; generalize so each repair, connect, forget, add, and DNS
+ override re-checks and surfaces pass/fail in the panel).
+- *Detect + respond to every failure mode below* (auto-fix where we can, else report the
+ helpful text), including the edge cases.
+
+*** Navigation (confirmed)
+- Top tabs: =Connections= | =Diagnostics= | =Performance=.
+- Connections: saved + in-range list, connect / add / forget.
+- Diagnostics: sub-row =Diagnose= | =Get Me Online= | =Advanced=; shared area below shows
+ diagnose items AND streams repair progress (replacing the terminal). =Advanced= reveals
+ the individual repair buttons, renamed with tooltips describing each.
+- Performance: Speedtest (+ live throughput later).
+
+*** Failure-mode catalog — detect / correct-or-report (the completeness backbone)
+Organized by the connectivity stack, bottom-up. "Fix" = auto-correct + verify; "Report" =
+the in-panel text when there's no safe auto-fix. Audit this list for completeness; it is the
+contract for what diagnose must detect and what the panel must say.
+
+**** Radio / hardware
+- rfkill soft block — Detect: rfkill soft. Fix: unblock + =nmcli radio wifi on=, verify radio unblocked.
+- rfkill hard block — Detect: rfkill hard. Report: "WiFi is off at the hardware switch — flip the physical switch or Fn key."
+- No WiFi adapter present — Detect: no wifi device in nmcli + rfkill absent. Report: "No WiFi adapter detected — use ethernet, or check the driver (dmesg | grep firmware)."
+- Driver/firmware not loaded — Detect: device present but errored / no operational state. Report: "WiFi driver or firmware didn't load — check dmesg for the adapter."
+- USB WiFi adapter unplugged — Detect: device disappeared since last scan. Report: "WiFi adapter was removed — reconnect it."
+- Airplane mode on — Detect: airplane state file set. Fix: offer toggle off (Super+Shift+A), verify radios back.
+
+**** Association (L2 link)
+- Not connected / disconnected — Detect: link down, device disconnected. Fix: reset (reconnect saved), verify link up.
+- Stuck "connecting" — Detect: device state connecting > budget. Fix: reset, verify; if it persists Report: "Stuck connecting to <ssid> — the AP may be rejecting us."
+- Weak signal / high loss — Detect: associated but signal below threshold (dBm) or heavy packet loss. Report: "Signal is weak (<dBm>) — move closer to the access point."
+- Saved network not in range — Detect: profile active target not in scan. Report: "<ssid> isn't in range here."
+- AP roaming flap — Detect: BSSID bouncing. Report: "Connection is unstable — switching between access points."
+
+**** Authentication
+- Wrong WPA password / missing secret — Detect: NM state 120 (snapshot; live detection is a known limit). Report + in-panel re-enter: "Saved password for <ssid> was rejected — re-enter it."
+- Enterprise / 802.1X cert or identity failure — Detect: 802.1X profile + activation failure. Report: "Enterprise auth failed — check the certificate or identity (edit the profile)."
+- Randomized MAC rejected by AP — Detect: reset-with-random-MAC fails where a prior connect worked. Fix: retry reset with the permanent MAC, verify; else Report.
+- WPA3/SAE incompatibility — Detect: SAE key-mgmt + association failure. Report: "This network needs WPA3 and the adapter or profile may not support it."
+
+**** IP / DHCP
+- No IPv4 lease (DHCP timeout) — Detect: connected, no IP4.ADDRESS. Fix: reset → bounce, verify lease.
+- APIPA / link-local only (169.254.x) — Detect: only a link-local IPv4. Fix: reset/bounce, verify real lease; else Report: "DHCP server didn't answer — switch network."
+- IPv6-only network (no IPv4 by design) — Detect: no IPv4 but IPv6 address + online via v6. Report (not a failure): "Online over IPv6 (no IPv4 here)." Requires making diagnose IPv6-aware.
+- IP but no gateway — Detect: IP4.ADDRESS present, IP4.GATEWAY empty. Fix: bounce, verify gateway; else Report.
+- Duplicate IP / ARP conflict — Detect: kernel ARP-conflict signal. Report: "Another device is using our IP address — reconnect to get a new lease." (edge)
+
+**** Gateway (L3 local)
+- Gateway unreachable — Detect: no route out, gateway no ICMP. Fix: try one bounce (renew route), verify online; else Report: "No route to the gateway — switch network." (closes the spec/code gap where bounce was never tried)
+
+**** DNS
+- No resolver configured — Detect: IP4.DNS empty. Fix: bounce to re-pull DHCP DNS, verify; else Report.
+- Venue DNS broken, public DNS works — Detect: name fails to resolve but 1.1.1.1 resolves (dns-test). Fix: set a PERSISTENT resolver override (1.1.1.1 / 9.9.9.9), verify resolution + online, offer revert. (closes gap #1 — today dns-test reverts and misreports as upstream.)
+- DNS hijack (resolves to gateway / private IP) — Detect: classify_resolution hijack. Treat as captive → portal-login flow.
+- DNSSEC validation failure — Detect: resolution fails with SERVFAIL where public resolver succeeds without DNSSEC. Report: "DNS security checks are failing on this network." (edge)
+- Encrypted DNS (DoT/DoH) hiding the portal — Detect: captive suspected + DoT on. Fix: portal-login drops DoT, opens portal, auto-restores. (existing)
+
+**** Egress / internet
+- Upstream / AP outage (no uplink) — Detect: link/IP/DNS fine, http-probe fail, not a redirect. Report: "This network has no internet — switch network or contact the venue."
+- Captive portal (redirect) — Detect: probe redirected. Fix: portal-login opens the page; verify online after login.
+- Captive blocked pre-auth (no portal URL) — Detect: probe blocked, no URL. Fix: fresh MAC + open trigger; verify.
+- Proxy-required network — Detect: probe fails but a PAC/proxy is advertised (WPAD/env). Report: "This network requires a proxy — configure it in settings." (edge)
+- MTU / MSS blackhole (PMTUD broken) — Detect: small probe ok, large transfer hangs. Fix: lower the interface MTU, verify; else Report. (edge)
+- Clock skew breaking TLS — Detect: HTTPS/portal fails with cert-time errors + system clock far off. Fix: trigger a time sync, verify; else Report: "System clock is wrong — fix the date/time." (edge)
+
+**** Routing / multi-homing
+- VPN owns the route, no internet through it — Detect: VPN device connected + http-probe fail. Report: "Internet is routed through a VPN (<dev>) — check the VPN, not WiFi."
+- VPN up but dead — Detect: VPN device up, no traffic/handshake. Report: "The VPN is connected but not passing traffic." (Phase 5 territory)
+- WiFi + tether/ethernet both active — Detect: which iface owns the default route + whether the system is online by any path. Report: "You're online through <other iface>; WiFi itself has no internet," or let the user pick. (closes gap #4)
+
+**** Infrastructure / system
+- Wedged NetworkManager — Detect: nmcli fails / API unresponsive. Fix: restart NetworkManager (bounce escalation), verify.
+- NetworkManager not running — Detect: service inactive. Fix: start it, verify; else Report.
+- systemd-resolved down — Detect: resolved inactive / DNS via it fails. Fix: restart, verify.
+- resolv.conf not resolved-managed — Detect: /etc/resolv.conf not the resolved stub. Report: "DNS isn't managed by systemd-resolved — manual resolv.conf in play." (edge)
+
+**** Tooling / environment
+- nmcli / NM API unavailable — Detect: nmcli error or timeout. Report: "Can't reach NetworkManager — is it installed and running?"
+- Slow / hung tool — Detect: step exceeds budget. Fix: degrade that step, retry within budget.
+- Stale / corrupt cache — Detect: schema/age mismatch. Fix: self-heal (atomic write + invalidation).
+- Missing speedtest backend — Detect: speedtest-go absent. Report: "Install speedtest-go to run a speed test."
+- Privileged op fails (helper missing / sudo declined) — Detect: helper exits non-zero or absent. Report: "Couldn't get admin rights for this repair — <install/fix the helper>."
+
+*** TODO Sudo helper + NOPASSWD sudoers (gates everything; archsetup-installed)
+Root-owned =/usr/local/bin/net-priv= (or similar), NOT stowed/user-writable, dispatching net's
+fixed privileged verbs; narrow NOPASSWD sudoers scoped to that helper only. =repair.py= calls
+=sudo net-priv <verb>=. Must also fix the latent bug it unblocks: the detached
+=portal_restore_watch= runs with no tty and can't prompt, so today =_restore_dot= silently fails
+when sudo creds aren't cached, leaving DNS unencrypted until a manual =net portal --restore=.
+Separately reconcile where velox's DoT actually lives (currently -DNSOverTLS, no drop-in, so the
+"drop DoT" step is a no-op there; =NET_DOT_CONF= overrides the path) — decide whether velox should
+run DoT at all.
+*** TODO Merged Diagnostics panel + nav restructure (Connections | Diagnostics | Performance)
+*** TODO Make diagnose IPv6-aware and multi-homing-aware
+*** TODO Close every detect/correct gap in the catalog, with post-action verification
+
** TODO [#B] Waybar network module — custom/net :feature:waybar:network:
:PROPERTIES:
:LAST_REVIEWED: 2026-06-29