diff options
| author | Craig Jennings <c@cjennings.net> | 2026-06-30 16:22:26 -0400 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-06-30 16:22:26 -0400 |
| commit | f9c9df6dc079f39b304938fb7fd795df7c319120 (patch) | |
| tree | add0fafcdb5afa89666e131343d7a3eca209ac48 /todo.org | |
| parent | f8f9865a0d97e65e0924eaf2f0a68ef544bed8ae (diff) | |
| download | archsetup-f9c9df6dc079f39b304938fb7fd795df7c319120.tar.gz archsetup-f9c9df6dc079f39b304938fb7fd795df7c319120.zip | |
docs(todo): file network panel redesign + full failure-mode catalog
Diffstat (limited to 'todo.org')
| -rw-r--r-- | todo.org | 114 |
1 files changed, 111 insertions, 3 deletions
@@ -34,9 +34,6 @@ Force portal (panel Repair tab) = =net-popup net portal= = the same portal-login - Chrome first-run wizard fired on every launch — =_open_portal= made a fresh tempfile profile but passed no first-run flags. Added =--no-first-run --no-default-browser-check= + a unit test. - Flashing sudo prompt for the DoT drop + pointless resolved restart on velox, where the DoT drop-in the code looks for (=/etc/systemd/resolved.conf.d/dns-over-tls.conf=) doesn't exist. Guarded =_disable_dot=/=_restore_dot= to be true no-ops (no sudo, no restart) when there's no DoT drop-in to move; tests assert no systemctl call fires. -*** TODO [#C] Passwordless DoT toggle for DoT-on machines + reconcile velox DoT path :network:bug: -On a machine where DoT *is* managed via the drop-in, the portal flow still needs sudo to move it + restart resolved. The detached restore-watcher (=portal_restore_watch=) runs with no tty, so it can't prompt — if sudo creds aren't cached when it fires, =_restore_dot= silently fails and leaves DNS unencrypted (DoT off) until a manual =net portal --restore=. Fix: a small root-owned helper (=/usr/local/bin/net-dot-toggle on|off=, archsetup-installed, NOT stowed/user-writable) + a narrow sudoers NOPASSWD drop-in scoped to that helper (blanket NOPASSWD on mv/systemctl is too broad). Then =repair.py= calls =sudo net-dot-toggle …=. Separately: reconcile where velox's DoT actually lives — it currently shows -DNSOverTLS with no drop-in, so the "drop DoT" premise is a no-op here; decide whether velox should run DoT at all (=NET_DOT_CONF= overrides the path). velox itself is now unaffected (the no-op guard), so this is the durable cross-machine fix, not a velox blocker. - ** TODO [#B] Consistent red=off across waybar toggle modules :waybar: Extend the red=off convention (just added to the touchpad/mouse indicator) to the other toggles — sound volume, microphone mute, and caffeine — so a disabled / muted / off state reads red across the board. Skip the "cross"/slash; the color alone carries it. Origin: roam inbox capture. @@ -84,6 +81,117 @@ Add =@emacs-eask/cli= to archsetup's provisioning so fresh machines get it. Eask - Decision: also set a persistent user npm prefix (=~/.npmrc= with =prefix=${HOME}/.local=)? If yes, that =~/.npmrc= is a legitimate dotfile to stow; if no, rely on the explicit =--prefix= flag alone. =~/.eask/= is a regenerable cache — leave un-stowed. - Acceptance: fresh run leaves =eask= on PATH at =~/.local/bin/eask= (no root); =cd ~/code/chime && make setup && make test= works. +** TODO [#B] Network panel redesign — no terminals, verify-everything, full failure coverage :feature:waybar:network: +Major evolution of the shipped =custom/net= module ([[file:docs/design/2026-06-29-waybar-network-module-spec.org]]). +Reverses the spec's "privileged tiers run in a net-popup terminal" decision. Origin: +design conversation 2026-06-30. + +*** Locked decisions +- *No terminals anywhere in the module.* Delete =net-popup= entirely. Every action and + every result renders in the panel. +- *Passwordless privileged path (the enabler).* A single root-owned helper runs net's + specific privileged commands (rfkill unblock, nmcli modify/up, networking off/on, + systemctl restart NetworkManager/systemd-resolved, resolvectl dns/revert, DoT toggle), + installed by archsetup with a narrow NOPASSWD sudoers rule scoped to that helper only + (never blanket mv/systemctl). =repair.py= calls =sudo <helper> <verb>=. This supersedes + and absorbs the earlier [#C] "Passwordless DoT toggle" follow-up. Without it an in-panel + worker thread can't prompt for a password, so this gates the whole no-terminal goal. +- *Verify every action.* Every mutating op confirms its effect before reporting success + (doctor already re-probes; generalize so each repair, connect, forget, add, and DNS + override re-checks and surfaces pass/fail in the panel). +- *Detect + respond to every failure mode below* (auto-fix where we can, else report the + helpful text), including the edge cases. + +*** Navigation (confirmed) +- Top tabs: =Connections= | =Diagnostics= | =Performance=. +- Connections: saved + in-range list, connect / add / forget. +- Diagnostics: sub-row =Diagnose= | =Get Me Online= | =Advanced=; shared area below shows + diagnose items AND streams repair progress (replacing the terminal). =Advanced= reveals + the individual repair buttons, renamed with tooltips describing each. +- Performance: Speedtest (+ live throughput later). + +*** Failure-mode catalog — detect / correct-or-report (the completeness backbone) +Organized by the connectivity stack, bottom-up. "Fix" = auto-correct + verify; "Report" = +the in-panel text when there's no safe auto-fix. Audit this list for completeness; it is the +contract for what diagnose must detect and what the panel must say. + +**** Radio / hardware +- rfkill soft block — Detect: rfkill soft. Fix: unblock + =nmcli radio wifi on=, verify radio unblocked. +- rfkill hard block — Detect: rfkill hard. Report: "WiFi is off at the hardware switch — flip the physical switch or Fn key." +- No WiFi adapter present — Detect: no wifi device in nmcli + rfkill absent. Report: "No WiFi adapter detected — use ethernet, or check the driver (dmesg | grep firmware)." +- Driver/firmware not loaded — Detect: device present but errored / no operational state. Report: "WiFi driver or firmware didn't load — check dmesg for the adapter." +- USB WiFi adapter unplugged — Detect: device disappeared since last scan. Report: "WiFi adapter was removed — reconnect it." +- Airplane mode on — Detect: airplane state file set. Fix: offer toggle off (Super+Shift+A), verify radios back. + +**** Association (L2 link) +- Not connected / disconnected — Detect: link down, device disconnected. Fix: reset (reconnect saved), verify link up. +- Stuck "connecting" — Detect: device state connecting > budget. Fix: reset, verify; if it persists Report: "Stuck connecting to <ssid> — the AP may be rejecting us." +- Weak signal / high loss — Detect: associated but signal below threshold (dBm) or heavy packet loss. Report: "Signal is weak (<dBm>) — move closer to the access point." +- Saved network not in range — Detect: profile active target not in scan. Report: "<ssid> isn't in range here." +- AP roaming flap — Detect: BSSID bouncing. Report: "Connection is unstable — switching between access points." + +**** Authentication +- Wrong WPA password / missing secret — Detect: NM state 120 (snapshot; live detection is a known limit). Report + in-panel re-enter: "Saved password for <ssid> was rejected — re-enter it." +- Enterprise / 802.1X cert or identity failure — Detect: 802.1X profile + activation failure. Report: "Enterprise auth failed — check the certificate or identity (edit the profile)." +- Randomized MAC rejected by AP — Detect: reset-with-random-MAC fails where a prior connect worked. Fix: retry reset with the permanent MAC, verify; else Report. +- WPA3/SAE incompatibility — Detect: SAE key-mgmt + association failure. Report: "This network needs WPA3 and the adapter or profile may not support it." + +**** IP / DHCP +- No IPv4 lease (DHCP timeout) — Detect: connected, no IP4.ADDRESS. Fix: reset → bounce, verify lease. +- APIPA / link-local only (169.254.x) — Detect: only a link-local IPv4. Fix: reset/bounce, verify real lease; else Report: "DHCP server didn't answer — switch network." +- IPv6-only network (no IPv4 by design) — Detect: no IPv4 but IPv6 address + online via v6. Report (not a failure): "Online over IPv6 (no IPv4 here)." Requires making diagnose IPv6-aware. +- IP but no gateway — Detect: IP4.ADDRESS present, IP4.GATEWAY empty. Fix: bounce, verify gateway; else Report. +- Duplicate IP / ARP conflict — Detect: kernel ARP-conflict signal. Report: "Another device is using our IP address — reconnect to get a new lease." (edge) + +**** Gateway (L3 local) +- Gateway unreachable — Detect: no route out, gateway no ICMP. Fix: try one bounce (renew route), verify online; else Report: "No route to the gateway — switch network." (closes the spec/code gap where bounce was never tried) + +**** DNS +- No resolver configured — Detect: IP4.DNS empty. Fix: bounce to re-pull DHCP DNS, verify; else Report. +- Venue DNS broken, public DNS works — Detect: name fails to resolve but 1.1.1.1 resolves (dns-test). Fix: set a PERSISTENT resolver override (1.1.1.1 / 9.9.9.9), verify resolution + online, offer revert. (closes gap #1 — today dns-test reverts and misreports as upstream.) +- DNS hijack (resolves to gateway / private IP) — Detect: classify_resolution hijack. Treat as captive → portal-login flow. +- DNSSEC validation failure — Detect: resolution fails with SERVFAIL where public resolver succeeds without DNSSEC. Report: "DNS security checks are failing on this network." (edge) +- Encrypted DNS (DoT/DoH) hiding the portal — Detect: captive suspected + DoT on. Fix: portal-login drops DoT, opens portal, auto-restores. (existing) + +**** Egress / internet +- Upstream / AP outage (no uplink) — Detect: link/IP/DNS fine, http-probe fail, not a redirect. Report: "This network has no internet — switch network or contact the venue." +- Captive portal (redirect) — Detect: probe redirected. Fix: portal-login opens the page; verify online after login. +- Captive blocked pre-auth (no portal URL) — Detect: probe blocked, no URL. Fix: fresh MAC + open trigger; verify. +- Proxy-required network — Detect: probe fails but a PAC/proxy is advertised (WPAD/env). Report: "This network requires a proxy — configure it in settings." (edge) +- MTU / MSS blackhole (PMTUD broken) — Detect: small probe ok, large transfer hangs. Fix: lower the interface MTU, verify; else Report. (edge) +- Clock skew breaking TLS — Detect: HTTPS/portal fails with cert-time errors + system clock far off. Fix: trigger a time sync, verify; else Report: "System clock is wrong — fix the date/time." (edge) + +**** Routing / multi-homing +- VPN owns the route, no internet through it — Detect: VPN device connected + http-probe fail. Report: "Internet is routed through a VPN (<dev>) — check the VPN, not WiFi." +- VPN up but dead — Detect: VPN device up, no traffic/handshake. Report: "The VPN is connected but not passing traffic." (Phase 5 territory) +- WiFi + tether/ethernet both active — Detect: which iface owns the default route + whether the system is online by any path. Report: "You're online through <other iface>; WiFi itself has no internet," or let the user pick. (closes gap #4) + +**** Infrastructure / system +- Wedged NetworkManager — Detect: nmcli fails / API unresponsive. Fix: restart NetworkManager (bounce escalation), verify. +- NetworkManager not running — Detect: service inactive. Fix: start it, verify; else Report. +- systemd-resolved down — Detect: resolved inactive / DNS via it fails. Fix: restart, verify. +- resolv.conf not resolved-managed — Detect: /etc/resolv.conf not the resolved stub. Report: "DNS isn't managed by systemd-resolved — manual resolv.conf in play." (edge) + +**** Tooling / environment +- nmcli / NM API unavailable — Detect: nmcli error or timeout. Report: "Can't reach NetworkManager — is it installed and running?" +- Slow / hung tool — Detect: step exceeds budget. Fix: degrade that step, retry within budget. +- Stale / corrupt cache — Detect: schema/age mismatch. Fix: self-heal (atomic write + invalidation). +- Missing speedtest backend — Detect: speedtest-go absent. Report: "Install speedtest-go to run a speed test." +- Privileged op fails (helper missing / sudo declined) — Detect: helper exits non-zero or absent. Report: "Couldn't get admin rights for this repair — <install/fix the helper>." + +*** TODO Sudo helper + NOPASSWD sudoers (gates everything; archsetup-installed) +Root-owned =/usr/local/bin/net-priv= (or similar), NOT stowed/user-writable, dispatching net's +fixed privileged verbs; narrow NOPASSWD sudoers scoped to that helper only. =repair.py= calls +=sudo net-priv <verb>=. Must also fix the latent bug it unblocks: the detached +=portal_restore_watch= runs with no tty and can't prompt, so today =_restore_dot= silently fails +when sudo creds aren't cached, leaving DNS unencrypted until a manual =net portal --restore=. +Separately reconcile where velox's DoT actually lives (currently -DNSOverTLS, no drop-in, so the +"drop DoT" step is a no-op there; =NET_DOT_CONF= overrides the path) — decide whether velox should +run DoT at all. +*** TODO Merged Diagnostics panel + nav restructure (Connections | Diagnostics | Performance) +*** TODO Make diagnose IPv6-aware and multi-homing-aware +*** TODO Close every detect/correct gap in the catalog, with post-action verification + ** TODO [#B] Waybar network module — custom/net :feature:waybar:network: :PROPERTIES: :LAST_REVIEWED: 2026-06-29 |
