diff options
| author | Craig Jennings <c@cjennings.net> | 2026-06-30 10:59:08 -0400 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-06-30 10:59:08 -0400 |
| commit | fdbaa52b4e308be6c809e98a785c3723273835f9 (patch) | |
| tree | 7f9b77891d1db5a4015f69ce24c8bb2e19d665d7 /docs | |
| parent | 6bd832897813c730deb12768d1eb5b02af66ad20 (diff) | |
| download | archsetup-fdbaa52b4e308be6c809e98a785c3723273835f9.tar.gz archsetup-fdbaa52b4e308be6c809e98a785c3723273835f9.zip | |
docs: capture captive-portal login learnings + close the ZFS task
File the captive-portal-login design doc from the 2026-06-30 Hyatt saga — the actual mechanism (system DoT + browser DoH both bypass the hotel's redirecting DNS; plain DNS is what works), the working hotel-wifi script, and the plan to make it a first-class net-panel action — plus a [#B] feature task to bake it in. Also close the ZFS pre-pacman snapshot task: the installer step shipped and the ZFS VM install passed 97/0 with the new hook assertion.
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/design/2026-06-30-captive-portal-login.org | 89 |
1 files changed, 89 insertions, 0 deletions
diff --git a/docs/design/2026-06-30-captive-portal-login.org b/docs/design/2026-06-30-captive-portal-login.org new file mode 100644 index 0000000..1739689 --- /dev/null +++ b/docs/design/2026-06-30-captive-portal-login.org @@ -0,0 +1,89 @@ +#+TITLE: Captive-portal login — learnings + baking it into the net panel +#+DATE: 2026-06-30 +#+SOURCE: the 2026-06-30 Hyatt wifi saga (velox) + +* Why this exists + +On a locked-down-DNS laptop, captive portals never show their login page, even +though phones get on fine. We spent hours on a Hyatt portal before finding the +mechanism; this captures it so the fix becomes a panel feature instead of a +one-off script. + +* The mechanism (what actually blocks the login) + +A redirect portal works by *DNS hijack*: you query a name, the hotel's resolver +hands back the portal, you get the login page. Two things on velox stop that: + +- *System resolver forces DNS-over-TLS.* =/etc/systemd/resolved.conf.d/dns-over-tls.conf= + hardcodes =DNS=1.1.1.1#... 9.9.9.9#...= with =DNSOverTLS=yes=. The system never + queries the hotel's resolver at all. The hotel blocks 853 (DoT) and external + 53, so system DNS is simply dead on the portal — only 443 (DoH) gets out. +- *Browser DoH.* Chrome "secure DNS" on bypasses the hotel DNS too, so the + browser never gets redirected either. + +A phone works because it uses *plain DNS* from the hotel plus a built-in +captive-portal popper. The laptop has neither. + +Confirmed facts from the saga: +- Front desk: it's a normal redirect-to-login portal. Phone: connects fine. +- No DHCP option 114 (RFC 8910) — the portal doesn't advertise its URL. But the + URL is recoverable from the HTTP 302 once you're on plain DNS. +- The walled garden whitelists OS captive-detection endpoints + (=captive.apple.com= returns "Success") — a *misleading* signal, not real + internet. Don't trust it. +- 443/DoH egress works broadly on the portal; only port-53 DNS is held. So + "system DNS fails" never means "no internet" here. + +* The working fix (=~/.local/bin/hotel-wifi=, to be folded in) + +Temporarily disable DoT → plain hotel DNS → discover the portal URL from the +redirect → open it in a clean browser profile (no DoH, no stale HSTS/cookies) → +click the button → restore DoT. Reversible; tested to restore cleanly. + +#+begin_src sh +#!/bin/sh +# hotel-wifi disable DoT -> find the portal login URL -> open it +# hotel-wifi off restore normal encrypted DNS (run once online) +conf=/etc/systemd/resolved.conf.d/dns-over-tls.conf +if [ "${1:-on}" = "off" ]; then + [ -f "$conf.captive-disabled" ] && sudo mv "$conf.captive-disabled" "$conf" + sudo systemctl restart systemd-resolved + echo "Encrypted DNS (DoT) restored."; exit 0 +fi +[ -f "$conf" ] && sudo mv "$conf" "$conf.captive-disabled" +sudo systemctl restart systemd-resolved; sleep 1 +resolvectl flush-caches 2>/dev/null || true +portal="" +for t in http://captive.apple.com/hotspot-detect.html http://neverssl.com \ + http://detectportal.firefox.com/canonical.html; do + loc=$(curl -sS -m 6 -o /dev/null -w '%{redirect_url}' "$t" 2>/dev/null) + [ -n "$loc" ] && { portal="$loc"; break; } + url=$(curl -sS -m 6 "$t" 2>/dev/null | grep -ioE 'https?://[^"'"'"' >]+' \ + | grep -ivE 'apple\.com|neverssl|firefox|w3\.org|gstatic' | head -1) + [ -n "$url" ] && { portal="$url"; break; } +done +prof=$(mktemp -d) +setsid -f google-chrome-stable --user-data-dir="$prof" "${portal:-http://neverssl.com}" >/dev/null 2>&1 +echo "Click the login button. When online: hotel-wifi off" +#+end_src + +* Baking it into the net panel (the task) + +- The net engine already diagnoses captive / no-internet. When it sees a held + portal, the panel should offer a first-class *"Log in to this network"* + action that runs the plain-DNS + clean-browser flow above, reversibly, and + auto-restores DoT when connectivity returns (or on a timeout). +- Reconcile with the existing =net portal= command and the =captive= helper — + they assumed a DNS-hijack-to-gateway model that did NOT match this portal + (gateway served no web; DNS was held, not hijacked-to-portal). The plain-DNS + approach is the one that worked; make it the engine's portal path. +- The DoT toggle must be safe and reversible (the =off= step). Consider a + per-connection or time-boxed DoT-off that can't strand encrypted DNS. +- Surface the misleading-"Success" lesson: a whitelisted captive-check passing + is not "online" — gate on a real, non-whitelisted fetch. + +* Related fix that unblocked the panel (already shipped) + +The panel could never switch networks because =net up= placed =--wait= after the +nmcli subcommand (it's a global option). Fixed in dotfiles 2432311; fake-nmcli +now rejects the misplaced flag so it can't regress. |
