aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorCraig Jennings <c@cjennings.net>2026-06-30 10:59:08 -0400
committerCraig Jennings <c@cjennings.net>2026-06-30 10:59:08 -0400
commitfdbaa52b4e308be6c809e98a785c3723273835f9 (patch)
tree7f9b77891d1db5a4015f69ce24c8bb2e19d665d7 /docs
parent6bd832897813c730deb12768d1eb5b02af66ad20 (diff)
downloadarchsetup-fdbaa52b4e308be6c809e98a785c3723273835f9.tar.gz
archsetup-fdbaa52b4e308be6c809e98a785c3723273835f9.zip
docs: capture captive-portal login learnings + close the ZFS task
File the captive-portal-login design doc from the 2026-06-30 Hyatt saga — the actual mechanism (system DoT + browser DoH both bypass the hotel's redirecting DNS; plain DNS is what works), the working hotel-wifi script, and the plan to make it a first-class net-panel action — plus a [#B] feature task to bake it in. Also close the ZFS pre-pacman snapshot task: the installer step shipped and the ZFS VM install passed 97/0 with the new hook assertion.
Diffstat (limited to 'docs')
-rw-r--r--docs/design/2026-06-30-captive-portal-login.org89
1 files changed, 89 insertions, 0 deletions
diff --git a/docs/design/2026-06-30-captive-portal-login.org b/docs/design/2026-06-30-captive-portal-login.org
new file mode 100644
index 0000000..1739689
--- /dev/null
+++ b/docs/design/2026-06-30-captive-portal-login.org
@@ -0,0 +1,89 @@
+#+TITLE: Captive-portal login — learnings + baking it into the net panel
+#+DATE: 2026-06-30
+#+SOURCE: the 2026-06-30 Hyatt wifi saga (velox)
+
+* Why this exists
+
+On a locked-down-DNS laptop, captive portals never show their login page, even
+though phones get on fine. We spent hours on a Hyatt portal before finding the
+mechanism; this captures it so the fix becomes a panel feature instead of a
+one-off script.
+
+* The mechanism (what actually blocks the login)
+
+A redirect portal works by *DNS hijack*: you query a name, the hotel's resolver
+hands back the portal, you get the login page. Two things on velox stop that:
+
+- *System resolver forces DNS-over-TLS.* =/etc/systemd/resolved.conf.d/dns-over-tls.conf=
+ hardcodes =DNS=1.1.1.1#... 9.9.9.9#...= with =DNSOverTLS=yes=. The system never
+ queries the hotel's resolver at all. The hotel blocks 853 (DoT) and external
+ 53, so system DNS is simply dead on the portal — only 443 (DoH) gets out.
+- *Browser DoH.* Chrome "secure DNS" on bypasses the hotel DNS too, so the
+ browser never gets redirected either.
+
+A phone works because it uses *plain DNS* from the hotel plus a built-in
+captive-portal popper. The laptop has neither.
+
+Confirmed facts from the saga:
+- Front desk: it's a normal redirect-to-login portal. Phone: connects fine.
+- No DHCP option 114 (RFC 8910) — the portal doesn't advertise its URL. But the
+ URL is recoverable from the HTTP 302 once you're on plain DNS.
+- The walled garden whitelists OS captive-detection endpoints
+ (=captive.apple.com= returns "Success") — a *misleading* signal, not real
+ internet. Don't trust it.
+- 443/DoH egress works broadly on the portal; only port-53 DNS is held. So
+ "system DNS fails" never means "no internet" here.
+
+* The working fix (=~/.local/bin/hotel-wifi=, to be folded in)
+
+Temporarily disable DoT → plain hotel DNS → discover the portal URL from the
+redirect → open it in a clean browser profile (no DoH, no stale HSTS/cookies) →
+click the button → restore DoT. Reversible; tested to restore cleanly.
+
+#+begin_src sh
+#!/bin/sh
+# hotel-wifi disable DoT -> find the portal login URL -> open it
+# hotel-wifi off restore normal encrypted DNS (run once online)
+conf=/etc/systemd/resolved.conf.d/dns-over-tls.conf
+if [ "${1:-on}" = "off" ]; then
+ [ -f "$conf.captive-disabled" ] && sudo mv "$conf.captive-disabled" "$conf"
+ sudo systemctl restart systemd-resolved
+ echo "Encrypted DNS (DoT) restored."; exit 0
+fi
+[ -f "$conf" ] && sudo mv "$conf" "$conf.captive-disabled"
+sudo systemctl restart systemd-resolved; sleep 1
+resolvectl flush-caches 2>/dev/null || true
+portal=""
+for t in http://captive.apple.com/hotspot-detect.html http://neverssl.com \
+ http://detectportal.firefox.com/canonical.html; do
+ loc=$(curl -sS -m 6 -o /dev/null -w '%{redirect_url}' "$t" 2>/dev/null)
+ [ -n "$loc" ] && { portal="$loc"; break; }
+ url=$(curl -sS -m 6 "$t" 2>/dev/null | grep -ioE 'https?://[^"'"'"' >]+' \
+ | grep -ivE 'apple\.com|neverssl|firefox|w3\.org|gstatic' | head -1)
+ [ -n "$url" ] && { portal="$url"; break; }
+done
+prof=$(mktemp -d)
+setsid -f google-chrome-stable --user-data-dir="$prof" "${portal:-http://neverssl.com}" >/dev/null 2>&1
+echo "Click the login button. When online: hotel-wifi off"
+#+end_src
+
+* Baking it into the net panel (the task)
+
+- The net engine already diagnoses captive / no-internet. When it sees a held
+ portal, the panel should offer a first-class *"Log in to this network"*
+ action that runs the plain-DNS + clean-browser flow above, reversibly, and
+ auto-restores DoT when connectivity returns (or on a timeout).
+- Reconcile with the existing =net portal= command and the =captive= helper —
+ they assumed a DNS-hijack-to-gateway model that did NOT match this portal
+ (gateway served no web; DNS was held, not hijacked-to-portal). The plain-DNS
+ approach is the one that worked; make it the engine's portal path.
+- The DoT toggle must be safe and reversible (the =off= step). Consider a
+ per-connection or time-boxed DoT-off that can't strand encrypted DNS.
+- Surface the misleading-"Success" lesson: a whitelisted captive-check passing
+ is not "online" — gate on a real, non-whitelisted fetch.
+
+* Related fix that unblocked the panel (already shipped)
+
+The panel could never switch networks because =net up= placed =--wait= after the
+nmcli subcommand (it's a global option). Fixed in dotfiles 2432311; fake-nmcli
+now rejects the misplaced flag so it can't regress.