aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/design/2026-06-29-zfs-pre-snapshot-installer.org87
-rw-r--r--todo.org77
2 files changed, 147 insertions, 17 deletions
diff --git a/docs/design/2026-06-29-zfs-pre-snapshot-installer.org b/docs/design/2026-06-29-zfs-pre-snapshot-installer.org
new file mode 100644
index 0000000..413bfa5
--- /dev/null
+++ b/docs/design/2026-06-29-zfs-pre-snapshot-installer.org
@@ -0,0 +1,87 @@
+#+TITLE: ZFS pre-pacman snapshot installer step (durable retention)
+#+DATE: 2026-06-29
+#+SOURCE: handoff from the home project, 2026-06-29
+
+* Problem
+
+A pacman =PreTransaction= hook snapshots =zroot/ROOT/default@pre-pacman_<ts>=
+before every transaction, but nothing prunes them. Sanoid doesn't manage them
+(they aren't =autosnap_= names), so they accumulated to 53 on velox between
+April and the 2026-06-29 health check. Unbounded, they fill the pool over time.
+
+* What's actually on velox vs. archsetup
+
+The live =/usr/local/bin/zfs-pre-snapshot= is *not* authored by archsetup —
+=git grep= for its content (=MIN_INTERVAL=, the pre-pacman =LOCKFILE= logic)
+finds nothing tracked. The =PreTransaction= hooks in the archsetup monolith
+(~lines 910, 1907, 1942) are the live-update guard, a different hook. The
+script appears hand-placed on velox.
+
+The 2026-01-17 security doc line "ZFS pre-pacman snapshots (already in
+install-archzfs)" is therefore *out of date* — archsetup does not install this.
+Incorporating the fix is a NET-NEW installer step, not a patch to an existing
+one. Correct that stale doc line as part of the work.
+
+velox was patched live (pruned to 10, script replaced with the self-pruning
+version below); live backup at =/usr/local/bin/zfs-pre-snapshot.bak-2026-06-29=.
+
+* Proposed installer step
+
+In the archzfs / ZFS-on-root install path, gated to ZFS-root installs (velox is
+the only ZFS daily driver; ratio is btrfs), install:
+
+1. =/etc/pacman.d/hooks/zfs-snapshot.hook= — the =PreTransaction= hook that
+ runs the script. *Not included in the handoff* — source it from velox
+ (=/etc/pacman.d/hooks/zfs-snapshot.hook=) or write it.
+2. =/usr/local/bin/zfs-pre-snapshot= — the =KEEP=10= self-pruning version
+ below.
+
+Tests live in archsetup, so this wants an archsetup session and a ZFS-root VM
+test (=make test FS_PROFILE=zfs=), not a cross-project edit from home.
+
+* The script (KEEP=10 self-pruning version)
+
+#+begin_src bash
+#!/bin/bash
+POOL="zroot"
+DATASET="$POOL/ROOT/default"
+LOCKFILE="/tmp/.zfs-pre-snapshot.lock"
+MIN_INTERVAL=60
+KEEP=10 # how many pre-pacman snapshots to retain (rollback safety for recent transactions)
+
+# Skip if a snapshot was created within the last 60 seconds
+if [ -f "$LOCKFILE" ]; then
+ last=$(stat -c %Y "$LOCKFILE" 2>/dev/null || echo 0)
+ now=$(date +%s)
+ if (( now - last < MIN_INTERVAL )); then
+ exit 0
+ fi
+fi
+
+TIMESTAMP=$(date +%Y-%m-%d_%H-%M-%S)
+SNAPSHOT_NAME="pre-pacman_$TIMESTAMP"
+
+if zfs snapshot "$DATASET@$SNAPSHOT_NAME"; then
+ echo "Created snapshot: $DATASET@$SNAPSHOT_NAME"
+ touch "$LOCKFILE"
+
+ # Retention: keep only the most recent $KEEP pre-pacman snapshots, destroy older ones.
+ # Sanoid does not manage these (they aren't autosnap_), so prune them here at creation time.
+ zfs list -H -o name -t snapshot -s creation "$DATASET" 2>/dev/null \
+ | grep '@pre-pacman_' \
+ | head -n -"$KEEP" \
+ | while read -r old; do
+ zfs destroy "$old" && echo "Pruned old snapshot: $old"
+ done
+else
+ echo "Warning: Failed to create snapshot" >&2
+fi
+#+end_src
+
+* Open items before implementation
+
+- Source or write =/etc/pacman.d/hooks/zfs-snapshot.hook= (the trigger).
+- Decide the exact insertion point in the ZFS-root install path.
+- Add a ZFS-root VM test asserting the hook + script land and the script
+ self-prunes past =KEEP=.
+- Correct the stale 2026-01-17 security-doc line.
diff --git a/todo.org b/todo.org
index 2e3d97f..98c6ed3 100644
--- a/todo.org
+++ b/todo.org
@@ -21,6 +21,26 @@ The vocabulary is open — topic tags are coined as needed — so these are conv
- *Effort / autonomy*: =:quick:= a spare-moment fix (minutes, not a sitting); =:solo:= Claude can carry it end to end — there's a build path, a test path, and no upfront decision needed (a leftover manual spot-check doesn't disqualify it).
- *Topic / area* (open): the subsystem a task touches — e.g. =:hyprland:= =:waybar:= =:mpd:= =:music:= =:network:= =:tooling:= =:llm:= =:eask:= =:pocketbook:= =:cmail:=. Coin a new one when it aids filtering.
* Archsetup Open Work
+** TODO [#B] ZFS pre-pacman snapshot installer step (ZFS-root) :feature:zfs:
+Add a ZFS-root-gated installer step that installs the pre-pacman snapshot pacman hook plus a self-pruning =/usr/local/bin/zfs-pre-snapshot= (KEEP=10). The script is hand-placed on velox, not authored by archsetup, so a reinstall loses it; snapshots accumulated unbounded (53 since April) because nothing prunes them and Sanoid ignores non-autosnap_ names. Gate to ZFS-root (velox; ratio is btrfs). Also correct the stale 2026-01-17 security-doc line claiming it's "already in install-archzfs". Needs the hook file (source from velox) and a ZFS-root VM test.
+
+Design notes and the KEEP=10 script: [[file:docs/design/2026-06-29-zfs-pre-snapshot-installer.org]]. Origin: home handoff 2026-06-29.
+
+** TODO [#B] Consistent red=off across waybar toggle modules :waybar:
+Extend the red=off convention (just added to the touchpad/mouse indicator) to the other toggles — sound volume, microphone mute, and caffeine — so a disabled / muted / off state reads red across the board. Skip the "cross"/slash; the color alone carries it. Origin: roam inbox capture.
+
+** TODO [#B] Microphone-mute keybind :feature:waybar:quick:
+A keyboard shortcut to toggle the mic mute. The pulseaudio#mic module shows the state but there's no hotkey to flip it. Wire a hyprland bind to a mic-mute toggle. Origin: roam inbox capture.
+
+** TODO [#B] File-manager swallow pattern :feature:hyprland:
+When the file manager launches another app, it should hide to a special workspace (the "swallow" pattern) and return when that process ends, rather than vanishing. Today it disappears with no signal of whether it's coming back, so the user can't tell success from failure — they should quit explicitly instead. Origin: roam inbox capture.
+
+** TODO [#C] Keybind hints in waybar module tooltips :waybar:
+Every module's hover tooltip should list its keyboard shortcut(s), for discoverability. Audit the modules and add the bindings to each tooltip. Origin: roam inbox capture.
+
+** TODO [#C] Smooth waybar expansion animation :waybar:
+The cluster expansion jumps instead of animating, and a few systray icons pop in one-by-one afterward, which reads as glitchy. Animate the expansion smoothly if waybar allows it — width transitions are limited, so feasibility is uncertain (hence [#C]). Origin: roam inbox capture.
+
** TODO [#B] Scrolling/Carousel layout: frame fit + wrap-around :hyprland:
:PROPERTIES:
:LAST_REVIEWED: 2026-06-13
@@ -154,15 +174,30 @@ green (32 suites). Live-verified on velox: panel opens/toggles, list shows real
profiles, right-click notification delivers (Craig confirmed). Phase 3 (diagnose/repair/
speedtest IN the panel) is next; the engine for it already exists from Phase 1.
-*** TODO Phase 3 — diagnostics + speed test in the panel :network:
-Deliverable: wire =net diagnose= / =net repair= / =net doctor= / =net portal= /
-=net speedtest= into the Diagnose (read-only) vs Repair (mutating, confirmed)
-sections; "Get me online" with live escalation reporting; portal Open button;
-speedtest (=speedtest-go --json=) progress + cancel; failure-mode → exact-string
-rendering across surfaces.
-Tests: diagnose read-only; each repair tier confirms + verifies cleanup (DNS
-override reverts → cleanup_verified, else cleanup-unverified); speedtest parse from
-fixture JSON + fixture stderr failure messages.
+*** 2026-06-29 Mon @ 22:43:40 -0400 Phase 3 shipped — diagnostics + speed test in the panel
+Shipped to dotfiles (=91277cf=..=691abcb=) + archsetup (=48052d6=, speedtest-go-bin),
+pushed. Engine: =net speedtest= (parses speedtest-go --json → ping from latency ns,
+down/up from per-server byte rates; missing-backend / offline / malformed → error
+envelope per the failure table). Panel grew a section switcher with four pages:
+- Connections (Phase 2).
+- Diagnose: =net diagnose= on a worker thread, each step a row (✓/✗/… glyph + title +
+ redacted evidence), read-only; Open-portal button when captive.
+- Repair: "Get me online" (=net doctor --fix=) + tiers (rfkill/reset/bounce/dns-test)
+ + force portal. Confirmations in-panel with the spec's exact wording; the privileged
+ tiers run via =net-popup= terminal (where the sudo prompt + step output, incl.
+ cleanup-verified, show) — a panel has no tty, and pkexec would mean a prompt per op.
+- Speed test: in-process =net speedtest= (no privilege → inline result: ↓/↑ Mbps + ping
+ + server), Run/Cancel (Cancel pkills the child), error envelope shown.
+
+213 net tests; pure helpers (step_indicator, format_speedtest) unit-tested. Full
+dotfiles suite green (32 suites). One unverified assumption: speedtest-go's dl/ul unit
+(taken as bytes/s; =BYTES_PER_SEC= flips it) — needs one real run vs a reference. The
+in-panel repair streaming (vs terminal) is a named future polish once the GUI-privilege
+story settles.
+
+The waybar network module ([#B] parent) is now COMPLETE through Phase 3. Phase 4
+(in-app help + user guide) and Phase 5 (VPN/WireGuard) remain as future work; the core
+feature (indicator + recovery + panel + diagnostics + speed test) is done.
Verify (manual, live): see Manual testing and validation.
*** TODO Phase 4 — docs + rollout :network:
@@ -822,17 +857,25 @@ rfkill list wifi # confirm Soft blocked: yes
make -C ~/.dotfiles online # or: net doctor --fix
#+end_src
- Expected: doctor reports the rfkill block, runs =rfkill unblock wifi= + =nmcli radio wifi on=, reconnects, and ends "online" — all from the TTY.
-*** Network module Phase 1 — bar clicks + airplane keybind
-What we're verifying: the custom/net clicks do the useful thing and airplane is a deliberate keybind, not a misclickable foot-gun (it disconnects you). Clicks (revised after live use 2026-06-29): left = =net doctor --notify= (desktop notification, no terminal — diagnose is read-only), middle = nmtui scratchpad, right = =net portal= in a floating terminal (=net-popup=, so the sudo prompt + browser work). Airplane = Super+Shift+A. The sudo fix path (=net doctor --fix=) stays =make online= in a terminal.
-- Left-click =custom/net= while online.
-- Expected: a desktop notification "Network / Online" (success), nothing changed. When offline it notifies the problem + next action.
-- Right-click =custom/net= on a captive network (or at a hotel).
-- Expected: =net portal= runs in the floating terminal — reset + opens the portal page to log in.
-- Middle-click =custom/net=.
-- Expected: the nmtui scratchpad (manual connection manager).
+*** Network module — bar clicks + airplane keybind (FINAL scheme)
+What we're verifying: the custom/net clicks and the airplane keybind. Clicks (settled with Craig over live use 2026-06-29): left = =net-panel= toggle (the GTK panel), middle = =net portal= (floating terminal), right = =net-fix= (notify the doctor result when one-way; open a terminal only when fixable). Airplane = Super+Shift+A.
+- Left-click =custom/net=.
+- Expected: the GTK connection panel toggles open (left-click again, or Esc, closes it).
+- Right-click =custom/net= while online.
+- Expected: a desktop notification "Network / Online" (success), no terminal. When a repair is needed it instead opens a terminal running =net doctor --fix=. (Craig confirmed the notification delivers, 2026-06-29.)
+- Middle-click =custom/net= on a captive network.
+- Expected: =net portal= runs in the floating terminal — reset + opens the portal page.
- Press Super+Shift+A.
- Expected: airplane engages (wifi off, dim, low-power); =custom/net= shows the airplane glyph in gold. Super+Shift+A again restores everything.
- Check =airplane-mode= is still present (=ls ~/.local/bin/airplane-mode=), and =waybar-airplane= / =waybar-netspeed= / =custom/airplane= are gone.
+*** Network module Phase 3 — panel Diagnose / Repair / Speed test tabs
+What we're verifying: the four-tab panel works end to end. Left-click =custom/net= to open it.
+- Diagnose tab → "Run diagnose".
+- Expected: a list of steps (link, DHCP, gateway, DNS config, DNS resolution, internet) each with a ✓/✗/… glyph + evidence; on a captive network an "Open portal" button appears.
+- Repair tab → click Reset (or Bounce, or DNS override test).
+- Expected: a confirmation dialog with the exact wording (Reset names the network + new-MAC warning; Bounce "links drop briefly"; DNS test "reverts automatically"). Proceed opens a floating terminal that runs the repair (sudo prompt there) and shows the step output incl. cleanup-verified for the DNS test.
+- Speed test tab → "Run speed test" (uses ~30s + data — do it on real wifi, not the metered hotspot).
+- Expected: ↓/↑ Mbps + ping + server shown inline. CONFIRM THE NUMBERS are sane vs a reference (fast.com) — this verifies the byte-rate→Mbps unit. If off by ~8x, the =BYTES_PER_SEC= constant in =net/src/net/speedtest.py= flips.
** DOING [#B] Prepare for GitHub open-source release
:PROPERTIES: