#+TITLE: ZFSBootMenu Implementation Plan #+AUTHOR: Craig Jennings & Claude #+DATE: 2026-01-22 * Overview Replace GRUB bootloader with ZFSBootMenu in the archzfs installation script. ZFSBootMenu provides native ZFS snapshot booting, eliminates kernel/snapshot version mismatch issues, and simplifies the boot architecture. * Why ZFSBootMenu? | Feature | GRUB | ZFSBootMenu | |-----------------+-----------------------------------+------------------------------| | Snapshot boot | Custom scripts (grub-zfs-snap) | Native, built-in | | Kernel location | Separate /boot partition | On ZFS with root | | Rollback safety | Can mismatch kernel/snapshot | Kernel travels with snapshot | | Boot menu | Regenerate with grub-mkconfig | Auto-discovers datasets | | EFI size needed | ~1GB (kernels + GRUB) | ~64MB (single binary) | | Complexity | High (scripts, hooks, generators) | Low (single binary + config) | * Current Architecture (GRUB) #+begin_example EFI Partition (1GB, /boot): ├── EFI/ │ └── GRUB/ │ └── grubx64.efi ├── grub/ │ ├── grub.cfg │ └── fonts/ ├── vmlinuz-linux-lts ├── initramfs-linux-lts.img └── initramfs-linux-lts-fallback.img ZFS Pool: └── zroot/ └── ROOT/ └── default (mountpoint=/) #+end_example * Target Architecture (ZFSBootMenu) #+begin_example EFI Partition (512MB, /efi): └── EFI/ └── ZBM/ └── zfsbootmenu.efi ZFS Pool: └── zroot/ └── ROOT/ (org.zfsbootmenu:commandline set here) └── default (mountpoint=/) (bootfs property points here) └── boot/ <-- regular directory, NOT a dataset! ├── vmlinuz-linux-lts ├── initramfs-linux-lts.img └── initramfs-linux-lts-fallback.img #+end_example *Key insight from research:* /boot is a regular directory inside ROOT/default, NOT a separate ZFS dataset. This ensures: 1. Snapshots of ROOT/default include the matching kernel 2. Rolling back a snapshot also rolls back to the kernel that was installed at that time 3. ZFSBootMenu can find the kernel at the expected path * Files to Modify ** custom/install-archzfs Primary installation script - most changes here. ** build.sh ISO build script - remove GRUB snapshot tooling. ** custom/grub-zfs-snap (DELETE) No longer needed - ZFSBootMenu handles natively. ** custom/40_zfs_snapshots (DELETE) GRUB generator - no longer needed. ** custom/zz-grub-zfs-snap.hook (DELETE) Pacman hook for GRUB - no longer needed. ** custom/zfssnapshot Update to remove grub-zfs-snap call (ZFSBootMenu auto-detects). ** custom/zfsrollback Update to remove grub-zfs-snap call. * Implementation Steps ** Step 1: Update partition_disks() Location: custom/install-archzfs, lines 707-750 Changes: - Reduce EFI partition from 1GB to 512MB - Update comments to reflect new purpose #+begin_src bash # Change this line: sgdisk -n 1:0:+1G -t 1:ef00 -c 1:"EFI" "$disk" # To: sgdisk -n 1:0:+512M -t 1:ef00 -c 1:"EFI" "$disk" #+end_src ** Step 2: Update create_datasets() Location: custom/install-archzfs, lines 817-859 *CRITICAL: DO NOT create a separate /boot dataset!* From research (PandaScience, sandreas/zarch, ZFSBootMenu docs): ZFSBootMenu expects /boot to be a *regular directory* inside the root dataset, NOT a separate ZFS dataset. The kernels must live at the path /boot/* within the root filesystem for ZFSBootMenu to find them. Changes: - Do NOT create a /boot dataset - The /boot directory will be created automatically by pacstrap when installing the kernel - This ensures snapshots of ROOT/default include the matching kernel #+begin_src bash # DO NOT ADD THIS - it's WRONG: # zfs create -o mountpoint=/boot "$POOL_NAME/ROOT/default/boot" # /boot is just a regular directory inside ROOT/default # mkinitcpio puts kernel/initramfs there automatically #+end_src Note: With ZFSBootMenu, kernels live ON the root ZFS dataset (not EFI partition). When you snapshot ROOT/default, the kernel is included in the snapshot. ** Step 3: Replace mount_efi() Location: custom/install-archzfs, lines 861-867 Changes: - Rename to mount_filesystems() - Mount EFI at /efi instead of /boot - /boot is already mounted as ZFS dataset #+begin_src bash mount_filesystems() { step "Mounting Filesystems" # EFI partition - only holds ZFSBootMenu binary mkdir -p /mnt/efi mount "${EFI_PARTS[0]}" /mnt/efi info "EFI partition ${EFI_PARTS[0]} mounted at /mnt/efi" # /boot is a directory inside the ZFS root dataset (created by pacstrap) # No separate mount needed - it's part of the root filesystem } #+end_src ** Step 4: Update install_base() Location: custom/install-archzfs, lines 869-920 Changes: - Remove: grub, freetype2 (GRUB font support) - Keep: efibootmgr (needed for EFI boot entries) #+begin_src bash # Remove these from pacstrap: # grub \ # freetype2 \ # Keep efibootmgr #+end_src ** Step 5: Update configure_system() fstab Location: custom/install-archzfs, lines 926-929 Changes: - Mount EFI at /efi instead of /boot #+begin_src bash # Change: echo "UUID=$(blkid -s UUID -o value "${EFI_PARTS[0]}") /boot vfat defaults,noatime 0 2" # To: echo "UUID=$(blkid -s UUID -o value "${EFI_PARTS[0]}") /efi vfat defaults,noatime 0 2" #+end_src ** Step 6: Update configure_initramfs() Location: custom/install-archzfs, lines 1021-1098 Changes: - Update preset to use /boot (now on ZFS) - No changes to hooks - ZFS hook still needed The preset file paths remain the same (/boot/vmlinuz-linux-lts, etc.) but /boot is now on ZFS instead of EFI partition. ** Step 7: Replace configure_bootloader() with configure_zfsbootmenu() Location: custom/install-archzfs, lines 1100-1164 Delete the entire GRUB function and replace with: #+begin_src bash configure_zfsbootmenu() { step "Configuring ZFSBootMenu" # Ensure hostid exists and get value # CRITICAL: Must be done BEFORE pool creation ideally, but we do it here too if [[ ! -f /etc/hostid ]]; then zgenhostid fi local host_id=$(hostid) # Copy hostid to installed system (ZFS uses this for pool ownership) cp /etc/hostid /mnt/etc/hostid # Create ZFSBootMenu directory on EFI mkdir -p /mnt/efi/EFI/ZBM # Download ZFSBootMenu release EFI binary # Using the bundled release which includes everything needed # (Alternative: build from AUR with generate-zbm, but this is simpler) info "Downloading ZFSBootMenu..." local zbm_url="https://get.zfsbootmenu.org/efi" if ! curl -fsSL -o /mnt/efi/EFI/ZBM/zfsbootmenu.efi "$zbm_url"; then error "Failed to download ZFSBootMenu" fi info "ZFSBootMenu binary installed." # Set kernel command line on the ROOT PARENT dataset # This allows inheritance to all boot environments (future-proofing) # ZFSBootMenu reads org.zfsbootmenu:commandline property local cmdline="rw loglevel=3" # Add any AMD GPU workarounds if needed (detect Strix Halo etc) if lspci | grep -qi "amd.*display\|amd.*vga"; then info "AMD GPU detected - adding workaround parameters" cmdline="$cmdline amdgpu.pg_mask=0 amdgpu.cwsr_enable=0" fi # Set on ROOT parent so all boot environments inherit it zfs set org.zfsbootmenu:commandline="$cmdline" "$POOL_NAME/ROOT" info "Kernel command line set on $POOL_NAME/ROOT (inherited by children)" # Set bootfs property - tells ZFSBootMenu which dataset to boot by default zpool set bootfs="$POOL_NAME/ROOT/default" "$POOL_NAME" info "Default boot filesystem set to $POOL_NAME/ROOT/default" # Create EFI boot entries for each disk # ZFSBootMenu EFI parameters (passed via --unicode): # spl_hostid=0x... - Required for pool import # zbm.timeout=3 - Seconds before auto-boot (-1 = always show menu) # zbm.prefer=POOLNAME - Preferred pool to boot from # zbm.import_policy=hostid - How to handle pool imports local zbm_cmdline="spl_hostid=0x${host_id} zbm.timeout=3 zbm.prefer=${POOL_NAME} zbm.import_policy=hostid" for i in "${!SELECTED_DISKS[@]}"; do local disk="${SELECTED_DISKS[$i]}" local label="ZFSBootMenu" if [[ ${#SELECTED_DISKS[@]} -gt 1 ]]; then label="ZFSBootMenu-disk$((i+1))" fi # Determine partition number (always 1 - first partition is EFI) local part_num=1 info "Creating EFI boot entry: $label on $disk" efibootmgr --create \ --disk "$disk" \ --part "$part_num" \ --label "$label" \ --loader '\EFI\ZBM\zfsbootmenu.efi' \ --unicode "$zbm_cmdline" \ --quiet done # Get the boot entry number and set as first in boot order local bootnum=$(efibootmgr | grep "ZFSBootMenu" | head -1 | grep -oP 'Boot\K[0-9A-F]+') if [[ -n "$bootnum" ]]; then # Get current boot order, prepend our entry local current_order=$(efibootmgr | grep "BootOrder" | cut -d: -f2 | tr -d ' ') efibootmgr --bootorder "$bootnum,$current_order" --quiet info "ZFSBootMenu set as primary boot option" fi info "ZFSBootMenu configuration complete." } #+end_src ** Step 8: Delete configure_grub_zfs_snap() Location: custom/install-archzfs, lines 1166-1184 Delete the entire function - ZFSBootMenu handles snapshot menus natively. ** Step 9: Update sync_efi_partitions() Location: custom/install-archzfs, lines 1285-1315 Changes: - Sync ZFSBootMenu binary instead of GRUB - Create EFI boot entries for secondary disks #+begin_src bash sync_efi_partitions() { [[ ${#EFI_PARTS[@]} -le 1 ]] && return step "Syncing EFI Partitions for Redundancy" for i in "${!EFI_PARTS[@]}"; do [[ $i -eq 0 ]] && continue local efi_part="${EFI_PARTS[$i]}" local temp_mount="/mnt/efi_sync" info "Syncing ZFSBootMenu to EFI partition $((i+1)): $efi_part" mkdir -p "$temp_mount" mount "$efi_part" "$temp_mount" # Copy ZFSBootMenu binary mkdir -p "$temp_mount/EFI/ZBM" cp /mnt/efi/EFI/ZBM/zfsbootmenu.efi "$temp_mount/EFI/ZBM/" umount "$temp_mount" done rmdir "$temp_mount" 2>/dev/null || true info "All EFI partitions synchronized." } #+end_src ** Step 10: Update cleanup() Location: custom/install-archzfs, lines 1379-1393 Changes: - Unmount /mnt/efi instead of /mnt/boot #+begin_src bash # Change: umount /mnt/boot 2>/dev/null || true # To: umount /mnt/efi 2>/dev/null || true #+end_src ** Step 11: Update print_summary() Location: custom/install-archzfs, lines 1395-1424 Changes: - Update bootloader references from GRUB to ZFSBootMenu - Update useful commands section #+begin_src bash # Update the "ZFS Features" section: echo "ZFS Features:" echo " - ZFSBootMenu: boot from any snapshot" echo " - Genesis snapshot: pristine post-install state" echo " - Pre-pacman snapshots for safe upgrades" echo "" echo "Boot Menu Keys (at ZFSBootMenu):" echo " Enter - Boot selected environment" echo " e - Edit kernel command line" echo " Ctrl+D - Show snapshot selector" echo " Ctrl+R - Recovery shell" #+end_src ** Step 12: Update build.sh Location: build.sh Changes: - Remove grub-zfs-snap file copies (lines ~375-380) - Remove grub-zfs-snap permissions (line ~408) - Keep bootloader configs for live ISO (still uses GRUB/syslinux) #+begin_src bash # DELETE these lines from build.sh: # Copy grub-zfs-snap to ISO cp custom/grub-zfs-snap profile/airootfs/usr/local/bin/grub-zfs-snap mkdir -p profile/airootfs/usr/local/share/grub-zfs-snap cp custom/40_zfs_snapshots profile/airootfs/usr/local/share/grub-zfs-snap/ cp custom/zz-grub-zfs-snap.hook profile/airootfs/usr/local/share/grub-zfs-snap/ # And from file_permissions: ["usr/local/bin/grub-zfs-snap"]="0:0:755" #+end_src ** Step 13: Update zfssnapshot and zfsrollback Location: custom/zfssnapshot, custom/zfsrollback Changes: - Remove calls to grub-zfs-snap - ZFSBootMenu auto-detects snapshots, no regeneration needed #+begin_src bash # DELETE from zfssnapshot (around line 107): grub-zfs-snap 2>/dev/null || true # DELETE from zfsrollback (around line 177): grub-zfs-snap 2>/dev/null || true #+end_src ** Step 14: Delete GRUB-specific files Files to delete from custom/: - custom/grub-zfs-snap - custom/40_zfs_snapshots - custom/zz-grub-zfs-snap.hook #+begin_src bash rm custom/grub-zfs-snap rm custom/40_zfs_snapshots rm custom/zz-grub-zfs-snap.hook #+end_src ** Step 15: Update main() function call order Location: custom/install-archzfs, main() around line 1443 Changes: - Replace configure_bootloader with configure_zfsbootmenu - Remove configure_grub_zfs_snap call #+begin_src bash # Change this sequence: configure_initramfs configure_bootloader # <- rename configure_grub_zfs_snap # <- delete configure_zfs_services # To: configure_initramfs configure_zfsbootmenu # <- new function configure_zfs_services #+end_src * Testing Plan ** Test Environment - QEMU VM with UEFI firmware (OVMF) - Multiple test scenarios for different disk configurations - Existing test script: scripts/test-vm.sh ** Test 1: Single Disk Install #+begin_src bash # Start VM ./scripts/test-vm.sh # In VM, run installer install-archzfs # Select single disk # Complete installation # Reboot #+end_src *Validation Points:* - [ ] EFI partition is 512MB (not 1GB) - [ ] /efi contains only EFI/ZBM/zfsbootmenu.efi - [ ] /boot is a directory (NOT a dataset): =zfs list= should NOT show zroot/ROOT/default/boot - [ ] Kernel files exist in /boot/ (=ls /boot/vmlinuz*=) - [ ] ZFSBootMenu menu appears on boot - [ ] Can boot into installed system - [ ] After login: =zfs get org.zfsbootmenu:commandline zroot/ROOT= shows cmdline (set on parent) - [ ] After login: =zpool get bootfs zroot= shows zroot/ROOT/default ** Test 2: Mirror Install (2 disks) #+begin_src bash # Create second virtual disk qemu-img create -f qcow2 test-disk2.qcow2 50G # Modify test-vm.sh to add second disk # -drive file=test-disk2.qcow2,if=virtio # Run installer, select both disks, choose mirror #+end_src *Validation Points:* - [ ] Both disks have EFI partitions - [ ] ZFSBootMenu binary exists on both EFI partitions - [ ] EFI boot entries exist for both disks (efibootmgr -v) - [ ] Can boot from either disk (test by removing first disk) - [ ] ZFS pool shows mirror topology (zpool status) ** Test 3: RAIDZ1 Install (3 disks) *Validation Points:* - [ ] All three disks have EFI partitions with ZFSBootMenu - [ ] Three EFI boot entries created - [ ] ZFS pool shows raidz1 topology ** Test 4: Snapshot Boot #+begin_src bash # After installation and first boot: # Create a test file echo "original" > /root/test.txt # Create a snapshot zfs snapshot zroot/ROOT/default@test-snap # Modify the file echo "modified" > /root/test.txt # Reboot, at ZFSBootMenu press Ctrl+D # Select zroot/ROOT/default@test-snap # Boot from snapshot #+end_src *Validation Points:* - [ ] ZFSBootMenu shows snapshot selector with Ctrl+D - [ ] Snapshot appears in list - [ ] Booting from snapshot shows original file content - [ ] Kernel version matches (no mismatch errors) ** Test 5: Kernel Update Scenario #+begin_src bash # Simulate kernel update (or actually do one) pacman -Syu # Reboot #+end_src *Validation Points:* - [ ] New kernel is on ZFS (included in future snapshots) - [ ] ZFSBootMenu detects and boots new kernel - [ ] No manual regeneration needed (unlike GRUB) ** Test 6: Recovery Shell #+begin_src bash # At ZFSBootMenu, press Ctrl+R #+end_src *Validation Points:* - [ ] Recovery shell accessible - [ ] ZFS pool is importable from recovery - [ ] Can manually mount and chroot if needed ** Test 7: Encrypted Pool #+begin_src bash # Run installer with encryption enabled # Enter passphrase when prompted #+end_src *Validation Points:* - [ ] ZFSBootMenu prompts for passphrase - [ ] Pool unlocks successfully - [ ] System boots normally after passphrase entry * Validation Checklist (All Tests) ** Pre-Installation - [ ] Live ISO boots successfully - [ ] ZFS module loads (lsmod | grep zfs) ** Partitioning - [ ] EFI partition is 512MB - [ ] ZFS partition uses remaining space - [ ] Partition table is GPT ** Filesystem Layout - [ ] /efi is vfat, mounted from EFI partition - [ ] /boot is a directory inside ROOT/default (NOT a separate dataset) - [ ] Verify: =zfs list= should NOT show a zroot/ROOT/default/boot dataset - [ ] Kernel/initramfs exist in /boot/ (on the ZFS root filesystem) ** ZFSBootMenu - [ ] zfsbootmenu.efi exists at /efi/EFI/ZBM/ - [ ] EFI boot entry points to ZFSBootMenu - [ ] org.zfsbootmenu:commandline property set on root dataset - [ ] hostid included in cmdline (spl.spl_hostid=0x...) ** Boot Process - [ ] ZFSBootMenu menu appears - [ ] Countdown timer works - [ ] Default boot entry is correct - [ ] Boot completes successfully - [ ] All ZFS datasets mount correctly ** Multi-Disk (if applicable) - [ ] All EFI partitions contain zfsbootmenu.efi - [ ] All disks have EFI boot entries - [ ] Can boot from any disk ** Snapshots - [ ] Genesis snapshot created - [ ] Ctrl+D shows snapshot selector - [ ] Can boot from snapshot - [ ] Snapshot includes matching kernel ** Services - [ ] zfs-import-scan.service enabled - [ ] zfs-mount.service enabled - [ ] NetworkManager starts - [ ] SSH accessible (if enabled) * Rollback Plan If ZFSBootMenu implementation fails: 1. Keep GRUB version in a git branch before changes 2. ISO still boots with GRUB (live environment unchanged) 3. Can install GRUB manually from live environment: #+begin_src bash pacstrap /mnt grub arch-chroot /mnt grub-install --target=x86_64-efi --efi-directory=/boot arch-chroot /mnt grub-mkconfig -o /boot/grub/grub.cfg #+end_src * Research Findings (from comparable projects) This plan incorporates best practices from these open-source Arch+ZFS installers: ** eoli3n/archiso-zfs + arch-config - ZFSBootMenu built from source with =generate-zbm= (we use pre-built binary for simplicity) - Uses file-based encryption key (=/etc/zfs/zroot.key=) embedded in initramfs to avoid double passphrase prompt - Sets =org.zfsbootmenu:commandline= on ROOT parent for inheritance to all boot environments - Minimal dataset layout: ROOT, data/home ** PandaScience/arch-on-zfs - Uses rEFInd (not ZFSBootMenu), but documents the "smart dataset layout": - =system/= for root, =userdata/= for home, =nobackup/= for cache/tmp - Emphasizes =canmount=noauto= on root dataset (we already do this) - Recommends =/boot= as directory inside root, NOT separate dataset ** sandreas/zarch - Downloads ZFSBootMenu binary from =https://get.zfsbootmenu.org/latest.EFI= (adopted) - Uses efibootmgr =--unicode= parameter for ZFSBootMenu cmdline (adopted) - ZFSBootMenu parameters: =spl_hostid=, =zbm.timeout=, =zbm.prefer=, =zbm.import_policy= (adopted) - Uses zrepl for time-based snapshots (we use pacman hooks - complementary approach) ** danboid/ALEZ - Two-pool design (bpool + rpool) for GRUB compatibility - NOT needed with ZFSBootMenu - systemd-boot ZFS entry uses =zfs=POOL/ROOT/default= parameter - Pool export/reimport pattern for clean state ** danfossi/Arch-ZFS-Root-Installation-Script - Uses =compatibility=grub2= pool option for GRUB - NOT needed with ZFSBootMenu - Good partition suffix helper for nvme/mmcblk naming (we already have this) - Separate bpool for boot - NOT needed with ZFSBootMenu ** Key Lessons Adopted 1. *DO NOT create separate /boot dataset* - must be directory inside root 2. *Set commandline on ROOT parent* - inherited by all boot environments 3. *Use =--unicode= for ZFSBootMenu parameters* - spl_hostid, zbm.timeout, zbm.prefer 4. *Download pre-built EFI binary* - simpler than building from AUR 5. *Copy hostid to installed system* - required for pool import 6. *Set bootfs pool property* - tells ZFSBootMenu default boot target ** Optional Enhancement: File-Based Encryption Key From eoli3n: To avoid entering passphrase twice (ZFSBootMenu + initramfs): #+begin_src bash # During pool creation, use keylocation=file instead of prompt echo "$ZFS_PASSPHRASE" > /mnt/etc/zfs/zroot.key chmod 000 /mnt/etc/zfs/zroot.key # Add to mkinitcpio FILES echo 'FILES+=(/etc/zfs/zroot.key)' >> /mnt/etc/mkinitcpio.conf.d/zfs-key.conf # Change keylocation zfs set keylocation=file:///etc/zfs/zroot.key zroot #+end_src Trade-off: Simpler UX (one passphrase) but key is in initramfs on ZFS. *Current plan uses prompt-based* - user enters passphrase at ZFSBootMenu. * References ** Official Documentation - ZFSBootMenu Documentation: https://docs.zfsbootmenu.org/ - ZFSBootMenu GitHub: https://github.com/zbm-dev/zfsbootmenu - ZFSBootMenu man page: https://docs.zfsbootmenu.org/en/latest/man/zfsbootmenu.7.html - Arch Wiki ZFS: https://wiki.archlinux.org/title/ZFS ** Researched Projects - eoli3n/archiso-zfs: https://github.com/eoli3n/archiso-zfs - eoli3n/arch-config: https://github.com/eoli3n/arch-config - PandaScience/arch-on-zfs: https://github.com/PandaScience/arch-on-zfs - sandreas/zarch: https://github.com/sandreas/zarch - danboid/ALEZ: https://github.com/danboid/ALEZ - danfossi/Arch-ZFS-Root-Installation-Script: https://github.com/danfossi/Arch-ZFS-Root-Installation-Script ** Guides - Florian Esser's ZFSBootMenu Guide: https://florianesser.ch/posts/20220714-arch-install-zbm/ - Arch Wiki ZFSBootMenu: https://wiki.archlinux.org/title/User:Kayvlim/Install_UEFI_and_BIOS_compatible_Arch_Linux_with_Encrypted_ZFS_and_ZFSBootMenu * Implementation Order 1. Create git branch: =git checkout -b zfsbootmenu= 2. Delete GRUB files (Step 14) 3. Update build.sh (Step 12) 4. Update install-archzfs (Steps 1-11, 15) 5. Update helper scripts (Step 13) 6. Build new ISO 7. Run Test 1 (single disk) 8. Fix any issues 9. Run Tests 2-7 10. Merge to main when all tests pass