diff options
| author | Craig Jennings <c@cjennings.net> | 2026-02-22 23:23:59 -0600 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-02-22 23:23:59 -0600 |
| commit | 8b9c2cc247f8d71d570921cb127e7e08cfac7674 (patch) | |
| tree | 0b0e00f580f2130cad98e2f2a9e2e7175116a7c0 | |
| parent | 26831607be9ae3bbc542bfc352dab718a486cab2 (diff) | |
| download | archangel-8b9c2cc247f8d71d570921cb127e7e08cfac7674.tar.gz archangel-8b9c2cc247f8d71d570921cb127e7e08cfac7674.zip | |
clean personal info and private files from repository
- Remove personal hardware specs, machine-specific troubleshooting docs,
and video transcript from assets/
- Remove stale PLAN-zfsbootmenu-implementation.org (feature complete)
- Remove .stignore (Syncthing config, not project-relevant)
- Untrack todo.org (personal task tracker with private infra details)
- Make archsetup path configurable via ARCHSETUP_DIR env var in build.sh
- Use $REAL_USER instead of hardcoded username in build-release scp
| -rw-r--r-- | .gitignore | 1 | ||||
| -rw-r--r-- | .stignore | 2 | ||||
| -rw-r--r-- | PLAN-zfsbootmenu-implementation.org | 712 | ||||
| -rw-r--r-- | assets/2026-01-22-mkinitcpio-fixes-applied-detail.org | 194 | ||||
| -rw-r--r-- | assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org | 152 | ||||
| -rw-r--r-- | assets/Donato Capitella-ROCm+Linux Support on Strix Halo: It's finally stable in 2026!.txt | 1 | ||||
| -rw-r--r-- | assets/cogito-hardware-specs.txt | 17 | ||||
| -rwxr-xr-x | build.sh | 5 | ||||
| -rwxr-xr-x | scripts/build-release | 2 | ||||
| -rw-r--r-- | todo.org | 844 |
10 files changed, 5 insertions, 1925 deletions
@@ -11,3 +11,4 @@ reference-repos/ # Personal session/workflow docs (not project documentation) docs/ +todo.org diff --git a/.stignore b/.stignore deleted file mode 100644 index e5998b3..0000000 --- a/.stignore +++ /dev/null @@ -1,2 +0,0 @@ -.git -work/ diff --git a/PLAN-zfsbootmenu-implementation.org b/PLAN-zfsbootmenu-implementation.org deleted file mode 100644 index 6733c26..0000000 --- a/PLAN-zfsbootmenu-implementation.org +++ /dev/null @@ -1,712 +0,0 @@ -#+TITLE: ZFSBootMenu Implementation Plan -#+AUTHOR: Craig Jennings & Claude -#+DATE: 2026-01-22 - -* Overview - -Replace GRUB bootloader with ZFSBootMenu in the archzfs installation script. ZFSBootMenu provides native ZFS snapshot booting, eliminates kernel/snapshot version mismatch issues, and simplifies the boot architecture. - -* Why ZFSBootMenu? - -| Feature | GRUB | ZFSBootMenu | -|-----------------+-----------------------------------+------------------------------| -| Snapshot boot | Custom scripts (grub-zfs-snap) | Native, built-in | -| Kernel location | Separate /boot partition | On ZFS with root | -| Rollback safety | Can mismatch kernel/snapshot | Kernel travels with snapshot | -| Boot menu | Regenerate with grub-mkconfig | Auto-discovers datasets | -| EFI size needed | ~1GB (kernels + GRUB) | ~64MB (single binary) | -| Complexity | High (scripts, hooks, generators) | Low (single binary + config) | - -* Current Architecture (GRUB) - -#+begin_example -EFI Partition (1GB, /boot): -├── EFI/ -│ └── GRUB/ -│ └── grubx64.efi -├── grub/ -│ ├── grub.cfg -│ └── fonts/ -├── vmlinuz-linux-lts -├── initramfs-linux-lts.img -└── initramfs-linux-lts-fallback.img - -ZFS Pool: -└── zroot/ - └── ROOT/ - └── default (mountpoint=/) -#+end_example - -* Target Architecture (ZFSBootMenu) - -#+begin_example -EFI Partition (512MB, /efi): -└── EFI/ - └── ZBM/ - └── zfsbootmenu.efi - -ZFS Pool: -└── zroot/ - └── ROOT/ (org.zfsbootmenu:commandline set here) - └── default (mountpoint=/) (bootfs property points here) - └── boot/ <-- regular directory, NOT a dataset! - ├── vmlinuz-linux-lts - ├── initramfs-linux-lts.img - └── initramfs-linux-lts-fallback.img -#+end_example - -*Key insight from research:* /boot is a regular directory inside ROOT/default, -NOT a separate ZFS dataset. This ensures: -1. Snapshots of ROOT/default include the matching kernel -2. Rolling back a snapshot also rolls back to the kernel that was installed at that time -3. ZFSBootMenu can find the kernel at the expected path - -* Files to Modify - -** custom/install-archzfs -Primary installation script - most changes here. - -** build.sh -ISO build script - remove GRUB snapshot tooling. - -** custom/grub-zfs-snap (DELETE) -No longer needed - ZFSBootMenu handles natively. - -** custom/40_zfs_snapshots (DELETE) -GRUB generator - no longer needed. - -** custom/zz-grub-zfs-snap.hook (DELETE) -Pacman hook for GRUB - no longer needed. - -** custom/zfssnapshot -Update to remove grub-zfs-snap call (ZFSBootMenu auto-detects). - -** custom/zfsrollback -Update to remove grub-zfs-snap call. - -* Implementation Steps - -** Step 1: Update partition_disks() - -Location: custom/install-archzfs, lines 707-750 - -Changes: -- Reduce EFI partition from 1GB to 512MB -- Update comments to reflect new purpose - -#+begin_src bash -# Change this line: -sgdisk -n 1:0:+1G -t 1:ef00 -c 1:"EFI" "$disk" - -# To: -sgdisk -n 1:0:+512M -t 1:ef00 -c 1:"EFI" "$disk" -#+end_src - -** Step 2: Update create_datasets() - -Location: custom/install-archzfs, lines 817-859 - -*CRITICAL: DO NOT create a separate /boot dataset!* - -From research (PandaScience, sandreas/zarch, ZFSBootMenu docs): -ZFSBootMenu expects /boot to be a *regular directory* inside the root dataset, -NOT a separate ZFS dataset. The kernels must live at the path /boot/* within -the root filesystem for ZFSBootMenu to find them. - -Changes: -- Do NOT create a /boot dataset -- The /boot directory will be created automatically by pacstrap when installing the kernel -- This ensures snapshots of ROOT/default include the matching kernel - -#+begin_src bash -# DO NOT ADD THIS - it's WRONG: -# zfs create -o mountpoint=/boot "$POOL_NAME/ROOT/default/boot" - -# /boot is just a regular directory inside ROOT/default -# mkinitcpio puts kernel/initramfs there automatically -#+end_src - -Note: With ZFSBootMenu, kernels live ON the root ZFS dataset (not EFI partition). -When you snapshot ROOT/default, the kernel is included in the snapshot. - -** Step 3: Replace mount_efi() - -Location: custom/install-archzfs, lines 861-867 - -Changes: -- Rename to mount_filesystems() -- Mount EFI at /efi instead of /boot -- /boot is already mounted as ZFS dataset - -#+begin_src bash -mount_filesystems() { - step "Mounting Filesystems" - - # EFI partition - only holds ZFSBootMenu binary - mkdir -p /mnt/efi - mount "${EFI_PARTS[0]}" /mnt/efi - info "EFI partition ${EFI_PARTS[0]} mounted at /mnt/efi" - - # /boot is a directory inside the ZFS root dataset (created by pacstrap) - # No separate mount needed - it's part of the root filesystem -} -#+end_src - -** Step 4: Update install_base() - -Location: custom/install-archzfs, lines 869-920 - -Changes: -- Remove: grub, freetype2 (GRUB font support) -- Keep: efibootmgr (needed for EFI boot entries) - -#+begin_src bash -# Remove these from pacstrap: -# grub \ -# freetype2 \ - -# Keep efibootmgr -#+end_src - -** Step 5: Update configure_system() fstab - -Location: custom/install-archzfs, lines 926-929 - -Changes: -- Mount EFI at /efi instead of /boot - -#+begin_src bash -# Change: -echo "UUID=$(blkid -s UUID -o value "${EFI_PARTS[0]}") /boot vfat defaults,noatime 0 2" - -# To: -echo "UUID=$(blkid -s UUID -o value "${EFI_PARTS[0]}") /efi vfat defaults,noatime 0 2" -#+end_src - -** Step 6: Update configure_initramfs() - -Location: custom/install-archzfs, lines 1021-1098 - -Changes: -- Update preset to use /boot (now on ZFS) -- No changes to hooks - ZFS hook still needed - -The preset file paths remain the same (/boot/vmlinuz-linux-lts, etc.) but /boot is now on ZFS instead of EFI partition. - -** Step 7: Replace configure_bootloader() with configure_zfsbootmenu() - -Location: custom/install-archzfs, lines 1100-1164 - -Delete the entire GRUB function and replace with: - -#+begin_src bash -configure_zfsbootmenu() { - step "Configuring ZFSBootMenu" - - # Ensure hostid exists and get value - # CRITICAL: Must be done BEFORE pool creation ideally, but we do it here too - if [[ ! -f /etc/hostid ]]; then - zgenhostid - fi - local host_id=$(hostid) - - # Copy hostid to installed system (ZFS uses this for pool ownership) - cp /etc/hostid /mnt/etc/hostid - - # Create ZFSBootMenu directory on EFI - mkdir -p /mnt/efi/EFI/ZBM - - # Download ZFSBootMenu release EFI binary - # Using the bundled release which includes everything needed - # (Alternative: build from AUR with generate-zbm, but this is simpler) - info "Downloading ZFSBootMenu..." - local zbm_url="https://get.zfsbootmenu.org/efi" - if ! curl -fsSL -o /mnt/efi/EFI/ZBM/zfsbootmenu.efi "$zbm_url"; then - error "Failed to download ZFSBootMenu" - fi - info "ZFSBootMenu binary installed." - - # Set kernel command line on the ROOT PARENT dataset - # This allows inheritance to all boot environments (future-proofing) - # ZFSBootMenu reads org.zfsbootmenu:commandline property - local cmdline="rw loglevel=3" - - # Add any AMD GPU workarounds if needed (detect Strix Halo etc) - if lspci | grep -qi "amd.*display\|amd.*vga"; then - info "AMD GPU detected - adding workaround parameters" - cmdline="$cmdline amdgpu.pg_mask=0 amdgpu.cwsr_enable=0" - fi - - # Set on ROOT parent so all boot environments inherit it - zfs set org.zfsbootmenu:commandline="$cmdline" "$POOL_NAME/ROOT" - info "Kernel command line set on $POOL_NAME/ROOT (inherited by children)" - - # Set bootfs property - tells ZFSBootMenu which dataset to boot by default - zpool set bootfs="$POOL_NAME/ROOT/default" "$POOL_NAME" - info "Default boot filesystem set to $POOL_NAME/ROOT/default" - - # Create EFI boot entries for each disk - # ZFSBootMenu EFI parameters (passed via --unicode): - # spl_hostid=0x... - Required for pool import - # zbm.timeout=3 - Seconds before auto-boot (-1 = always show menu) - # zbm.prefer=POOLNAME - Preferred pool to boot from - # zbm.import_policy=hostid - How to handle pool imports - local zbm_cmdline="spl_hostid=0x${host_id} zbm.timeout=3 zbm.prefer=${POOL_NAME} zbm.import_policy=hostid" - - for i in "${!SELECTED_DISKS[@]}"; do - local disk="${SELECTED_DISKS[$i]}" - local label="ZFSBootMenu" - if [[ ${#SELECTED_DISKS[@]} -gt 1 ]]; then - label="ZFSBootMenu-disk$((i+1))" - fi - - # Determine partition number (always 1 - first partition is EFI) - local part_num=1 - - info "Creating EFI boot entry: $label on $disk" - efibootmgr --create \ - --disk "$disk" \ - --part "$part_num" \ - --label "$label" \ - --loader '\EFI\ZBM\zfsbootmenu.efi' \ - --unicode "$zbm_cmdline" \ - --quiet - done - - # Get the boot entry number and set as first in boot order - local bootnum=$(efibootmgr | grep "ZFSBootMenu" | head -1 | grep -oP 'Boot\K[0-9A-F]+') - if [[ -n "$bootnum" ]]; then - # Get current boot order, prepend our entry - local current_order=$(efibootmgr | grep "BootOrder" | cut -d: -f2 | tr -d ' ') - efibootmgr --bootorder "$bootnum,$current_order" --quiet - info "ZFSBootMenu set as primary boot option" - fi - - info "ZFSBootMenu configuration complete." -} -#+end_src - -** Step 8: Delete configure_grub_zfs_snap() - -Location: custom/install-archzfs, lines 1166-1184 - -Delete the entire function - ZFSBootMenu handles snapshot menus natively. - -** Step 9: Update sync_efi_partitions() - -Location: custom/install-archzfs, lines 1285-1315 - -Changes: -- Sync ZFSBootMenu binary instead of GRUB -- Create EFI boot entries for secondary disks - -#+begin_src bash -sync_efi_partitions() { - [[ ${#EFI_PARTS[@]} -le 1 ]] && return - - step "Syncing EFI Partitions for Redundancy" - - for i in "${!EFI_PARTS[@]}"; do - [[ $i -eq 0 ]] && continue - - local efi_part="${EFI_PARTS[$i]}" - local temp_mount="/mnt/efi_sync" - - info "Syncing ZFSBootMenu to EFI partition $((i+1)): $efi_part" - - mkdir -p "$temp_mount" - mount "$efi_part" "$temp_mount" - - # Copy ZFSBootMenu binary - mkdir -p "$temp_mount/EFI/ZBM" - cp /mnt/efi/EFI/ZBM/zfsbootmenu.efi "$temp_mount/EFI/ZBM/" - - umount "$temp_mount" - done - - rmdir "$temp_mount" 2>/dev/null || true - info "All EFI partitions synchronized." -} -#+end_src - -** Step 10: Update cleanup() - -Location: custom/install-archzfs, lines 1379-1393 - -Changes: -- Unmount /mnt/efi instead of /mnt/boot - -#+begin_src bash -# Change: -umount /mnt/boot 2>/dev/null || true - -# To: -umount /mnt/efi 2>/dev/null || true -#+end_src - -** Step 11: Update print_summary() - -Location: custom/install-archzfs, lines 1395-1424 - -Changes: -- Update bootloader references from GRUB to ZFSBootMenu -- Update useful commands section - -#+begin_src bash -# Update the "ZFS Features" section: -echo "ZFS Features:" -echo " - ZFSBootMenu: boot from any snapshot" -echo " - Genesis snapshot: pristine post-install state" -echo " - Pre-pacman snapshots for safe upgrades" -echo "" -echo "Boot Menu Keys (at ZFSBootMenu):" -echo " Enter - Boot selected environment" -echo " e - Edit kernel command line" -echo " Ctrl+D - Show snapshot selector" -echo " Ctrl+R - Recovery shell" -#+end_src - -** Step 12: Update build.sh - -Location: build.sh - -Changes: -- Remove grub-zfs-snap file copies (lines ~375-380) -- Remove grub-zfs-snap permissions (line ~408) -- Keep bootloader configs for live ISO (still uses GRUB/syslinux) - -#+begin_src bash -# DELETE these lines from build.sh: - -# Copy grub-zfs-snap to ISO -cp custom/grub-zfs-snap profile/airootfs/usr/local/bin/grub-zfs-snap -mkdir -p profile/airootfs/usr/local/share/grub-zfs-snap -cp custom/40_zfs_snapshots profile/airootfs/usr/local/share/grub-zfs-snap/ -cp custom/zz-grub-zfs-snap.hook profile/airootfs/usr/local/share/grub-zfs-snap/ - -# And from file_permissions: -["usr/local/bin/grub-zfs-snap"]="0:0:755" -#+end_src - -** Step 13: Update zfssnapshot and zfsrollback - -Location: custom/zfssnapshot, custom/zfsrollback - -Changes: -- Remove calls to grub-zfs-snap -- ZFSBootMenu auto-detects snapshots, no regeneration needed - -#+begin_src bash -# DELETE from zfssnapshot (around line 107): -grub-zfs-snap 2>/dev/null || true - -# DELETE from zfsrollback (around line 177): -grub-zfs-snap 2>/dev/null || true -#+end_src - -** Step 14: Delete GRUB-specific files - -Files to delete from custom/: -- custom/grub-zfs-snap -- custom/40_zfs_snapshots -- custom/zz-grub-zfs-snap.hook - -#+begin_src bash -rm custom/grub-zfs-snap -rm custom/40_zfs_snapshots -rm custom/zz-grub-zfs-snap.hook -#+end_src - -** Step 15: Update main() function call order - -Location: custom/install-archzfs, main() around line 1443 - -Changes: -- Replace configure_bootloader with configure_zfsbootmenu -- Remove configure_grub_zfs_snap call - -#+begin_src bash -# Change this sequence: - configure_initramfs - configure_bootloader # <- rename - configure_grub_zfs_snap # <- delete - configure_zfs_services - -# To: - configure_initramfs - configure_zfsbootmenu # <- new function - configure_zfs_services -#+end_src - -* Testing Plan - -** Test Environment - -- QEMU VM with UEFI firmware (OVMF) -- Multiple test scenarios for different disk configurations -- Existing test script: scripts/test-vm.sh - -** Test 1: Single Disk Install - -#+begin_src bash -# Start VM -./scripts/test-vm.sh - -# In VM, run installer -install-archzfs - -# Select single disk -# Complete installation -# Reboot -#+end_src - -*Validation Points:* -- [ ] EFI partition is 512MB (not 1GB) -- [ ] /efi contains only EFI/ZBM/zfsbootmenu.efi -- [ ] /boot is a directory (NOT a dataset): =zfs list= should NOT show zroot/ROOT/default/boot -- [ ] Kernel files exist in /boot/ (=ls /boot/vmlinuz*=) -- [ ] ZFSBootMenu menu appears on boot -- [ ] Can boot into installed system -- [ ] After login: =zfs get org.zfsbootmenu:commandline zroot/ROOT= shows cmdline (set on parent) -- [ ] After login: =zpool get bootfs zroot= shows zroot/ROOT/default - -** Test 2: Mirror Install (2 disks) - -#+begin_src bash -# Create second virtual disk -qemu-img create -f qcow2 test-disk2.qcow2 50G - -# Modify test-vm.sh to add second disk -# -drive file=test-disk2.qcow2,if=virtio - -# Run installer, select both disks, choose mirror -#+end_src - -*Validation Points:* -- [ ] Both disks have EFI partitions -- [ ] ZFSBootMenu binary exists on both EFI partitions -- [ ] EFI boot entries exist for both disks (efibootmgr -v) -- [ ] Can boot from either disk (test by removing first disk) -- [ ] ZFS pool shows mirror topology (zpool status) - -** Test 3: RAIDZ1 Install (3 disks) - -*Validation Points:* -- [ ] All three disks have EFI partitions with ZFSBootMenu -- [ ] Three EFI boot entries created -- [ ] ZFS pool shows raidz1 topology - -** Test 4: Snapshot Boot - -#+begin_src bash -# After installation and first boot: - -# Create a test file -echo "original" > /root/test.txt - -# Create a snapshot -zfs snapshot zroot/ROOT/default@test-snap - -# Modify the file -echo "modified" > /root/test.txt - -# Reboot, at ZFSBootMenu press Ctrl+D -# Select zroot/ROOT/default@test-snap -# Boot from snapshot -#+end_src - -*Validation Points:* -- [ ] ZFSBootMenu shows snapshot selector with Ctrl+D -- [ ] Snapshot appears in list -- [ ] Booting from snapshot shows original file content -- [ ] Kernel version matches (no mismatch errors) - -** Test 5: Kernel Update Scenario - -#+begin_src bash -# Simulate kernel update (or actually do one) -pacman -Syu - -# Reboot -#+end_src - -*Validation Points:* -- [ ] New kernel is on ZFS (included in future snapshots) -- [ ] ZFSBootMenu detects and boots new kernel -- [ ] No manual regeneration needed (unlike GRUB) - -** Test 6: Recovery Shell - -#+begin_src bash -# At ZFSBootMenu, press Ctrl+R -#+end_src - -*Validation Points:* -- [ ] Recovery shell accessible -- [ ] ZFS pool is importable from recovery -- [ ] Can manually mount and chroot if needed - -** Test 7: Encrypted Pool - -#+begin_src bash -# Run installer with encryption enabled -# Enter passphrase when prompted -#+end_src - -*Validation Points:* -- [ ] ZFSBootMenu prompts for passphrase -- [ ] Pool unlocks successfully -- [ ] System boots normally after passphrase entry - -* Validation Checklist (All Tests) - -** Pre-Installation -- [ ] Live ISO boots successfully -- [ ] ZFS module loads (lsmod | grep zfs) - -** Partitioning -- [ ] EFI partition is 512MB -- [ ] ZFS partition uses remaining space -- [ ] Partition table is GPT - -** Filesystem Layout -- [ ] /efi is vfat, mounted from EFI partition -- [ ] /boot is a directory inside ROOT/default (NOT a separate dataset) -- [ ] Verify: =zfs list= should NOT show a zroot/ROOT/default/boot dataset -- [ ] Kernel/initramfs exist in /boot/ (on the ZFS root filesystem) - -** ZFSBootMenu -- [ ] zfsbootmenu.efi exists at /efi/EFI/ZBM/ -- [ ] EFI boot entry points to ZFSBootMenu -- [ ] org.zfsbootmenu:commandline property set on root dataset -- [ ] hostid included in cmdline (spl.spl_hostid=0x...) - -** Boot Process -- [ ] ZFSBootMenu menu appears -- [ ] Countdown timer works -- [ ] Default boot entry is correct -- [ ] Boot completes successfully -- [ ] All ZFS datasets mount correctly - -** Multi-Disk (if applicable) -- [ ] All EFI partitions contain zfsbootmenu.efi -- [ ] All disks have EFI boot entries -- [ ] Can boot from any disk - -** Snapshots -- [ ] Genesis snapshot created -- [ ] Ctrl+D shows snapshot selector -- [ ] Can boot from snapshot -- [ ] Snapshot includes matching kernel - -** Services -- [ ] zfs-import-scan.service enabled -- [ ] zfs-mount.service enabled -- [ ] NetworkManager starts -- [ ] SSH accessible (if enabled) - -* Rollback Plan - -If ZFSBootMenu implementation fails: - -1. Keep GRUB version in a git branch before changes -2. ISO still boots with GRUB (live environment unchanged) -3. Can install GRUB manually from live environment: - #+begin_src bash - pacstrap /mnt grub - arch-chroot /mnt grub-install --target=x86_64-efi --efi-directory=/boot - arch-chroot /mnt grub-mkconfig -o /boot/grub/grub.cfg - #+end_src - -* Research Findings (from comparable projects) - -This plan incorporates best practices from these open-source Arch+ZFS installers: - -** eoli3n/archiso-zfs + arch-config -- ZFSBootMenu built from source with =generate-zbm= (we use pre-built binary for simplicity) -- Uses file-based encryption key (=/etc/zfs/zroot.key=) embedded in initramfs to avoid double passphrase prompt -- Sets =org.zfsbootmenu:commandline= on ROOT parent for inheritance to all boot environments -- Minimal dataset layout: ROOT, data/home - -** PandaScience/arch-on-zfs -- Uses rEFInd (not ZFSBootMenu), but documents the "smart dataset layout": - - =system/= for root, =userdata/= for home, =nobackup/= for cache/tmp -- Emphasizes =canmount=noauto= on root dataset (we already do this) -- Recommends =/boot= as directory inside root, NOT separate dataset - -** sandreas/zarch -- Downloads ZFSBootMenu binary from =https://get.zfsbootmenu.org/latest.EFI= (adopted) -- Uses efibootmgr =--unicode= parameter for ZFSBootMenu cmdline (adopted) -- ZFSBootMenu parameters: =spl_hostid=, =zbm.timeout=, =zbm.prefer=, =zbm.import_policy= (adopted) -- Uses zrepl for time-based snapshots (we use pacman hooks - complementary approach) - -** danboid/ALEZ -- Two-pool design (bpool + rpool) for GRUB compatibility - NOT needed with ZFSBootMenu -- systemd-boot ZFS entry uses =zfs=POOL/ROOT/default= parameter -- Pool export/reimport pattern for clean state - -** danfossi/Arch-ZFS-Root-Installation-Script -- Uses =compatibility=grub2= pool option for GRUB - NOT needed with ZFSBootMenu -- Good partition suffix helper for nvme/mmcblk naming (we already have this) -- Separate bpool for boot - NOT needed with ZFSBootMenu - -** Key Lessons Adopted - -1. *DO NOT create separate /boot dataset* - must be directory inside root -2. *Set commandline on ROOT parent* - inherited by all boot environments -3. *Use =--unicode= for ZFSBootMenu parameters* - spl_hostid, zbm.timeout, zbm.prefer -4. *Download pre-built EFI binary* - simpler than building from AUR -5. *Copy hostid to installed system* - required for pool import -6. *Set bootfs pool property* - tells ZFSBootMenu default boot target - -** Optional Enhancement: File-Based Encryption Key - -From eoli3n: To avoid entering passphrase twice (ZFSBootMenu + initramfs): - -#+begin_src bash -# During pool creation, use keylocation=file instead of prompt -echo "$ZFS_PASSPHRASE" > /mnt/etc/zfs/zroot.key -chmod 000 /mnt/etc/zfs/zroot.key - -# Add to mkinitcpio FILES -echo 'FILES+=(/etc/zfs/zroot.key)' >> /mnt/etc/mkinitcpio.conf.d/zfs-key.conf - -# Change keylocation -zfs set keylocation=file:///etc/zfs/zroot.key zroot -#+end_src - -Trade-off: Simpler UX (one passphrase) but key is in initramfs on ZFS. -*Current plan uses prompt-based* - user enters passphrase at ZFSBootMenu. - -* References - -** Official Documentation -- ZFSBootMenu Documentation: https://docs.zfsbootmenu.org/ -- ZFSBootMenu GitHub: https://github.com/zbm-dev/zfsbootmenu -- ZFSBootMenu man page: https://docs.zfsbootmenu.org/en/latest/man/zfsbootmenu.7.html -- Arch Wiki ZFS: https://wiki.archlinux.org/title/ZFS - -** Researched Projects -- eoli3n/archiso-zfs: https://github.com/eoli3n/archiso-zfs -- eoli3n/arch-config: https://github.com/eoli3n/arch-config -- PandaScience/arch-on-zfs: https://github.com/PandaScience/arch-on-zfs -- sandreas/zarch: https://github.com/sandreas/zarch -- danboid/ALEZ: https://github.com/danboid/ALEZ -- danfossi/Arch-ZFS-Root-Installation-Script: https://github.com/danfossi/Arch-ZFS-Root-Installation-Script - -** Guides -- Florian Esser's ZFSBootMenu Guide: https://florianesser.ch/posts/20220714-arch-install-zbm/ -- Arch Wiki ZFSBootMenu: https://wiki.archlinux.org/title/User:Kayvlim/Install_UEFI_and_BIOS_compatible_Arch_Linux_with_Encrypted_ZFS_and_ZFSBootMenu - -* Implementation Order - -1. Create git branch: =git checkout -b zfsbootmenu= -2. Delete GRUB files (Step 14) -3. Update build.sh (Step 12) -4. Update install-archzfs (Steps 1-11, 15) -5. Update helper scripts (Step 13) -6. Build new ISO -7. Run Test 1 (single disk) -8. Fix any issues -9. Run Tests 2-7 -10. Merge to main when all tests pass diff --git a/assets/2026-01-22-mkinitcpio-fixes-applied-detail.org b/assets/2026-01-22-mkinitcpio-fixes-applied-detail.org deleted file mode 100644 index 68c6f0e..0000000 --- a/assets/2026-01-22-mkinitcpio-fixes-applied-detail.org +++ /dev/null @@ -1,194 +0,0 @@ -#+TITLE: Detailed mkinitcpio Fixes Applied to ratio -#+DATE: 2026-01-22 - -* Overview - -This documents the exact fixes applied to ratio's mkinitcpio configuration to make it bootable. These fixes worked - the system booted successfully after applying them. The install-archzfs script needs to be updated to apply these configurations during installation. - -* Fix 1: /etc/mkinitcpio.conf HOOKS - -** Problem - -The HOOKS line was configured for a systemd-based initramfs without ZFS support. - -** Before (broken) -#+begin_example -HOOKS=(base systemd autodetect microcode modconf kms keyboard keymap sd-vconsole block filesystems fsck) -#+end_example - -** After (working) -#+begin_example -HOOKS=(base udev autodetect microcode modconf kms keyboard keymap consolefont block zfs filesystems) -#+end_example - -** Changes Explained - -| Removed | Added/Changed | Reason | -|----------------+----------------+-----------------------------------------------------------| -| systemd | udev | ZFS hook is busybox-based, incompatible with systemd init | -| sd-vconsole | consolefont | sd-vconsole is systemd-specific; consolefont is busybox | -| fsck | (removed) | fsck is for ext4/xfs, not needed for ZFS | -| (missing) | zfs | Required to import ZFS pool and mount root at boot | - -** Command Used -#+begin_src bash -sed -i "s/^HOOKS=.*/HOOKS=(base udev autodetect microcode modconf kms keyboard keymap consolefont block zfs filesystems)/" /etc/mkinitcpio.conf -#+end_src - -* Fix 2: Remove /etc/mkinitcpio.conf.d/archiso.conf - -** Problem - -The archzfs live ISO uses a drop-in config file at =/etc/mkinitcpio.conf.d/archiso.conf=. This file was not removed during installation, and it *overrides* the HOOKS setting in mkinitcpio.conf. - -** Contents of archiso.conf (should not exist on installed system) -#+begin_example -HOOKS=(base udev microcode modconf kms memdisk archiso archiso_loop_mnt archiso_pxe_common archiso_pxe_nbd archiso_pxe_http archiso_pxe_nfs block filesystems keyboard) -COMPRESSION="xz" -COMPRESSION_OPTIONS=(-9e) -#+end_example - -** Why This Breaks Things - -Even if mkinitcpio.conf has the correct HOOKS, this drop-in file overrides them with archiso-specific hooks (memdisk, archiso, archiso_loop_mnt, etc.) that are only for the live ISO environment. The =zfs= hook is notably absent. - -** Fix Applied -#+begin_src bash -rm -f /etc/mkinitcpio.conf.d/archiso.conf -#+end_src - -** Note for install-archzfs - -The script should remove this file after arch-chroot setup: -#+begin_src bash -rm -f /mnt/etc/mkinitcpio.conf.d/archiso.conf -#+end_src - -* Fix 3: /etc/mkinitcpio.d/linux-lts.preset - -** Problem - -The preset file was still configured for the archiso live environment, not a normal installed system. - -** Before (broken) -#+begin_example -# mkinitcpio preset file for the 'linux-lts' package on archiso - -PRESETS=('archiso') - -ALL_kver='/boot/vmlinuz-linux-lts' -archiso_config='/etc/mkinitcpio.conf.d/archiso.conf' - -archiso_image="/boot/initramfs-linux-lts.img" -#+end_example - -** After (working) -#+begin_example -# mkinitcpio preset file for linux-lts - -PRESETS=(default fallback) - -ALL_kver="/boot/vmlinuz-linux-lts" - -default_image="/boot/initramfs-linux-lts.img" - -fallback_image="/boot/initramfs-linux-lts-fallback.img" -fallback_options="-S autodetect" -#+end_example - -** Changes Explained - -| Before | After | Reason | -|---------------------------------+------------------------+-----------------------------------------------------| -| PRESETS=('archiso') | PRESETS=(default fallback) | Normal system needs default + fallback images | -| archiso_config=... (drop-in) | (removed) | Don't use archiso drop-in config | -| archiso_image=... | default_image=... | Use standard naming | -| (missing) | fallback_image=... | Fallback image for recovery | -| (missing) | fallback_options="-S autodetect" | Fallback skips autodetect for broader hardware support | - -** Command Used -#+begin_src bash -cat > /etc/mkinitcpio.d/linux-lts.preset << 'EOF' -# mkinitcpio preset file for linux-lts - -PRESETS=(default fallback) - -ALL_kver="/boot/vmlinuz-linux-lts" - -default_image="/boot/initramfs-linux-lts.img" - -fallback_image="/boot/initramfs-linux-lts-fallback.img" -fallback_options="-S autodetect" -EOF -#+end_src - -* Fix 4: Rebuild initramfs - -After applying the above fixes, the initramfs must be rebuilt: - -#+begin_src bash -mkinitcpio -P -#+end_src - -This regenerates both default and fallback images with the correct hooks. - -* Verification - -** Verify HOOKS are correct -#+begin_src bash -grep "^HOOKS" /etc/mkinitcpio.conf -# Should show: HOOKS=(base udev autodetect microcode modconf kms keyboard keymap consolefont block zfs filesystems) -#+end_src - -** Verify no archiso drop-in -#+begin_src bash -ls /etc/mkinitcpio.conf.d/ -# Should be empty or not contain archiso.conf -#+end_src - -** Verify preset is correct -#+begin_src bash -grep "PRESETS" /etc/mkinitcpio.d/linux-lts.preset -# Should show: PRESETS=(default fallback) -#+end_src - -** Verify ZFS hook is in initramfs -#+begin_src bash -lsinitcpio /boot/initramfs-linux-lts.img | grep -E "^hooks/zfs|zfs.ko" -# Should show: -# hooks/zfs -# usr/lib/modules/.../zfs.ko.zst -#+end_src - -* Summary for install-archzfs Script - -The script needs to add these steps after installing packages and before running final mkinitcpio: - -#+begin_src bash -# 1. Set correct HOOKS for ZFS boot -sed -i "s/^HOOKS=.*/HOOKS=(base udev autodetect microcode modconf kms keyboard keymap consolefont block zfs filesystems)/" /mnt/etc/mkinitcpio.conf - -# 2. Remove archiso drop-in config -rm -f /mnt/etc/mkinitcpio.conf.d/archiso.conf - -# 3. Create proper preset file (adjust kernel name if not linux-lts) -cat > /mnt/etc/mkinitcpio.d/linux-lts.preset << 'EOF' -# mkinitcpio preset file for linux-lts - -PRESETS=(default fallback) - -ALL_kver="/boot/vmlinuz-linux-lts" - -default_image="/boot/initramfs-linux-lts.img" - -fallback_image="/boot/initramfs-linux-lts-fallback.img" -fallback_options="-S autodetect" -EOF - -# 4. Rebuild initramfs with correct config -arch-chroot /mnt mkinitcpio -P -#+end_src - -* Result - -After applying these fixes and rebuilding initramfs from the live ISO, ratio booted successfully. The system froze on a subsequent =mkinitcpio -P= run, but that's a separate AMD GPU issue (see 2026-01-22-mkinitcpio-freeze-during-rebuild.org), not a configuration problem. diff --git a/assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org b/assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org deleted file mode 100644 index 1132ddd..0000000 --- a/assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org +++ /dev/null @@ -1,152 +0,0 @@ -#+TITLE: System freezes during mkinitcpio -P rebuild -#+DATE: 2026-01-22 - -* Problem Summary - -After fixing the mkinitcpio configuration issues (see 2026-01-22-mkinitcpio-config-boot-failure.org), the system successfully booted. However, running =mkinitcpio -P= again caused the system to freeze, requiring a power cycle. - -This indicates the mkinitcpio config fix was correct, but there's a separate issue causing freezes during initramfs rebuilds. - -* Timeline - -1. System wouldn't boot due to broken mkinitcpio config (wrong HOOKS, missing zfs) -2. Booted from archzfs live ISO -3. Fixed mkinitcpio.conf, preset file, removed archiso.conf drop-in -4. Rebuilt initramfs via chroot - completed successfully -5. Rebooted - system booted successfully -6. Ran =mkinitcpio -P= again - system froze -7. Had to power cycle, now back on live ISO - -* What This Tells Us - -The mkinitcpio configuration fix was correct (system booted). But something about running mkinitcpio itself is triggering a system freeze. - -* Suspected Cause: AMD GPU Power Gating Bug - -ratio has an AMD Strix Halo GPU (RDNA 3.5) with a known VPE power gating bug. When the VPE (Video Processing Engine) tries to power gate after 1 second of idle, the SMU hangs and the system freezes. - -Symptoms before freeze: -#+begin_example -amdgpu: SMU: I'm not done with your previous command -amdgpu: Failed to power gate VPE! -[drm:vpe_set_powergating_state] *ERROR* Dpm disable vpe failed, ret = -62 -#+end_example - -The fix is to disable power gating via =/etc/modprobe.d/amdgpu.conf=: -#+begin_example -options amdgpu pg_mask=0 -#+end_example - -*CRITICAL*: After creating this file, must run =mkinitcpio -P= to include it in initramfs (the modconf hook reads /etc/modprobe.d/ at build time). - -* The Chicken-and-Egg Problem - -1. Need to run =mkinitcpio -P= to apply the GPU fix (include amdgpu.conf in initramfs) -2. But running =mkinitcpio -P= triggers the GPU freeze -3. The fix can't be applied because applying it causes the problem it's meant to fix - -* Possible Solutions to Investigate - -** Option 1: Apply GPU fix at runtime before mkinitcpio - -Before running mkinitcpio, manually set pg_mask at runtime: -#+begin_src bash -echo 0 | sudo tee /sys/module/amdgpu/parameters/pg_mask -#+end_src - -Then run mkinitcpio while power gating is disabled. This might prevent the freeze. - -** Option 2: Build initramfs from live ISO - -Boot from archzfs live ISO (which doesn't have the GPU issue), mount the system, and rebuild initramfs from there. The live ISO uses a different GPU driver state. - -We tried this and it worked - the rebuild completed. But then running mkinitcpio on the booted system froze. - -** Option 3: Add amdgpu.conf before rebuilding from live ISO - -When rebuilding from live ISO: -1. Create /etc/modprobe.d/amdgpu.conf with pg_mask=0 -2. Rebuild initramfs -3. Boot - now the GPU fix should be in effect -4. Future mkinitcpio runs might not freeze - -This might work because the initramfs would load with power gating disabled from the start. - -** Option 4: Wait for kernel 6.18+ - -The upstream fix (VPE_IDLE_TIMEOUT increased from 1s to 2s) is in kernel 6.15+. When linux-lts reaches 6.18, the workaround won't be needed. - -Current: linux-lts 6.12.66 -Target: linux-lts 6.18 - -* Current State of ratio - -- Booted to archzfs live ISO -- ZFS pool: zroot (mirror of nvme0n1p2 + nvme1n1p2) -- mkinitcpio.conf: FIXED (has correct HOOKS with zfs) -- /etc/mkinitcpio.conf.d/archiso.conf: REMOVED -- /etc/mkinitcpio.d/linux-lts.preset: FIXED -- /etc/modprobe.d/amdgpu.conf: EXISTS but may not be in initramfs -- Current pg_mask value on booted system: Unknown (need to check after boot) - -* Verification Commands - -Check if GPU fix is active: -#+begin_src bash -cat /sys/module/amdgpu/parameters/pg_mask -# Should return: 0 -# If returns 4294967295 (0xFFFFFFFF), fix is NOT active -#+end_src - -Check if amdgpu.conf is in initramfs: -#+begin_src bash -lsinitcpio /boot/initramfs-linux-lts.img | grep amdgpu -#+end_src - -* Recovery Procedure (Option 3 - recommended) - -From archzfs live ISO: - -#+begin_src bash -# Import and mount ZFS -zpool import -f zroot -zfs mount zroot/ROOT/default -mount /dev/nvme0n1p1 /boot - -# Ensure GPU fix file exists -cat > /etc/modprobe.d/amdgpu.conf << 'EOF' -# Disable power gating to prevent VPE freeze on Strix Halo GPUs -# Remove this file when linux-lts reaches 6.18+ -options amdgpu pg_mask=0 -EOF - -# Mount system directories for chroot -mount --rbind /dev /dev -mount --rbind /sys /sys -mount --rbind /proc /proc -mount --rbind /run /run - -# Rebuild initramfs (should include amdgpu.conf via modconf hook) -chroot / mkinitcpio -P - -# Verify amdgpu.conf is in initramfs -lsinitcpio /boot/initramfs-linux-lts.img | grep amdgpu - -# Reboot and test -reboot -#+end_src - -After reboot, verify pg_mask=0 is active, then test =mkinitcpio -P= again. - -* Related Files - -- [[file:2026-01-22-mkinitcpio-config-boot-failure.org]] - The config fix that was applied -- archsetup NOTES.org - AMD GPU freeze diagnosis details - -* Machine Details - -- Machine: ratio (desktop) -- CPU: AMD (Strix Halo) -- GPU: AMD RDNA 3.5 (integrated) -- Storage: Two NVMe in ZFS mirror -- Kernel: linux-lts 6.12.66-1 diff --git a/assets/Donato Capitella-ROCm+Linux Support on Strix Halo: It's finally stable in 2026!.txt b/assets/Donato Capitella-ROCm+Linux Support on Strix Halo: It's finally stable in 2026!.txt deleted file mode 100644 index 322893a..0000000 --- a/assets/Donato Capitella-ROCm+Linux Support on Strix Halo: It's finally stable in 2026!.txt +++ /dev/null @@ -1 +0,0 @@ -Speaker A: In this video I want to give you an update on the current state of Linux support for Streak's Halo. I know that most of my recent viewers either have a device with this AMD APU or are thinking of buying one. As a reminder, this is the integrated GPU Inside AMD Ryzen AI Max and it's codenamed GFX 1151. Now over the last two months there's been a lot of confusion, broken setups and contradictory advice and this video is meant to clarify what changed and what actually works. Now if you've been trying to run LLMs, ComfyUI or other ROCM based AI workflows on Strix Halo and things broke depending on your distribution, kernel and ROCM version, this this wasn't user error. The software stack itself was inconsistent and only recently it has started to converge again to something stable. Now if you just want a working system without digging into the details, here's what you have to Use Linux firmware 20260110 or newer avoid 20251125 that firmware is broken for ROCM on Streak's Halo. Use Linux kernel 6.18.4 or newer use my toolboxes that have the ROCM nightly builds from the Rock or alternatively the ROCM 7.2 builds once they are officially released. This is currently the only combination that includes the full stability fixes for GFX 1151. Importantly, if you try to run older versions of ROCM on newer kernels, these won't work. If you want more details about what's been happening, keep watching this video as essentially we had two major unrelated issues plaguing this IGPU before moving on the usual ask the research that goes into these videos is fun to do but also time consuming. I really appreciate it if you could take a second to support the channel in all the usual ways like subscribing, liking and commenting on the video. Your support does make a difference. Back in November, AMD pushed a Linux firmware update that got bundled into Linux firmware 20251125 and quickly made its way into major Linux distributions like Fedora. Unfortunately, that firmware completely broke ROCAM support on Streak's Halo. ROCAM would simply fail to initialize and became unusable. AMD reverted that firmware fairly quickly, but several distributions never picked up. The revert Fedora is the most obvious example. For roughly two months a fully up to date Fedora system simply could not run ROCM on Streak's Halo. I find the reluctance from the Fedora maintainers to push a fix for these hard to understand this wasn't a corner case or an obscure configuration issue. It completely broke a flagship AMD platform for a whole cluster of users doing GPU compute. The only workaround during that period was to manually downgrade the Linux firmware to 2025 1111, which was the last known one working version. I documented this downgrade process and the link is in the description and a lot of people ended up having to do that in order to run Rocm on their Strixelo systems. But finally, in January 2026 a new firmware release, Linux firmware 202260110 started landing in mainstream distributions. That version restored RAW CAM functionality without requiring a downgrade. So from a firmware persp this specific regression is now resolved if you update your system. However, that firmware update only addressed the immediate RAW QM regression, but it did not fix the underlying stability problems, which were caused by a separate issue elsewhere in the stack. Typical symptoms were GPU kernel crashes and resets, causing AI workflows to fail randomly, and ComfyUI is a good example here. It's the de facto standard software used for image and video generation and it really stresses the gpu. On STRIX Halo it would often work briefly and then fall over, which exposed the ROCM stability issues we've been talking about. This is also why I didn't focus much on ComfyUI in my earlier video on image and video generation. At that time time it simply wasn't stable enough on STRIX Halo to recommend AMD finally identified the underlying issue causing all this trouble. The fixed turned out to require changes in two places at the same time, the AMD GPU driver in the Linux kernel and ROCM itself. The core problem was a mismatch in how hardware resource limits were defined and communicated for GFX 1151, specifically around something called VGPRS vector general purpose registers. For Streak's Halo, the actual VGPR capacity is significantly higher than what ROCM had been assuming. All the ROCM versions were effectively using the wrong register limits for this gpu. This led to GPU kernels being scheduled with invalid assumptions about available registers. The result wasn't a clean failure, it was undefined behavior, often resulting in heap kernel hangs and eventually GPU resets. AMD addressed this by changing both sides of the stack. The important point is that both sides must agree. If the kernel thinks more registers are available but ROCAM still assumes the old limits or vice versa. The runtime ends up scheduling work that doesn't Line up with what is expected by the hardware, which leads to failures. These fixes landed in mainline Linux starting with kernel 6.18.4 with matching changes in ROCM, and this is the key point. The kernel fix and the ROCM fix must be used together. This is where most of the current confusion comes from. If you run kernel 6.18.4 or newer with an older rocm version, for example 6.4.4 or 7.1.1, things will break immediately. This isn't a regression in those ROCM versions, it's a compatibility mismatch. The kernel now expects ROCM to behave differently. Older ROCM builds don't know about these changes, so the stack crashes. The first ROCM release that properly matches these kernel changes will be ROCM 7.2, but at the time of recording 7.2 hasn't been officially released yet. That's why if you are on a newer kernel today, you need to use ROCM nightly B builds from the ROC which already include this fix. All of my current toolboxes provide this option. The table on screen now summarizes what combinations actually work on GFX 1151 and which ones are known to break. This is the part most people trip over. Right now there are two validations configurations the new kernel path, which means kernel 6.18.4 or newer ROCM builds that already include the fixes, which is the nightly builds from the ROC and toolboxes built against these ROCM nightly builds. The second configuration is the old ROCM compatibility path, which means ROCM versions 6.4 and 7.1 kernel 6.18.3 or older. Mixing these parts does not work. That's the key takeaway. If you update your kernel but keep using older ROM toolboxes, you will hit crashes. If you want to stay on older ROCM versions for benchmarking or comparison, you must also stay on the older kernel. This is Donato from the future. As I made it in this video, I realized I want to make an additional point. It is perfectly possible that in the next few months AMD decides to cherry pick and include this stability patch back into older branches of ROCM. So we might have for example a 6.4.5 release which includes this fix and the same might happen for 7.1. We might have a 7.1.2. I don't know this, but it is possible. Likewise, distributions like Fedora build ROCM from scratch and this particular patch is incredibly easy to cherry pick. MBAC port and I think that Fedora is currently doing that. It's currently looking at backporting this particular patch. So long story short, it might become possible to use older version of RAW Cam with newer kernels pretty soon. Up to now, I've kept multiple toolboxes around using different RAW Cam versions. That was intentional. Rocam performance on GFX 1151 has been quite inconsistent, and in some cases older versions were genuinely faster. But with the kernel fixes in place and with AMD now clearly moving forward with Strix Halo as a supported AI platform, especially after the Ryzen AI halo announcement at CES 2026, it no longer makes sense to anchor on old stacks. Performance improvements will eventually land in the latest ROCAM versions, even if there are still some regressions today due to a mix of RAW M and for example LLAMA CPP changes. These are all being worked on as we speak, so over time I'll be retiring the older Rocam toolboxes and focusing on the latest stack. As a result, I can now release a Strix Halo toolbox focused on config UI with proper benchmarks and stability, similar to what I did with the Radio 9 700. That toolbox is ready and a dedicated video on ComfyUI performance and workflows on Streaksalo is coming next. diff --git a/assets/cogito-hardware-specs.txt b/assets/cogito-hardware-specs.txt deleted file mode 100644 index 7a3b285..0000000 --- a/assets/cogito-hardware-specs.txt +++ /dev/null @@ -1,17 +0,0 @@ -Framework Desktop ML (cogito) Hardware Specifications - -Model: AMD Ryzen AI MAX+ 395 (Strix Halo) -CPU: 16 cores Zen 5 -GPU: Radeon 8060S (40 RDNA 3.5 CUs) -Total RAM: 128GB unified memory -Max VRAM: 96GB (via AMD Variable Graphics Memory) -NPU: XDNA 2, 50+ peak AI TOPS -GPU Arch: gfx1151 -Peak Perf: 59.4 FP16/BF16 TFLOPS @ 2.9GHz - -Inference Performance (AMD testing with LM Studio 0.3.11 / llama.cpp 1.18): -- Small models (1-3B): ~100+ tokens/sec -- Medium models (7-8B): ~60-80 tokens/sec -- Large models (20B): ~58 tokens/sec -- Very large models (120B): ~38 tokens/sec -- Context support: Up to 256K tokens with Flash Attention @@ -421,7 +421,8 @@ if grep -q "file_permissions=" "$PROFILE_DIR/profiledef.sh"; then fi # Copy archsetup into airootfs (exclude large/unnecessary directories) -if [[ -d /home/cjennings/code/archsetup ]]; then +ARCHSETUP_DIR="${ARCHSETUP_DIR:-$HOME/code/archsetup}" +if [[ -d "$ARCHSETUP_DIR" ]]; then info "Copying archsetup into ISO..." mkdir -p "$PROFILE_DIR/airootfs/code" rsync -a --exclude='.git' \ @@ -430,7 +431,7 @@ if [[ -d /home/cjennings/code/archsetup ]]; then --exclude='test-results' \ --exclude='*.qcow2' \ --exclude='*.iso' \ - /home/cjennings/code/archsetup "$PROFILE_DIR/airootfs/code/" + "$ARCHSETUP_DIR" "$PROFILE_DIR/airootfs/code/" fi # Pre-populate tealdeer (tldr) cache for offline use diff --git a/scripts/build-release b/scripts/build-release index 83f34b9..bdd6711 100755 --- a/scripts/build-release +++ b/scripts/build-release @@ -139,7 +139,7 @@ distribute_truenas() { fi info "Copying to $TRUENAS_HOST:$TRUENAS_PATH/" - if $scp_cmd "$ISO_FILE" "cjennings@$TRUENAS_HOST:$TRUENAS_PATH/"; then + if $scp_cmd "$ISO_FILE" "$REAL_USER@$TRUENAS_HOST:$TRUENAS_PATH/"; then info "Done: $TRUENAS_HOST:$TRUENAS_PATH/$ISO_NAME" TRUENAS_SUCCESS=true else diff --git a/todo.org b/todo.org deleted file mode 100644 index b650d94..0000000 --- a/todo.org +++ /dev/null @@ -1,844 +0,0 @@ -* Archangel Open Work -** TODO [#A] Rename ISO to (project)-(YYYY-MM-DD)-vmlinuz-(version).iso -Current format: archangel-vmlinuz-6.12.66-lts-2026-01-24-x86_64.iso -New format: archangel-2026-01-24-vmlinuz-6.12.66-lts-x86_64.iso - -Date should come right after project name for easier sorting and identification. -Update build.sh ISO_NAME variable. - -** TODO [#A] Manually verify LUKS btrfs installations boot correctly -Automated reboot testing for LUKS configs is blocked - can't send passphrase to -initramfs encrypt hook via QEMU. Installation tests pass, but need manual verification -that systems actually boot and decrypt correctly. - -Test on physical hardware or VM with manual interaction: -1. Boot from archangel ISO -2. Run: archangel --config-file with LUKS config -3. Reboot, enter passphrase at GRUB prompt -4. Enter passphrase at initramfs prompt -5. Verify system boots to login - -Configs to test: btrfs-luks (single disk), btrfs-mirror-luks (2-disk RAID1) - -See [[file:docs/TESTING-STRATEGY.org][TESTING-STRATEGY.org]] for background on automation limitations. - -** TODO [#A] Fix mkinitcpio configuration in archangel (causes boot failure) -After kernel updates or mkinitcpio regeneration, systems fail to boot because archangel -leaves incorrect mkinitcpio configuration from the live ISO environment. - -See [[file:docs/2026-01-22-mkinitcpio-config-boot-failure.org][bug report]] for full details. - -*** Three issues to fix - -1. *Wrong HOOKS in mkinitcpio.conf* - uses systemd init (incompatible with ZFS hook), missing zfs hook - #+BEGIN_SRC bash - sed -i 's/^HOOKS=.*/HOOKS=(base udev autodetect microcode modconf kms keyboard keymap consolefont block zfs filesystems)/' /mnt/etc/mkinitcpio.conf - #+END_SRC - -2. *Leftover archiso.conf drop-in* - overrides HOOKS setting - #+BEGIN_SRC bash - rm -f /mnt/etc/mkinitcpio.conf.d/archiso.conf - #+END_SRC - -3. *Wrong preset file* - has archiso configuration instead of standard - #+BEGIN_SRC bash - cat > /mnt/etc/mkinitcpio.d/linux-lts.preset << 'EOF' - PRESETS=(default fallback) - ALL_kver="/boot/vmlinuz-linux-lts" - default_image="/boot/initramfs-linux-lts.img" - fallback_image="/boot/initramfs-linux-lts-fallback.img" - fallback_options="-S autodetect" - EOF - #+END_SRC - -4. *Rebuild initramfs after fixing* - #+BEGIN_SRC bash - arch-chroot /mnt mkinitcpio -P - #+END_SRC - -** TODO [#A] Build AUR packages and include in ISO as local repository -Build AUR packages during ISO creation and include them in a local pacman repository. -This allows AUR software to work both in the live environment AND be installable to target systems. - -*** Implementation Plan - -**** 1. Create build infrastructure -Add to build.sh or separate script (build-aur.sh): -#+BEGIN_SRC bash -build_aur_packages() { - local aur_packages=(downgrade yay sanoid informant rate-mirrors) - local repo_dir="$PROJECT_DIR/aur-packages" - local build_dir="/tmp/aur-build" - - mkdir -p "$repo_dir" "$build_dir" - - for pkg in "${aur_packages[@]}"; do - info "Building AUR package: $pkg" - git clone --depth 1 "https://aur.archlinux.org/${pkg}.git" "$build_dir/${pkg}" - (cd "$build_dir/${pkg}" && makepkg -s --noconfirm --needed) - cp "$build_dir/${pkg}"/*.pkg.tar.zst "$repo_dir/" - done - - # Create/update repo database - repo-add "$repo_dir/aur.db.tar.gz" "$repo_dir"/*.pkg.tar.zst -} -#+END_SRC - -**** 2. Add local repo to ISO's pacman.conf -In profile/pacman.conf, add: -#+BEGIN_SRC ini -[aur] -SigLevel = Optional TrustAll -Server = file:///usr/share/aur-packages -#+END_SRC - -**** 3. Copy repo into ISO -In build.sh, copy aur-packages/ to profile/airootfs/usr/share/aur-packages/ - -**** 4. Add packages to packages.x86_64 -Once in local repo, packages can be listed normally and pacman will find them. - -**** 5. Make available during installation -In install-archzfs, copy local repo to target or mount it: -#+BEGIN_SRC bash -# Copy AUR repo to target for installation -cp -r /usr/share/aur-packages /mnt/usr/share/ -# Add repo to target's pacman.conf temporarily -#+END_SRC - -*** AUR Packages to Include - -**** Essential (Priority A) -| Package | Description | Why needed | -|---------+-------------+------------| -| downgrade | Roll back to previous package versions | Essential for recovery when updates break | -| yay | AUR helper | Users can install additional AUR packages | -| informant | Check Arch news before upgrading | Prevents breaking changes from surprises | -| arch-wiki-lite | Offline Arch Wiki with CLI reader | Documentation when network is down | - -**** ZFS Management (Priority A) -| Package | Description | Why needed | -|---------+-------------+------------| -| sanoid | ZFS snapshot policy management | Automated snapshot creation/pruning | -| syncoid | ZFS replication tool (part of sanoid) | Backup to remote systems | -| zrepl | ZFS replication daemon | Alternative to sanoid for replication | - -**** System Maintenance (Priority B) -| Package | Description | Why needed | -|---------+-------------+------------| -| rate-mirrors | Fast Arch mirror selection | Better than reflector for speed | -| paru | Alternative AUR helper (Rust) | Some prefer over yay | -| pacman-cleanup-hook | Auto-remove old package cache | Disk space management | -| arch-audit | CVE security monitoring | Check for vulnerable packages | - -**** Recovery Tools (Priority B) -| Package | Description | Why needed | -|---------+-------------+------------| -| ventoy-bin | Create multiboot USB drives | Useful rescue tool | -| topgrade | Universal upgrade tool | Update everything at once | -| mkinitcpio-firmware | Suppress firmware warnings | Cleaner initramfs builds | - -**** Nice to Have (Priority C) -| Package | Description | Why needed | -|---------+-------------+------------| -| zfs-auto-snapshot | Automatic ZFS snapshots | Simple cron-based snapshots | -| btop | Modern resource monitor | Better than htop | -| duf | Modern disk usage viewer | Better than df | -| dust | Modern du replacement | Intuitive disk usage | -| procs | Modern ps replacement | Better process viewer | - -*** Considerations -- Build must run on Arch Linux (or in Arch container) -- Some AUR packages have dependencies that are also AUR - need to handle build order -- Package versions will be frozen at ISO build time -- Consider caching built packages to speed rebuilds -- May want to GPG sign the local repo for security - -*** Size Estimate -Most AUR packages are small (<5MB each). Estimate ~50-100MB for full suite. -Significantly less than pre-cloning git repos. - -** TODO [#A] Fix ZFS rollback breaking boot (/boot not on ZFS) -ZFS rollbacks can leave the system unbootable because /boot is on a separate EFI partition -that doesn't get rolled back with the ZFS root filesystem. - -*** The Problem -When rolling back ZFS: -- /usr/lib/modules/ (kernel modules) gets rolled back -- /var/lib/pacman/ (package database) gets rolled back -- Everything else on ZFS root gets rolled back - -But /boot (EFI partition) does NOT roll back: -- Kernel images (vmlinuz-*) remain at newer version -- Initramfs images remain (may reference missing modules) -- GRUB config still lists kernels that may not have matching modules - -Result: After rollback, GRUB shows kernels that can't boot because their modules -no longer exist on root. User gets kernel panic or missing module errors. - -*** Why This Matters -- Kernel updates happen frequently and often go unnoticed -- User does ZFS rollback for unrelated reason -- System fails to boot with confusing errors -- Defeats the purpose of ZFS snapshots for easy recovery - -*** Solutions - -**** Option 1: ZFSBootMenu (Recommended) -Replace GRUB with ZFSBootMenu which is designed for ZFS boot environments. -- Boots directly from ZFS snapshots -- Kernel and initramfs stored on ZFS (rolled back together) -- Can select boot environment from boot menu -- See existing task below for implementation details - -**** Option 2: Put /boot on ZFS -- GRUB can read ZFS (with limitations) -- Requires careful GRUB configuration -- May have issues with ZFS features GRUB doesn't support - -**** Option 3: Sync /boot snapshots with ZFS -- Script to backup /boot before ZFS snapshot -- Restore /boot when rolling back ZFS -- More complex, error-prone - -**** Option 4: Always rebuild initramfs after rollback -- Document this as required step -- Add helper script to automate -- Doesn't help if kernel package itself was rolled back - -*** References -- https://zfsbootmenu.org/ -- https://wiki.archlinux.org/title/Install_Arch_Linux_on_ZFS -- https://openzfs.github.io/openzfs-docs/Getting%20Started/Arch%20Linux/index.html - -** TODO [#A] Prep archangel for open sourcing -Address all issues preventing the codebase from being publicly released. - -*** Critical — Must Fix -- [ ] Remove phone number from docs/protocols.org (also in git history — consider BFG or filter-branch) -- [ ] Remove or gitignore personal docs/ files (session history, personal workflows, protocols with personal info) -- [ ] Create LICENSE file (GPL-3.0 — README already references it) -- [ ] Change SigLevel from "Optional TrustAll" to "Never" in build.sh (already fixed in custom/archangel) - -*** Major — Should Fix -- [ ] Remove hardcoded /home/cjennings/code/archsetup path in build.sh (make configurable or env var) -- [ ] Remove hardcoded distribution paths in build-release (~/downloads/isos, ~/code/archsetup/inbox, truenas.local) -- [ ] Add prominent warnings about SSH root login with known password on live ISO -- [ ] Scrub personal email addresses (c@cjennings.net, craigmartinjennings@gmail.com) from scripts and docs - -*** Moderate — Code Quality -- [ ] Fix ShellCheck SC2155 warnings in custom/lib/*.sh (declare and assign separately) -- [ ] Don't echo root password in build.sh output (line 481) -- [ ] Add set -u to scripts for undefined variable safety -- [ ] Remove dead pacman-key import code (already removed from archangel, verify no remnants) - -*** Minor — Polish -- [ ] Add CONTRIBUTING.md with development setup and contribution guidelines -- [ ] Make OVMF firmware paths configurable in test-vm.sh (not portable across distros) -- [ ] Clean up commented-out code in archangel.conf.example (or add explanations) -- [ ] Standardize shebang lines across scripts (#!/usr/bin/env bash) - -** TODO [#A] Update README.org with GRUB references amd ZFSBootMenu referemces - -Note: the information here is old. we have added btrfs to the filesystem choices since this task was written. this task still needs to be completed, just considering that we use both grub for btrfs and zfsbootmenu for zfs. - -README.org contains multiple outdated references to GRUB that are now incorrect: -- Line 19: "EFI Boot Redundancy - GRUB installed on all disks" - now uses ZFSBootMenu for ZFS insalls -- Lines 417-472: Entire section on "grub-zfs-snap" and GRUB snapshot boot entries - doesn't exist -- Lines 98-100: Project structure lists grub-zfs-snap, zfs-snap-prune, 40_zfs_snapshots - files don't exist - -*** Actions -- Remove/update "EFI Boot Redundancy" line to mention ZFSBootMenu -- Delete or rewrite "ZFS Snapshot Boot Entries (grub-zfs-snap)" section -- Update project structure to reflect actual files -- Update "Post-Installation" section for ZFSBootMenu workflow - -** TODO [#A] Add LICENSE file (GPL-3.0) -README.org line 723 references [[file:LICENSE][LICENSE]] but the file doesn't exist. -Create LICENSE file with GPL-3.0 text as stated in README. - -** TODO [#A] Delete or complete custom/archsetup-zfs -The script has full function definitions but main() just prints "this is a skeleton". -A skeleton script that pretends to work is worse than nothing. - -*** Options -1. Delete it entirely - users can run archsetup from ~/code/archsetup -2. Complete the implementation -3. Replace with a simple launcher that calls archsetup with ZFS-specific flags - -** TODO [#A] Add initial user password to install-archzfs config -Currently hardcoded as "welcome" in archsetup-zfs. Should be configurable via: -- Interactive prompt during install-archzfs -- Config file option for unattended installs -- Document that password must be changed on first login - -** TODO [#A] Add ZFS encrypted volume tests -ZFS native encryption is implemented in lib/zfs.sh but not tested. -All current ZFS test configs use NO_ENCRYPT=yes. - -*** Implementation Details -ZFS encryption uses native AES-256-GCM (not LUKS). The passphrase is: -1. Provided at pool creation: echo "$passphrase" | zpool create ... -O encryption=aes-256-gcm -O keyformat=passphrase -2. Stored in ZFS properties (encrypted) -3. Prompted by ZFSBootMenu at boot time (not initramfs encrypt hook) - -*** Test Configs to Create -- zfs-luks.conf - Single disk ZFS with encryption -- zfs-mirror-luks.conf - 2-disk mirror with encryption - -*** Config Format -#+BEGIN_SRC -HOSTNAME=test-zfs-luks -FILESYSTEM=zfs -DISKS=/dev/vda -ZFS_PASSPHRASE=testpassphrase -ROOT_PASSWORD=testpass -ENABLE_SSH=yes -#+END_SRC - -*** Verification Checks -- zpool get feature@encryption zroot (should show enabled) -- zfs get encryption zroot/ROOT (should show aes-256-gcm) -- zfs get keystatus zroot/ROOT (should show available after unlock) - -*** Notes -- ZFSBootMenu handles passphrase prompt at boot -- Test framework may need adjustment if ZFSBootMenu prompt differs from LUKS -- Passphrase unlocks all datasets in pool (single prompt) - -** DONE [#A] Install Arch Wiki on ISO for offline package help -CLOSED: [2026-01-24 Sat] -Added to profile/packages.x86_64: -- arch-wiki-docs (full HTML version at /usr/share/doc/arch-wiki/html/) -- arch-wiki-lite (CLI search via wiki-search command) - -Both are in official repos (not AUR). Documented in RESCUE-GUIDE.txt. - -** DONE [#A] Integrate ZFSBootMenu as alternative boot manager -CLOSED: [2026-01-24 Sat] -Implemented in custom/lib/zfs.sh configure_zfsbootmenu(). -Downloads EFI binary from get.zfsbootmenu.org and installs to /efi/EFI/ZBM/. -ZFS installs use ZFSBootMenu; btrfs installs use GRUB. - -** TODO [#B] grub menu too small on hidpi displays -** TODO [#B] Add --chroot mode to archsetup for in-chroot execution -Enable running archsetup from within install-archangel chroot so users get a fully -configured workstation on first boot instead of running archsetup manually post-reboot. - -*** Required changes to ~/code/archsetup/archsetup - -**** 1. Add --chroot flag to argument parsing -#+BEGIN_SRC bash -chroot_mode=false ---chroot) - chroot_mode=true - shift - ;; -#+END_SRC - -**** 2. Skip systemctl start calls (5 locations) -Wrap in chroot check - services will start on reboot: -- Line 712: systemctl start systemd-resolved -- Line 807: systemctl start rngd -- Line 877: systemctl start sshd -- Line 905: systemctl start fail2ban -- Line 1199: systemctl start grub-btrfsd - -**** 3. Skip ping network checks (3 locations) -Network available via live ISO, skip validation: -- Line 243: connectivity check -- Lines 1105-1107: TrueNAS detection - -**** 4. Skip tmpfs mount (line 702) -Not needed/problematic in chroot - use regular directory. - -*** Integration with install-archangel -At end of install, prompt: "Run archsetup for full workstation setup? [y/N]" -If yes: -1. Copy archsetup into chroot (or bind mount) -2. Run: arch-chroot /mnt /path/to/archsetup --chroot --config-file /path/to/config -3. Continue with reboot - -*** What works in chroot (confirmed by scan) -- All systemctl enable calls (20+) -- User creation (useradd, chpasswd, usermod) -- Package installation (pacman, yay/makepkg) -- Git clone operations -- All file/config operations - -** TODO [#B] Delete stale SESSION-CONTEXT.md -SESSION-CONTEXT.md in project root is from 2026-01-19 and references old GRUB workflow. -Superseded by docs/session-context.org. Delete to avoid confusion. - -** TODO [#B] Move PLAN-zfsbootmenu-implementation.org to docs/ -Implementation plan files should be in docs/ or archived after completion. -The plan is complete - ZFSBootMenu is now the bootloader. - -** TODO [#B] Clean up docs/ directory -- Delete docs/someday-maybe.org (empty, 0 bytes) -- Move date-specific docs from assets/ to docs/ for consistency -- Document or delete docs/scripts/ directory (unclear purpose) - -** TODO [#B] Fix Makefile lint target to fail on errors -Current lint target has `|| true` which swallows shellcheck errors: -#+BEGIN_SRC makefile -lint: - @shellcheck -x build.sh scripts/*.sh custom/install-archzfs ... || true -#+END_SRC - -Change to actually fail on lint errors so CI can catch issues. - -** TODO [#B] Document or gitignore unclear directories -These directories exist but aren't documented or gitignored: -- zfs-packages/ - unclear purpose -- reference-repos/ - unclear purpose -- test-logs/ - should probably be gitignored - -** TODO [#B] Fill in README.org #+AUTHOR field -Line 2 has empty #+AUTHOR: - looks unfinished. Add author info. - -** TODO [#B] Set up CI/CD pipeline for automated ISO builds - -*** Options to evaluate -- Self-hosted on TrueNAS (primary target) - - Gitea + Gitea Actions or Drone CI - - Jenkins in a jail/VM - - Woodpecker CI (lightweight Drone fork) -- GitHub Actions (if repo mirrored to GitHub) -- GitLab CI (self-hosted or gitlab.com) - -*** Requirements -- Arch Linux build environment (container or VM) -- Sudo/root access for mkarchiso -- ~10GB disk space per build -- Caching for pacman packages to speed builds - -*** Considerations -- Trigger builds on push to main -- Scheduled builds (weekly?) to catch upstream updates -- Store artifacts (ISO) with retention policy -- Notifications on build failure -- Test automation (boot ISO in QEMU, run checks) - -*** TrueNAS-specific tips -- Use a jail or VM for the CI runner -- Consider bhyve VM with Arch Linux for native builds -- Mount dataset for build artifacts and cache -- Snapshot before/after builds for easy cleanup - -** TODO [#B] Add pre-flight validation to install-archzfs -Validate configuration and environment before any destructive operations. -Fail fast with clear error messages rather than failing mid-install. - -*** Validations to add -- Disk exists and is accessible -- Disk is not mounted or in use -- Sufficient disk space (minimum 20GB recommended) -- Network connectivity (for package downloads) -- Required commands available (zpool, zfs, etc.) -- Config file syntax valid (if using unattended mode) -- EFI variables accessible (for UEFI installs) - -*** Benefits -- Prevents partial installations that leave system in bad state -- Clear error messages help users fix issues before starting -- Reduces support burden from avoidable failures -- Aligns with "fail fast" testing principle - -*** Implementation -Add validate_environment() function called before any disk operations: -#+BEGIN_SRC bash -validate_environment() { - local errors=0 - - # Check disk exists - [[ -b "$INSTALL_DISK" ]] || { error "Disk $INSTALL_DISK not found"; ((errors++)); } - - # Check not mounted - mountpoint -q "$INSTALL_DISK"* && { error "Disk is mounted"; ((errors++)); } - - # Check ZFS tools - command -v zpool >/dev/null || { error "zpool not found"; ((errors++)); } - - [[ $errors -gt 0 ]] && exit 1 -} -#+END_SRC - -** TODO [#B] Create Makefile with distinct build targets -Replace or supplement build.sh with a Makefile for cleaner build orchestration. - -*** Proposed targets -- make deps - Install all dependencies (pacman + AUR) needed to build and test -- make lint - Run shellcheck on all bash scripts -- make build - Full ISO build (current build.sh behavior) -- make clean - Remove work/ and output/ directories -- make test - Run VM tests (single disk, mirror, raidz) - depends on lint -- make test-quick - Quick single-disk test only - depends on lint -- make aur - Build AUR packages only -- make iso - Build ISO only (skip AUR if already built) -- make deploy - Copy ISO to truenas.local and USB drive (if present) -- make all - Full build + tests - -*** Benefits -- Familiar interface for developers -- Dependency tracking (rebuild only what changed) -- Parallel execution where possible -- Self-documenting (make help) -- Easy CI/CD integration - -*** Considerations -- Keep build.sh as the underlying implementation -- Makefile calls build.sh with appropriate flags -- Or refactor build.sh logic into Makefile directly - -** TODO [#B] Add Docker/Podman container support for builds -Idea from: https://github.com/stevleibelt/arch-linux-live-cd-iso-with-zfs - -Use containers with minimal capabilities instead of full root/privileged mode. -This improves security and reproducibility. - -*** Capabilities needed for mkarchiso -- DAC_OVERRIDE -- SYS_ADMIN -- SYS_CHROOT -- SYS_MODULE -- Device: /dev/loop-control - -*** Benefits -- Reproducible builds across different host systems -- No need to install archiso on host -- Cleaner build environment -- Easier CI/CD integration - -** TODO [#B] Support building against Arch Linux Archive snapshots -Idea from: https://github.com/stevleibelt/arch-linux-live-cd-iso-with-zfs - -When archzfs lags behind the main Arch repos, builds can fail due to kernel version mismatch. -Pinning to historical repo snapshots solves this problem. - -*** Implementation -- Add -r/--repo-date flag to build.sh -- Use archive.archlinux.org for historical packages -- Example: ./build.sh -r 2026/01/15 or ./build.sh -r week - -This solves the common problem of ZFS packages not being available for the latest kernel. - -** TODO [#B] Add build logging with tee -Idea from: https://github.com/stevleibelt/arch-linux-live-cd-iso-with-zfs - -Capture all build output to a log file for debugging and CI artifact collection. - -*** Implementation -See ~/code/archsetup for a reference implementation. The approach there provides a better -user experience by hiding confusing (sometimes scary-looking) messages on the console while -preserving complete build output in the log file. Users see clean progress indicators while -full diagnostic information is available for troubleshooting. - -#+BEGIN_SRC bash -# Basic approach -exec &> >(tee "build-$(date +%Y%m%d-%H%M%S).log") - -# Better UX: show progress on console, full output to log -exec 3>&1 4>&2 -exec 1> >(tee -a "$LOG_FILE") 2>&1 -# Then use fd 3 for user-facing output: echo "Installing..." >&3 -#+END_SRC - -*** Additional features -- Check log for known error patterns (e.g., DKMS failures) and fail fast -- Rotate old logs to prevent disk space issues -- Include system info header (date, kernel version, etc.) - -** DONE [#B] Implement btrfs support (archangel expansion) -CLOSED: [2026-01-24 Sat] -See [[file:docs/PLAN-archangel-btrfs.org][PLAN-archangel-btrfs.org]] for full implementation plan. - -Completed phases: -- Phase 1: Refactor (ZFS works) -- Phase 2: Single-disk btrfs with snapper/grub-btrfs -- Phase 2.8: LUKS encryption support -- Phase 3: Multi-disk btrfs (RAID0/RAID1) with LUKS - -All tests passing: btrfs-single, btrfs-luks, btrfs-mirror, btrfs-stripe, btrfs-mirror-luks - -** DONE [#B] Add RAID configuration tests (mirror, raidz) -CLOSED: [2026-01-24 Sat] -Test configs exist in scripts/test-configs/: -- mirror.conf (ZFS 2-disk mirror) -- raidz1.conf (ZFS 3-disk raidz1) -- btrfs-mirror.conf (btrfs RAID1) -- btrfs-stripe.conf (btrfs RAID0) -- btrfs-mirror-luks.conf (btrfs RAID1 + LUKS) - -All tests passing via test-install.sh. - -** DONE [#B] Extract install-archzfs into testable functions -CLOSED: [2026-01-24 Sat] -Completed as Phase 1 of btrfs implementation. Refactored into: -- custom/lib/common.sh - Colors, output, prompts -- custom/lib/config.sh - Config file handling, argument parsing -- custom/lib/disk.sh - Partitioning, EFI formatting -- custom/lib/zfs.sh - ZFS pool/dataset creation, ZFSBootMenu -- custom/lib/btrfs.sh - Btrfs volume/subvolume creation, snapper, LUKS - -Main script (archangel) sources libraries and orchestrates install flow. - -** TODO [#C] Review code review workflow document and provide feedback -Review [[file:docs/project-workflows/code-review.org][docs/project-workflows/code-review.org]] and refine based on feedback. -Created: 2026-01-23 -Postponed: After archangel project completion. - -** TODO [#C] Standardize shell script conventions -*** Shebang inconsistency -- build.sh: #!/bin/bash -- zfssnapshot: #!/bin/env bash -- archsetup-zfs: #!/bin/sh -Pick one convention (recommend #!/usr/bin/env bash for portability) - -*** Email inconsistency -- Some files: c@cjennings.net -- archsetup-zfs: craigmartinjennings@gmail.com -Standardize to one email address. - -** TODO [#C] Add .editorconfig for consistent formatting -No project-wide formatting rules. Add .editorconfig to enforce: -- Indent style (spaces vs tabs) -- Indent size -- End of line -- Trim trailing whitespace -- Final newline - -** TODO [#C] Consolidate test scripts documentation -scripts/ has multiple test files with unclear relationships: -- test-vm.sh - Manual VM testing -- sanity-test.sh - Quick automated checks -- test-install.sh - Installation testing -- full-test.sh - Comprehensive testing -- test-zfs-snap-prune.sh - Unit tests for prune script - -Document the testing strategy and when to use each script. - -** TODO [#C] Consider adding bootable archzfs ISO to GRUB boot menu -Store the archzfs ISO on disk and add a GRUB menu entry to boot it directly - no USB drive needed for recovery/reinstall. - -*** Benefits -- Always have a rescue environment available -- Can reinstall or rollback without external media -- Useful for remote/headless servers - -*** Challenges -1. Storage location - ISO is 5GB. Can't live on ZFS (GRUB can't read it). Options: - - EFI partition (currently 1GB - would need to be larger) - - Dedicated recovery partition (ext4 or FAT32) - - Second EFI partition just for the ISO - -2. GRUB loopback boot - Arch ISOs support this with the right kernel params: - #+BEGIN_SRC - menuentry "Archzfs Recovery" { - loopback loop /path/to/archzfs.iso - linux (loop)/arch/boot/x86_64/vmlinuz-linux archisolabel=ARCHZFS - initrd (loop)/arch/boot/x86_64/initramfs-linux.img - } - #+END_SRC - -3. Keeping it updated - Would need a mechanism to update the ISO when rebuilding - -*** Questions to resolve -- Is this for recovery scenarios, or would you actually reinstall from it? -- Would you want this integrated into the installer (auto-create recovery partition)? -- Or just document how to set it up manually? - -** TODO [#C] Research mkosi as alternative to mkarchiso -Investigate whether mkosi (systemd project) offers advantages over mkarchiso. - -*** Comparison -| Aspect | mkarchiso | mkosi | -|--------|-----------|-------| -| Purpose | Live ISO images | Disk images, containers, ISOs | -| Config | Shell scripts + file structure | Declarative TOML files | -| Output | ISO9660 (USB/CD) | GPT disk images, tarballs, ISOs | -| Boot | GRUB/syslinux/systemd-boot | UKI (Unified Kernel Images) | -| Distros | Arch only | Arch, Fedora, Debian, Ubuntu | -| Build env | Host or chroot | Container-native, reproducible | - -*** Where mkosi shines -- Reproducible builds - designed for CI/CD, hermetic builds -- Unified Kernel Images - modern secure boot (kernel+initrd+cmdline in one signed EFI) -- VM images - can output raw disk images directly (great for QEMU testing) -- Declarative - TOML config instead of shell scripts - -*** Where mkarchiso is better for us -- Arch ecosystem - all docs, examples, community use it -- ZFS live environment - archiso has the hooks we need -- Proven - we know it works for our use case - -*** Verdict -Keep mkarchiso for now. mkosi could be valuable for: -- VM test images instead of booting ISOs -- Future UKI boot (more secure boot chain) -- Reproducibility when CI/CD becomes important - -*** References -- https://wiki.archlinux.org/title/Mkosi -- https://github.com/systemd/mkosi - -** TODO [#C] Add 1-minute countdown timer before automatic reboot after installation -Display a countdown timer (1 minute) with red text after installation completes, before automatically rebooting the system. -Gives user time to review the installation summary and cancel if needed. - -*** Implementation -In install-archzfs, after displaying the completion message: -#+BEGIN_SRC bash -# Red text countdown before reboot -echo -e "\n\033[0;31mSystem will reboot in 60 seconds. Press Ctrl+C to cancel.\033[0m" -for i in {60..1}; do - printf "\r\033[0;31mRebooting in %2d seconds...\033[0m" "$i" - sleep 1 -done -echo -reboot -#+END_SRC - -** TODO [#C] Add negative/failure test cases -Current tests only verify happy path (successful installation). -Add tests for error conditions to ensure graceful failure handling. - -*** Test cases to add -- Installation with insufficient disk space -- Installation with disk that disappears mid-install -- Installation with network failure during pacstrap -- Installation with invalid config file -- Installation on already-mounted disk -- Verify error messages are helpful and actionable - -*** Benefits -- Ensures failures don't leave system in corrupted state -- Validates error messages help users diagnose issues -- Catches regressions in error handling code -- Aligns with quality engineering "error cases" principle - -** TODO [#C] Add install-archzfs --dry-run mode -Show what would be done without making any changes. -Useful for validating configuration before committing to installation. - -*** What dry-run should show -- Disk partitioning plan (sizes, types) -- ZFS pool and dataset structure -- Packages that would be installed -- Services that would be enabled -- Bootloader configuration - -*** Implementation approach -- Add DRY_RUN=1 flag checked before destructive operations -- Replace actual commands with echo statements showing what would run -- Validate all inputs and configuration -- Exit with success if everything validates - -*** Benefits -- Users can verify configuration before destroying data -- Easier debugging of configuration issues -- Supports "measure twice, cut once" workflow -- Can be used in CI to validate config without full install - -** TODO [#C] Pre-clone useful tools and documentation into ISO -Idea from: https://github.com/stevleibelt/arch-linux-live-cd-iso-with-zfs - -Bundle Git repos (without .git dirs) into /root for offline access: -- archinstall (official installer) -- downgrade (package rollback) -- ZFS howtos and documentation -- Recovery scripts - -Already partially implemented (have rescue tools), but could expand with: -- Pre-cloned arch-linux-configuration scripts -- ZFS administration cheatsheets -- Offline troubleshooting guides - -** TODO [#C] Add environment file configuration (.env pattern) -Idea from: https://github.com/stevleibelt/arch-linux-live-cd-iso-with-zfs - -Allow build customization via .env file instead of command-line flags. -Useful for CI/CD and reproducible builds. - -*** Example .env -#+BEGIN_SRC -KERNEL=linux-lts -USE_DKMS=1 -BE_VERBOSE=0 -PACKAGES_TO_ADD=git,vim -PACKAGES_TO_REMOVE=b43-fwcutter -#+END_SRC - -** TODO [#C] Add dry-run mode to build.sh -Idea from: https://github.com/stevleibelt/arch-linux-live-cd-iso-with-zfs - -Support --dry-run flag that shows what would be done without executing. -Useful for testing configuration changes and debugging. - -** TODO [#D] Support multi-variant ISO builds -Idea from: https://github.com/stevleibelt/arch-linux-live-cd-iso-with-zfs - -The reference project builds 8 variants automatically: -- linux vs linux-lts kernel -- DKMS vs native ZFS packages -- Default vs experimental archzfs repos - -Very low priority. We're focused on robustness and compatibility first, bleeding edge last. -The linux-lts + DKMS combination provides maximum stability and hardware compatibility. -Only consider this if there's clear user demand for bleeding-edge kernel support. - -* Archangel Resolved - -** DONE [#B] Add zfsrollback and zfssnapshot scripts to ISO -CLOSED: [2026-01-19 Sun] -Include dedicated ZFS snapshot management scripts in the archzfs ISO rather than archsetup. -These tools belong here since they're useful for rescue scenarios and post-install management. - -** DONE [#C] Consider Avahi for USB boot disk discoverability -CLOSED: [2026-01-19 Sun] -Make the live ISO discoverable on the network by name (e.g., archzfs.local) when booted. - -Implemented in commit 0bd172a: -- Added avahi and nss-mdns packages to ISO -- Enabled avahi-daemon.service -- Set hostname to "archzfs" -- Live ISO now accessible as root@archzfs.local - -** DONE [#B] Add Avahi mDNS to installed systems -CLOSED: [2026-01-19 Sun] -Matches archsetup's implementation: install avahi + nss-mdns, enable avahi-daemon. - -Added to install-archzfs: -- Packages: avahi, nss-mdns (in pacstrap) -- Service: avahi-daemon enabled - -After installation, system will be accessible as <hostname>.local on the local network. - -** DONE [#B] Add config file information to README -Config file format documented in README.org with full reference and examples. - -** DONE [#B] Add CI/CD test infrastructure -Added Makefile, test-install.sh, and test configs for automated VM testing. - -** CANCELLED [#C] Consider Dialog-Based Interface for Status, Information, and Questions -Using fzf instead. - -** DONE [#C] Consider fzf interface for choices -Implemented fzf for timezone, locale, keymap, disk, RAID, and WiFi selection. - -** DONE [#A] Create comprehensive project documentation (README.org) -CLOSED: [2026-01-18 Sun 02:01] - -** DONE [#C] Add date/timestamp to install-archzfs log -Log filename now includes timestamp: /tmp/install-archzfs-YYYY-MM-DD-HH-MM-SS.log -Also includes header with start time inside the log file. - -** DONE [#B] Add common recovery tools to archzfs ISO -CLOSED: [2026-01-18 Sat] -Make the ISO double as a general-purpose recovery disk. -See custom/RESCUE-GUIDE.txt for comprehensive documentation of all tools. - -** TODO [#C] Consider adding btrfs RAID10 support -Multi-disk btrfs currently only supports RAID1 (mirror). -RAID10 (striped mirrors) would require 4+ disks but offers better performance. -Evaluate if there are real use cases before implementing. |
