diff options
| author | Craig Jennings <c@cjennings.net> | 2026-01-19 23:20:24 -0600 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-01-19 23:20:24 -0600 |
| commit | 5ed8c9567ca2075d29ac00078fa6dd5ad7580629 (patch) | |
| tree | 89349f7c4ae719c928e2fe553d625a35aa8ae92d | |
| parent | 2cf3e88b630b2a6287926c02ca8024ffc5065deb (diff) | |
| download | archangel-5ed8c9567ca2075d29ac00078fa6dd5ad7580629.tar.gz archangel-5ed8c9567ca2075d29ac00078fa6dd5ad7580629.zip | |
Fix hostid mismatch bug that prevented booting installed systems
Root cause: The `hostid` command returns a value even without /etc/hostid,
but `zgenhostid` generates a DIFFERENT random value. The install script
was calling `hostid` for the GRUB kernel parameter, then later calling
`zgenhostid` to create /etc/hostid - resulting in a mismatch.
ZFS refuses to auto-import pools when spl.spl_hostid doesn't match
/etc/hostid, causing "Failed to mount /sysroot" at boot.
Fix: Generate hostid with zgenhostid FIRST (in configure_bootloader),
then read the consistent value for the GRUB kernel parameter. The
configure_zfs_services function now just copies the already-existing
/etc/hostid to the installed system.
Verified in VM: GRUB and /etc/hostid both show identical values after
installation.
| -rwxr-xr-x | custom/install-archzfs | 18 | ||||
| -rw-r--r-- | docs/session-context.org | 81 |
2 files changed, 67 insertions, 32 deletions
diff --git a/custom/install-archzfs b/custom/install-archzfs index c604868..a6679eb 100755 --- a/custom/install-archzfs +++ b/custom/install-archzfs @@ -1071,7 +1071,14 @@ EOF configure_bootloader() { step "Configuring GRUB Bootloader" - # Get hostid for kernel parameter + # Ensure hostid exists BEFORE reading it + # This is critical: hostid command returns a value even without /etc/hostid, + # but zgenhostid creates a DIFFERENT value. We must generate first, then read. + if [[ ! -f /etc/hostid ]]; then + zgenhostid + fi + + # Now get the consistent hostid for kernel parameter local host_id host_id=$(hostid) @@ -1156,13 +1163,8 @@ configure_zfs_services() { arch-chroot /mnt systemctl enable zfs-import.target # Copy hostid to installed system (ZFS uses this for pool ownership) - if [[ -f /etc/hostid ]]; then - cp /etc/hostid /mnt/etc/hostid - else - # Generate hostid if it doesn't exist - zgenhostid - cp /etc/hostid /mnt/etc/hostid - fi + # Note: hostid is generated in configure_bootloader, so it always exists here + cp /etc/hostid /mnt/etc/hostid # Generate zpool cache mkdir -p /mnt/etc/zfs diff --git a/docs/session-context.org b/docs/session-context.org index aeaeb02..72ca9b4 100644 --- a/docs/session-context.org +++ b/docs/session-context.org @@ -1,35 +1,68 @@ #+TITLE: Session Context #+DATE: 2026-01-19 -* Current Session: Monday 2026-01-19 17:29 CST (ongoing) +* Current Session: Monday 2026-01-19 17:29 CST -** Summary +** FIXED: Hostid mismatch bug causing boot failures -Continued from interrupted session. Rebuilt ISO with Avahi/hostname fixes, tested successfully, deployed, committed. Verified ratio.local install - no errors. +*** Problem +ratio wouldn't boot - dropped to emergency mode with "Failed to mount /sysroot" +Root account locked in initramfs, preventing debugging. + +*** Root Cause +The install script was generating inconsistent hostids: +- Line 1076: ~hostid~ command returned value X (used for GRUB cmdline) +- Line 1163: ~zgenhostid~ created /etc/hostid with value Y + +The ~hostid~ command returns a value even without /etc/hostid, but ~zgenhostid~ +generates a DIFFERENT random value. ZFS refused to auto-import the pool because +the hostids didn't match. + +*** Fix Applied (custom/install-archzfs) +Move ~zgenhostid~ call to configure_bootloader() BEFORE reading hostid: + +#+BEGIN_SRC bash +configure_bootloader() { + # Ensure hostid exists BEFORE reading it + if [[ ! -f /etc/hostid ]]; then + zgenhostid + fi + # Now get the consistent hostid for kernel parameter + local host_id + host_id=$(hostid) + ... +} +#+END_SRC + +Simplified configure_zfs_services() to just copy /etc/hostid (always exists now). + +*** Verification +Tested in VM - after installation: +- GRUB hostid: 073ad2a5 +- /etc/hostid: 073ad2a5 +- MATCH confirmed + +*** For ratio +Needs reinstall with fixed ISO, or manual fix: +#+BEGIN_SRC bash +# From live ISO booted on ratio: +printf '\x56\x19\xc0\xa8' > /mnt/etc/hostid # 0xa8c01956 in little-endian +arch-chroot /mnt mkinitcpio -P +#+END_SRC + +** Earlier in Session + +- Rebuilt ISO with Avahi/hostname fixes +- Tested: hostname "archzfs", avahi active, mDNS works +- Deployed to ~/Downloads/isos/, TrueNAS, USB drives +- Committed: 0bd172a (Avahi), 4f9eadb (TODO update) +- Added TODO for Avahi on installed systems +- Wrote ISO to new 115G flash drive (/dev/sdb) ** Commits This Session | Commit | Description | |--------|-------------| | 0bd172a | Add Avahi mDNS for easy SSH access, fix ISP firmware path | - -** What Was Done - -1. Recovered from interrupted session -2. Built ISO with Avahi/hostname fixes -3. Sanity tests: 13/13 PASSED -4. Deployed to ~/Downloads/isos/, TrueNAS, USB drive -5. VM test: hostname "archzfs", avahi active, mDNS working -6. Committed and pushed -7. Verified ratio.local install: - - ZFS pool ONLINE (mirror, 7.12TB) - - All 15 datasets mounted correctly - - No failed services - - Only benign warnings (RDSEED32, ZFS CDDL taint, amdgpu timeout) - - ISP firmware and journald fixes working - -** ratio.local Status - -- Framework Desktop, AMD Ryzen AI Max 300 -- Kernel 6.12.66-1-lts, ZFS 2.3.3-1 -- Install: SUCCESS, no errors +| 4f9eadb | Add TODO for Avahi on installed systems, mark live ISO done | +| PENDING | Fix hostid mismatch bug in install-archzfs | |
