aboutsummaryrefslogtreecommitdiff
path: root/assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org
diff options
context:
space:
mode:
Diffstat (limited to 'assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org')
-rw-r--r--assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org152
1 files changed, 0 insertions, 152 deletions
diff --git a/assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org b/assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org
deleted file mode 100644
index 1132ddd..0000000
--- a/assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org
+++ /dev/null
@@ -1,152 +0,0 @@
-#+TITLE: System freezes during mkinitcpio -P rebuild
-#+DATE: 2026-01-22
-
-* Problem Summary
-
-After fixing the mkinitcpio configuration issues (see 2026-01-22-mkinitcpio-config-boot-failure.org), the system successfully booted. However, running =mkinitcpio -P= again caused the system to freeze, requiring a power cycle.
-
-This indicates the mkinitcpio config fix was correct, but there's a separate issue causing freezes during initramfs rebuilds.
-
-* Timeline
-
-1. System wouldn't boot due to broken mkinitcpio config (wrong HOOKS, missing zfs)
-2. Booted from archzfs live ISO
-3. Fixed mkinitcpio.conf, preset file, removed archiso.conf drop-in
-4. Rebuilt initramfs via chroot - completed successfully
-5. Rebooted - system booted successfully
-6. Ran =mkinitcpio -P= again - system froze
-7. Had to power cycle, now back on live ISO
-
-* What This Tells Us
-
-The mkinitcpio configuration fix was correct (system booted). But something about running mkinitcpio itself is triggering a system freeze.
-
-* Suspected Cause: AMD GPU Power Gating Bug
-
-ratio has an AMD Strix Halo GPU (RDNA 3.5) with a known VPE power gating bug. When the VPE (Video Processing Engine) tries to power gate after 1 second of idle, the SMU hangs and the system freezes.
-
-Symptoms before freeze:
-#+begin_example
-amdgpu: SMU: I'm not done with your previous command
-amdgpu: Failed to power gate VPE!
-[drm:vpe_set_powergating_state] *ERROR* Dpm disable vpe failed, ret = -62
-#+end_example
-
-The fix is to disable power gating via =/etc/modprobe.d/amdgpu.conf=:
-#+begin_example
-options amdgpu pg_mask=0
-#+end_example
-
-*CRITICAL*: After creating this file, must run =mkinitcpio -P= to include it in initramfs (the modconf hook reads /etc/modprobe.d/ at build time).
-
-* The Chicken-and-Egg Problem
-
-1. Need to run =mkinitcpio -P= to apply the GPU fix (include amdgpu.conf in initramfs)
-2. But running =mkinitcpio -P= triggers the GPU freeze
-3. The fix can't be applied because applying it causes the problem it's meant to fix
-
-* Possible Solutions to Investigate
-
-** Option 1: Apply GPU fix at runtime before mkinitcpio
-
-Before running mkinitcpio, manually set pg_mask at runtime:
-#+begin_src bash
-echo 0 | sudo tee /sys/module/amdgpu/parameters/pg_mask
-#+end_src
-
-Then run mkinitcpio while power gating is disabled. This might prevent the freeze.
-
-** Option 2: Build initramfs from live ISO
-
-Boot from archzfs live ISO (which doesn't have the GPU issue), mount the system, and rebuild initramfs from there. The live ISO uses a different GPU driver state.
-
-We tried this and it worked - the rebuild completed. But then running mkinitcpio on the booted system froze.
-
-** Option 3: Add amdgpu.conf before rebuilding from live ISO
-
-When rebuilding from live ISO:
-1. Create /etc/modprobe.d/amdgpu.conf with pg_mask=0
-2. Rebuild initramfs
-3. Boot - now the GPU fix should be in effect
-4. Future mkinitcpio runs might not freeze
-
-This might work because the initramfs would load with power gating disabled from the start.
-
-** Option 4: Wait for kernel 6.18+
-
-The upstream fix (VPE_IDLE_TIMEOUT increased from 1s to 2s) is in kernel 6.15+. When linux-lts reaches 6.18, the workaround won't be needed.
-
-Current: linux-lts 6.12.66
-Target: linux-lts 6.18
-
-* Current State of ratio
-
-- Booted to archzfs live ISO
-- ZFS pool: zroot (mirror of nvme0n1p2 + nvme1n1p2)
-- mkinitcpio.conf: FIXED (has correct HOOKS with zfs)
-- /etc/mkinitcpio.conf.d/archiso.conf: REMOVED
-- /etc/mkinitcpio.d/linux-lts.preset: FIXED
-- /etc/modprobe.d/amdgpu.conf: EXISTS but may not be in initramfs
-- Current pg_mask value on booted system: Unknown (need to check after boot)
-
-* Verification Commands
-
-Check if GPU fix is active:
-#+begin_src bash
-cat /sys/module/amdgpu/parameters/pg_mask
-# Should return: 0
-# If returns 4294967295 (0xFFFFFFFF), fix is NOT active
-#+end_src
-
-Check if amdgpu.conf is in initramfs:
-#+begin_src bash
-lsinitcpio /boot/initramfs-linux-lts.img | grep amdgpu
-#+end_src
-
-* Recovery Procedure (Option 3 - recommended)
-
-From archzfs live ISO:
-
-#+begin_src bash
-# Import and mount ZFS
-zpool import -f zroot
-zfs mount zroot/ROOT/default
-mount /dev/nvme0n1p1 /boot
-
-# Ensure GPU fix file exists
-cat > /etc/modprobe.d/amdgpu.conf << 'EOF'
-# Disable power gating to prevent VPE freeze on Strix Halo GPUs
-# Remove this file when linux-lts reaches 6.18+
-options amdgpu pg_mask=0
-EOF
-
-# Mount system directories for chroot
-mount --rbind /dev /dev
-mount --rbind /sys /sys
-mount --rbind /proc /proc
-mount --rbind /run /run
-
-# Rebuild initramfs (should include amdgpu.conf via modconf hook)
-chroot / mkinitcpio -P
-
-# Verify amdgpu.conf is in initramfs
-lsinitcpio /boot/initramfs-linux-lts.img | grep amdgpu
-
-# Reboot and test
-reboot
-#+end_src
-
-After reboot, verify pg_mask=0 is active, then test =mkinitcpio -P= again.
-
-* Related Files
-
-- [[file:2026-01-22-mkinitcpio-config-boot-failure.org]] - The config fix that was applied
-- archsetup NOTES.org - AMD GPU freeze diagnosis details
-
-* Machine Details
-
-- Machine: ratio (desktop)
-- CPU: AMD (Strix Halo)
-- GPU: AMD RDNA 3.5 (integrated)
-- Storage: Two NVMe in ZFS mirror
-- Kernel: linux-lts 6.12.66-1