#+TITLE: System freezes during mkinitcpio -P rebuild #+DATE: 2026-01-22 * Problem Summary After fixing the mkinitcpio configuration issues (see 2026-01-22-mkinitcpio-config-boot-failure.org), the system successfully booted. However, running =mkinitcpio -P= again caused the system to freeze, requiring a power cycle. This indicates the mkinitcpio config fix was correct, but there's a separate issue causing freezes during initramfs rebuilds. * Timeline 1. System wouldn't boot due to broken mkinitcpio config (wrong HOOKS, missing zfs) 2. Booted from archzfs live ISO 3. Fixed mkinitcpio.conf, preset file, removed archiso.conf drop-in 4. Rebuilt initramfs via chroot - completed successfully 5. Rebooted - system booted successfully 6. Ran =mkinitcpio -P= again - system froze 7. Had to power cycle, now back on live ISO * What This Tells Us The mkinitcpio configuration fix was correct (system booted). But something about running mkinitcpio itself is triggering a system freeze. * Suspected Cause: AMD GPU Power Gating Bug ratio has an AMD Strix Halo GPU (RDNA 3.5) with a known VPE power gating bug. When the VPE (Video Processing Engine) tries to power gate after 1 second of idle, the SMU hangs and the system freezes. Symptoms before freeze: #+begin_example amdgpu: SMU: I'm not done with your previous command amdgpu: Failed to power gate VPE! [drm:vpe_set_powergating_state] *ERROR* Dpm disable vpe failed, ret = -62 #+end_example The fix is to disable power gating via =/etc/modprobe.d/amdgpu.conf=: #+begin_example options amdgpu pg_mask=0 #+end_example *CRITICAL*: After creating this file, must run =mkinitcpio -P= to include it in initramfs (the modconf hook reads /etc/modprobe.d/ at build time). * The Chicken-and-Egg Problem 1. Need to run =mkinitcpio -P= to apply the GPU fix (include amdgpu.conf in initramfs) 2. But running =mkinitcpio -P= triggers the GPU freeze 3. The fix can't be applied because applying it causes the problem it's meant to fix * Possible Solutions to Investigate ** Option 1: Apply GPU fix at runtime before mkinitcpio Before running mkinitcpio, manually set pg_mask at runtime: #+begin_src bash echo 0 | sudo tee /sys/module/amdgpu/parameters/pg_mask #+end_src Then run mkinitcpio while power gating is disabled. This might prevent the freeze. ** Option 2: Build initramfs from live ISO Boot from archzfs live ISO (which doesn't have the GPU issue), mount the system, and rebuild initramfs from there. The live ISO uses a different GPU driver state. We tried this and it worked - the rebuild completed. But then running mkinitcpio on the booted system froze. ** Option 3: Add amdgpu.conf before rebuilding from live ISO When rebuilding from live ISO: 1. Create /etc/modprobe.d/amdgpu.conf with pg_mask=0 2. Rebuild initramfs 3. Boot - now the GPU fix should be in effect 4. Future mkinitcpio runs might not freeze This might work because the initramfs would load with power gating disabled from the start. ** Option 4: Wait for kernel 6.18+ The upstream fix (VPE_IDLE_TIMEOUT increased from 1s to 2s) is in kernel 6.15+. When linux-lts reaches 6.18, the workaround won't be needed. Current: linux-lts 6.12.66 Target: linux-lts 6.18 * Current State of ratio - Booted to archzfs live ISO - ZFS pool: zroot (mirror of nvme0n1p2 + nvme1n1p2) - mkinitcpio.conf: FIXED (has correct HOOKS with zfs) - /etc/mkinitcpio.conf.d/archiso.conf: REMOVED - /etc/mkinitcpio.d/linux-lts.preset: FIXED - /etc/modprobe.d/amdgpu.conf: EXISTS but may not be in initramfs - Current pg_mask value on booted system: Unknown (need to check after boot) * Verification Commands Check if GPU fix is active: #+begin_src bash cat /sys/module/amdgpu/parameters/pg_mask # Should return: 0 # If returns 4294967295 (0xFFFFFFFF), fix is NOT active #+end_src Check if amdgpu.conf is in initramfs: #+begin_src bash lsinitcpio /boot/initramfs-linux-lts.img | grep amdgpu #+end_src * Recovery Procedure (Option 3 - recommended) From archzfs live ISO: #+begin_src bash # Import and mount ZFS zpool import -f zroot zfs mount zroot/ROOT/default mount /dev/nvme0n1p1 /boot # Ensure GPU fix file exists cat > /etc/modprobe.d/amdgpu.conf << 'EOF' # Disable power gating to prevent VPE freeze on Strix Halo GPUs # Remove this file when linux-lts reaches 6.18+ options amdgpu pg_mask=0 EOF # Mount system directories for chroot mount --rbind /dev /dev mount --rbind /sys /sys mount --rbind /proc /proc mount --rbind /run /run # Rebuild initramfs (should include amdgpu.conf via modconf hook) chroot / mkinitcpio -P # Verify amdgpu.conf is in initramfs lsinitcpio /boot/initramfs-linux-lts.img | grep amdgpu # Reboot and test reboot #+end_src After reboot, verify pg_mask=0 is active, then test =mkinitcpio -P= again. * Related Files - [[file:2026-01-22-mkinitcpio-config-boot-failure.org]] - The config fix that was applied - archsetup NOTES.org - AMD GPU freeze diagnosis details * Machine Details - Machine: ratio (desktop) - CPU: AMD (Strix Halo) - GPU: AMD RDNA 3.5 (integrated) - Storage: Two NVMe in ZFS mirror - Kernel: linux-lts 6.12.66-1