diff options
| author | Craig Jennings <c@cjennings.net> | 2026-02-22 23:23:59 -0600 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-02-22 23:23:59 -0600 |
| commit | 77fdae15fb5ad1498d8b006104a0d6fd151060bb (patch) | |
| tree | 0b0e00f580f2130cad98e2f2a9e2e7175116a7c0 /assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org | |
| parent | 8ef5c24abd55d3fe7dafe93687be5928e0152c02 (diff) | |
| download | archangel-77fdae15fb5ad1498d8b006104a0d6fd151060bb.tar.gz archangel-77fdae15fb5ad1498d8b006104a0d6fd151060bb.zip | |
clean personal info and private files from repository
- Remove personal hardware specs, machine-specific troubleshooting docs,
and video transcript from assets/
- Remove stale PLAN-zfsbootmenu-implementation.org (feature complete)
- Remove .stignore (Syncthing config, not project-relevant)
- Untrack todo.org (personal task tracker with private infra details)
- Make archsetup path configurable via ARCHSETUP_DIR env var in build.sh
- Use $REAL_USER instead of hardcoded username in build-release scp
Diffstat (limited to 'assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org')
| -rw-r--r-- | assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org | 152 |
1 files changed, 0 insertions, 152 deletions
diff --git a/assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org b/assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org deleted file mode 100644 index 1132ddd..0000000 --- a/assets/2026-01-22-mkinitcpio-freeze-during-rebuild.org +++ /dev/null @@ -1,152 +0,0 @@ -#+TITLE: System freezes during mkinitcpio -P rebuild -#+DATE: 2026-01-22 - -* Problem Summary - -After fixing the mkinitcpio configuration issues (see 2026-01-22-mkinitcpio-config-boot-failure.org), the system successfully booted. However, running =mkinitcpio -P= again caused the system to freeze, requiring a power cycle. - -This indicates the mkinitcpio config fix was correct, but there's a separate issue causing freezes during initramfs rebuilds. - -* Timeline - -1. System wouldn't boot due to broken mkinitcpio config (wrong HOOKS, missing zfs) -2. Booted from archzfs live ISO -3. Fixed mkinitcpio.conf, preset file, removed archiso.conf drop-in -4. Rebuilt initramfs via chroot - completed successfully -5. Rebooted - system booted successfully -6. Ran =mkinitcpio -P= again - system froze -7. Had to power cycle, now back on live ISO - -* What This Tells Us - -The mkinitcpio configuration fix was correct (system booted). But something about running mkinitcpio itself is triggering a system freeze. - -* Suspected Cause: AMD GPU Power Gating Bug - -ratio has an AMD Strix Halo GPU (RDNA 3.5) with a known VPE power gating bug. When the VPE (Video Processing Engine) tries to power gate after 1 second of idle, the SMU hangs and the system freezes. - -Symptoms before freeze: -#+begin_example -amdgpu: SMU: I'm not done with your previous command -amdgpu: Failed to power gate VPE! -[drm:vpe_set_powergating_state] *ERROR* Dpm disable vpe failed, ret = -62 -#+end_example - -The fix is to disable power gating via =/etc/modprobe.d/amdgpu.conf=: -#+begin_example -options amdgpu pg_mask=0 -#+end_example - -*CRITICAL*: After creating this file, must run =mkinitcpio -P= to include it in initramfs (the modconf hook reads /etc/modprobe.d/ at build time). - -* The Chicken-and-Egg Problem - -1. Need to run =mkinitcpio -P= to apply the GPU fix (include amdgpu.conf in initramfs) -2. But running =mkinitcpio -P= triggers the GPU freeze -3. The fix can't be applied because applying it causes the problem it's meant to fix - -* Possible Solutions to Investigate - -** Option 1: Apply GPU fix at runtime before mkinitcpio - -Before running mkinitcpio, manually set pg_mask at runtime: -#+begin_src bash -echo 0 | sudo tee /sys/module/amdgpu/parameters/pg_mask -#+end_src - -Then run mkinitcpio while power gating is disabled. This might prevent the freeze. - -** Option 2: Build initramfs from live ISO - -Boot from archzfs live ISO (which doesn't have the GPU issue), mount the system, and rebuild initramfs from there. The live ISO uses a different GPU driver state. - -We tried this and it worked - the rebuild completed. But then running mkinitcpio on the booted system froze. - -** Option 3: Add amdgpu.conf before rebuilding from live ISO - -When rebuilding from live ISO: -1. Create /etc/modprobe.d/amdgpu.conf with pg_mask=0 -2. Rebuild initramfs -3. Boot - now the GPU fix should be in effect -4. Future mkinitcpio runs might not freeze - -This might work because the initramfs would load with power gating disabled from the start. - -** Option 4: Wait for kernel 6.18+ - -The upstream fix (VPE_IDLE_TIMEOUT increased from 1s to 2s) is in kernel 6.15+. When linux-lts reaches 6.18, the workaround won't be needed. - -Current: linux-lts 6.12.66 -Target: linux-lts 6.18 - -* Current State of ratio - -- Booted to archzfs live ISO -- ZFS pool: zroot (mirror of nvme0n1p2 + nvme1n1p2) -- mkinitcpio.conf: FIXED (has correct HOOKS with zfs) -- /etc/mkinitcpio.conf.d/archiso.conf: REMOVED -- /etc/mkinitcpio.d/linux-lts.preset: FIXED -- /etc/modprobe.d/amdgpu.conf: EXISTS but may not be in initramfs -- Current pg_mask value on booted system: Unknown (need to check after boot) - -* Verification Commands - -Check if GPU fix is active: -#+begin_src bash -cat /sys/module/amdgpu/parameters/pg_mask -# Should return: 0 -# If returns 4294967295 (0xFFFFFFFF), fix is NOT active -#+end_src - -Check if amdgpu.conf is in initramfs: -#+begin_src bash -lsinitcpio /boot/initramfs-linux-lts.img | grep amdgpu -#+end_src - -* Recovery Procedure (Option 3 - recommended) - -From archzfs live ISO: - -#+begin_src bash -# Import and mount ZFS -zpool import -f zroot -zfs mount zroot/ROOT/default -mount /dev/nvme0n1p1 /boot - -# Ensure GPU fix file exists -cat > /etc/modprobe.d/amdgpu.conf << 'EOF' -# Disable power gating to prevent VPE freeze on Strix Halo GPUs -# Remove this file when linux-lts reaches 6.18+ -options amdgpu pg_mask=0 -EOF - -# Mount system directories for chroot -mount --rbind /dev /dev -mount --rbind /sys /sys -mount --rbind /proc /proc -mount --rbind /run /run - -# Rebuild initramfs (should include amdgpu.conf via modconf hook) -chroot / mkinitcpio -P - -# Verify amdgpu.conf is in initramfs -lsinitcpio /boot/initramfs-linux-lts.img | grep amdgpu - -# Reboot and test -reboot -#+end_src - -After reboot, verify pg_mask=0 is active, then test =mkinitcpio -P= again. - -* Related Files - -- [[file:2026-01-22-mkinitcpio-config-boot-failure.org]] - The config fix that was applied -- archsetup NOTES.org - AMD GPU freeze diagnosis details - -* Machine Details - -- Machine: ratio (desktop) -- CPU: AMD (Strix Halo) -- GPU: AMD RDNA 3.5 (integrated) -- Storage: Two NVMe in ZFS mirror -- Kernel: linux-lts 6.12.66-1 |
