aboutsummaryrefslogtreecommitdiff
path: root/docs/retrospectives
diff options
context:
space:
mode:
Diffstat (limited to 'docs/retrospectives')
-rw-r--r--docs/retrospectives/2026-01-22-ratio-boot-fix.org45
1 files changed, 45 insertions, 0 deletions
diff --git a/docs/retrospectives/2026-01-22-ratio-boot-fix.org b/docs/retrospectives/2026-01-22-ratio-boot-fix.org
new file mode 100644
index 0000000..430a03d
--- /dev/null
+++ b/docs/retrospectives/2026-01-22-ratio-boot-fix.org
@@ -0,0 +1,45 @@
+#+TITLE: Retrospective: Ratio Boot Fix
+#+DATE: 2026-01-22
+
+* Summary
+
+Diagnosed and fixed boot failures on ratio (Framework Desktop, AMD Strix Halo).
+Root cause: missing linux-firmware 20260110. Secondary issues from chroot cleanup mistakes.
+
+See [[file:../2026-01-22-ratio-boot-fix-session.org][full session doc]] for technical details.
+
+* What Went Well
+
+- Systematic diagnosis - isolated variables (firmware vs kernel vs ZFS)
+- External research - video transcript and community posts gave us the firmware fix
+- Documentation - captured everything in session doc for future reference
+- Collaboration sync - after feedback, stayed in step and confirmed each action
+- ZFS snapshots - rollback capability enabled safe experimentation
+
+* What Didn't Go Well
+
+- Jumped ahead repeatedly - rebooting without confirming, running commands without checking in
+- Chroot cleanup mistakes - left mountpoint=legacy and /mnt prefixes causing boot failures
+- Wrong assumptions - initially assumed kernel 6.15 was the fix; firmware was the real issue
+- UUID mistake - used wrong boot partition UUID (didn't account for mirrored NVMe)
+- SSH debugging waste - spent time on sshpass issues when keys would have been simpler
+
+* Behavioral Lessons (Added to PRINCIPLES.org)
+
+1. *Sync Before Action* - Confirm before destructive actions, wait for go-ahead
+2. *Clean Up After Yourself* - Reset mountpoints after chroot, verify before export
+3. *Verify Assumptions* - When "should work" doesn't, question the assumption
+4. *Patience Over Speed* - "Wait, wait, wait" improved our effectiveness
+
+* What Would We Do Differently
+
+- Set up SSH keys at start of remote troubleshooting session
+- Create a chroot cleanup checklist and follow it every time
+- State the plan and wait for confirmation before each reboot
+- Test one fix at a time instead of stacking changes
+
+* Action Items
+
+- [X] Create PRINCIPLES.org with behavioral lessons
+- [X] Add retrospective workflow to protocols
+- [X] Document session for future reference