aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/2026-01-27-ratio-amd-gpu-suspend-workaround.org217
-rw-r--r--docs/workflows/assemble-email.org181
-rw-r--r--docs/workflows/email.org198
-rw-r--r--docs/workflows/retrospective-workflow.org90
4 files changed, 686 insertions, 0 deletions
diff --git a/docs/2026-01-27-ratio-amd-gpu-suspend-workaround.org b/docs/2026-01-27-ratio-amd-gpu-suspend-workaround.org
new file mode 100644
index 0000000..46e403d
--- /dev/null
+++ b/docs/2026-01-27-ratio-amd-gpu-suspend-workaround.org
@@ -0,0 +1,217 @@
+#+TITLE: Ratio AMD GPU Suspend Freeze - Workaround & Fix Tracking
+#+DATE: 2026-01-27
+
+* Summary
+
+Ratio (Framework Desktop, AMD Ryzen AI Max / Strix Halo) freezes hard on
+resume from suspend due to a VPE power gating race condition in the amdgpu
+driver. The freeze requires a hard power cycle, which causes journal
+corruption and can leave the btrfs filesystem read-only.
+
+As of 2026-01-27, the proper kernel fix exists (merged in 6.18) but is
+unusable due to separate CWSR bugs in 6.18+. Ratio runs kernel 6.12 LTS,
+which does not have the fix and will not receive a backport.
+
+A systemd suspend mask is applied as a workaround to prevent the system from
+ever entering the suspend/resume path.
+
+* The Bug
+
+** What Happens
+
+~8% of suspend/resume cycles on Strix Halo result in a hard system freeze
+approximately 1 second after the screen turns on during resume.
+
+** Root Cause: VPE Power Gating Race Condition
+
+The freeze is caused by a race condition in the amdgpu driver's VPE (Video
+Processing Engine) power management during resume:
+
+1. System resumes from suspend.
+2. amdgpu schedules =amdgpu_device_delayed_init_work_handler= (2s delay) to
+ run self-tests, including =vpe_ring_test_ib= which briefly powers on VPE.
+3. The ring buffer test is very short. VPE goes idle.
+4. After 1 second of idle, =vpe_idle_work_handler= fires and tells the SMU
+ (System Management Unit) to power gate (shut down) VPE.
+5. *But VPE is still at a high DPM level.* Newer VPE firmware only drops DPM
+ back to the lowest level (DPM0) after a workload has run for 2+ seconds.
+ The ring buffer test was too short to trigger that drop.
+6. The SMU tries to power gate VPE while it's at a high DPM level. On Strix
+ Halo, this hangs the SMU.
+7. The SMU hang cascades -- VCN, JPEG, and other GPU IPs can't be managed.
+ Half the GPU is frozen.
+8. The thread that issued the SMU command is stuck. System is locked up.
+ No further logging is possible.
+
+It only triggers on resume because that's when the driver runs the ring
+buffer self-test. During normal operation, VPE either isn't used or has had
+enough time to settle its DPM level before power gating.
+
+** Error Messages (if visible before freeze)
+
+#+begin_example
+SMU: I'm not done with your previous command
+Failed to power gate VPE!
+Dpm disable vpe failed, ret = -62
+Failed to power gate JPEG
+Failed to power gate VCN instance 0
+Dpm disable uvd failed
+#+end_example
+
+** References
+
+- [[https://lkml.org/lkml/2025/8/24/139][Original VPE_IDLE_TIMEOUT patch (LKML, Aug 2025)]]
+- [[https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg130657.html][VPE DPM0 fix v5 (amd-gfx, Oct 2025)]]
+- [[https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg130804.html][Follow-up: missing return statement fix]]
+- [[https://gitlab.freedesktop.org/drm/amd/-/issues/4615][Freedesktop bug #4615]]
+- [[https://community.frame.work/t/attn-critical-bugs-in-amdgpu-driver-included-with-kernel-6-18-x-6-19-x/79221][Framework Community: Critical 6.18/6.19 CWSR bugs]]
+
+* Kernel Fix Status
+
+** The Proper Fix
+
+Mario Limonciello (AMD) wrote =drm/amd: Check that VPE has reached DPM0 in
+idle handler= -- makes the idle handler check that VPE has actually reached
+DPM0 before attempting the power gate. Targets VPE 6.1.1 (Strix Halo) with
+firmware versions below =0x0a640500=.
+
+Merged into Linux 6.18 during the RC phase (drm-fixes-6.18, Oct 29, 2025).
+Closes freedesktop bug #4615.
+
+** Why We Can't Use 6.18
+
+Kernel 6.18.x and 6.19.x have critical CWSR (Compute Wavefront Save/Restore)
+bugs that cause hard GPU hangs on RDNA3/RDNA4 during compute workloads. The
+Framework Community recommends staying on 6.15-6.17 for Strix Halo until
+AMD resolves both VPE and CWSR issues in the same kernel.
+
+** Backport Status
+
+The fix was tagged =Cc: stable@vger.kernel.org= for backport but has NOT
+appeared in any 6.12 LTS release as of 6.12.67. It likely won't be
+backported to 6.12 due to infrastructure differences.
+
+** When to Check Again
+
+Monitor these for resolution:
+- Arch =linux-lts= package updates (=pacman -Si linux-lts=)
+- [[https://cdn.kernel.org/pub/linux/kernel/v6.x/][Kernel.org changelogs]] for 6.12.x stable releases
+- [[https://community.frame.work/t/attn-critical-bugs-in-amdgpu-driver-included-with-kernel-6-18-x-6-19-x/79221][Framework Community thread]] for CWSR resolution status
+- [[https://gitlab.freedesktop.org/drm/amd/-/issues/4615][Freedesktop #4615]] for any further developments
+
+* What We Applied (2026-01-27)
+
+** Workaround: Disable Suspend via systemd
+
+Prevents the system from entering the suspend/resume path entirely.
+The GPU bug is still present but never triggered.
+
+#+begin_src bash
+# Applied 2026-01-27:
+sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
+#+end_src
+
+Effects:
+- hypridle can no longer suspend the system
+- Screen stays on at idle (active power draw)
+- No more freeze → hard reboot → filesystem corruption cycle
+
+** Kernel Parameters NOT Applied
+
+The following parameters were identified as fixes but caused boot failures
+on ratio when previously attempted (twice):
+
+#+begin_example
+amdgpu.pg_mask=0 # Disables all GPU power gating
+amdgpu.cwsr_enable=0 # Disables Compute Wavefront Save/Restore
+#+end_example
+
+It is unclear whether the boot failures were caused by the parameters
+themselves or by a corrupted initramfs from running mkinitcpio while the
+GPU was in a bad state. Testing via the GRUB =e= key (temporary, no
+permanent change) is planned but deferred.
+
+** Current Kernel Command Line (for reference)
+
+#+begin_example
+BOOT_IMAGE=/@/boot/vmlinuz-linux-lts root=UUID=5b9f7f7f-2477-488f-8fb1-52b5c7d90e98
+rw rootflags=subvol=@ console=tty0 console=ttyS0,115200 rw loglevel=2
+rd.systemd.show_status=auto rd.udev.log_level=2 nvme.noacpi=1
+mem_sleep_default=deep nowatchdog random.trust_cpu=off quiet splash
+#+end_example
+
+* How to Undo When a Fixed Kernel Arrives
+
+** Step 1: Verify the Fix is in the New Kernel
+
+Check that the VPE DPM0 fix is present:
+
+#+begin_src bash
+# Check kernel version
+uname -r
+
+# Search for the fix in the changelog
+# Look for "VPE" or "DPM0" or "vpe_idle" in the relevant changelog:
+# https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-<version>
+
+# Or check the source directly:
+grep -r "vpe_need_dpm0_at_power_down\|vpe_get_dpm_level" /usr/src/linux/drivers/gpu/drm/amd/ 2>/dev/null
+#+end_src
+
+Also verify that CWSR bugs are resolved (check Framework Community thread).
+
+** Step 2: Unmask Suspend Targets
+
+#+begin_src bash
+sudo systemctl unmask sleep.target suspend.target hibernate.target hybrid-sleep.target
+#+end_src
+
+** Step 3: Test Suspend/Resume
+
+#+begin_src bash
+# Test a single suspend/resume cycle
+sudo systemctl suspend
+
+# If system resumes cleanly, test a few more times
+# The original bug had ~8% failure rate, so test at least 20 cycles
+#+end_src
+
+** Step 4: If Kernel Parameters Were Applied
+
+If =amdgpu.pg_mask=0= and =amdgpu.cwsr_enable=0= were added to GRUB, remove
+them once the kernel fix is confirmed working:
+
+#+begin_src bash
+# Edit GRUB config
+sudo vim /etc/default/grub
+# Remove amdgpu.pg_mask=0 and amdgpu.cwsr_enable=0 from GRUB_CMDLINE_LINUX_DEFAULT
+
+# Rebuild GRUB config
+sudo grub-mkconfig -o /boot/grub/grub.cfg
+
+# Reboot and test suspend
+#+end_src
+
+* Log Evidence (2026-01-27 Investigation)
+
+** System Info
+
+- Machine: Framework Desktop (AMD Ryzen AI Max 300 Series)
+- Hostname: ratio
+- Kernel: 6.12.67-1-lts
+- Filesystem: btrfs RAID1 on 2x NVMe (nvme0n1p2 + nvme1n1p2)
+- GPU: AMD Strix Halo (RDNA 3.5)
+
+** Findings
+
+- 13 boots between Jan 25-27, most ending in suspend then hard freeze
+- Journal corruption on boots -5, -3, and -7 (unclean shutdown)
+- =mc= (Midnight Commander) stuck in D state (uninterruptible I/O) during
+ failed freeze attempts, in =io_schedule → folio_wait_bit_common →
+ filemap_read= path
+- Suspend freeze pattern: =PM: suspend entry (deep)= → =PM: suspend exit= →
+ =PM: suspend entry (s2idle)= → no more logs → hard reboot required
+- =mu= database corruption (error 121) from repeated unclean shutdowns
+- btrfs device stats: zero errors on both NVMe drives
+- No explicit BTRFS read-only event logged (freeze kills logging before it
+ can be recorded)
diff --git a/docs/workflows/assemble-email.org b/docs/workflows/assemble-email.org
new file mode 100644
index 0000000..bae647f
--- /dev/null
+++ b/docs/workflows/assemble-email.org
@@ -0,0 +1,181 @@
+#+TITLE: Email Assembly Workflow
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-01-29
+
+* Overview
+
+This workflow assembles documents for an email that will be sent via Craig's email client (Proton Mail). It creates a temporary workspace, gathers relevant documents, drafts the email, and cleans up after sending.
+
+Use this workflow when Craig needs to send an email with multiple attachments that require gathering from various locations in the project.
+
+* When to Use This Workflow
+
+When Craig says:
+- "assemble an email" or "email assembly workflow"
+- "gather documents for an email"
+- "I need to send [person] some documents"
+
+* The Workflow
+
+** Step 1: Create Temporary Workspace
+
+Create a temporary folder at the project root:
+
+#+begin_src bash
+mkdir -p ./tmp
+#+end_src
+
+This folder will hold:
+- Copies of all attachments
+- The draft email text
+
+** Step 2: Identify Required Documents
+
+Discuss with Craig what documents are needed. Common categories:
+- Legal documents (deeds, certificates, agreements)
+- Financial documents (statements, invoices)
+- Correspondence (prior emails, letters)
+- Identity documents (death certificates, ID copies)
+
+For each document:
+1. Locate it in the project
+2. Confirm with Craig it's the right one
+3. Open it in zathura for Craig to verify if needed
+
+** Step 3: Copy Documents to Workspace
+
+**IMPORTANT: Always COPY, never MOVE documents.**
+
+#+begin_src bash
+cp /path/to/original/document.pdf ./tmp/
+#+end_src
+
+After copying, list the workspace contents to confirm:
+
+#+begin_src bash
+ls -lh ./tmp/
+#+end_src
+
+** Step 4: Draft the Email
+
+Create a draft email file in the workspace:
+
+#+begin_src bash
+./tmp/email-draft.txt
+#+end_src
+
+Include:
+- To: (recipient email)
+- Subject: (clear, descriptive subject line)
+- Body: (context, list of attachments, contact info)
+
+The body should:
+- Provide context for why documents are being sent
+- List all attachments with brief descriptions
+- Include Craig's contact information
+
+** Step 5: Open Draft in Emacs
+
+Open the draft for Craig to review and edit:
+
+#+begin_src bash
+emacsclient -n ./tmp/email-draft.txt
+#+end_src
+
+Wait for Craig to finish editing before proceeding.
+
+** Step 6: Craig Sends Email
+
+Craig will:
+1. Open his email client (Proton Mail)
+2. Create a new email using the draft text
+3. Attach documents from the tmp folder
+4. Send the email
+
+** Step 7: Process Sent Email
+
+Once Craig confirms the email was sent:
+
+1. Craig saves the sent email to the inbox
+2. Use the extraction script to process it:
+
+#+begin_src bash
+python3 docs/scripts/extract_attachments.py "./inbox/[email-file].eml"
+#+end_src
+
+3. Read the extracted content to verify
+4. Rename and refile the email appropriately:
+
+#+begin_src bash
+mv "./inbox/[email-file].eml" ./[appropriate-folder]/YYYY-MM-DD-email-to-[recipient]-[topic].eml
+#+end_src
+
+5. Delete any duplicate extracted attachments from inbox
+
+** Step 8: Clean Up Workspace
+
+Delete the temporary folder:
+
+#+begin_src bash
+rm -rf ./tmp/
+#+end_src
+
+* Best Practices
+
+** Document Verification
+
+Before copying documents:
+- Open each one in zathura for Craig to verify
+- Confirm it's the correct version
+- Check that sensitive information is appropriate to send
+
+** Email Draft Structure
+
+A good email draft includes:
+
+#+begin_example
+To: recipient@example.com
+Subject: [Clear Topic] - [Property/Case Reference]
+
+Hi [Name],
+
+[Opening - context for why you're sending this]
+
+[Middle - explanation of what's attached and why]
+
+Attached are the following documents:
+
+1. [Document name] - [brief description]
+2. [Document name] - [brief description]
+3. [Document name] - [brief description]
+
+[Closing - next steps, request for confirmation, offer to provide more]
+
+Thank you,
+
+Craig Jennings
+510-316-9357
+c@cjennings.net
+#+end_example
+
+** Filing Conventions
+
+When refiling sent emails:
+- Use format: YYYY-MM-DD-email-to-[recipient]-[topic].eml
+- File in the most relevant project folder.
+- Remove duplicate attachments extracted to inbox
+
+* Example Usage
+
+Craig: "I need to send Seabreeze the documents for the HOA refund"
+
+Claude:
+1. Creates ./tmp/ folder
+2. Discusses needed documents (death certificate, closing docs, purchase agreement)
+3. Locates and opens each document for verification
+4. Copies verified documents to ./tmp/
+5. Drafts email and opens in emacsclient
+6. Craig edits, then sends via Proton Mail
+7. Craig saves sent email to inbox
+8. Claude extracts, reads, renames, and refiles email
+9. Claude deletes ./tmp/ folder
diff --git a/docs/workflows/email.org b/docs/workflows/email.org
new file mode 100644
index 0000000..cfd7adf
--- /dev/null
+++ b/docs/workflows/email.org
@@ -0,0 +1,198 @@
+#+TITLE: Email Workflow
+#+AUTHOR: Craig Jennings & Claude
+#+DATE: 2026-01-26
+
+* Overview
+
+This workflow sends emails with optional attachments via msmtp using the cmail account (c@cjennings.net via Proton Bridge).
+
+* When to Use This Workflow
+
+When Craig says:
+- "email workflow" or "send an email"
+- "email [person] about [topic]"
+- "send [file] to [person]"
+
+* Required Information
+
+Before sending, gather and confirm:
+
+1. **To:** (required) - recipient email address(es)
+2. **CC:** (optional) - carbon copy recipients
+3. **BCC:** (optional) - blind carbon copy recipients
+4. **Subject:** (required) - email subject line
+5. **Body:** (required) - email body text
+6. **Attachments:** (optional) - file path(s) to attach
+
+* The Workflow
+
+** Step 1: Gather Missing Information
+
+If any required fields are missing, prompt Craig:
+
+#+begin_example
+To send this email, I need:
+- To: [who should receive this?]
+- Subject: [what's the subject line?]
+- Body: [what should the email say?]
+- Attachments: [any files to attach?]
+- CC/BCC: [anyone to copy?]
+#+end_example
+
+** Step 2: Validate Email Addresses
+
+Look up all recipient names/emails in the contacts file:
+
+#+begin_src bash
+grep -i "[name or email]" ~/sync/org/contacts.org
+#+end_src
+
+**Note:** If contacts.org is empty, check for sync-conflict files:
+#+begin_src bash
+ls ~/sync/org/contacts*.org
+#+end_src
+
+For each recipient:
+1. Search contacts by name or email
+2. Confirm the email address matches
+3. If name not found, ask Craig to confirm the email is correct
+4. If multiple emails for a contact, ask which one to use
+
+** Step 3: Confirm Before Sending
+
+Display the complete email for review:
+
+#+begin_example
+Ready to send:
+
+From: c@cjennings.net
+To: [validated email(s)]
+CC: [if any]
+BCC: [if any]
+Subject: [subject]
+
+[body text]
+
+Attachments: [list files if any]
+
+Send this email? [Y/n]
+#+end_example
+
+** Step 4: Send the Email
+
+Use Python to construct MIME message and pipe to msmtp:
+
+#+begin_src python
+python3 << 'EOF' | msmtp -a cmail [recipient]
+import sys
+from email.mime.multipart import MIMEMultipart
+from email.mime.text import MIMEText
+from email.mime.application import MIMEApplication
+from email.utils import formatdate
+import os
+
+msg = MIMEMultipart()
+msg['From'] = 'c@cjennings.net'
+msg['To'] = '[to_address]'
+# msg['Cc'] = '[cc_address]' # if applicable
+# msg['Bcc'] = '[bcc_address]' # if applicable
+msg['Subject'] = '[subject]'
+msg['Date'] = formatdate(localtime=True)
+
+body = """[body text]"""
+msg.attach(MIMEText(body, 'plain'))
+
+# For each attachment:
+# pdf_path = '/path/to/file.pdf'
+# with open(pdf_path, 'rb') as f:
+# attachment = MIMEApplication(f.read(), _subtype='pdf')
+# attachment.add_header('Content-Disposition', 'attachment', filename='filename.pdf')
+# msg.attach(attachment)
+
+print(msg.as_string())
+EOF
+#+end_src
+
+**Important:** When there are CC or BCC recipients, pass ALL recipients to msmtp:
+#+begin_src bash
+python3 << 'EOF' | msmtp -a cmail to@example.com cc@example.com bcc@example.com
+#+end_src
+
+** Step 5: Verify Delivery
+
+Check the msmtp log for confirmation:
+
+#+begin_src bash
+tail -3 ~/.msmtp.cmail.log
+#+end_src
+
+Look for: ~smtpstatus=250~ and ~exitcode=EX_OK~
+
+** Step 6: Sync to Sent Folder (Optional)
+
+If Craig wants the email in his Sent folder:
+
+#+begin_src bash
+mbsync cmail
+#+end_src
+
+* msmtp Configuration
+
+The cmail account should be configured in ~/.msmtprc:
+
+#+begin_example
+account cmail
+tls_certcheck off
+auth on
+host 127.0.0.1
+port 1025
+protocol smtp
+from c@cjennings.net
+user c@cjennings.net
+passwordeval "cat ~/.config/.cmailpass"
+tls on
+tls_starttls on
+logfile ~/.msmtp.cmail.log
+#+end_example
+
+**Note:** ~tls_certcheck off~ is used because Proton Bridge uses self-signed certificates on localhost.
+
+* Attachment Handling
+
+** Supported Types
+
+Common MIME subtypes:
+- PDF: ~_subtype='pdf'~
+- Images: ~_subtype='png'~, ~_subtype='jpeg'~
+- Text: ~_subtype='plain'~
+- Generic: ~_subtype='octet-stream'~
+
+** Multiple Attachments
+
+Add multiple attachment blocks before ~print(msg.as_string())~
+
+* Troubleshooting
+
+** Password File Missing
+Ensure ~/.config/.cmailpass exists with the Proton Bridge SMTP password.
+
+** TLS Certificate Errors
+Use ~tls_certcheck off~ in msmtprc for Proton Bridge (localhost only).
+
+** Proton Bridge Not Running
+Start Proton Bridge before sending. Check if port 1025 is listening:
+#+begin_src bash
+ss -tlnp | grep 1025
+#+end_src
+
+* Example Usage
+
+Craig: "email workflow - send the November 3rd SOV to Christine"
+
+Claude:
+1. Searches contacts for "Christine" -> finds cciarmello@gmail.com
+2. Asks for subject and body if not provided
+3. Locates the SOV file in assets/
+4. Shows confirmation
+5. Sends via msmtp
+6. Verifies delivery in log
diff --git a/docs/workflows/retrospective-workflow.org b/docs/workflows/retrospective-workflow.org
new file mode 100644
index 0000000..440c14e
--- /dev/null
+++ b/docs/workflows/retrospective-workflow.org
@@ -0,0 +1,90 @@
+#+TITLE: Retrospective Workflow
+#+DESCRIPTION: How to run a retrospective after major problem-solving sessions
+
+* When to Run a Retrospective
+
+Run after:
+- Major debugging/troubleshooting sessions
+- Complex multi-step implementations
+- Any session where significant friction occurred
+- Sessions lasting more than an hour with trial-and-error
+
+* The Process
+
+** 1. Trigger the Retrospective
+
+Either party can say: "Let's do a retrospective" or "Retrospective time"
+
+** 2. Answer These Questions (Both Parties)
+
+*** What went well?
+Identify patterns worth reinforcing. Be specific.
+
+*** What didn't go well?
+Identify friction points, mistakes, wasted time. No blame, just facts.
+
+*** What behavioral changes should we make?
+Focus on *how we work*, not technical facts.
+- Good: "Confirm before rebooting"
+- Not behavioral: "AMD needs firmware 20260110"
+
+*** What would we do differently next time?
+Specific scenarios and better approaches.
+
+*** Any new principles to add?
+Distill lessons into short, actionable principles for retrospective/PRINCIPLES.org.
+
+** 3. Copy and Update retrospectives/PRINCIPLES.org
+
+Copy the template retrospectives/PRINCIPLES.org.
+
+Using the copied template, add new behavioral principles learned. Keep them:
+- Short and actionable
+- Focused on behavior, not facts
+- Easy to remember and apply
+
+** 4. Create Retrospective Record
+
+Save to =docs/retrospectives/YYYY-MM-DD-topic.org= with:
+- Summary of what happened
+- Answers to the questions above
+- Link to detailed session doc if exists
+
+** 5. Commit Changes
+
+Commit PRINCIPLES.org updates and retrospective record.
+
+* PRINCIPLES.org Structure
+
+#+BEGIN_SRC org
+,* How We Work Together
+,** Principle Name
+- Bullet points explaining the principle
+- When it applies
+- Why it matters
+
+,* Checklists
+,** Checklist Name
+- [ ] Step 1
+- [ ] Step 2
+#+END_SRC
+
+* Integration with Session Startup
+
+Add to project's protocols.org or session startup:
+- Check if PRINCIPLES.org was updated since last session
+- Review any new principles before starting work
+
+* Example Principles (Starters)
+
+** Sync Before Action
+- Confirm before destructive or irreversible actions
+- State what you're about to do and wait for go-ahead
+
+** Verify Assumptions
+- When something "should work" but doesn't, question the assumption
+- Test one variable at a time
+
+** Clean Up After Yourself
+- Reset temporary changes before finishing
+- Verify system is in expected state