1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
|
#+TITLE: Extract Email Workflow
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2026-02-06
* Overview
Extract email content and attachments from an EML file, rename with a consistent naming convention, and refile to =assets/=.
* When to Use This Workflow
When Craig says:
- "extract the email"
- "get the attachment from [email]"
- "pull the info from [email]"
- "process the email in inbox"
* Sources
The EML file may come from two places:
** Already in =inbox/=
Emails dropped into the project's =inbox/= directory via Syncthing, manual copy, or other means. These are ready for extraction immediately.
** From =~/.mail/=
Emails in the local maildir managed by mbsync/mu. Use the [[file:find-email.org][find-email workflow]] to locate the message, then copy (don't move) it into =inbox/= before proceeding. Never modify =~/.mail/= directly.
* The Workflow
** Step 0: Context Hygiene
Before starting, write out the session context file and check with Craig whether we could compact the context. If there are a lot of emails, this will be a long process. If the context window collapses, we may forget important details. Writing out the session context prevents this data loss.
** Step 1: Run Extraction Script
Run the extraction script with =--output-dir= to perform the full pipeline (create temp dir, parse, auto-rename, extract attachments, refile, clean up):
#+begin_src bash
python3 docs/scripts/eml-view-and-extract-attachments.py inbox/message.eml --output-dir assets/
#+end_src
The script automatically:
- Parses email headers, body, and attachments
- Generates filenames using the naming convention (see below)
- Creates =.eml= (renamed copy), =.txt= (body text), and attachment files
- Checks for filename collisions in the output directory
- Moves all files to =assets/=
- Cleans up its temp directory
- Prints a summary of created files
** Step 2: Review Summary Output
Review the script's summary output and verify:
- Filenames look correct (rename manually if needed)
- Delete junk attachments (e.g., signature logos, tracking pixels)
- Delete source EML from inbox after confirming results
** Step 3: Report Results
Report to Craig:
- Summary of email content
- What files were extracted and their final names
- Where files were saved
* Naming Convention
Pattern: =YYYY-MM-DD-HHMM-Sender-TYPE-Description.ext=
| Component | Source |
|-------------+---------------------------------------------------------------------------|
| YYYY-MM-DD | From the email's Date header (server time) |
| HHMM | Hours and minutes from the Date header |
| Sender | First name of the sender |
| TYPE | =EMAIL= for the email body (.eml and .txt), =ATTACH= for attachments |
| Description | Shortened subject line for EMAIL files; original filename for ATTACH files |
** Example
For an email from Jonathan Smith, subject "Re: Fw: 4319 Danneel Street", sent 2026-02-05 at 11:36, with a PDF attachment "Ltr Carrollton.pdf":
#+begin_src
2026-02-05-1136-Jonathan-EMAIL-Re-Fw-4319-Danneel-Street.eml
2026-02-05-1136-Jonathan-EMAIL-Re-Fw-4319-Danneel-Street.txt
2026-02-05-1136-Jonathan-ATTACH-Ltr-Carrollton.pdf
#+end_src
* Backwards-Compatible Mode
Without =--output-dir=, the script behaves as before: prints metadata and body to stdout, extracts attachments alongside the EML file. This is useful for quick inspection without filing.
#+begin_src bash
python3 docs/scripts/eml-view-and-extract-attachments.py inbox/message.eml
#+end_src
* Batch Processing
When processing multiple emails, complete all steps for one email before starting the next. Do not parallelize across emails.
* Principles
- *Never modify =~/.mail/=* — always copy first, work on the copy
- *EML is authoritative* — always keep it alongside extracted files
- *Use email Date header for timestamps* — not extraction time
- *Refer to find-email for maildir searches* — don't duplicate those instructions
- *Script checks for collisions* — won't overwrite existing files in output dir
- *One email at a time* — complete the full cycle before starting the next
- *Source EML stays untouched* — the script copies, never moves the source; Claude deletes after verifying results
* Tools Reference
| Tool | Purpose |
|-------------------------------------+---------------------------------|
| eml-view-and-extract-attachments.py | Extract content and attachments |
Script location: =docs/scripts/eml-view-and-extract-attachments.py=
|