docs/workflows/process-meeting-transcript.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301

#+TITLE: Process Meeting Transcript Workflow
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2026-02-03

* Overview

This workflow defines the process for processing meeting recordings from start to finish: finding recordings, extracting audio, transcribing via AssemblyAI, identifying speakers, correcting errors, and archiving files.

* When to Use This Workflow

Trigger this workflow when:
- Craig says "process the transcript" or "process the recording" or similar
- New recording files (.mkv) appear in ~/sync/recordings/ after meetings
- Craig wants to process meeting recordings into labeled transcripts

* Prerequisites

- Recording file(s) exist in ~/sync/recordings/ (*.mkv)
- Calendar files available at ~/.emacs.d/data/*cal.org for meeting titles
- AssemblyAI transcription script at ~/.emacs.d/scripts/assemblyai-transcribe
- AssemblyAI API key stored in ~/.authinfo.gpg (machine api.assemblyai.com)
- ffmpeg available for audio extraction

* The Workflow

** Step 1: Identify Engagement and Write Session Context

Before starting transcript processing:

1. *Identify which engagement this meeting belongs to:*
   - DeepSat (default for current work)
   - Vineti (historical)
   - Salesforce (historical)
   - If unclear, ask Craig

2. *Set destination paths based on engagement:*
   - Assets: ~{engagement}/assets/~ (e.g., ~deepsat/assets/~)
   - Meetings: ~{engagement}/meetings/~ (e.g., ~deepsat/meetings/~)
   - Knowledge: ~{engagement}/knowledge.org~ for reference

3. Update docs/session-context.org with current status:
   - Note that we're about to process a meeting transcript
   - Get meeting name by checking ~/.emacs.d/data/*cal.org (match date/time to transcript timestamp)
   - If meeting not found in calendar, ask Craig for the meeting title

4. Ask Craig if he wants to compact the conversation context:
   - Transcript processing can use significant context
   - Compacting now preserves the session context file for recovery

** Step 2: Find Recording Files

Find and match recording files with calendar events:

1. **List recordings:** Find all .mkv files in ~/sync/recordings/
   #+begin_src bash
   ls -la ~/sync/recordings/*.mkv
   #+end_src

2. **Extract timestamps:** Parse date/time from each filename (format: YYYY-MM-DD_HH-MM-SS.mkv)

3. **Match with calendar:** Check ~/.emacs.d/data/*cal.org for meetings at those times
   #+begin_src bash
   cat ~/.emacs.d/data/dcal.org | grep -A2 "YYYY-MM-DD"
   #+end_src

4. **Present selection table to Craig:**
   | Filename                    | Meeting / Date-Time            |
   |-----------------------------+--------------------------------|
   | 2026-02-03_10-00-00.mkv     | DeepSat Standup (from calendar)|
   | 2026-02-03_14-30-00.mkv     | 2026-02-03 14:30 (no match)    |

5. **Craig selects files:** One, several, or all files to process

6. **Queue for processing:** Selected files ordered oldest → newest for serial processing

** Step 3: Extract Audio

For each selected recording file, extract audio for transcription:

#+begin_src bash
ffmpeg -i ~/sync/recordings/FILENAME.mkv -vn -ac 1 -c:a aac -b:a 96k /tmp/FILENAME.m4a
#+end_src

Settings:
- =-vn= : no video (audio only)
- =-ac 1= : mono channel (sufficient for speech, smaller file)
- =-c:a aac= : AAC codec
- =-b:a 96k= : 96kbps bitrate (sufficient for speech transcription)

Output: /tmp/FILENAME.m4a (temporary, deleted after transcription)

** Step 4: Transcribe with AssemblyAI

1. **Run transcription:**
   #+begin_src bash
   ~/.emacs.d/scripts/assemblyai-transcribe /tmp/FILENAME.m4a > ~/sync/recordings/FILENAME.txt
   #+end_src

2. **Clean up:** Delete intermediate .m4a file after successful transcription
   #+begin_src bash
   rm /tmp/FILENAME.m4a
   #+end_src

3. **Output format:** The script produces speaker-diarized output:
   #+begin_example
   Speaker A: First speaker's text here.
   Speaker B: Second speaker's response.
   Speaker A: First speaker continues.
   #+end_example

4. Continue to speaker identification workflow below.

** Step 5: Locate Files

Confirm the transcript and recording files are ready:

1. **Verify transcript exists:**
   #+begin_src bash
   ls -la ~/sync/recordings/FILENAME.txt
   #+end_src

2. **Verify recording exists:**
   #+begin_src bash
   ls -la ~/sync/recordings/FILENAME.mkv
   #+end_src

3. **Get meeting title:** If not already known from Step 2, check calendar
   - Calendar location: ~/.emacs.d/data/*cal.org
   - Match the meeting time to the transcript timestamp

** Step 6: Read and Analyze Transcript

1. Read the full transcript file

2. Identify speakers by analyzing context clues:
   - Names mentioned in conversation ("Thanks, Ryan")
   - Role references ("as the developer", "on the IT side")
   - Project-specific knowledge (who works on what)
   - Previous meeting context (known attendees)
   - Speaking order patterns

3. Build a speaker identification table:
   | Speaker | Person | Evidence |
   |---------|--------|----------|
   | A       | Name   | Clues... |

** Step 7: Confirm Speaker Identifications

Present the speaker identification table to Craig for confirmation:
- List each speaker label and proposed name
- Include the evidence/reasoning
- Ask about any uncertain identifications
- Note any new people to add to notes.org contacts

** Step 8: Create Labeled Transcript

1. Replace all speaker labels with actual names

2. Correct transcription errors:
   - Common mishearings (names, technical terms, company names)
   - Known substitutions from this project:
     - "Vanetti" → "Vineti"
     - "Fresh" → "Vrezh"
     - "Clean4" / "clone" → "CLIN 4"
     - "Vascan" → "Vazgan"
     - "Hike" / "Ike" → "Hayk"
     - "High Tech" → "HyeTech"
     - "Java software" → "JAMA software"
     - "JSON" (person) → "Jason"
     - "their S" / "ress" → "Nerses"
   - Technical terms specific to DeepSat (GovCloud, AFRL, SOUTHCOM, etc.)

3. Save to engagement assets folder:
   - Location: ~{engagement}/assets/~ (e.g., ~deepsat/assets/~)
   - Filename: YYYY-MM-DD-meeting-name.txt
   - Example: deepsat/assets/2026-02-03-standup-ipm-grooming.txt

** Step 9: Copy Recording to Meetings Folder

1. Ensure engagement meetings folder exists and pattern is in .gitignore (~*/meetings/*.mkv~)

2. Copy the .mkv file with descriptive name:
   #+begin_src bash
   cp ~/sync/recordings/YYYY-MM-DD_HH-MM-SS.mkv {engagement}/meetings/YYYY-MM-DD_HH-MM-meeting-name.mkv
   #+end_src
   Example: ~deepsat/meetings/2026-02-03_11-02-standup-ipm-grooming.mkv~

3. Verify the copy succeeded

** Step 10: Update Session Context with Meeting Summary

Add a meeting summary section to docs/session-context.org including:

1. **Attendees** - List all participants

2. **Key Decisions** - Important choices made

3. **Action Items** - Tasks assigned, especially for Craig

4. **New Information** - Things learned that should be noted

5. **New Contacts** - People to add to notes.org

** Step 11: Write Session Context File

Update docs/session-context.org with:
- Files created this session (transcript, recording)
- Summary of what was processed
- Next steps (file to assets, update notes.org, etc.)

*** Context Management (for multiple files)

When processing multiple recordings in a queue:

1. **After completing each file's workflow**, update docs/session-context.org with:
   - Files processed so far
   - Current position in queue
   - Summary of meeting just processed

2. **Ask Craig if compact is needed** before starting next file:
   - Transcript processing uses significant context
   - Compacting preserves session context for recovery

3. **If autocompact occurs**, reread session-context.org to:
   - Resume at correct position in queue
   - Avoid reprocessing already-completed files

** Step 12: Clean Up Source Files

After successful completion of all previous steps, delete the source files from ~/sync/recordings/:

1. **Delete the original recording:**
   #+begin_src bash
   rm ~/sync/recordings/FILENAME.mkv
   #+end_src

2. **Delete the raw transcript** (if generated):
   #+begin_src bash
   rm ~/sync/recordings/FILENAME.txt
   #+end_src

This step happens last to ensure all files are safely copied/processed before deletion. If anything goes wrong earlier in the workflow, the source files remain intact for retry.

* Output Files

| File               | Location                                              | Purpose                            |
|--------------------+-------------------------------------------------------+------------------------------------|
| Labeled transcript | {engagement}/assets/YYYY-MM-DD-meeting-name.txt       | Corrected transcript for reference |
| Meeting recording  | {engagement}/meetings/YYYY-MM-DD_HH-MM-meeting-name.mkv | Video for review (gitignored)      |
| Session context    | docs/session-context.org                              | Crash recovery, meeting summary    |
| Knowledge base     | {engagement}/knowledge.org                            | Team, infrastructure, corrections  |

* Common Transcription Errors

Keep this list updated as new patterns emerge:

| Heard As      | Correct       | Context                                        |
|---------------+---------------+------------------------------------------------|
| Vanetti       | Vineti        | Company where Craig, Nerses, Eric, Ryan worked |
| Fresh         | Vrezh         | Developer name                                 |
| Clean4, clone | CLIN 4        | Contract milestone                             |
| Vascan        | Vazgan        | MagicalLabs AI team member                     |
| Hike, Ike     | Hayk          | CTO name                                       |
| High Tech     | HyeTech       | Armenian tech community org                    |
| Java software | JAMA software | Requirements traceability tool                 |
| JSON (person) | Jason         | DevSecOps or advisor                           |
| their S, ress | Nerses        | CEO name                                       |
| sir Keith     | Sarkis        | BD/investor relations                          |
| Fastgas       | MagicalLabs   | Armenian AI contractor                         |
| Sitelix       | Cytellix      | CMMC security/compliance partner               |

* Tips

1. **Read the whole transcript first** - Context from later in the meeting often helps identify speakers from earlier

2. **Use the calendar** - Meeting names help set expectations for who attended

3. **Check engagement knowledge.org** - Team roster and transcription corrections specific to this engagement

4. **Ask about unknowns** - If a new person appears, ask Craig for context

5. **Note new learnings** - Update engagement knowledge.org with new contacts, corrections, or context after processing

* Validation Checklist

- [ ] Engagement identified and destination paths set
- [ ] Session context written before starting
- [ ] Recording files listed and matched with calendar
- [ ] Craig selected files to process
- [ ] Audio extracted to .m4a (mono, 96k AAC)
- [ ] AssemblyAI transcription completed
- [ ] Intermediate .m4a file deleted
- [ ] Transcript file verified
- [ ] All speakers identified
- [ ] Speaker identifications confirmed with Craig
- [ ] Transcript corrected and saved to {engagement}/assets/
- [ ] Recording copied to {engagement}/meetings/ with proper name
- [ ] Session context updated with meeting summary
- [ ] New contacts/info flagged for {engagement}/knowledge.org update
- [ ] (If multiple files) Queue position tracked in session context
- [ ] Source files deleted from ~/sync/recordings/